diff --git a/CHANGELOG.md b/CHANGELOG.md index dde48749d..5f532894f 100644 --- a/CHANGELOG.md +++ b/CHANGELOG.md @@ -4,7 +4,7 @@ [Full Changelog](https://github.com/internetarchive/heritrix3/compare/3.4.0-20240909...HEAD) -## [3.4.0-20240909 2024-09-09](https://github.com/internetarchive/heritrix3/releases/edit/3.4.0-20240909) +## [3.4.0-20240909](https://github.com/internetarchive/heritrix3/releases/3.4.0-20240909) 2024-09-09 [Full Changelog](https://github.com/internetarchive/heritrix3/compare/3.4.0-20220727...3.4.0-20240909) @@ -14,28 +14,28 @@ Checkpoints and crawl state created with older versions of Heritrix will not be #### New Features - JDK 22 support -- Added `ConfigurableExtractorJS` for more flexible JavaScript extraction. (#602) -- Added `HostnameQueueAssignmentPolicyWithLimits` with optional name length limits. (#598) -- `ExtractorHTML` can now extract more variants of alternative resolution image URLs. (#605) -- `ExtractorHTTP` can now be configured with extra inferred paths (#597) -- `ExtractorYoutubeDL` metadata records can now be optionally logged to crawl.log (#593) +- Added `ConfigurableExtractorJS` for more flexible JavaScript extraction. ([#602](https://github.com/internetarchive/heritrix3/issues/602)) +- Added `HostnameQueueAssignmentPolicyWithLimits` with optional name length limits. ([#598](https://github.com/internetarchive/heritrix3/issues/598)) +- `ExtractorHTML` can now extract more variants of alternative resolution image URLs. ([#605](https://github.com/internetarchive/heritrix3/issues/605)) +- `ExtractorHTTP` can now be configured with extra inferred paths ([#597](https://github.com/internetarchive/heritrix3/issues/597)) +- `ExtractorYoutubeDL` metadata records can now be optionally logged to crawl.log ([#593](https://github.com/internetarchive/heritrix3/issues/593)) #### Removals -- Removed `ExtractorChrome` from contrib (#601) +- Removed `ExtractorChrome` from contrib ([#601](https://github.com/internetarchive/heritrix3/issues/601)) #### Fixes -- Reduced false positive speculative URLs from meta tags (#595) -- Fixed BdbModule resource leak on job teardown (f4280012ae5f23763f1e19d196a245ae49f9b697) -- Corrected function name in `ScriptedProcessor` Javadoc. (#599) +- Reduced false positive speculative URLs from meta tags ([#595](https://github.com/internetarchive/heritrix3/issues/595)) +- Fixed BdbModule resource leak on job teardown ([f4280012ae5f23763f1e19d196a245ae49f9b697](https://github.com/internetarchive/heritrix3/commit/f4280012ae5f23763f1e19d196a245ae49f9b697)) +- Corrected function name in `ScriptedProcessor` Javadoc. ([#599](https://github.com/internetarchive/heritrix3/issues/599)) - Updated Maven builds to use HTTPS for resolving dependencies. -- Reset CrawlURI status for hasPrerequisite() so that it isn't preserved between attempts (#600) -- Fixed older junit3 tests not being run (#592) -- Increased DiskSpaceMonitor default pause threshold to 8 GiB (#499) -- Stopping logging authentication failures when header is missing (#539) -- Fixed console still showing job running after crash (#549) +- Reset CrawlURI status for hasPrerequisite() so that it isn't preserved between attempts ([#600](https://github.com/internetarchive/heritrix3/issues/600)) +- Fixed older junit3 tests not being run ([#592](https://github.com/internetarchive/heritrix3/issues/592)) +- Increased DiskSpaceMonitor default pause threshold to 8 GiB ([#499](https://github.com/internetarchive/heritrix3/issues/499)) +- Stopping logging authentication failures when header is missing ([#539](https://github.com/internetarchive/heritrix3/issues/539)) +- Fixed console still showing job running after crash ([#549](https://github.com/internetarchive/heritrix3/issues/549)) #### Dependency Upgrades -- Transitioned `PDFParser` and `ExtractorPDF` to pdfbox (#575) +- Transitioned `PDFParser` and `ExtractorPDF` to pdfbox ([#575](https://github.com/internetarchive/heritrix3/issues/575)) - Transitioned `ExtractorYoutubeDL` to yt-dlp - commons-net 3.9.0 - com.rabbitmq:amqp-client 5.18.0