Skip to content

trafilatura-1.10.0

Compare
Choose a tag to compare
@adbar adbar released this 30 May 15:45
b36b6fa

Breaking changes:

  • raise errors on deprecated CLI and function arguments (#581)
  • regroup classes and functions linked to deduplication (#582)
    trafilatura.hashingtrafilatura.deduplication

Extraction:

  • port of is_probably_readerable from readability.js by @zirkelc in #587
  • Markdown table fixes by @naktinis in #601
  • fix list spacing in TXT output (#598)
  • CLI fixes: file processing options, mtime, and tests (#605)
  • CLI fix: read standard input as binary (#607)

Downloads:

  • fix deflate and add optional zstd to accepted encodings (#594)
  • spider fix: use internal download utilities for robots.txt (#590)

Maintenance:

  • add author XPaths (#567)
  • update justext and lxml dependencies (#593)
  • simplify code: unique function for length tests (#591)

Docs: