Skip to content

Version 2.2.0

Latest
Compare
Choose a tag to compare
@dolsysmith dolsysmith released this 22 Sep 12:41
· 2 commits to master since this release
7c4f064
  • Upgrades Python to 3.8 (#126. #131)
  • Upgrades Spark & pyspark to 3.1 (#128, #117)
  • Uses the Spark DataFrame API to create full extracts at time of load: Tweet ID's, full Tweet CSV, Tweet mentions, Tweet users (#128)
  • Re-purposes the original (gzipped) JSONL from SFM to create the full Tweet JSON extract, concatenating the files by date of harvest (#152)
  • Adds an environment variable for specifying maximum file size for full extracts (#128)
  • Updates the TweetSets data model to align with twarc v. 1.12 (#150, #128)
  • Improves the indexing and extraction of full text and hashtags from extended Tweets (#150, #128)
  • Updates tests to test the Spark schema for creating extracts (#135)
  • Prevents access to full dataset files by those not authorized (#148)
  • Installation documentation and docker-compose.yml clarifications (#119, #95, #90)
  • Updates pinning of Elasticsearch dependencies (#141)
  • Bugfixes for using flask create-extract command (#125) and checking whether user should be directed to full dataset (#120)
  • Preventing incorrect date format from being submitted in form (#87)