Skip to content

Releases: apache/iceberg

Apache Iceberg 1.7.0

08 Nov 15:12
Compare
Choose a tag to compare

What's Changed

Read more

Apache Iceberg 1.6.1

27 Aug 17:47
8e9d59d
Compare
Choose a tag to compare

What's Changed

New Contributors

Full Changelog: apache-iceberg-1.6.0...apache-iceberg-1.6.1

Apache Iceberg 1.6.0

23 Jul 12:06
229d8f6
Compare
Choose a tag to compare

What's Changed

Read more

Apache Iceberg 1.5.2

16 May 07:50
cbb8530
Compare
Choose a tag to compare

The 1.5.2 release has the same changes that the 1.5.1 release has. The 1.5.1 release had issues with the spark runtime artifacts; specifically certain artifacts were built with the wrong Scala version. It is strongly recommended to upgrade to 1.5.2 for any systems that are using 1.5.1.

Apache Iceberg 1.5.1

30 Apr 12:43
cbb8530
Compare
Choose a tag to compare

What's Changed

  • [1.5.x] API: Fix default FileIO#newInputFile ManifestFile, DataFile and DeleteFile implementations by @amogh-jahagirdar in #10114
  • [1.5.x] Core: Mark 502 and 504 failures as retryable to the exponential retry strategy by @amogh-jahagirdar in #10113
  • Core: Fix JDBC Catalog table commit when migrating from schema V0 to V1 (#101111) by @jbonofre in #10152
  • Core: Fix namespace SQL statement using ESCAPE character that works with MySQL/PostgreSQL (#10167) by @jbonofre in #10169
  • (1.5.x cherry-pick) Spark 3.5: Fix system function pushdown in CoW row-level commands by @amogh-jahagirdar in #10170
  • (1.5.x Cherry-pick) Spark 3.4: Fix system function pushdown in CoW row-level commands (#10119) by @amogh-jahagirdar in #10171

Full Changelog: apache-iceberg-1.5.0...apache-iceberg-1.5.1

Apache Iceberg 1.5.0

11 Mar 11:00
2519ab4
Compare
Choose a tag to compare

Apache Iceberg 1.5.0 was released on March 11, 2024.
The 1.5.0 release adds a variety of new features and bug fixes.

  • API
    • Extend FileIO and add EncryptingFileIO. (#9592)
    • Track partition statistics in TableMetadata (#8502)
    • Add sqlFor API to views to handle resolving a representation for a dialect(#9247)
  • Core
    • Add view support for REST catalog (#7913)
    • Add view support for JDBC catalog (#9487)
    • Add catalog type for glue,jdbc,nessie (#9647)
    • Support Avro file encryption with AES GCM streams (#9436)
    • Add ApplyNameMapping for Avro (#9347)
    • Add StandardEncryptionManager (#9277)
    • Add REST catalog table session cache (#8920)
    • Support view metadata compression (#8552)
    • Track partition statistics in TableMetadata (#8502)
    • Enable column statistics filtering after planning (#8803)
  • Spark
    • Remove support for Spark 3.2 (#9295)
    • Support views via SQL for Spark 3.4 and 3.5 (#9423, #9421, #9343, #9513, #9582)
    • Support executor cache locality (#9563)
    • Added support for delete manifest rewrites (#9020)
    • Support encrypted output files (#9435)
    • Add Spark UI metrics from Iceberg scan metrics (#8717)
    • Parallelize reading files in add_files procedure (#9274)
    • Support file and partition delete granularity (#9384)
  • Flink
    • Remove Flink 1.15
    • Adds support for 1.18 version #9211
    • Emit watermarks from the IcebergSource (#8553)
    • Watermark read options (#9346)
  • Parquet
    • Support reading INT96 column in row group filter (#8988)
    • Add system config for unsafe Parquet ID fallback. (#9324)
  • Kafka-Connect
    • Initial project setup and event data structures (#8701)
    • Sink connector with data writers and converters (#9466)
  • Spec
    • Add partition stats spec (#7105)
    • add nanosecond timestamp types (#8683)
    • Add multi-arg transform (#8579)
  • Vendor Integrations
    • AWS: Support setting description for Glue table (#9530)
    • AWS: Update S3FileIO test to run when CLIENT_FACTORY is not set (#9541)
    • AWS: Add S3 Access Grants Integration (#9385)
    • AWS: Glue catalog strip trailing slash on DB URI (#8870)
    • Azure: Add FileIO that supports ADLSv2 storage (#8303)
    • Azure: Make ADLSFileIO implement DelegateFileIO (#8563)
    • Nessie: Support views for NessieCatalog (#8909)
    • Nessie: Strip trailing slash for warehouse location (#9415)
    • Nessie: Infer default API version from URI (#9459)
  • Dependencies
    • Bump Nessie to 0.77.1
    • Bump ORC to 1.9.2
    • Bump Arrow to 15.0.0
    • Bump AWS Java SDK to 2.24.5
    • Bump Azure Java SDK to 1.2.20
    • Bump Google cloud libraries to 26.28.0

Note:

  1. To enable view support for JDBC catalog, configure jdbc.schema-version to V1 in catalog properties.

New Contributors

Read more

Apache Iceberg 1.4.3

27 Dec 17:15
9a5d24f
Compare
Choose a tag to compare

What's Changed

  • Core: Scan only live entries in partitions table (#8969) by @Fokko in #9197
  • [1.4.x] Core: Fix missing files from transaction retries with conflicting manifest merges (#9230) by @nastra in #9337
  • [1.4.x] JDBC Catalog: Fix namespaceExists check with special characters (#8340) by @ismailsimsek in #9291
  • [1.4.x] Core: Expired Snapshot files in a transaction should be deleted by @bartash in #9223
  • [1.4.x] Core: Fix missing delete files from transaction (#9354) by @nastra in #9356

Full Changelog: apache-iceberg-1.4.2...apache-iceberg-1.4.3

Apache Iceberg 1.4.2

07 Nov 16:44
f6bb917
Compare
Choose a tag to compare

What's Changed

Full Changelog: apache-iceberg-1.4.1...apache-iceberg-1.4.2

Apache Iceberg 1.4.1

23 Oct 11:03
445664f
Compare
Choose a tag to compare

What's Changed

  • Core: Do not use a lazy split offset list in manifests (#8834) by @nastra in #8845
  • Core: Ignore split offsets when the last split offset is past the file length by @amogh-jahagirdar in #8861
  • AWS: avoid static global credentials provider which doesn't play well with lifecycle management (#8677) by @nastra in #8843
  • Flink: Reverting the default custom partitioner for bucket column (#8848) by @nastra in #8858

Full Changelog: apache-iceberg-1.4.0...apache-iceberg-1.4.1

Apache Iceberg 1.4.0

08 Oct 00:46
Compare
Choose a tag to compare
  • API
    • Implement bound expression sanitization (#8149)
    • Remove overflow checks in DefaultCounter causing performance issues (#8297)
    • Support incremental scanning with branch (#5984)
    • Add a validation API to DeleteFiles which validates files exist (#8525)
  • Core
    • Use V2 format by default in new tables (#8381)
    • Use zstd compression for Parquet by default in new tables (#8593)
    • Add strict metadata cleanup mode and enable it by default (#8397) (#8599)
    • Avoid generating huge manifests during commits (#6335)
    • Add a writer for unordered position deletes (#7692)
    • Optimize DeleteFileIndex (#8157)
    • Optimize lookup in DeleteFileIndex without useful bounds (#8278)
    • Optimize split offsets handling (#8336)
    • Optimize computing user-facing state in data tasks (#8346)
    • Don't persist useless file and position bounds for deletes (#8360)
    • Don't persist counts for paths and positions in position delete files (#8590)
    • Support setting system-level properties via environmental variables (#5659)
    • Add JSON parser for ContentFile and FileScanTask (#6934)
    • Add REST spec and request for commits to multiple tables (#7741)
    • Add REST API for committing changes against multiple tables (#7569)
    • Default to exponential retry strategy in REST client (#8366)
    • Support registering tables with REST session catalog (#6512)
    • Add last updated timestamp and snapshot ID to partitions metadata table (#7581)
    • Add total data size to partitions metadata table (#7920)
    • Extend ResolvingFileIO to support bulk operations (#7976)
    • Key metadata in Avro format (#6450)
    • Add AES GCM encryption stream (#3231)
    • Fix a connection leak in streaming delete filters (#8132)
    • Fix lazy snapshot loading history (#8470)
    • Fix unicode handling in HTTPClient (#8046)
    • Fix paths for unpartitioned specs in writers (#7685)
    • Fix OOM caused by Avro decoder caching (#7791)
  • Spark
    • Added support for Spark 3.5
      • Code for DELETE, UPDATE, and MERGE commands has moved to Spark, and all related extensions have been dropped from Iceberg.
      • Support for WHEN NOT MATCHED BY SOURCE clause in MERGE.
      • Column pruning in merge-on-read operations.
      • Ability to request a bigger advisory partition size for the final write to produce well-sized output files without harming the job parallelism.
    • Dropped support for Spark 3.1
    • Deprecated support for Spark 3.2
    • Support vectorized reads for merge-on-read operations in Spark 3.4 and 3.5 (#8466)
    • Increase default advisory partition size for writes in Spark 3.5 (#8660)
    • Support distributed planning in Spark 3.4 and 3.5 (#8123)
    • Support pushing down system functions by V2 filters in Spark 3.4 and 3.5 (#7886)
    • Support fanout position delta writers in Spark 3.4 and 3.5 (#7703)
    • Use fanout writers for unsorted tables by default in Spark 3.5 (#8621)
    • Support multiple shuffle partitions per file in compaction in Spark 3.4 and 3.5 (#7897)
    • Output net changes across snapshots for carryover rows in CDC (#7326)
    • Display read metrics on Spark SQL UI (#7447) (#8445)
    • Adjust split size to benefit from cluster parallelism in Spark 3.4 and 3.5 (#7714)
    • Add fast_forward procedure (#8081)
    • Support filters when rewriting position deletes (#7582)
    • Support setting current snapshot with ref (#8163)
    • Make backup table name configurable during migration (#8227)
    • Add write and SQL options to override compression config (#8313)
    • Correct partition transform functions to match the spec (#8192)
    • Enable extra commit properties with metadata delete (#7649)
  • Flink
    • Add possibility of ordering the splits based on the file sequence number (#7661)
    • Fix serialization in TableSink with anonymous object (#7866)
    • Switch to FileScanTaskParser for JSON serialization of IcebergSourceSplit (#7978)
    • Custom partitioner for bucket partitions (#7161)
    • Implement data statistics coordinator to aggregate data statistics from operator subtasks (#7360)
    • Support alter table column (#7628)
  • Parquet
    • Add encryption config to read and write builders (#2639)
    • Skip writing bloom filters for deletes (#7617)
    • Cache codecs by name and level (#8182)
    • Fix decimal data reading from ParquetAvroValueReaders (#8246)
    • Handle filters with transforms by assuming data must be scanned (#8243)
  • ORC
    • Handle filters with transforms by assuming the filter matches (#8244)
  • Vendor Integrations
    • GCP: Fix single byte read in GCSInputStream (#8071)
    • GCP: Add properties for OAtuh2 and update library (#8073)
    • GCP: Add prefix and bulk operations to GCSFileIO (#8168)
    • GCP: Add bundle jar for GCP-related dependencies (#8231)
    • GCP: Add range reads to GCSInputStream (#8301)
    • AWS: Add bundle jar for AWS-related dependencies (#8261)
    • AWS: support config storage class for S3FileIO (#8154)
    • AWS: Add FileIO tracker/closer to Glue catalog (#8315)
    • AWS: Update S3 signer spec to allow an optional string body in S3SignRequest (#8361)
    • Azure: Add FileIO that supports ADLSv2 storage (#8303)
    • Azure: Make ADLSFileIO implement DelegateFileIO (#8563)
    • Nessie: Provide better commit message on table registration (#8385)
  • Dependencies
    • Bump Nessie to 0.71.0
    • Bump ORC to 1.9.1
    • Bump Arrow to 12.0.1
    • Bump AWS Java SDK to 2.20.131