TiSpark Changelog

All notable changes to this project will be documented in this file.

[TiSpark 3.2.0] 2023-01-09

New Features

Normalize the Java client in TiSpark and use the official client-java #2491
Compatible with TiDB v6.5.0 #2598
Support writing into the table with auto random primary key #2545
Support follower read #2546
Support writing into partition table with new collations #2565
Support partition pruning with to_days function when read from TiKV/TiFlash #2593
Support reading TiFlash load balancing with Round-Robin strategy client-java #662

Fixes

Bump Spark version from 3.0.2 to 3.0.3, from 3.1.1 to 3.1.3, from 3.2.1 to 3.2.3, from 3.3.0 to 3.3.1 #2544 #2607
Fix break change in partition read. For example, to_days is not supported #2552
Fix exception will be thrown when we insert into a table partitioned by year and the first col of the table is not date type #2554
Fix the bug that cluster index can not be used if the clustered index is not the Integer type #2560
Fix the bug that CBO may not choose min cost between TiKV table scan, TiKV index scan, and TiFlash scan #2563
Fix the bug that statistics are not collected which may affect the choice of the plan #2578
Fix the bug that partition pruning fails with the uppercase column #2593

Documents

Optimize user guide and dev guide #2533

[TiSpark 3.1.2] 2022-12-12

New Features

Support write into partition table with new collations #2570
Support read TiFlash load balancing with Round-Robin strategy #2576
Support partition pruning with to_days function when read from TiKV/TiFlash #2594 #2600

Fixes

Fix CBO to let TiSpark choose the min cost between TiKV table scan, TiKV index scan and TiFlash scan correctly #2568
Fix the region not find error when reading from TiKV #2575
Fix the bug that statistics is not collected which may affect the choice of plans #2589
Compatible with TiDB v6.5.0 #2602

[TiSpark 3.1.1] 2022-09-23

Fixes

Fix fail to read from some partition table #2553
Bump spark version to avoid vulnerabilities #2555
Fix error will be thrown when date col is not the first col in hash partition table #2556

[TiSpark 3.1.0] 2022-09-08

Compatibility Changes

We will not provide the mysql-connector-java dependency because of the limit of the GPL license #2457

New Features

Add authorization check for datasource api #2366
Make TiSpark's Explain clearer and easier to read #2439
Support host mapping in TiSpark #2436
Support bypass-TiDB write into partition table #2451
Support insert sql #2471
Support Spark 3.3 #2492
Only do auth check for tables in TiDB #2489
Support new Collation #2524

Fixes

Fix when TiDB has more than 10,000 tables in one Database, TiSpark may throw table not found exceptions #2433
Fix the bug that count/avg can not push down #2445
Fix the bug that when the primary key is not integer type, the two rows with null unique index will conflict. And the bug that when the unique index conflicts, the conflicting unique index column cannot be deleted correctly #2455
Fix the bug that exception would through when the size of pdAddresse is > 1 #2473
Fix the bug that Count(bit) should not be pushed down before TiKV 6.0.0 #2476
Upgraded Spark 3.1 support version from 3.0.2 to 3.0.3, upgraded Spark 3.1 support version from 3.1.1 to 3.1.3, upgraded Spark 3.2 support version from 3.2.1 to 3.2.2 #2486
Fix the bug that exception will be throw when date col is not the first col ref #2538

Documents

TiSpark Development Guide #2497

[TiSpark 3.0.2] 2022-08-29

Compatibility Changes

We will not provide the mysql-connector-java dependency because of the limit of the GPL license #2460

Fixes

Fix the bug that single column condition is in the incorrect if branch #2395
Fix when TiDB has more than 10,000 tables in one Database, TiSpark may throw Table not found exceptions #2440
Fix the bug that count/avg can not push down #2470
Fix the bug that when the primary key is not integer type, the two rows with null unique index will conflict and the bug that when the unique index conflicts, the conflicting unique index column cannot be deleted correctly #2515
Fix exception would through when the size of pdAddresse is > 1 #2478
Fix the bug that Count(bit) should not be pushed down before TiKV 6.0.0 #2485
Upgraded Spark 3.1 support version from 3.0.2 to 3.0.3, upgraded Spark 3.1 support version from 3.1.1 to 3.1.3, upgraded Spark3.2 support version from 3.2.1 to 3.2.2 #2488
Only do auth check for tables in TiDB #2500
Changed profile #2518

[TiSpark 3.0.1] 2022-06-23

Fixes

Fix the bug that the single column condition is in the incorrect if branch #2395
Fix the bug that the spark-shell is stuck when exiting #2402
Fix the bug that the maven dependency of TiSpark 3.0.0 can not be imported #2401
Shutdown recycler when closing ChannelFactory to avoid resource leak #2405

[TiSpark 3.0.0] 2022-06-15

Compatibility Changes

TiSpark without catalog plugin is no more supported. You must configure catalog configs and use tidb_catalog now #2252
TiSpark's jar has a new naming rule like tispark-assembly-{$spark_version}_{$scala_version}-{$tispark_verison} #2370

New Feature

Support DELETE statement #2276
Support Spark 3.2 #2287
Support telemetry to collect information #2316
Support stale read to read historical versions of data #2322
Support TLS with reload capability #2306 #2349 #2365 #2377

Fixes

Fix the wrong result of _tidb_rowid when set spark.tispark.show_rowid=true #2270
Fix sum not push down bug #2314
Fix limit not push down bug #2329
Avoid NoSuchElementException when setting catalog #2220
Avoid ClassCastException when cluster index with type Timestamp and Date #2319
Improve retry logic in write so that it will not retry in some scenarios which needn't retry #2279
Delete unused configuration spark.tispark.statistics.auto_load #2300
Upgrade jackson-databind from 2.9.10.8 to 2.12.6.1 #2285
Upgrade guava from 26.0-android to 29.0-android #2340
Upgrade mysql-connector-java from 5.1.44 to 5.1.49 #2367

Documents

Update communication channels #2228
Add limitation: new collations are not supported #2238

[TiSpark 2.5.2] 2022-08-30

Compatibility Changes

We will not provide the mysql-connector-java dependency because of the limit of the GPL license #2461

Fixes

Fix the bug that single column condition is incorrect if branch #2394
Fix when TiDB has more than 10,000 tables in one Database, TiSpark may throw Table not found exceptions #2441
Fix the bug that count/avg can not push down #2469
Fix the bug that when the primary key is not integer type, the two rows with null unique index will conflict and the bug that when the unique index conflicts, the conflicting unique index column cannot be deleted correctly #2516
Fix exception would through when the size of pdAddresse is > 1 #2477
Fix the bug that Count(bit) should not be pushed down before TiKV 6.0.0 #2484
Upgraded Spark 3.0 support version from 3.0.2 to 3.0.3, upgraded Spark 3.1 support version from 3.1.1 to 3.1.3 #2487
Only do auth check for tables in TiDB #2502
Change spark profile #2517

[TiSpark 2.5.1] 2022-05-16

Fixes

Fix limit not push down bug #2335
Fix ClassCastException when cluster index with type Timestamp and Date #2323
Upgrade jackson-databind from 2.9.10.8 to 2.12.6.1 #2288
Fix the wrong result of _tidb_rowid #2278
Fix set catalog throw NoSuchElementException #2254

Documents

Add limitation: TLS is not supported #2281
Add limitation: new collations are not supported #2251
Update communication channels #2244

[TiSpark 2.5.0] 2022-01-25

New feature

Support Spark 3.1.x and 3.0.x version.
Support Spark SQL's authentication and authorization through TiDB #2185.

Fixes

Fix duplicate range in RowIDAllocator #2156.
upgrade log4j to 2.17.1 #2197.
upgrade jackson-databind to 2.9.10.8 #2161.

Documents

Fix in user docs forget to use options while using BatchWrite #2169

Known issue

Spark SQL's authentication and authorization through TiDB will affect other datasource such as hive. See details in this issue #2224.
Batch write needs more tests and may be unstable. See details in this issue #2222.

[TiSpark 2.4.3] 2022-01-25

New feature

scala 2.11 and 2.12 are both supported

Fixes

Fix duplicate range in RowIDAllocator #2208.
upgrade log4j to 2.17.1 #2209.
upgrade jackson-databind to 2.9.10.8 #2209.

Documents

Fix in user docs forget to use options while using BatchWrite #2171.

Known issue

Batch write needs more tests and may be unstable. See details in this issue #2222.

[TiSpark 2.4.2] 2021-11-02

Since TiSpark v2.4.2, Only Scala v2.12 is supported.

New feature

BatchWrite: support clustered index (#2060) by @marsishandsome in #2079.
Update to support scala-2.12 by @birdstorm in #2149.
Update spark version info for v2.4 by @birdstorm in #2155.

Fixes

Fix region manager race (#1987) by @ti-srebot in #2092.
Fix ResolveLock not called in Scan API (#2089) by @ti-srebot in #2091.
Fix region split: region split key should be a valid key by @marsishandsome in #2101.
Optimize region cache miss (#2106) by @ti-srebot in #2107.
Fix catalog test (#2110) by @ti-srebot in #2113.
Fix channel shut-downed due to ping failure (#2120) by @ti-srebot in #2122.
Close TTLManager before creating a new one (#2114) by @ti-srebot in #2118.
Close PDClient without closing channel factory (#2123) by @ti-srebot in #2125.
Fix DateTime shift before the year 1582 (#2104) by @birdstorm in #2116.
Fix enum with trailing spaces (#2127) by @ti-srebot in #2134.
Fix not leader error due to region not invalidated (#2135) by @ti-srebot in #2138.
Fix BatchGet StackOverFlow Error (#2126) by @ti-srebot in #2140.

[TiSpark 2.3.16] 2021-05-17

Fixes

Change PARTITION_PER_SPLIT default value from 1 to 10 #2029
Fix array index out of bound when encoding #2041
fix batch write missing unpersist #2046
Fix bug: write decimal(16, 4) using batch write and read using tiflash #2055
Fix mistake setting current_ts to 0 #2067
Filter views in catalog #2071
BatchWrite: add argument deduplicate #2073

[TiSpark 2.4.1] 2021-05-17

Fixes

Fix bug: write decimal(16, 4) using batch write and read using tiflash #2055
Fix mistake setting current_ts to 0 #2067
Filter views in catalog #2071
BatchWrite: add argument deduplicate #2073

[TiSpark 2.4.0] 2021-04-15

New Features

Support TiDB-5.0

[TiSpark 2.3.14] 2021-04-12

Fixes

Fix GregorianCalendar #1952
Fix tiflash timestamp pushdown bug #1966
Do not select tidb_row_format_version if tidb is 3.x #1993
Update table statistics after batch write #2009
Fix oom cause missing data bug #2010

[TiSpark 2.3.13] 2021-03-05

Fixes

Disable index scan on partition table #1923
Fix jackson-databind #1910
Fix BatchGet retry #1907

[TiSpark 2.3.12] 2021-02-01

Fixes

Add retry in RangeSplitter #1886
Fix TiDAGRequest concurrent visit #1893
Fix partition table with partition column name date #1892

[TiSpark 2.3.11] 2020-12-17

Fixes

Fix the resolved txn status cache for pessimistic txn #1868
Fix overflow when convert big MyDecimal to BigDecimal #1869
Add retry in RowIDAllocator #1856
Fix select count where values contains null #1841
Do not retry for some key error #1799
Disable year type index scan #1790

New Features

Support auto_random column #1838
Support log desensitization #1805
Support insert null on auto increment column #1786

[TiSpark 2.3.10] 2020-11-06

Fixes

Fix maven-shade-plugin relocation cannot handle folder name with dot #1737
BatchWrite: add parameter maxRegionSplitNum #1729
Fix string insert overflow check #1727

[TiSpark 2.3.9] 2020-10-31

Fixes

Revert "fix TiSession Close bug" #1713
Fix typo in DateTime calculation #1710

[TiSpark 2.3.8] 2020-10-30

Fixes

Fix negative column id when there are more than 128 columns #1673
Fix incorrect date type value when timezone is set #1704
BatchWrite: fix auto increment column not primary key #1694
Fix TiSession Close bug #1677
Optimize region split number in small data size #1681

[TiSpark 2.3.7] 2020-10-27

Fixes

BatchWrite: enable user provided auto increment value in update mode #1658
BatchWrite: fix initial ttl expired #1660
BatchWrite: fix commit ts expired #1661

[TiSpark 2.3.6] 2020-10-16

Fixes

Fix covering index error when alter-primary-key is enabled #1640
Fix alter primary key for write #1644

[TiSpark 2.3.5] 2020-09-25

Fixes

Fix scala match error when predicate includes boolean type conversion #1620
Fix encoder v2 with type conversion and set default batch write version to v2 #1623
Filter sequence table from catalog #1627
Fix batch write failed because of MetaCodec.hashGet failed #1636

[TiSpark 2.3.4] 2020-09-15

Fixes

Filter Tombstone store #1582
Increase gRPC response default limit size to 2G #1597
Update txnPrewriteBatchSize & txnCommitBatchSize default value to 16K #1599
Update writeThreadPerTask default value to 2 #1600
Fix multiple database scan test bug #1603
Update grpc version to 1.29.0 #1609
BatchWrite: add parameter maxWriteTaskNumber #1616

[TiSpark 2.3.3] 2020-09-02

Fixes

Optimize scatter region logic #1582
Set regionSplitUsingSize default value to true #1584
User provided auto increment value is not supported #1586
Fix partition table on tiflash #1588
Use ShardRowIDBits to generate row id #1585

[TiSpark 2.3.2] 2020-08-21

Fixes

Fix batch write bug and optimize batch write execution #1562
Fix columnar batch execution #1566
Fix covering index generates incorrect plan when first column is not included in index #1573
Fix pytispark #1557

[TiSpark 2.3.1] 2020-07-08

Fixes

Ignore get version sql exception #1516

[TiSpark 2.3.0] 2020-07-06

New Features

Support working with TiDB-3.1 and TiDB-4.0
Support working with TiFlash
Support directly writing data to TiKV

Fixes

Support new row format encoding #1492
Fix test: add TiFlash replica avaiable check #1495
Fix string type pushdown #1500

[TiSpark 2.3.0-rc.1] 2020-06-22

Fixes

Fix column name case bug #1487
Fix resolve current lock npe bug #1485
Fix resolve lock bug with tiflash #1483

[TiSpark 2.3.0-rc] 2020-06-05

New Features

Support working with TiDB-3.1 and TiDB-4.0
Support working with TiFlash
Support directly writing data to TiKV

[TiSpark 2.1.9] 2020-05-11

Fixes

Fix desc temp view #1328
Fix prefix index on blob #1334
Shade io.opencensus to resolve grpc conflict #1352
Fix parition table isn't shown in show command #1374
Fix partition pruning when partition definition contains big integer #1385
Support TiDB-4.0 #1398

[TiSpark 2.1.8] 2019-12-12

Fixes

Fix UnsupportedOperationException: using stream rather than removeIf #1303

[TiSpark 2.1.7] 2019-12-09

Fixes

Add task retry if tikv is down #1207
Fix output offsets: add field type to Constant and ColumnRef when encoding proto #1231
Register udf ti_version for every sparksession #1258
Add timezone check #1275
Disable set, enum and bit pushed down #1242

[TiSpark 2.1.6] 2019-11-08

Fixes

Fix TopN push down bug #1185
Consider nulls order in TopN pushdown #1187
Fix Stack Overflow Error when reading from partition table #1179
Fix parsing view table's json bug #1174
Fix No Matching column bug #1162
Fix behavior of estimateTableSize #845
Fix Bit Type default value bug #1148
Fix fastxml security alert #1127
Fix bug: TiSpark Catalog has 10-20s delay #1108
Fix reading data from TiDB in Spark Structured Streaming #1104

[TiSpark 2.1.5] 2019-09-02

Fixes

Remove useless scala and jackson dependencies #1079
Fix range partition throw UnsupportedSyntaxException error #1088
Make TiSpark reading data from a hash partition table #1089

[TiSpark 2.2.0] 2019-08-30

New Features

Natively support writing data to TiKV (ACID) using Spark Data Source API

WARNING

DO NOT set spark.tispark.write.without_lock_table to true on production environment (you may lost data).

Improvements

Release one TiSpark jar (both support Spark-2.3.x and Spark-2.4.x) instead of two #933
Add spark version to TiSpark's udf ti_version #943
Bump grpc to 1.17 #982
Add retry mechanism for batchGet #986

Fixes

Catch UnsupportedSyntaxException when generating partition expressions #960
Fix TiSpark cannot read from a hash partition table #966
Prohibit extra index data type pushdown when doing index scan to avoid decoding extra column #995
Prohibit agg or groupby pushdown on double read #1004

[TiSpark 2.1.4] 2019-08-27

Fixes

Fix distinct without alias bug: disable pushdown aggregate with alias #1055
Fix reflection bug: pass in different arguments for different version of same function #1037

[TiSpark 2.1.3] 2019-08-15

Fixes

Fix cost model in table scan #1023
Fix index scan bug #1024
Prohibit aggregate or group by pushdown on double read #1027
Fix reflection bug for HDP release #1017
Fix scala compiler version #1019

[TiSpark 2.1.2] 2019-07-29

Fixes

Fix improper response with region error #922
Fix view parsing problem #953

[TiSpark 1.2.1]

Fixes

Fix count error, if advanceNextResponse is empty, we should read next region (#899)
Use fixed version of proto (#898)

[TiSpark 2.1.1]

Fixes

Add TiDB/TiKV/PD version and Spark version supported for each latest major release (#804) (#887)
Fix incorrect timestamp of tidbMapDatabase (#862) (#885)
Fix column size estimation (#858) (#884)
Fix count error, if advanceNextResponse is empty, we should read next region (#878) (#882)
Use fixed version of proto instead of master branch (#843) (#850)

[TiSpark 2.1]

Features

Support range partition pruning (Beta) (#599)
Support show columns command (#614)

Fixes

Fix build key ranges with xor expression (#576)
Fix cannot initialize pd if using ipv6 address (#587)
Fix default value bug (#596)
Fix possible IndexOutOfBoundException in KeyUtils (#597)
Fix outputOffset is incorrect when building DAGRequest (#615)
Fix incorrect implementation of Key.next() (#648)
Fix partition parser can't parser numerical value 0 (#651)
Fix prefix length may be larger than the value used. (#668)
Fix retry logic when scan meet lock (#666)
Fix inconsistent timestamp (#676)
Fix tempView may be unresolved when applying timestamp to plan (#690)
Fix concurrent DAGRequest issue (#714)
Fix downgrade scan logic (#725)
Fix integer type default value should be parsed to long (#741)
Fix index scan on partition table (#735)
Fix KeyNotInRegion may occur when retrieving rows by handle (#755)
Fix encode value long max (#761)
Fix MatchErrorException may occur when Unsigned BigInt contains in group by columns (#780)
Fix IndexOutOfBoundException when trying to get pd member (#788)

[TiSpark 2.0]

Features

Work with Spark 2.3
Support use $database statement
Support show databases statement
Support show tables statement
No need to use TiContext.mapTiDBDatabase, use $database.$table to identify a table instead
Support data type SET and ENUM
Support data type YEAR
Support data type TIME
Support isolation level settings
Support describe table command
Support cache tables and uncache tables
Support read from a TiDB partition table
Support use TiDB as metastore

Fixes

Fix JSON parsing (#491)
Fix count on empty table (#498)
Fix ScanIterator unable to read from adjacent empty regions (#519)
Fix possible NullPointerException when setting show_row_id true (#522)

Improved

Make ti version usable without selecting database (#545)

[TiSpark 1.2]

Fixes

Fixes compatibility with PDServer #480

[TiSpark 1.1]

Fixes multiple bugs:

Fix daylight saving time (DST) (#347)
Fix count(1) result is always 0 if subquery contains limit (#346)
Fix incorrect totalRowCount calculation (#353)
Fix request fail with Key not in region after retrying NotLeaderError (#354)
Fix ScanIterator logic where index may be out of bound (#357)
Fix tispark-sql dbName (#379)
Fix StoreNotMatch (#396)
Fix utf8 prefix index (#400)
Fix decimal decoding (#401)
Refactor not leader logic (#412)
Fix global temp view not visible in thriftserver (#437)

Adds:

Allow TiSpark retrieve row id (#367)
Decode json to string (#417)

Improvements:

Improve PD connection issue's error log (#388)
Add DB prefix option for TiDB tables (#416)

[TiSpark 1.0.1]

Fix unsigned index
Compatible with TiDB before and since 48a42f

[TiSpark 1.0 GA]

New Features

TiSpark provides distributed computing of TiDB data using Apache Spark.

Provide a gRPC communication framework to read data from TiKV
Provide encoding and decoding of TiKV component data and communication protocol
Provide calculation pushdown, which includes:
- Aggregate pushdown
- Predicate pushdown
- TopN pushdown
- Limit pushdown
Provide index related support
- Transform predicate into Region key range or secondary index
- Optimize Index Only queries
- Adaptive downgrade index scan to table scan per region
Provide cost-based optimization
- Support statistics
- Select index
- Estimate broadcast table cost
Provide support for multiple Spark interfaces
- Support Spark Shell
- Support ThriftServer/JDBC
- Support Spark-SQL interaction
- Support PySpark Shell
- Support SparkR

Files

CHANGELOG.md

Latest commit

History

CHANGELOG.md

File metadata and controls

TiSpark Changelog

[TiSpark 3.2.0] 2023-01-09

New Features

Fixes

Documents

[TiSpark 3.1.2] 2022-12-12

New Features

Fixes

[TiSpark 3.1.1] 2022-09-23

Fixes

[TiSpark 3.1.0] 2022-09-08

Compatibility Changes

New Features

Fixes

Documents

[TiSpark 3.0.2] 2022-08-29

Compatibility Changes

Fixes

[TiSpark 3.0.1] 2022-06-23

Fixes

[TiSpark 3.0.0] 2022-06-15

Compatibility Changes

New Feature

Fixes

Documents

[TiSpark 2.5.2] 2022-08-30

Compatibility Changes

Fixes

[TiSpark 2.5.1] 2022-05-16

Fixes

Documents

[TiSpark 2.5.0] 2022-01-25

New feature

Fixes

Documents

Known issue

[TiSpark 2.4.3] 2022-01-25

New feature

Fixes

Documents

Known issue

[TiSpark 2.4.2] 2021-11-02

New feature

Fixes

[TiSpark 2.3.16] 2021-05-17

Fixes

[TiSpark 2.4.1] 2021-05-17

Fixes

[TiSpark 2.4.0] 2021-04-15

New Features

[TiSpark 2.3.14] 2021-04-12

Fixes

[TiSpark 2.3.13] 2021-03-05

Fixes

[TiSpark 2.3.12] 2021-02-01

Fixes

[TiSpark 2.3.11] 2020-12-17

Fixes

New Features

[TiSpark 2.3.10] 2020-11-06

Fixes

[TiSpark 2.3.9] 2020-10-31

Fixes

[TiSpark 2.3.8] 2020-10-30

Fixes

[TiSpark 2.3.7] 2020-10-27

Fixes

[TiSpark 2.3.6] 2020-10-16

Fixes

[TiSpark 2.3.5] 2020-09-25

Fixes

[TiSpark 2.3.4] 2020-09-15

Fixes

[TiSpark 2.3.3] 2020-09-02