You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I could not find a way to start the streaming job from where it left off previously in the kinesis stream even with using Spark streaming's "checkpointLocation" option.
TRIM_HORIZON starts all the way from the beginning and LATEST starts from the current timestamp. Let's say I stop (or restarts due to an error) the streaming job, how to restart from the offset where it left off? I believe this is possible with KCL 2.0 but "kinesis-sql" library only supports KCL 1.0. Is there a way to achieve this or not possible at all until we move to KCL 2.0?
Thanks,
Prashanth
The text was updated successfully, but these errors were encountered:
I sent an email to the maintainer of this repo, and didn't get a response. For now, I've forked the project here https://github.com/roncemer/kinesis-spark-connector and updated it to build for Spark 3.2.1. Under Spark 3.2.1, it appears to be working correctly. I allowed it to go overnight with no new records being posted to the Kinesis data stream. When I started posting records again, the records arrived in Spark, and were processed.
I issued pull request https://github.com/qubole/kinesis-sql/pull/113/files from my forked repo back to the original qubole/kinesis-sql repo. Be sure to read the full description under the Comments tab as well.
I could use the help of anyone who might be interested in this project. Apparently, qubole/kinesis-sql is abandoned for about two years, and the main guy who was maintaining it doesn't respond to emails. If anyone can get in touch with someone who has control of this repo, please ask them to make me a committer. Barring that, I will have to publish the jar file from my forked repo as a maven resource. In any case, if I end up maintaining this project, either the original or forked repo, I will need volunteers to help handle the builds each time a new version of Spark is released.
In the mean time, feel free to build your own jar from my forked repo and run it under Spark 3.2.1. Also, let me know if it works under 3.3.x, or if you run into any other problems.
Hi,
I could not find a way to start the streaming job from where it left off previously in the kinesis stream even with using Spark streaming's "checkpointLocation" option.
TRIM_HORIZON starts all the way from the beginning and LATEST starts from the current timestamp. Let's say I stop (or restarts due to an error) the streaming job, how to restart from the offset where it left off? I believe this is possible with KCL 2.0 but "kinesis-sql" library only supports KCL 1.0. Is there a way to achieve this or not possible at all until we move to KCL 2.0?
Thanks,
Prashanth
The text was updated successfully, but these errors were encountered: