Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG] StarRocksSink 重试多次后死锁 #377

Open
songpinru opened this issue Jul 18, 2024 · 0 comments
Open

[BUG] StarRocksSink 重试多次后死锁 #377

songpinru opened this issue Jul 18, 2024 · 0 comments

Comments

@songpinru
Copy link

songpinru commented Jul 18, 2024

使用Starrocks遇到一个问题:
flink写入SR,某段时间SR出现故障,不能写入,flink sink重试3次依然,预期flink此时应该报错重启或者挂掉,但是发现flink正常运行,不再继续写入SR,也不再继续读取上游数据,陷入僵死状态。

任务信息:
没有开启checkpoint,sink.properties.format=json,其余配置皆为默认配置

日志如下:
ps:当时没有保留日志,使用另一个程序的日志代替,同样是retry 3次依然失败,flink任务没有报错

2024-07-02 10:43:16,953 INFO  com.starrocks.connector.flink.manager.StarRocksStreamLoadVisitor [] - Start to join batch data: label[d0442073-7e85-48f1-8f47-aa3cab7ad15f].
2024-07-02 10:43:16,953 INFO  com.starrocks.connector.flink.manager.StarRocksStreamLoadVisitor [] - Executing stream load to: 'http://fe-c-ea043025e91a9d66-internal.starrocks.aliyuncs.com:8030/api/social/ods_social_ks_json/_stream_load', size: '78500', thread: 64
2024-07-02 10:43:16,974 WARN  com.starrocks.connector.flink.manager.StarRocksSinkManager   [] - Failed to flush batch data to StarRocks, retry times = 0
com.starrocks.connector.flink.manager.StarRocksStreamLoadFailedException: Failed to flush data to StarRocks, Error response: 
{"Status":"Fail","BeginTxnTimeMs":0,"Message":"Failed to parse json as array. error: Within strings, some characters must be escaped, we found unescaped characters","NumberUnselectedRows":0,"CommitAndPublishTimeMs":0,"Label":"d0442073-7e85-48f1-8f47-aa3cab7ad15f","LoadBytes":78500,"StreamLoadPlanTimeMs":1,"NumberTotalRows":0,"WriteDataTimeMs":9,"TxnId":6546603,"LoadTimeMs":10,"ReadDataTimeMs":0,"NumberLoadedRows":0,"NumberFilteredRows":0}
{}

	at com.starrocks.connector.flink.manager.StarRocksStreamLoadVisitor.doStreamLoad(StarRocksStreamLoadVisitor.java:116) ~[blob_p-895cea66edfff125d480c2434207bf2b18c87e89-a185b9cd92a59348133d3a2090b20e01:?]
	at com.starrocks.connector.flink.manager.StarRocksSinkManager.asyncFlush(StarRocksSinkManager.java:340) ~[blob_p-895cea66edfff125d480c2434207bf2b18c87e89-a185b9cd92a59348133d3a2090b20e01:?]
	at com.starrocks.connector.flink.manager.StarRocksSinkManager.lambda$startAsyncFlushing$0(StarRocksSinkManager.java:174) ~[blob_p-895cea66edfff125d480c2434207bf2b18c87e89-a185b9cd92a59348133d3a2090b20e01:?]
	at java.lang.Thread.run(Thread.java:750) [?:1.8.0_322]
2024-07-02 10:43:17,976 INFO  com.starrocks.connector.flink.manager.StarRocksStreamLoadVisitor [] - Start to join batch data: label[d0442073-7e85-48f1-8f47-aa3cab7ad15f].
2024-07-02 10:43:17,976 INFO  com.starrocks.connector.flink.manager.StarRocksStreamLoadVisitor [] - Executing stream load to: 'http://fe-c-ea043025e91a9d66-internal.starrocks.aliyuncs.com:8030/api/social/ods_social_ks_json/_stream_load', size: '78500', thread: 64
2024-07-02 10:43:17,994 WARN  com.starrocks.connector.flink.manager.StarRocksSinkManager   [] - Failed to flush batch data to StarRocks, retry times = 1
com.starrocks.connector.flink.manager.StarRocksStreamLoadFailedException: Failed to flush data to StarRocks, Error response: 
{"Status":"Fail","BeginTxnTimeMs":0,"Message":"Failed to parse json as array. error: Within strings, some characters must be escaped, we found unescaped characters","NumberUnselectedRows":0,"CommitAndPublishTimeMs":0,"Label":"d0442073-7e85-48f1-8f47-aa3cab7ad15f","LoadBytes":78500,"StreamLoadPlanTimeMs":0,"NumberTotalRows":0,"WriteDataTimeMs":7,"TxnId":6546605,"LoadTimeMs":8,"ReadDataTimeMs":0,"NumberLoadedRows":0,"NumberFilteredRows":0}
{}

	at com.starrocks.connector.flink.manager.StarRocksStreamLoadVisitor.doStreamLoad(StarRocksStreamLoadVisitor.java:116) ~[blob_p-895cea66edfff125d480c2434207bf2b18c87e89-a185b9cd92a59348133d3a2090b20e01:?]
	at com.starrocks.connector.flink.manager.StarRocksSinkManager.asyncFlush(StarRocksSinkManager.java:340) ~[blob_p-895cea66edfff125d480c2434207bf2b18c87e89-a185b9cd92a59348133d3a2090b20e01:?]
	at com.starrocks.connector.flink.manager.StarRocksSinkManager.lambda$startAsyncFlushing$0(StarRocksSinkManager.java:174) ~[blob_p-895cea66edfff125d480c2434207bf2b18c87e89-a185b9cd92a59348133d3a2090b20e01:?]
	at java.lang.Thread.run(Thread.java:750) [?:1.8.0_322]
2024-07-02 10:43:18,011 INFO  com.starrocks.connector.flink.manager.StarRocksSinkManager   [] - StarRocks interval Sinking triggered.
2024-07-02 10:43:18,791 INFO  com.starrocks.connector.flink.manager.StarRocksSinkManager   [] - StarRocks interval Sinking triggered.
2024-07-02 10:43:18,807 INFO  com.starrocks.connector.flink.manager.StarRocksSinkManager   [] - StarRocks interval Sinking triggered.
2024-07-02 10:43:18,807 INFO  com.starrocks.connector.flink.manager.StarRocksSinkManager   [] - Async stream load: db[social] table[ods_social_ks_comment] rows[50] bytes[26159] label[f71b437e-762e-4b90-ba05-b09d695a69fc].
2024-07-02 10:43:18,809 INFO  com.starrocks.connector.flink.manager.StarRocksStreamLoadVisitor [] - Start to join batch data: label[f71b437e-762e-4b90-ba05-b09d695a69fc].
2024-07-02 10:43:18,809 INFO  com.starrocks.connector.flink.manager.StarRocksStreamLoadVisitor [] - Executing stream load to: 'http://fe-c-ea043025e91a9d66-internal.starrocks.aliyuncs.com:8030/api/social/ods_social_ks_comment/_stream_load', size: '26210', thread: 63
2024-07-02 10:43:18,935 INFO  com.starrocks.connector.flink.manager.StarRocksSinkManager   [] - Async stream load finished: label[f71b437e-762e-4b90-ba05-b09d695a69fc].
2024-07-02 10:43:19,997 INFO  com.starrocks.connector.flink.manager.StarRocksStreamLoadVisitor [] - Start to join batch data: label[d0442073-7e85-48f1-8f47-aa3cab7ad15f].
2024-07-02 10:43:19,997 INFO  com.starrocks.connector.flink.manager.StarRocksStreamLoadVisitor [] - Executing stream load to: 'http://fe-c-ea043025e91a9d66-internal.starrocks.aliyuncs.com:8030/api/social/ods_social_ks_json/_stream_load', size: '78500', thread: 64
2024-07-02 10:43:20,011 INFO  com.starrocks.connector.flink.manager.StarRocksSinkManager   [] - StarRocks interval Sinking triggered.
2024-07-02 10:43:20,016 WARN  com.starrocks.connector.flink.manager.StarRocksSinkManager   [] - Failed to flush batch data to StarRocks, retry times = 2
com.starrocks.connector.flink.manager.StarRocksStreamLoadFailedException: Failed to flush data to StarRocks, Error response: 
{"Status":"Fail","BeginTxnTimeMs":0,"Message":"Failed to parse json as array. error: Within strings, some characters must be escaped, we found unescaped characters","NumberUnselectedRows":0,"CommitAndPublishTimeMs":0,"Label":"d0442073-7e85-48f1-8f47-aa3cab7ad15f","LoadBytes":78500,"StreamLoadPlanTimeMs":0,"NumberTotalRows":0,"WriteDataTimeMs":7,"TxnId":6546609,"LoadTimeMs":8,"ReadDataTimeMs":0,"NumberLoadedRows":0,"NumberFilteredRows":0}
{}

	at com.starrocks.connector.flink.manager.StarRocksStreamLoadVisitor.doStreamLoad(StarRocksStreamLoadVisitor.java:116) ~[blob_p-895cea66edfff125d480c2434207bf2b18c87e89-a185b9cd92a59348133d3a2090b20e01:?]
	at com.starrocks.connector.flink.manager.StarRocksSinkManager.asyncFlush(StarRocksSinkManager.java:340) ~[blob_p-895cea66edfff125d480c2434207bf2b18c87e89-a185b9cd92a59348133d3a2090b20e01:?]
	at com.starrocks.connector.flink.manager.StarRocksSinkManager.lambda$startAsyncFlushing$0(StarRocksSinkManager.java:174) ~[blob_p-895cea66edfff125d480c2434207bf2b18c87e89-a185b9cd92a59348133d3a2090b20e01:?]
	at java.lang.Thread.run(Thread.java:750) [?:1.8.0_322]
2024-07-02 10:43:20,791 INFO  com.starrocks.connector.flink.manager.StarRocksSinkManager   [] - StarRocks interval Sinking triggered.
2024-07-02 10:43:20,935 INFO  com.starrocks.connector.flink.manager.StarRocksSinkManager   [] - StarRocks interval Sinking triggered.
2024-07-02 10:43:20,935 INFO  com.starrocks.connector.flink.manager.StarRocksSinkManager   [] - Async stream load: db[social] table[ods_social_ks_comment] rows[132] bytes[68885] label[d0521fc5-89dc-4d96-9b5c-7a85457fb342].
2024-07-02 10:43:20,937 INFO  com.starrocks.connector.flink.manager.StarRocksStreamLoadVisitor [] - Start to join batch data: label[d0521fc5-89dc-4d96-9b5c-7a85457fb342].
2024-07-02 10:43:20,941 INFO  com.starrocks.connector.flink.manager.StarRocksStreamLoadVisitor [] - Executing stream load to: 'http://fe-c-ea043025e91a9d66-internal.starrocks.aliyuncs.com:8030/api/social/ods_social_ks_comment/_stream_load', size: '69018', thread: 63
2024-07-02 10:43:21,177 INFO  com.starrocks.connector.flink.manager.StarRocksSinkManager   [] - Async stream load finished: label[d0521fc5-89dc-4d96-9b5c-7a85457fb342].
2024-07-02 10:43:22,011 INFO  com.starrocks.connector.flink.manager.StarRocksSinkManager   [] - StarRocks interval Sinking triggered.
2024-07-02 10:43:22,791 INFO  com.starrocks.connector.flink.manager.StarRocksSinkManager   [] - StarRocks interval Sinking triggered.
2024-07-02 10:43:23,018 INFO  com.starrocks.connector.flink.manager.StarRocksStreamLoadVisitor [] - Start to join batch data: label[d0442073-7e85-48f1-8f47-aa3cab7ad15f].
2024-07-02 10:43:23,018 INFO  com.starrocks.connector.flink.manager.StarRocksStreamLoadVisitor [] - Executing stream load to: 'http://fe-c-ea043025e91a9d66-internal.starrocks.aliyuncs.com:8030/api/social/ods_social_ks_json/_stream_load', size: '78500', thread: 64
2024-07-02 10:43:23,036 WARN  com.starrocks.connector.flink.manager.StarRocksSinkManager   [] - Failed to flush batch data to StarRocks, retry times = 3
com.starrocks.connector.flink.manager.StarRocksStreamLoadFailedException: Failed to flush data to StarRocks, Error response: 
{"Status":"Fail","BeginTxnTimeMs":0,"Message":"Failed to parse json as array. error: Within strings, some characters must be escaped, we found unescaped characters","NumberUnselectedRows":0,"CommitAndPublishTimeMs":0,"Label":"d0442073-7e85-48f1-8f47-aa3cab7ad15f","LoadBytes":78500,"StreamLoadPlanTimeMs":0,"NumberTotalRows":0,"WriteDataTimeMs":7,"TxnId":6546611,"LoadTimeMs":8,"ReadDataTimeMs":0,"NumberLoadedRows":0,"NumberFilteredRows":0}
{}

	at com.starrocks.connector.flink.manager.StarRocksStreamLoadVisitor.doStreamLoad(StarRocksStreamLoadVisitor.java:116) ~[blob_p-895cea66edfff125d480c2434207bf2b18c87e89-a185b9cd92a59348133d3a2090b20e01:?]
	at com.starrocks.connector.flink.manager.StarRocksSinkManager.asyncFlush(StarRocksSinkManager.java:340) ~[blob_p-895cea66edfff125d480c2434207bf2b18c87e89-a185b9cd92a59348133d3a2090b20e01:?]
	at com.starrocks.connector.flink.manager.StarRocksSinkManager.lambda$startAsyncFlushing$0(StarRocksSinkManager.java:174) ~[blob_p-895cea66edfff125d480c2434207bf2b18c87e89-a185b9cd92a59348133d3a2090b20e01:?]
	at java.lang.Thread.run(Thread.java:750) [?:1.8.0_322]
2024-07-02 10:43:23,177 INFO  com.starrocks.connector.flink.manager.StarRocksSinkManager   [] - StarRocks interval Sinking triggered.
2024-07-02 10:43:23,177 INFO  com.starrocks.connector.flink.manager.StarRocksSinkManager   [] - Async stream load: db[social] table[ods_social_ks_comment] rows[116] bytes[60488] label[107e7e84-e340-4321-8916-2b05cc471494].
2024-07-02 10:43:23,179 INFO  com.starrocks.connector.flink.manager.StarRocksStreamLoadVisitor [] - Start to join batch data: label[107e7e84-e340-4321-8916-2b05cc471494].
2024-07-02 10:43:23,180 INFO  com.starrocks.connector.flink.manager.StarRocksStreamLoadVisitor [] - Executing stream load to: 'http://fe-c-ea043025e91a9d66-internal.starrocks.aliyuncs.com:8030/api/social/ods_social_ks_comment/_stream_load', size: '60605', thread: 63
2024-07-02 10:43:23,318 INFO  com.starrocks.connector.flink.manager.StarRocksSinkManager   [] - Async stream load finished: label[107e7e84-e340-4321-8916-2b05cc471494].
2024-07-02 10:43:24,011 INFO  com.starrocks.connector.flink.manager.StarRocksSinkManager   [] - StarRocks interval Sinking triggered.
2024-07-02 10:43:24,791 INFO  com.starrocks.connector.flink.manager.StarRocksSinkManager   [] - StarRocks interval Sinking triggered.
2024-07-02 10:43:25,319 INFO  com.starrocks.connector.flink.manager.StarRocksSinkManager   [] - StarRocks interval Sinking triggered.
2024-07-02 10:43:25,319 INFO  com.starrocks.connector.flink.manager.StarRocksSinkManager   [] - Async stream load: db[social] table[ods_social_ks_comment] rows[71] bytes[39524] label[8f9b7f0b-3a47-4ee3-a93a-e17938cd6886].
2024-07-02 10:43:25,321 INFO  com.starrocks.connector.flink.manager.StarRocksStreamLoadVisitor [] - Start to join batch data: label[8f9b7f0b-3a47-4ee3-a93a-e17938cd6886].
2024-07-02 10:43:25,321 INFO  com.starrocks.connector.flink.manager.StarRocksStreamLoadVisitor [] - Executing stream load to: 'http://fe-c-ea043025e91a9d66-internal.starrocks.aliyuncs.com:8030/api/social/ods_social_ks_comment/_stream_load', size: '39596', thread: 63
2024-07-02 10:43:25,527 INFO  com.starrocks.connector.flink.manager.StarRocksSinkManager   [] - Async stream load finished: label[8f9b7f0b-3a47-4ee3-a93a-e17938cd6886].
2024-07-02 10:43:26,011 INFO  com.starrocks.connector.flink.manager.StarRocksSinkManager   [] - StarRocks interval Sinking triggered.
2024-07-02 10:43:26,792 INFO  com.starrocks.connector.flink.manager.StarRocksSinkManager   [] - StarRocks interval Sinking triggered.
2024-07-02 10:43:27,528 INFO  com.starrocks.connector.flink.manager.StarRocksSinkManager   [] - StarRocks interval Sinking triggered.
2024-07-02 10:43:27,528 INFO  com.starrocks.connector.flink.manager.StarRocksSinkManager   [] - Async stream load: db[social] table[ods_social_ks_comment] rows[97] bytes[48808] label[3ce64d14-e276-4e6e-83a5-0c37793489bd].
2024-07-02 10:43:27,530 INFO  com.starrocks.connector.flink.manager.StarRocksStreamLoadVisitor [] - Start to join batch data: label[3ce64d14-e276-4e6e-83a5-0c37793489bd].
2024-07-02 10:43:27,530 INFO  com.starrocks.connector.flink.manager.StarRocksStreamLoadVisitor [] - Executing stream load to: 'http://fe-c-ea043025e91a9d66-internal.starrocks.aliyuncs.com:8030/api/social/ods_social_ks_comment/_stream_load', size: '48906', thread: 63
2024-07-02 10:43:27,712 INFO  com.starrocks.connector.flink.manager.StarRocksSinkManager   [] - Async stream load finished: label[3ce64d14-e276-4e6e-83a5-0c37793489bd].
2024-07-02 10:43:28,012 INFO  com.starrocks.connector.flink.manager.StarRocksSinkManager   [] - StarRocks interval Sinking triggered.
2024-07-02 10:43:28,792 INFO  com.starrocks.connector.flink.manager.StarRocksSinkManager   [] - StarRocks interval Sinking triggered.
2024-07-02 10:43:29,712 INFO  com.starrocks.connector.flink.manager.StarRocksSinkManager   [] - StarRocks interval Sinking triggered.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant