the bug to write data into juicefs with spark-sql #5088

GoodJeek · 2024-08-15T04:01:05Z

What happened:
when insert a few amounts of datas overwrite into a table with spark-sql，it normally worked well, but when the data size exceed a specific value such as 5000, ti could not write data into the table completely and some error logs were printed

the logs as below:

spark driver log:

24/08/15 03:03:53 INFO ShuffleWriteClientImpl: Successfully send heartbeat to Coordinator grpc client ref to 10.39.215.217:19999
24/08/15 03:03:53 INFO ShuffleWriteClientImpl: Successfully send heartbeat to Coordinator grpc client ref to 10.39.215.218:19999
24/08/15 03:03:53 INFO RssShuffleManager: Finish send heartbeat to coordinator and servers
24/08/15 03:03:56 WARN TaskSetManager: Lost task 0.0 in stage 14.0 (TID 131) (10.42.0.245 executor 3): org.apache.spark.SparkException: Task failed while writing rows.
at org.apache.spark.sql.errors.QueryExecutionErrors$.taskFailedWhileWritingRowsError(QueryExecutionErrors.scala:500)
at org.apache.spark.sql.execution.datasources.FileFormatWriter$.executeTask(FileFormatWriter.scala:321)
at org.apache.spark.sql.execution.datasources.FileFormatWriter$.$anonfun$write$16(FileFormatWriter.scala:229)
at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:90)
at org.apache.spark.scheduler.Task.run(Task.scala:131)
at org.apache.spark.executor.Executor$TaskRunner.$anonfun$run$3(Executor.scala:506)
at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1491)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:509)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
Caused by: java.io.IOException: jfs://hive/warehouse/item_01/.hive-staging_hive_2024-08-15_02-58-50_459_6630558984109284475-2/-ext-10000/_temporary/0/_temporary/attempt_202408150258544842979019484022797_0014_m_000000_131/part-00000-a208cb54-a78d-43ef-81d7-abc0e871fcb3-c000
at io.juicefs.JuiceFileSystemImpl.error(JuiceFileSystemImpl.java:281)
at io.juicefs.JuiceFileSystemImpl.access$600(JuiceFileSystemImpl.java:76)
at io.juicefs.JuiceFileSystemImpl$FSOutputStream.close(JuiceFileSystemImpl.java:1018)
at java.io.FilterOutputStream.close(FilterOutputStream.java:159)
at io.juicefs.JuiceFileSystemImpl$BufferedFSOutputStream.close(JuiceFileSystemImpl.java:1139)
at org.apache.hadoop.fs.FSDataOutputStream$PositionCache.close(FSDataOutputStream.java:77)
at org.apache.hadoop.fs.FSDataOutputStream.close(FSDataOutputStream.java:106)
at org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat$1.close(HiveIgnoreKeyTextOutputFormat.java:99)
at org.apache.spark.sql.hive.execution.HiveOutputWriter.close(HiveFileFormat.scala:162)
at org.apache.spark.sql.execution.datasources.FileFormatDataWriter.releaseCurrentWriter(FileFormatDataWriter.scala:64)
at org.apache.spark.sql.execution.datasources.FileFormatDataWriter.releaseResources(FileFormatDataWriter.scala:75)
at org.apache.spark.sql.execution.datasources.FileFormatDataWriter.commit(FileFormatDataWriter.scala:105)
at org.apache.spark.sql.execution.datasources.FileFormatWriter$.$anonfun$executeTask$1(FileFormatWriter.scala:305)
at org.apache.spark.util.Utils$.tryWithSafeFinallyAndFailureCallbacks(Utils.scala:1525)
at org.apache.spark.sql.execution.datasources.FileFormatWriter$.executeTask(FileFormatWriter.scala:311)
... 9 more
24/08/15 03:03:56 INFO TaskSetManager: Starting task 0.1 in stage 14.0 (TID 132) (10.42.3.149, executor 2, partition 0, ANY, 4472 bytes) taskResourceAssignments Map()
24/08/15 03:03:56 INFO BlockManagerInfo: Added broadcast_17_piece0 in memory on 10.42.3.149:38447 (size: 133.4 KiB, free: 3.3 GiB)
24/08/15 03:03:56 INFO MapOutputTrackerMasterEndpoint: Asked to send map output locations for shuffle 2 to 10.42.3.149:44030

spark executor log as below:

caused by: expected element type but have (after 11 tries) [writer.go:118]
24/08/15 03:20:27 WARN JuiceFileSystemImpl: 2024/08/15 03:20:27.101260 juicefs[14] : Upload chunks/4/4247/4247266_0_2616981: SerializationError: failed to unmarshal error message
status code: 417, request id: , host id:
caused by: UnmarshalError: failed to unmarshal error message
00000000 3c 21 44 4f 43 54 59 50 45 20 68 74 6d 6c 20 50 |.<html|
00000060 3e 3c 68 65 61 64 3e 0a 3c 6d 65 74 61 20 68 74 |>.<meta ht|
00000070 74 70 2d 65 71 75 69 76 3d 22 43 6f 6e 74 65 6e |tp-equiv="Conten|
00000080 74 2d 54 79 70 65 22 20 63 6f 6e 74 65 6e 74 3d |t-Type" content=|
00000090 22 74 65 78 74 2f 68 74 6d 6c 3b 20 63 68 61 72 |"text/html; char|
000000a0 73 65 74 3d 75 74 66 2d 38 22 3e 0a 3c 74 69 74 |set=utf-8">.<tit|
000000b0 6c 65 3e 45 52 52 4f 52 3a 20 54 68 65 20 72 65 |le>ERROR: The re|
000000c0 71 75 65 73 74 65 64 20 55 52 4c 20 63 6f 75 6c |quested URL coul|
000000d0 64 20 6e 6f 74 20 62 65 20 72 65 74 72 69 65 76 |d not be retriev|
000000e0 65 64 3c 2f 74 69 74 6c 65 3e 0a 3c 73 74 79 6c |ed</title>.<styl|
000000f0 65 20 74 79 70 65 3d 22 74 65 78 74 2f 63 73 73 |e type="text/css|
00000100 22 3e 3c 21 2d 2d 20 0a 20 2f 2a 0a 20 53 74 79 |"><!-- . /. Sty|
00000110 6c 65 73 68 65 65 74 20 66 6f 72 20 53 71 75 69 |lesheet for Squi|
00000120 64 20 45 72 72 6f 72 20 70 61 67 65 73 0a 20 41 |d Error pages. A|
00000130 64 61 70 74 65 64 20 66 72 6f 6d 20 64 65 73 69 |dapted from desi|
00000140 67 6e 20 62 79 20 46 72 65 65 20 43 53 53 20 54 |gn by Free CSS T|
00000150 65 6d 70 6c 61 74 65 73 0a 20 68 74 74 70 3a 2f |emplates. http:/|
00000160 2f 77 77 77 2e 66 72 65 65 63 73 73 74 65 6d 70 |/www.freecsstemp|
00000170 6c 61 74 65 73 2e 6f 72 67 0a 20 52 65 6c 65 61 |lates.org. Relea|
00000180 73 65 64 20 66 6f 72 20 66 72 65 65 20 75 6e 64 |sed for free und|
00000190 65 72 20 61 20 43 72 65 61 74 69 76 65 20 43 6f |er a Creative Co|
000001a0 6d 6d 6f 6e 73 20 41 74 74 72 69 62 75 74 69 6f |mmons Attributio|
000001b0 6e 20 32 2e 35 20 4c 69 63 65 6e 73 65 0a 2a 2f |n 2.5 License./|
000001c0 0a 0a 2f 2a 20 50 61 67 65 20 62 61 73 69 63 73 |../* Page basics|
000001d0 20 2a 2f 0a 2a 20 7b 0a 09 66 6f 6e 74 2d 66 61 | /. {..font-fa|
000001e0 6d 69 6c 79 3a 20 76 65 72 64 61 6e 61 2c 20 73 |mily: verdana, s|
000001f0 61 6e 73 2d 73 65 72 69 66 3b 0a 7d 0a 0a 68 74 |ans-serif;.}..ht|
00000200 6d 6c 20 62 6f 64 79 20 7b 0a 09 6d 61 72 67 69 |ml body {..margi|
00000210 6e 3a 20 30 3b 0a 09 70 61 64 64 69 6e 67 3a 20 |n: 0;..padding: |
00000220 30 3b 0a 09 62 61 63 6b 67 72 6f 75 6e 64 3a 20 |0;..background: |
00000230 23 65 66 65 66 65 66 3b 0a 09 66 6f 6e 74 2d 73 |#efefef;..font-s|
00000240 69 7a 65 3a 20 31 32 70 78 3b 0a 09 63 6f 6c 6f |ize: 12px;..colo|
00000250 72 3a 20 23 31 65 31 65 31 65 3b 0a 7d 0a 0a 2f |r: #1e1e1e;.}../|
00000260 2a 20 50 61 67 65 20 64 69 73 70 6c 61 79 65 64 |* Page displayed|
00000270 20 74 69 74 6c 65 20 61 72 65 61 20 2a 2f 0a 23 | title area /.#|
00000280 74 69 74 6c 65 73 20 7b 0a 09 6d 61 72 67 69 6e |titles {..margin|
00000290 2d 6c 65 66 74 3a 20 31 35 70 78 3b 0a 09 70 61 |-left: 15px;..pa|
000002a0 64 64 69 6e 67 3a 20 31 30 70 78 3b 0a 09 70 61 |dding: 10px;..pa|
000002b0 64 64 69 6e 67 2d 6c 65 66 74 3a 20 31 30 30 70 |dding-left: 100p|
000002c0 78 3b 0a 09 62 61 63 6b 67 72 6f 75 6e 64 3a 20 |x;..background: |
000002d0 75 72 6c 28 27 68 74 74 70 3a 2f 2f 77 77 77 2e |url('http://www.|
000002e0 73 71 75 69 64 2d 63 61 63 68 65 2e 6f 72 67 2f |squid-cache.org/|
000002f0 41 72 74 77 6f 72 6b 2f 53 4e 2e 70 6e 67 27 29 |Artwork/SN.png')|
00000300 20 6e 6f 2d 72 65 70 65 61 74 20 6c 65 66 74 3b | no-repeat left;|
00000310 0a 7d 0a 0a 2f 2a 20 69 6e 69 74 69 61 6c 20 74 |.}../ initial t|
00000320 69 74 6c 65 20 2a 2f 0a 23 74 69 74 6c 65 73 20 |itle /.#titles |
00000330 68 31 20 7b 0a 09 63 6f 6c 6f 72 3a 20 23 30 30 |h1 {..color: #00|
00000340 30 30 30 30 3b 0a 7d 0a 23 74 69 74 6c 65 73 20 |0000;.}.#titles |
00000350 68 32 20 7b 0a 09 63 6f 6c 6f 72 3a 20 23 30 30 |h2 {..color: #00|
00000360 30 30 30 30 3b 0a 7d 0a 0a 2f 2a 20 73 70 65 63 |0000;.}../ spec|
00000370 69 61 6c 20 65 76 65 6e 74 3a 20 46 54 50 20 73 |ial event: FTP s|
00000380 75 63 63 65 73 73 20 70 61 67 65 20 74 69 74 6c |uccess page titl|
00000390 65 73 20 2a 2f 0a 23 74 69 74 6c 65 73 20 66 74 |es /.#titles ft|
000003a0 70 73 75 63 63 65 73 73 20 7b 0a 09 62 61 63 6b |psuccess {..back|
000003b0 67 72 6f 75 6e 64 2d 63 6f 6c 6f 72 3a 23 30 30 |ground-color:#00|
000003c0 66 66 30 30 3b 0a 09 77 69 64 74 68 3a 31 30 30 |ff00;..width:100|
000003d0 25 3b 0a 7d 0a 0a 2f 2a 20 50 61 67 65 20 64 69 |%;.}../ Page di|
000003e0 73 70 6c 61 79 65 64 20 62 6f 64 79 20 63 6f 6e |splayed body con|
000003f0 74 65 6e 74 20 61 72 65 61 20 2a 2f 0a 23 63 6f |tent area /.#co|
00000400 6e 74 65 6e 74 20 7b 0a 09 70 61 64 64 69 6e 67 |ntent {..padding|
00000410 3a 20 31 30 70 78 3b 0a 09 62 61 63 6b 67 72 6f |: 10px;..backgro|
00000420 75 6e 64 3a 20 23 66 66 66 66 66 66 3b 0a 7d 0a |und: #ffffff;.}.|
00000430 0a 2f 2a 20 47 65 6e 65 72 61 6c 20 74 65 78 74 |./ General text|
00000440 20 2a 2f 0a 70 20 7b 0a 7d 0a 0a 2f 2a 20 65 72 | /.p {.}../ er|
00000450 72 6f 72 20 62 72 69 65 66 20 64 65 73 63 72 69 |ror brief descri|
00000460 70 74 69 6f 6e 20 2a 2f 0a 23 65 72 72 6f 72 20 |ption /.#error |
00000470 70 20 7b 0a 7d 0a 0a 2f 2a 20 73 6f 6d 65 20 64 |p {.}../ some d|
00000480 61 74 61 20 77 68 69 63 68 20 6d 61 79 20 68 61 |ata which may ha|
00000490 76 65 20 63 61 75 73 65 64 20 74 68 65 20 70 72 |ve caused the pr|
000004a0 6f 62 6c 65 6d 20 2a 2f 0a 23 64 61 74 61 20 7b |oblem /.#data {|
000004b0 0a 7d 0a 0a 2f 2a 20 74 68 65 20 65 72 72 6f 72 |.}../ the error|
000004c0 20 6d 65 73 73 61 67 65 20 72 65 63 65 69 76 65 | message receive|
000004d0 64 20 66 72 6f 6d 20 74 68 65 20 73 79 73 74 65 |d from the syste|
000004e0 6d 20 6f 72 20 6f 74 68 65 72 20 73 6f 66 74 77 |m or other softw|
000004f0 61 72 65 20 2a 2f 0a 23 73 79 73 6d 73 67 20 7b |are /.#sysmsg {|
00000500 0a 7d 0a 0a 70 72 65 20 7b 0a 20 20 20 20 66 6f |.}..pre {. fo|
00000510 6e 74 2d 66 61 6d 69 6c 79 3a 73 61 6e 73 2d 73 |nt-family:sans-s|
00000520 65 72 69 66 3b 0a 7d 0a 0a 2f 2a 20 73 70 65 63 |erif;.}../ spec|
00000530 69 61 6c 20 65 76 65 6e 74 3a 20 46 54 50 20 2f |ial event: FTP /|
00000540 20 47 6f 70 68 65 72 20 64 69 72 65 63 74 6f 72 | Gopher director|
00000550 79 20 6c 69 73 74 69 6e 67 20 2a 2f 0a 23 64 69 |y listing /.#di|
00000560 72 6d 73 67 20 7b 0a 20 20 20 20 66 6f 6e 74 2d |rmsg {. font-|
00000570 66 61 6d 69 6c 79 3a 20 63 6f 75 72 69 65 72 3b |family: courier;|
00000580 0a 20 20 20 20 63 6f 6c 6f 72 3a 20 62 6c 61 63 |. color: blac|
00000590 6b 3b 0a 20 20 20 20 66 6f 6e 74 2d 73 69 7a 65 |k;. font-size|
000005a0 3a 20 31 30 70 74 3b 0a 7d 0a 23 64 69 72 6c 69 |: 10pt;.}.#dirli|
000005b0 73 74 69 6e 67 20 7b 0a 20 20 20 20 6d 61 72 67 |sting {. marg|
000005c0 69 6e 2d 6c 65 66 74 3a 20 32 25 3b 0a 20 20 20 |in-left: 2%;. |
000005d0 20 6d 61 72 67 69 6e 2d 72 69 67 68 74 3a 20 32 | margin-right: 2|
000005e0 25 3b 0a 7d 0a 23 64 69 72 6c 69 73 74 69 6e 67 |%;.}.#dirlisting|
000005f0 20 74 72 2e 65 6e 74 72 79 20 74 64 2e 69 63 6f | tr.entry td.ico|
00000600 6e 2c 74 64 2e 66 69 6c 65 6e 61 6d 65 2c 74 64 |n,td.filename,td|
00000610 2e 73 69 7a 65 2c 74 64 2e 64 61 74 65 20 7b 0a |.size,td.date {.|
00000620 20 20 20 20 62 6f 72 64 65 72 2d 62 6f 74 74 6f | border-botto|
00000630 6d 3a 20 67 72 6f 6f 76 65 3b 0a 7d 0a 23 64 69 |m: groove;.}.#di|
00000640 72 6c 69 73 74 69 6e 67 20 74 64 2e 73 69 7a 65 |rlisting td.size|
00000650 20 7b 0a 20 20 20 20 77 69 64 74 68 3a 20 35 30 | {. width: 50|
00000660 70 78 3b 0a 20 20 20 20 74 65 78 74 2d 61 6c 69 |px;. text-ali|
00000670 67 6e 3a 20 72 69 67 68 74 3b 0a 20 20 20 20 70 |gn: right;. p|
00000680 61 64 64 69 6e 67 2d 72 69 67 68 74 3a 20 35 70 |adding-right: 5p|
00000690 78 3b 0a 7d 0a 0a 2f 2a 20 68 6f 72 69 7a 6f 6e |x;.}../ horizon|
000006a0 74 61 6c 20 6c 69 6e 65 73 20 2a 2f 0a 68 72 20 |tal lines /.hr |
000006b0 7b 0a 09 6d 61 72 67 69 6e 3a 20 30 3b 0a 7d 0a |{..margin: 0;.}.|
000006c0 0a 2f 2a 20 70 61 67 65 20 64 69 73 70 6c 61 79 |./ page display|
000006d0 65 64 20 66 6f 6f 74 65 72 20 61 72 65 61 20 2a |ed footer area *|
000006e0 2f 0a 23 66 6f 6f 74 65 72 20 7b 0a 09 66 6f 6e |/.#footer {..fon|
000006f0 74 2d 73 69 7a 65 3a 20 39 70 78 3b 0a 09 70 61 |t-size: 9px;..pa|
00000700 64 64 69 6e 67 2d 6c 65 66 74 3a 20 31 30 70 78 |dding-left: 10px|
00000710 3b 0a 7d 0a 0a 0a 62 6f 64 79 0a 3a 6c 61 6e 67 |;.}...body.:lang|
00000720 28 66 61 29 20 7b 20 64 69 72 65 63 74 69 6f 6e |(fa) { direction|
00000730 3a 20 72 74 6c 3b 20 66 6f 6e 74 2d 73 69 7a 65 |: rtl; font-size|
00000740 3a 20 31 30 30 25 3b 20 66 6f 6e 74 2d 66 61 6d |: 100%; font-fam|
00000750 69 6c 79 3a 20 54 61 68 6f 6d 61 2c 20 52 6f 79 |ily: Tahoma, Roy|
00000760 61 2c 20 73 61 6e 73 2d 73 65 72 69 66 3b 20 66 |a, sans-serif; f|
00000770 6c 6f 61 74 3a 20 72 69 67 68 74 3b 20 7d 0a 3a |loat: right; }.:|
00000780 6c 61 6e 67 28 68 65 29 20 7b 20 64 69 72 65 63 |lang(he) { direc|
00000790 74 69 6f 6e 3a 20 72 74 6c 3b 20 7d 0a 20 2d 2d |tion: rtl; }. --|
000007a0 3e 3c 2f 73 74 79 6c 65 3e 0a 3c 2f 68 65 61 64 |></style>.</head|
000007b0 3e 3c 62 6f 64 79 20 69 64 3d 45 52 52 5f 49 4e |><body id=ERR_IN|
000007c0 56 41 4c 49 44 5f 52 45 51 3e 0a 3c 64 69 76 20 |VALID_REQ>.<div |
000007d0 69 64 3d 22 74 69 74 6c 65 73 22 3e 0a 3c 68 31 |id="titles">.<h1|
00000850 65 73 74 3c 2f 62 3e 20 65 72 72 6f 72 20 77 61 |est error wa|
00000860 73 20 65 6e 63 6f 75 6e 74 65 72 65 64 20 77 68 |s encountered wh|
00000930 37 3b 20 6c 69 6e 75 78 3b 20 61 6d 64 36 34 29 |7; linux; amd64)|
00000940 0d 0a 43 6f 6e 74 65 6e 74 2d 4c 65 6e 67 74 68 |..Content-Length|
00000950 3a 20 32 36 31 36 39 38 31 0d 0a 41 75 74 68 6f |: 2616981..Autho|
00000960 72 69 7a 61 74 69 6f 6e 3a 20 2a 2a 20 4e 4f 54 |rization: ** NOT|
00000970 20 44 49 53 50 4c 41 59 45 44 20 2a 2a 0d 0a 43 | DISPLAYED **..C|
00000980 6f 6e 74 65 6e 74 2d 4d 44 35 3a 20 45 65 34 2f |ontent-MD5: Ee4/|
00000990 41 7a 72 5a 4a 6e 4d 4d 53 7a 34 31 31 71 2f 76 |AzrZJnMMSz411q/v|
00000a70 70 72 6f 62 6c 65 6d 73 20 61 72 65 3a 3c 2f 70 |problems are:</p|
00000fe0 4d 53 7a 34 31 31 71 25 32 46 76 59 77 25 33 44 |MSz411q%2FvYw%3D|
00000ff0 25 33 44 25 30 44 25 30 41 43 6f 6e 74 65 6e 74 |%3D%0D%0AContent|
> caused by: expected element type but have (try 11) [cached_store.go:390]
24/08/15 03:20:27 ERROR JuiceFileSystemImpl: 2024/08/15 03:20:27.101709 juicefs[14] : upload chunk 4247266 (length: 2616981) fail: (max tries) upload block chunks/4/4247/4247266_0_2616981: SerializationError: failed to unmarshal error message
status code: 417, request id: , host id:
caused by: UnmarshalError: failed to unmarshal error message

What you expected to happen:
I hope large amounts of data can be writted into the table correctly ,normally and timely with spark-sql, which store data in juicefs/minio and store metadata in mysql

Environment:

JuiceFS version (use juicefs --version) or Hadoop Java SDK version:
JuiceFS version 1.1.0
Cloud provider or hardware configuration running JuiceFS:
OS (e.g cat /etc/os-release):
Kernel (e.g. uname -a):
Object storage (cloud provider and region, or self maintained):
Metadata engine info (version, cloud provider managed or self maintained):
Network connectivity (JuiceFS to metadata engine, JuiceFS to object storage):
Others:

The text was updated successfully, but these errors were encountered:

zhijian-pro · 2024-08-19T07:40:51Z

What is object storage ?

GoodJeek · 2024-08-20T06:37:18Z

minio

zhijian-pro · 2024-08-23T06:25:02Z

Is this error unrelated to the content of the written data?
Does this error necessarily occur whenever the amount of data being written exceeds a certain size?

Is there any other network middleware between juicefs and minio that causes the returned data to be truncated, resulting in formatting errors that cannot be parsed?

zhijian-pro · 2024-10-23T02:49:31Z

Resolved, determined to be caused by the user's network set proxy.

GoodJeek added the kind/bug Something isn't working label Aug 15, 2024

zhijian-pro self-assigned this Aug 19, 2024

davies added needs-more-info This issue requires more information to address and removed kind/bug Something isn't working labels Aug 27, 2024

zhijian-pro closed this as completed Oct 23, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

the bug to write data into juicefs with spark-sql #5088

the bug to write data into juicefs with spark-sql #5088

GoodJeek commented Aug 15, 2024 •

edited

Loading

zhijian-pro commented Aug 19, 2024

GoodJeek commented Aug 20, 2024

zhijian-pro commented Aug 23, 2024 •

edited

Loading

zhijian-pro commented Oct 23, 2024

the bug to write data into juicefs with spark-sql #5088

the bug to write data into juicefs with spark-sql #5088

Comments

GoodJeek commented Aug 15, 2024 • edited Loading

zhijian-pro commented Aug 19, 2024

GoodJeek commented Aug 20, 2024

zhijian-pro commented Aug 23, 2024 • edited Loading

zhijian-pro commented Oct 23, 2024

GoodJeek commented Aug 15, 2024 •

edited

Loading

zhijian-pro commented Aug 23, 2024 •

edited

Loading