You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
What happened:
when insert a few amounts of datas overwrite into a table with spark-sql,it normally worked well, but when the data size exceed a specific value such as 5000, ti could not write data into the table completely and some error logs were printed
the logs as below:
spark driver log:
24/08/15 03:03:53 INFO ShuffleWriteClientImpl: Successfully send heartbeat to Coordinator grpc client ref to 10.39.215.217:19999
24/08/15 03:03:53 INFO ShuffleWriteClientImpl: Successfully send heartbeat to Coordinator grpc client ref to 10.39.215.218:19999
24/08/15 03:03:53 INFO RssShuffleManager: Finish send heartbeat to coordinator and servers
24/08/15 03:03:56 WARN TaskSetManager: Lost task 0.0 in stage 14.0 (TID 131) (10.42.0.245 executor 3): org.apache.spark.SparkException: Task failed while writing rows.
at org.apache.spark.sql.errors.QueryExecutionErrors$.taskFailedWhileWritingRowsError(QueryExecutionErrors.scala:500)
at org.apache.spark.sql.execution.datasources.FileFormatWriter$.executeTask(FileFormatWriter.scala:321)
at org.apache.spark.sql.execution.datasources.FileFormatWriter$.$anonfun$write$16(FileFormatWriter.scala:229)
at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:90)
at org.apache.spark.scheduler.Task.run(Task.scala:131)
at org.apache.spark.executor.Executor$TaskRunner.$anonfun$run$3(Executor.scala:506)
at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1491)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:509)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
Caused by: java.io.IOException: jfs://hive/warehouse/item_01/.hive-staging_hive_2024-08-15_02-58-50_459_6630558984109284475-2/-ext-10000/_temporary/0/_temporary/attempt_202408150258544842979019484022797_0014_m_000000_131/part-00000-a208cb54-a78d-43ef-81d7-abc0e871fcb3-c000
at io.juicefs.JuiceFileSystemImpl.error(JuiceFileSystemImpl.java:281)
at io.juicefs.JuiceFileSystemImpl.access$600(JuiceFileSystemImpl.java:76)
at io.juicefs.JuiceFileSystemImpl$FSOutputStream.close(JuiceFileSystemImpl.java:1018)
at java.io.FilterOutputStream.close(FilterOutputStream.java:159)
at io.juicefs.JuiceFileSystemImpl$BufferedFSOutputStream.close(JuiceFileSystemImpl.java:1139)
at org.apache.hadoop.fs.FSDataOutputStream$PositionCache.close(FSDataOutputStream.java:77)
at org.apache.hadoop.fs.FSDataOutputStream.close(FSDataOutputStream.java:106)
at org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat$1.close(HiveIgnoreKeyTextOutputFormat.java:99)
at org.apache.spark.sql.hive.execution.HiveOutputWriter.close(HiveFileFormat.scala:162)
at org.apache.spark.sql.execution.datasources.FileFormatDataWriter.releaseCurrentWriter(FileFormatDataWriter.scala:64)
at org.apache.spark.sql.execution.datasources.FileFormatDataWriter.releaseResources(FileFormatDataWriter.scala:75)
at org.apache.spark.sql.execution.datasources.FileFormatDataWriter.commit(FileFormatDataWriter.scala:105)
at org.apache.spark.sql.execution.datasources.FileFormatWriter$.$anonfun$executeTask$1(FileFormatWriter.scala:305)
at org.apache.spark.util.Utils$.tryWithSafeFinallyAndFailureCallbacks(Utils.scala:1525)
at org.apache.spark.sql.execution.datasources.FileFormatWriter$.executeTask(FileFormatWriter.scala:311)
... 9 more
24/08/15 03:03:56 INFO TaskSetManager: Starting task 0.1 in stage 14.0 (TID 132) (10.42.3.149, executor 2, partition 0, ANY, 4472 bytes) taskResourceAssignments Map()
24/08/15 03:03:56 INFO BlockManagerInfo: Added broadcast_17_piece0 in memory on 10.42.3.149:38447 (size: 133.4 KiB, free: 3.3 GiB)
24/08/15 03:03:56 INFO MapOutputTrackerMasterEndpoint: Asked to send map output locations for shuffle 2 to 10.42.3.149:44030
What you expected to happen:
I hope large amounts of data can be writted into the table correctly ,normally and timely with spark-sql, which store data in juicefs/minio and store metadata in mysql
Environment:
JuiceFS version (use juicefs --version) or Hadoop Java SDK version:
JuiceFS version 1.1.0
Cloud provider or hardware configuration running JuiceFS:
OS (e.g cat /etc/os-release):
Kernel (e.g. uname -a):
Object storage (cloud provider and region, or self maintained):
Metadata engine info (version, cloud provider managed or self maintained):
Network connectivity (JuiceFS to metadata engine, JuiceFS to object storage):
Others:
The text was updated successfully, but these errors were encountered:
Is this error unrelated to the content of the written data?
Does this error necessarily occur whenever the amount of data being written exceeds a certain size?
Is there any other network middleware between juicefs and minio that causes the returned data to be truncated, resulting in formatting errors that cannot be parsed?
What happened:
when insert a few amounts of datas overwrite into a table with spark-sql,it normally worked well, but when the data size exceed a specific value such as 5000, ti could not write data into the table completely and some error logs were printed
the logs as below:
spark driver log:
spark executor log as below:
What you expected to happen:
I hope large amounts of data can be writted into the table correctly ,normally and timely with spark-sql, which store data in juicefs/minio and store metadata in mysql
Environment:
juicefs --version
) or Hadoop Java SDK version:cat /etc/os-release
):uname -a
):The text was updated successfully, but these errors were encountered: