Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

HaplotypeCallerSpark Failed because GenotypesCache thrown NPE #8961

Open
Xuehai-Chen opened this issue Aug 23, 2024 · 2 comments
Open

HaplotypeCallerSpark Failed because GenotypesCache thrown NPE #8961

Xuehai-Chen opened this issue Aug 23, 2024 · 2 comments

Comments

@Xuehai-Chen
Copy link

Bug Report

Affected tool(s) or class(es)

HaplotypeCallerSpark

Affected version(s)

  • [ 4.6.0.0]

Description

spark task failed, here is the stack trace:

java.lang.NullPointerException: Cannot invoke "java.util.List.size()" because "cache" is null
	at org.broadinstitute.hellbender.tools.walkers.genotyper.GenotypesCache.ensureCapacity(GenotypesCache.java:84)
	at org.broadinstitute.hellbender.tools.walkers.genotyper.GenotypesCache.get(GenotypesCache.java:43)
	at org.broadinstitute.hellbender.utils.variant.GATKVariantContextUtils.makeGenotypeCall(GATKVariantContextUtils.java:341)
	at org.broadinstitute.hellbender.tools.walkers.genotyper.AlleleSubsettingUtils.subsetAlleles(AlleleSubsettingUtils.java:133)
	at org.broadinstitute.hellbender.tools.walkers.genotyper.AlleleSubsettingUtils.subsetAlleles(AlleleSubsettingUtils.java:48)
	at org.broadinstitute.hellbender.tools.walkers.genotyper.GenotypingEngine.calculateGenotypes(GenotypingEngine.java:191)
	at org.broadinstitute.hellbender.tools.walkers.haplotypecaller.HaplotypeCallerGenotypingEngine.assignGenotypeLikelihoods(HaplotypeCallerGenotypingEngine.java:263)
	at org.broadinstitute.hellbender.tools.walkers.haplotypecaller.HaplotypeCallerEngine.callRegion(HaplotypeCallerEngine.java:979)
	at org.broadinstitute.hellbender.tools.HaplotypeCallerSpark.lambda$assemblyFunction$0(HaplotypeCallerSpark.java:179)
	at java.base/java.util.stream.ReferencePipeline$3$1.accept(ReferencePipeline.java:197)
	at java.base/java.util.Spliterators$IteratorSpliterator.tryAdvance(Spliterators.java:1856)
	at java.base/java.util.stream.StreamSpliterators$WrappingSpliterator.lambda$initPartialTraversalState$0(StreamSpliterators.java:292)
	at java.base/java.util.stream.StreamSpliterators$AbstractWrappingSpliterator.fillBuffer(StreamSpliterators.java:206)
	at java.base/java.util.stream.StreamSpliterators$AbstractWrappingSpliterator.doAdvance(StreamSpliterators.java:161)
	at java.base/java.util.stream.StreamSpliterators$WrappingSpliterator.tryAdvance(StreamSpliterators.java:298)
	at java.base/java.util.Spliterators$1Adapter.hasNext(Spliterators.java:681)
	at org.broadinstitute.hellbender.relocated.com.google.common.collect.Iterators$ConcatenatedIterator.getTopMetaIterator(Iterators.java:1379)
	at org.broadinstitute.hellbender.relocated.com.google.common.collect.Iterators$ConcatenatedIterator.hasNext(Iterators.java:1395)
	at org.broadinstitute.hellbender.utils.iterators.PushToPullIterator.fillCache(PushToPullIterator.java:71)
	at org.broadinstitute.hellbender.utils.iterators.PushToPullIterator.advanceToNextElement(PushToPullIterator.java:58)
	at org.broadinstitute.hellbender.utils.iterators.PushToPullIterator.(PushToPullIterator.java:37)
	at org.broadinstitute.hellbender.utils.variant.writers.GVCFBlockCombiningIterator.(GVCFBlockCombiningIterator.java:14)
	at org.broadinstitute.hellbender.engine.spark.datasources.VariantsSparkSink.lambda$writeVariantsSingle$516343c4$1(VariantsSparkSink.java:127)
	at org.apache.spark.api.java.JavaRDDLike.$anonfun$mapPartitions$1(JavaRDDLike.scala:153)
	at org.apache.spark.rdd.RDD.$anonfun$mapPartitions$2(RDD.scala:858)
	at org.apache.spark.rdd.RDD.$anonfun$mapPartitions$2$adapted(RDD.scala:858)
	at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:56)
	at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:367)
	at org.apache.spark.rdd.RDD.iterator(RDD.scala:331)
	at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:56)
	at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:367)
	at org.apache.spark.rdd.RDD.iterator(RDD.scala:331)
	at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:56)
	at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:367)
	at org.apache.spark.rdd.RDD.iterator(RDD.scala:331)
	at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:93)
	at org.apache.spark.TaskContext.runTaskWithListeners(TaskContext.scala:166)
	at org.apache.spark.scheduler.Task.run(Task.scala:141)
	at org.apache.spark.executor.Executor$TaskRunner.$anonfun$run$4(Executor.scala:620)
	at org.apache.spark.util.SparkErrorUtils.tryWithSafeFinally(SparkErrorUtils.scala:64)
	at org.apache.spark.util.SparkErrorUtils.tryWithSafeFinally$(SparkErrorUtils.scala:61)
	at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:93)
	at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:623)
	at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1136)
	at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:635)
	at java.base/java.lang.Thread.run(Thread.java:833)

Steps to reproduce

Run HaplotypeCallerSpark multiple times, it had a chance to fail.
Looks like the method ensureCapacity of GenotypesCache is not synchronized. So when multiple task threads run into this method, the new added cache is not fully initialized.

Expected behavior

spark tasks success

Actual behavior

spark tasks failed

@Xuehai-Chen
Copy link
Author

Xuehai-Chen commented Aug 23, 2024

I think the root cause is the method ensureCapacity of GenotypesCache is not synchronized. So when multiple task threads run into this method, the new added cache is not fully initialized.

@gokalpcelik
Copy link
Contributor

Hi @Xuehai-Chen
HaplotypeCallerSpark is not developed regularly as the original HaplotypeCaller therefore it has its own quirks and issues present. It can be considered as an experimental tool/a conceptual tool to show that HaplotypeCaller may be accelerated using spark. It is not endorsed as a ready to be used tool for any purpose. Its development is not a high priority therefore we don't recommend using it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants