Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Nextflow + Azure Batch] Unable to find size for VM name 'Standard_E4ads_v5' and location 'germanywestcentral' #5076

Open
landroutsosAIE opened this issue Jun 19, 2024 · 8 comments · May be fixed by #5108

Comments

@landroutsosAIE
Copy link

landroutsosAIE commented Jun 19, 2024

Bug report

I am trying to use Azure batch with Nextflow and Seqera, but I cant initiate any job because of wrong VM name, location name or not matching any VM of this name at the specific region.

Expected behavior and actual behavior

I am running this command as a test to check my Azure batch config at the Nextflow level:

nextflow run nf-core/rnaseq -profile test,docker -c .nextflow/azure_batch_19_06.config --outdir "az://firstcontainer/testrun_19_06/" -w "az://firstcontainer/work_19_06" -with-tower

My config file:

// Scale formula to use low-priority nodes only.
lowPriorityScaleFormula = '''
    lifespan = time() - time("{{poolCreationTime}}");
    interval = TimeInterval_Minute * {{scaleInterval}};
    $samples = $PendingTasks.GetSamplePercent(interval);
    $tasks = $samples < 70 ? max(0, $PendingTasks.GetSample(1)) : max($PendingTasks.GetSample(1), avg($PendingTasks.GetSample(interval)));
    $targetVMs = $tasks > 0 ? $tasks : max(0, $TargetLowPriorityNodes/2);
    targetPoolSize = max(0, min($targetVMs, {{maxVmCount}}));
    $TargetLowPriorityNodes = lifespan < interval ? {{vmCount}} : targetPoolSize;
    $TargetDedicatedNodes = 0;
    $NodeDeallocationOption = taskcompletion;
'''

process {
    executor = 'azurebatch'
    container = 'nfcore/rnaseq:latest'
    queue = 'Standard_E4_2ads_v5'
    withLabel:process_low {queue = 'Standard_E4_2ads_v5'}
    withLabel:process_medium {queue = 'Standard_E8_4ads_v5'}
    withLabel:process_high {queue = 'Standard_E16_8ads_v5'}
    withLabel:process_high_memory {queue = 'Standard_E32_16ads_v5'}
}
azure {
        storage {
                accountName = "<myaccountname>"
                accountKey = "<myaccountkey>"
        }
        batch {
                location = "germanywestcentral"
                accountName = "<mybatchname>"
                accountKey = "<myaccountkey>"

        autoPoolMode = false
        allowPoolCreation = true
        deletePoolsOnCompletion = true

        pools {
            Standard_E4_2ads_v5 {
                autoScale = true
                vmType = 'Standard_E4-2ads_v5'
                vmCount = 2
                maxVmCount = 20
                scaleFormula = lowPriorityScaleFormula
            }
            Standard_E8_4ads_v5 {
                autoScale = true
                vmType = 'Standard_E8-4ads_v5'
                vmCount = 2
                maxVmCount = 20
                scaleFormula = lowPriorityScaleFormula
            }
            Standard_E16_8ads_v5 {
                autoScale = true
                vmType = 'Standard_E16-8ads_v5'
                vmCount = 2
                maxVmCount = 20
                scaleFormula = lowPriorityScaleFormula
            }
            Standard_E32_16ads_v5 {
                autoScale = true
                vmType = 'Standard_E32-16ads_v5'
                vmCount = 2
                maxVmCount = 10
                scaleFormula = lowPriorityScaleFormula
            }
        }
    }
}

The expected behavior was to run the rnaseq test correctly at Seqera, using Azure Batch for job scheduling and computational resources management, but it can't access the VMs I am specifying.

My Azure Batch quota is the following: 256 EADSv5 Vm Series,

Program output

The error is this:

ERROR ~ Error executing process > 'NFCORE_RNASEQ:RNASEQ:PREPARE_GENOME:UNTAR_SALMON_INDEX (salmon.tar.gz)'

Caused by:
  Unable to find size for VM name 'Standard_E4ads_v5' and location 'germanywestcentral'

The error at .nextflow.log is this:

Jun-19 10:01:08.286 [FileTransfer-9] DEBUG nextflow.file.FilePorter - Copying foreign file https://raw.githubusercontent.com/nf-core/test-datasets/rnaseq/testdata/GSE110004/SRR6357074_1.fastq.gz to work dir: az://firstcontainer/work_19_06/stage-9cbe4492-a38b-4ffc-963e-534fe37e66e5/21/aa88e373263b112da5b5b5205d6d4a/SRR6357074_1.fastq.gz
Jun-19 10:01:08.304 [Task submitter] DEBUG nextflow.processor.TaskProcessor - Handling unexpected condition for
  task: name=NFCORE_RNASEQ:RNASEQ:PREPARE_GENOME:UNTAR_SALMON_INDEX (salmon.tar.gz); work-dir=az://firstcontainer/work_19_06/7c/3aa463c2576a48a891e0ee4c1e5e1c
  error [java.lang.IllegalArgumentException]: Unable to find size for VM name 'Standard_E4ads_v5' and location 'germanywestcentral'
Jun-19 10:01:08.313 [FileTransfer-8] DEBUG nextflow.file.FilePorter - Copying foreign file https://raw.githubusercontent.com/nf-core/test-datasets/rnaseq/testdata/GSE110004/SRR6357076_2.fastq.gz to work dir: az://firstcontainer/work_19_06/stage-9cbe4492-a38b-4ffc-963e-534fe37e66e5/44/698042ca0fd803daa2d7363806d8b9/SRR6357076_2.fastq.gz
Jun-19 10:01:08.313 [Task submitter] ERROR nextflow.processor.TaskProcessor - Error executing process > 'NFCORE_RNASEQ:RNASEQ:PREPARE_GENOME:UNTAR_SALMON_INDEX (salmon.tar.gz)'

Caused by:
  Unable to find size for VM name 'Standard_E4ads_v5' and location 'germanywestcentral'

java.lang.IllegalArgumentException: Unable to find size for VM name 'Standard_E4ads_v5' and location 'germanywestcentral'
        at nextflow.cloud.azure.batch.AzBatchService.memoizedMethodPriv$getVmTypeStringString(AzBatchService.groovy:237)
        at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
        at java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
        at java.base/java.lang.reflect.Method.invoke(Method.java:566)
        at org.codehaus.groovy.reflection.CachedMethod.invoke(CachedMethod.java:107)
        at groovy.lang.MetaMethod.doMethodInvoke(MetaMethod.java:323)
        at groovy.lang.MetaClassImpl.invokeMethod(MetaClassImpl.java:1254)
        at groovy.lang.MetaClassImpl.invokeMethod(MetaClassImpl.java:1030)
        at org.codehaus.groovy.runtime.InvokerHelper.invokePogoMethod(InvokerHelper.java:1036)
        at org.codehaus.groovy.runtime.InvokerHelper.invokeMethod(InvokerHelper.java:1019)
        at org.codehaus.groovy.runtime.InvokerHelper.invokeMethodSafe(InvokerHelper.java:97)
        at nextflow.cloud.azure.batch.AzBatchService$_closure5.doCall(AzBatchService.groovy)
        at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
        at java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
        at java.base/java.lang.reflect.Method.invoke(Method.java:566)
        at org.codehaus.groovy.reflection.CachedMethod.invoke(CachedMethod.java:107)
        at groovy.lang.MetaMethod.doMethodInvoke(MetaMethod.java:323)
        at org.codehaus.groovy.runtime.metaclass.ClosureMetaClass.invokeMethod(ClosureMetaClass.java:274)
        at groovy.lang.MetaClassImpl.invokeMethod(MetaClassImpl.java:1030)
        at groovy.lang.Closure.call(Closure.java:427)
        at org.codehaus.groovy.runtime.memoize.Memoize$MemoizeFunction.lambda$call$0(Memoize.java:137)
        at org.codehaus.groovy.runtime.memoize.ConcurrentCommonCache.getAndPut(ConcurrentCommonCache.java:137)
        at org.codehaus.groovy.runtime.memoize.ConcurrentCommonCache.getAndPut(ConcurrentCommonCache.java:113)
        at org.codehaus.groovy.runtime.memoize.Memoize$MemoizeFunction.call(Memoize.java:136)
        at nextflow.cloud.azure.batch.AzBatchService.getVmType(AzBatchService.groovy)
        at nextflow.cloud.azure.batch.AzBatchService.specFromPoolConfig(AzBatchService.groovy:542)
        at nextflow.cloud.azure.batch.AzBatchService.specForTask(AzBatchService.groovy:608)
        at nextflow.cloud.azure.batch.AzBatchService.getOrCreatePool(AzBatchService.groovy:615)
        at nextflow.cloud.azure.batch.AzBatchService.submitTask(AzBatchService.groovy:320)
        at nextflow.cloud.azure.batch.AzBatchTaskHandler.submit(AzBatchTaskHandler.groovy:91)
        at nextflow.processor.TaskPollingMonitor.submit(TaskPollingMonitor.groovy:196)
        at nextflow.processor.TaskPollingMonitor.submitPendingTasks(TaskPollingMonitor.groovy:565)
        at nextflow.processor.TaskPollingMonitor.submitLoop(TaskPollingMonitor.groovy:390)
        at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
        at java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
        at java.base/java.lang.reflect.Method.invoke(Method.java:566)
        at org.codehaus.groovy.reflection.CachedMethod.invoke(CachedMethod.java:107)
        at groovy.lang.MetaMethod.doMethodInvoke(MetaMethod.java:323)
        at groovy.lang.MetaClassImpl.invokeMethod(MetaClassImpl.java:1254)
        at groovy.lang.MetaClassImpl.invokeMethodClosure(MetaClassImpl.java:1042)
        at groovy.lang.MetaClassImpl.invokeMethod(MetaClassImpl.java:1128)
        at groovy.lang.MetaClassImpl.invokeMethod(MetaClassImpl.java:1030)
        at groovy.lang.Closure.call(Closure.java:427)
        at groovy.lang.Closure.call(Closure.java:406)
        at groovy.lang.Closure.run(Closure.java:498)
        at java.base/java.lang.Thread.run(Thread.java:829)

Environment

  • Nextflow version: 23.10.1.5891
  • Operating system: Linux

What could be the issue here? Thanks in advance!

@adamrtalbot
Copy link
Collaborator

Hi @landroutsosAIE, it doesn't seem to appear in this list of VMs by region.. I've updated the list here:

#5100

related to #2994

@landroutsosAIE
Copy link
Author

landroutsosAIE commented Jul 2, 2024

Hello @adamrtalbot. Thank you for help. I will wait for the pull request to be accepted.
I have an other problem with the same pipeline. I changed the Azure Batch config and now it works until the Salmon quant step. It stops with exit status 1 and the real error shows up at command.log file:
Unable to download path: https://<ourblobstorage>/test_run_07_01/7a/4ac8cca20ea799d9c65917be044a73/salmon

So it can't download the salmon folder from the previous step.

We are running this pipeline in Seqera too and from the four same tasks, one succeeded, two didnt with exit 1 and one didnt with exit 137 (which i suppose is a RAM problem). We are using max memory 256gb.
image

@adamrtalbot
Copy link
Collaborator

That's unusual, does the blob directory include the expected file? Does resume work? I presume the task that exited with error code 137 was running on a machine with 256gb of storage?

When using Seqera Platform, you shouldn't need to specify any of this configuration. I would try and remove anything around configuring storage and batch accounts.

@landroutsosAIE
Copy link
Author

landroutsosAIE commented Jul 2, 2024

Yes, the folder exists in the blob directory. I think the problem is with my Azure batch config for nextflow. it didnt used the high memory process VM series that I was assigning. I am now running the pipeline with only the high memory process VM (with 256 gb ram) and I will get back at you. I am using Seqera (-with-tower parameter) only for better monitoring of my pipeline.

@pditommaso
Copy link
Member

Can this be considered solved by #5100?

@adamrtalbot
Copy link
Collaborator

Currently getting error 😱 :

ERROR ~ Error executing process > 'sayHello (3)'

Caused by:
  Cannot find a VM for task 'sayHello (3)' matching these requirements: type=Standard_E4-2ads_v5, cpus=1, mem=-, location=useast

@adamrtalbot
Copy link
Collaborator

Adding some logging it's failing to find the Azure VMs in a region:

Jul-03 11:17:09.642 [Task submitter] DEBUG n.c.azure.batch.AzBatchTaskHandler - [AZURE BATCH] Submitting task sayHello (4) - work-dir=az://scidev-useast/aa/a13d6b287c3b1e178396256bce01be
Jul-03 11:17:10.120 [Task submitter] DEBUG n.cloud.azure.batch.AzBatchService - [AZURE BATCH] guessing best VM given location=useast; cpus=1; mem=null; family=Standard_E4-2ads_v5
Jul-03 11:17:10.120 [Task submitter] DEBUG n.cloud.azure.batch.AzBatchService - [AZURE BATCH] Finding best VM given location=useast; cpus=1; mem=null; family=Standard_E4-2ads_v5
Jul-03 11:17:10.121 [Task submitter] WARN  n.cloud.azure.batch.AzBatchService - [AZURE BATCH] Unable to find Azure VM names for location: useast
Jul-03 11:17:10.121 [Task submitter] DEBUG n.cloud.azure.batch.AzBatchService - [AZURE BATCH] Found 0 VM types in location useast
Jul-03 11:17:10.121 [Task submitter] DEBUG n.cloud.azure.batch.AzBatchService - [AZURE BATCH] Listing VM families
Jul-03 11:17:10.121 [Task submitter] DEBUG n.cloud.azure.batch.AzBatchService - [AZURE BATCH] Found 0 VM types matching the criteria

@adamrtalbot
Copy link
Collaborator

adamrtalbot commented Jul 3, 2024

Idiot

useast vs eastus. Going to add another check for that 🤦

Done: #5108

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants