Replies: 8 comments 5 replies
-
I'd recommend reading a bit about how Dask Gateway works: https://gateway.dask.org/. The machines running your computation are physically separate from the ones where you're client is running. They don't share a filesystem. If you use something like an Azure Blob Storage container to have a location that all nodes can read or write to. Or, to answer the specific question of 'How can I make a file available to the workers, there's https://distributed.dask.org/en/stable/api.html?highlight=local%20file#distributed.Client.upload_file. |
Beta Was this translation helpful? Give feedback.
-
Based on my communication with the dask discourse group, I should NOT use clinet.upload_file or client.scatter to distribute the files to workers. The gateway should have a way (or policy) to make client data available to the cluster. So my question is, for the gateway provided by the PC, how the client can make their data accessible to the cluster? Is an Azure Blob Storage the only option? Any documents or examples on how to make this happen? Anyone from the team can help? |
Beta Was this translation helpful? Give feedback.
-
Can a PC user create a free Blob Storage account? And what's the storage limit if we can? My code uses scipy.sio.loadmat() to read local mat files. Assuming that I can upload the mat files to a storage account, how can the gateway workers read the mat files from the blob storage? I don't think loadmat() supports direct reading from blob storage. |
Beta Was this translation helpful? Give feedback.
-
I have an organization Azure portal account but I cannot create a blob storage account with it. Azure won't let me add a new Resource group and set Region to (Europe) West Europe which is where PC located I believe. Should I create a personal Azure account and then create the storage account? Any free research storage account? Right now, I just want to test and see how the dask gateway can speed up flood mapping. |
Beta Was this translation helpful? Give feedback.
-
Beta Was this translation helpful? Give feedback.
-
Tom,
I was trying to compare the running time between 'local' file system, blob storage and parallelized blob storage using 3 test watershed from small to large . However, my PC account didn't allow me to upload all 3 watershed data to my local account as the small and medium watershed data have used up my disk quota. Is it possible to increase my disk quota so that I can finish the test and comparison?
So far, with just the small and medium watersheds, I found using blob storage is about 30 to 50% slower than using the local file system. But parallelized version (with 20 workers) only used about 35% of the time of local file system. I think I may further improve the parallelization to save more time.
Thanks,
Xingong
From: Li, Xingong
Sent: Friday, February 11, 2022 10:49 AM
To: microsoft/PlanetaryComputer ***@***.***>
Subject: RE: [microsoft/PlanetaryComputer] Access to local files by Dask Gateway workers (Discussion #31)
Tom,
I've got those things figured out. I changed my code and uploaded one small watershed to azure blob storage for testing. The parallelized flood mapping was running last night and used only half of the time of a linear mapping! I'll try a large watershed (I hope it won't reach my storage limit if there is one) and see the benefit of parallelization. I'm very new to Dask and would like to explore more on better the parallelization.
In addition to the flood mapping project, I also plan to do a global snow cover change analysis on PC using MODIS MOD10A1 & MYD10A1 data. I have done some before on Earth Engine (see TrendySnow<https://trendysnow.herokuapp.com/> web application built using EE as the backend engine). But there are limitations on using some trend analysis methods on EE. I think I may be able to do those analyses on PC with its open structure.
My trial account on blob storage is only one month. I'm wondering whether there is a way for me to apply for some storage resources to experiment those ideas. Also, is there any development like geemap<https://geemap.org/> for PC that we can publish the maps generated by PC? Do you have something like PC academic advocators who can help test, use, and advocate the platform?
Sorry for such a long email.
Thanks,
Xingong
From: Tom Augspurger ***@***.******@***.***>>
Sent: Friday, February 11, 2022 7:43 AM
To: microsoft/PlanetaryComputer ***@***.******@***.***>>
Cc: Li, Xingong ***@***.******@***.***>>; Author ***@***.******@***.***>>
Subject: Re: [microsoft/PlanetaryComputer] Access to local files by Dask Gateway workers (Discussion #31)
Either getting your organization to create you a storage account / container or using a personal account (with the free trial) sounds reasonable.
-
Reply to this email directly, view it on GitHub<https://nam10.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fmicrosoft%2FPlanetaryComputer%2Fdiscussions%2F31%23discussioncomment-2157456&data=04%7C01%7Clixi%40ku.edu%7Ce6692356a7a74d00d6dc08d9ed647702%7C3c176536afe643f5b96636feabbe3c1a%7C0%7C0%7C637801838029175618%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000&sdata=yVTgpzJmuvI5iJWT3Dn%2Fvb3ez9lpcvTjx%2B%2FxHiOooRo%3D&reserved=0>, or unsubscribe<https://nam10.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fnotifications%2Funsubscribe-auth%2FAEGJEL4UJBCURMYB7ZAEC7DU2UG6VANCNFSM5NUJKRFA&data=04%7C01%7Clixi%40ku.edu%7Ce6692356a7a74d00d6dc08d9ed647702%7C3c176536afe643f5b96636feabbe3c1a%7C0%7C0%7C637801838029175618%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000&sdata=ijTshRr%2BL5VQyJNFCqGtYs8DF%2F9tCZ2G8zhgLCWkv20%3D&reserved=0>.
Triage notifications on the go with GitHub Mobile for iOS<https://nam10.safelinks.protection.outlook.com/?url=https%3A%2F%2Fapps.apple.com%2Fapp%2Fapple-store%2Fid1477376905%3Fct%3Dnotification-email%26mt%3D8%26pt%3D524675&data=04%7C01%7Clixi%40ku.edu%7Ce6692356a7a74d00d6dc08d9ed647702%7C3c176536afe643f5b96636feabbe3c1a%7C0%7C0%7C637801838029175618%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000&sdata=L74%2B4BoNv1AqR7w1Na7Jk4fShht5Mbx1M%2F0FrH2pEBQ%3D&reserved=0> or Android<https://nam10.safelinks.protection.outlook.com/?url=https%3A%2F%2Fplay.google.com%2Fstore%2Fapps%2Fdetails%3Fid%3Dcom.github.android%26referrer%3Dutm_campaign%253Dnotification-email%2526utm_medium%253Demail%2526utm_source%253Dgithub&data=04%7C01%7Clixi%40ku.edu%7Ce6692356a7a74d00d6dc08d9ed647702%7C3c176536afe643f5b96636feabbe3c1a%7C0%7C0%7C637801838029175618%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000&sdata=FExMYNsFZELp%2F4jY2JzfvHkgbs1E3JAzQewPsQsxS3M%3D&reserved=0>.
You are receiving this because you authored the thread.Message ID: ***@***.******@***.***>>
|
Beta Was this translation helpful? Give feedback.
-
Beta Was this translation helpful? Give feedback.
-
Beta Was this translation helpful? Give feedback.
-
I’m trying to use PC and Dask Gateway to speed up a flood mapping application. My data (uploaded to PC and saved under my account) are divided into tiles (in MATLAB .mat format) and each tile need to be mapped as a GeoTiff file and all the tile maps will be mosaiced when individual tile mapping is done. I can run the application as a notebook on PC without any problem. When I tried to parallelize the application using the Dask Gateway it gave me the error "FileNotFoundError: [Errno 2] No such file or directory: '/home/jovyan/fldpln/libraries/spring/FLDPLN_tiled_10.mat'". Below is the custom DAG for the parallelization. It seems to me that gateway workers cannot access those tile files and I suspect they also cannot write the maps. So my question is how I can make my local file system available to gateway workers?
Beta Was this translation helpful? Give feedback.
All reactions