Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

geoparquet FileNotFoundError #377

Open
sfalkena opened this issue Oct 3, 2024 · 2 comments
Open

geoparquet FileNotFoundError #377

sfalkena opened this issue Oct 3, 2024 · 2 comments

Comments

@sfalkena
Copy link

sfalkena commented Oct 3, 2024

Hi,

Earlier in time I have been using the geoparquet associated with various datasets. For my current project I wanted to go with a similar approach, but when I try to read in any geoparquet, I am getting a FileNotFoundError. The snippet I am using is similar to the example notebook:

import pystac_client
import planetary_computer
import geopandas

catalog = pystac_client.Client.open(
    "https://planetarycomputer.microsoft.com/api/stac/v1/",
    modifier=planetary_computer.sign_inplace,
)

asset = catalog.get_collection("sentinel-2-l2a").assets["geoparquet-items"]

s2l2a = geopandas.read_parquet(
    asset.href, storage_options=asset.extra_fields["table:storage_options"]
)
s2l2a.head()

FileNotFoundError: items/sentinel-2-l2a.parquet

Some info about the most relevant packages in my environment:
python=3.10
planetary-computer=1.0.0
pystac-client=0.8.3
pyarrow=17.0.0
geopandas=0.14.4
adlfs=2024.7.0

Am I missing something, or has the interface changed?

@sfalkena
Copy link
Author

sfalkena commented Oct 4, 2024

Update: even when I run:

import adlfs
filesystem = adlfs.AzureBlobFileSystem(**asset.extra_fields["table:storage_options"])
filesystem.ls("")

it only lists my own containers. I am starting to feel that adlfs is somehow using a default Azure subscription instead of using the account_name="pcstacitems". Could that be? To verify this, I tried running this code on another machine too, and there it worked without issues. How would I know which information it secretly uses in the background?

@777arc
Copy link
Collaborator

777arc commented Oct 5, 2024

As far as reading in a geoparquet, see if this helps-

import pystac_client
import planetary_computer
import dask.dataframe as dd

catalog = pystac_client.Client.open(
    "https://planetarycomputer.microsoft.com/api/stac/v1/",
    modifier=planetary_computer.sign_inplace,
)

asset = catalog.get_collection("sentinel-2-l2a").assets["geoparquet-items"]

ids = dd.read_parquet(
    "abfs://items/sentinel-2-l2a.parquet",
    columns=["id"],
    storage_options=asset.extra_fields["table:storage_options"]
)
parquet = ids["id"].compute() # turns a lazy collection into its in-memory equivalent
parquet.head()

As far as adlfs, do you get the same result if you add anon=False?
filesystem = adlfs.AzureBlobFileSystem(**asset.extra_fields["table:storage_options"], anon=False)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants