geoparquet FileNotFoundError #377

sfalkena · 2024-10-03T14:27:06Z

Hi,

Earlier in time I have been using the geoparquet associated with various datasets. For my current project I wanted to go with a similar approach, but when I try to read in any geoparquet, I am getting a FileNotFoundError. The snippet I am using is similar to the example notebook:

import pystac_client
import planetary_computer
import geopandas

catalog = pystac_client.Client.open(
    "https://planetarycomputer.microsoft.com/api/stac/v1/",
    modifier=planetary_computer.sign_inplace,
)

asset = catalog.get_collection("sentinel-2-l2a").assets["geoparquet-items"]

s2l2a = geopandas.read_parquet(
    asset.href, storage_options=asset.extra_fields["table:storage_options"]
)
s2l2a.head()

FileNotFoundError: items/sentinel-2-l2a.parquet

Some info about the most relevant packages in my environment:
python=3.10
planetary-computer=1.0.0
pystac-client=0.8.3
pyarrow=17.0.0
geopandas=0.14.4
adlfs=2024.7.0

Am I missing something, or has the interface changed?

The text was updated successfully, but these errors were encountered:

sfalkena · 2024-10-04T10:47:54Z

Update: even when I run:

import adlfs
filesystem = adlfs.AzureBlobFileSystem(**asset.extra_fields["table:storage_options"])
filesystem.ls("")

it only lists my own containers. I am starting to feel that adlfs is somehow using a default Azure subscription instead of using the account_name="pcstacitems". Could that be? To verify this, I tried running this code on another machine too, and there it worked without issues. How would I know which information it secretly uses in the background?

777arc · 2024-10-05T07:31:40Z

As far as reading in a geoparquet, see if this helps-

import pystac_client
import planetary_computer
import dask.dataframe as dd

catalog = pystac_client.Client.open(
    "https://planetarycomputer.microsoft.com/api/stac/v1/",
    modifier=planetary_computer.sign_inplace,
)

asset = catalog.get_collection("sentinel-2-l2a").assets["geoparquet-items"]

ids = dd.read_parquet(
    "abfs://items/sentinel-2-l2a.parquet",
    columns=["id"],
    storage_options=asset.extra_fields["table:storage_options"]
)
parquet = ids["id"].compute() # turns a lazy collection into its in-memory equivalent
parquet.head()

As far as adlfs, do you get the same result if you add anon=False?
filesystem = adlfs.AzureBlobFileSystem(**asset.extra_fields["table:storage_options"], anon=False)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

geoparquet FileNotFoundError #377

geoparquet FileNotFoundError #377

sfalkena commented Oct 3, 2024

sfalkena commented Oct 4, 2024 •

edited

Loading

777arc commented Oct 5, 2024 •

edited

Loading

geoparquet FileNotFoundError #377

geoparquet FileNotFoundError #377

Comments

sfalkena commented Oct 3, 2024

sfalkena commented Oct 4, 2024 • edited Loading

777arc commented Oct 5, 2024 • edited Loading

sfalkena commented Oct 4, 2024 •

edited

Loading

777arc commented Oct 5, 2024 •

edited

Loading