Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Temporary failure in name resolution #81

Closed
vncntprvst opened this issue Jun 11, 2024 · 7 comments
Closed

Temporary failure in name resolution #81

vncntprvst opened this issue Jun 11, 2024 · 7 comments

Comments

@vncntprvst
Copy link

Hi,
I'm testing lindi, following this discussion.
This is the code I'm running:

import pynwb
import lindi

# URL of the remote .nwb.lindi.json file
url = "https://lindi.neurosift.org/dandi/dandisets/000363/assets/21c622b7-6d8e-459b-98e8-b968a97a1585/nwb.lindi.json"

# Set up a local cache
local_cache = lindi.LocalCache(cache_dir='lindi_cache')

# Create the h5py-like client with cache
# # client = lindi.LindiH5pyFile.from_lindi_file(url)
client = lindi.LindiH5pyFile.from_lindi_file(url, local_cache=local_cache)

# Open using pynwb
with pynwb.NWBHDF5IO(file=client, mode="r") as io:
    nwbfile = io.read()

print(nwbfile)

trials_df = nwbfile.trials.to_dataframe()
units_df = nwbfile.units.to_dataframe()

It worked up to (and including) trials_df = nwbfile.trials.to_dataframe().
However, at units_df = nwbfile.units.to_dataframe(), I got this error:

Traceback (most recent call last):
  File "/home/wanglab/mambaforge/envs/map_ephys/lib/python3.10/site-packages/urllib3/connection.py", line 198, in _new_conn
    sock = connection.create_connection(
  File "/home/wanglab/mambaforge/envs/map_ephys/lib/python3.10/site-packages/urllib3/util/connection.py", line 60, in create_connection
    for res in socket.getaddrinfo(host, port, family, socket.SOCK_STREAM):
  File "/home/wanglab/mambaforge/envs/map_ephys/lib/python3.10/socket.py", line 955, in getaddrinfo
    for res in _socket.getaddrinfo(host, port, family, type, proto, flags):
socket.gaierror: [Errno -3] Temporary failure in name resolution

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/home/wanglab/mambaforge/envs/map_ephys/lib/python3.10/site-packages/urllib3/connectionpool.py", line 793, in urlopen
    response = self._make_request(
  File "/home/wanglab/mambaforge/envs/map_ephys/lib/python3.10/site-packages/urllib3/connectionpool.py", line 491, in _make_request
    raise new_e
  File "/home/wanglab/mambaforge/envs/map_ephys/lib/python3.10/site-packages/urllib3/connectionpool.py", line 467, in _make_request
    self._validate_conn(conn)
  File "/home/wanglab/mambaforge/envs/map_ephys/lib/python3.10/site-packages/urllib3/connectionpool.py", line 1099, in _validate_conn
    conn.connect()
  File "/home/wanglab/mambaforge/envs/map_ephys/lib/python3.10/site-packages/urllib3/connection.py", line 616, in connect
    self.sock = sock = self._new_conn()
  File "/home/wanglab/mambaforge/envs/map_ephys/lib/python3.10/site-packages/urllib3/connection.py", line 205, in _new_conn
    raise NameResolutionError(self.host, self, e) from e
urllib3.exceptions.NameResolutionError: <urllib3.connection.HTTPSConnection object at 0x748ecdad1660>: Failed to resolve 'dandiarchive.s3.amazonaws.com' ([Errno -3] Temporary failure in name resolution)

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/home/wanglab/mambaforge/envs/map_ephys/lib/python3.10/site-packages/requests/adapters.py", line 589, in send
    resp = conn.urlopen(
  File "/home/wanglab/mambaforge/envs/map_ephys/lib/python3.10/site-packages/urllib3/connectionpool.py", line 847, in urlopen
    retries = retries.increment(
  File "/home/wanglab/mambaforge/envs/map_ephys/lib/python3.10/site-packages/urllib3/util/retry.py", line 515, in increment
    raise MaxRetryError(_pool, url, reason) from reason  # type: ignore[arg-type]
urllib3.exceptions.MaxRetryError: HTTPSConnectionPool(host='dandiarchive.s3.amazonaws.com', port=443): Max retries exceeded with url: /blobs/886/c43/886c4302-846a-4ef5-996a-6f02d6a81a5f?response-content-disposition=attachment%3B%20filename%3D%22sub-440956_ses-20190207T120657_behavior%2Becephys%2Bimage%2Bogen.nwb%22&X-Amz-Algorithm=AWS4-HMAC-SHA256&X-Amz-Credential=AKIAUBRWC5GAEKH3223E%2F20240611%2Fus-east-2%2Fs3%2Faws4_request&X-Amz-Date=20240611T140109Z&X-Amz-Expires=3600&X-Amz-SignedHeaders=host&X-Amz-Signature=b3d9ee6212e1188568d787dfc3ae894dcede3a9fca10cdba28e42b1f8039bde1 (Caused by NameResolutionError("<urllib3.connection.HTTPSConnection object at 0x748ecdad1660>: Failed to resolve 'dandiarchive.s3.amazonaws.com' ([Errno -3] Temporary failure in name resolution)"))

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "<string>", line 1, in <module>
  File "/home/wanglab/mambaforge/envs/map_ephys/lib/python3.10/site-packages/hdmf/utils.py", line 668, in func_call
    return func(args[0], **pargs)
  File "/home/wanglab/mambaforge/envs/map_ephys/lib/python3.10/site-packages/hdmf/common/table.py", line 1225, in to_dataframe
    sel = self.__get_selection_as_dict(arg, df=True, **kwargs)
  File "/home/wanglab/mambaforge/envs/map_ephys/lib/python3.10/site-packages/hdmf/common/table.py", line 1063, in __get_selection_as_dict
    ret[name] = col.get(arg, df=df, index=index, **kwargs)
  File "/home/wanglab/mambaforge/envs/map_ephys/lib/python3.10/site-packages/hdmf/common/table.py", line 203, in get
    ret.append(self.__getitem_helper(i, **kwargs))
  File "/home/wanglab/mambaforge/envs/map_ephys/lib/python3.10/site-packages/hdmf/common/table.py", line 172, in __getitem_helper
    end = self.data[arg]
  File "/home/wanglab/mambaforge/envs/map_ephys/lib/python3.10/site-packages/lindi/LindiH5pyFile/LindiH5pyDataset.py", line 170, in __getitem__
    return self._get_item_for_zarr(self._zarr_array, args)
  File "/home/wanglab/mambaforge/envs/map_ephys/lib/python3.10/site-packages/lindi/LindiH5pyFile/LindiH5pyDataset.py", line 219, in _get_item_for_zarr
    return decode_references(zarr_array[selection])
  File "/home/wanglab/mambaforge/envs/map_ephys/lib/python3.10/site-packages/zarr/core.py", line 800, in __getitem__
    result = self.get_basic_selection(pure_selection, fields=fields)
  File "/home/wanglab/mambaforge/envs/map_ephys/lib/python3.10/site-packages/zarr/core.py", line 926, in get_basic_selection
    return self._get_basic_selection_nd(selection=selection, out=out, fields=fields)
  File "/home/wanglab/mambaforge/envs/map_ephys/lib/python3.10/site-packages/zarr/core.py", line 968, in _get_basic_selection_nd
    return self._get_selection(indexer=indexer, out=out, fields=fields)
  File "/home/wanglab/mambaforge/envs/map_ephys/lib/python3.10/site-packages/zarr/core.py", line 1343, in _get_selection
    self._chunk_getitems(
  File "/home/wanglab/mambaforge/envs/map_ephys/lib/python3.10/site-packages/zarr/core.py", line 2179, in _chunk_getitems
    cdatas = self.chunk_store.getitems(ckeys, contexts=contexts)
  File "/home/wanglab/mambaforge/envs/map_ephys/lib/python3.10/site-packages/zarr/_storage/store.py", line 179, in getitems
    return {k: self[k] for k in keys if k in self}
  File "/home/wanglab/mambaforge/envs/map_ephys/lib/python3.10/site-packages/zarr/_storage/store.py", line 179, in <dictcomp>
    return {k: self[k] for k in keys if k in self}
  File "/home/wanglab/mambaforge/envs/map_ephys/lib/python3.10/site-packages/lindi/LindiH5pyFile/LindiReferenceFileSystemStore.py", line 147, in __getitem__
    val = _read_bytes_from_url_or_path(url, offset, length)
  File "/home/wanglab/mambaforge/envs/map_ephys/lib/python3.10/site-packages/lindi/LindiH5pyFile/LindiReferenceFileSystemStore.py", line 259, in _read_bytes_from_url_or_path
    response = requests.get(url_resolved, headers=headers)
  File "/home/wanglab/mambaforge/envs/map_ephys/lib/python3.10/site-packages/requests/api.py", line 73, in get
    return request("get", url, params=params, **kwargs)
  File "/home/wanglab/mambaforge/envs/map_ephys/lib/python3.10/site-packages/requests/api.py", line 59, in request
    return session.request(method=method, url=url, **kwargs)
  File "/home/wanglab/mambaforge/envs/map_ephys/lib/python3.10/site-packages/requests/sessions.py", line 589, in request
    resp = self.send(prep, **send_kwargs)
  File "/home/wanglab/mambaforge/envs/map_ephys/lib/python3.10/site-packages/requests/sessions.py", line 703, in send
    r = adapter.send(request, **kwargs)
  File "/home/wanglab/mambaforge/envs/map_ephys/lib/python3.10/site-packages/requests/adapters.py", line 622, in send
    raise ConnectionError(e, request=request)
requests.exceptions.ConnectionError: HTTPSConnectionPool(host='dandiarchive.s3.amazonaws.com', port=443): Max retries exceeded with url: /blobs/886/c43/886c4302-846a-4ef5-996a-6f02d6a81a5f?response-content-disposition=attachment%3B%20filename%3D%22sub-440956_ses-20190207T120657_behavior%2Becephys%2Bimage%2Bogen.nwb%22&X-Amz-Algorithm=AWS4-HMAC-SHA256&X-Amz-Credential=AKIAUBRWC5GAEKH3223E%2F20240611%2Fus-east-2%2Fs3%2Faws4_request&X-Amz-Date=20240611T140109Z&X-Amz-Expires=3600&X-Amz-SignedHeaders=host&X-Amz-Signature=b3d9ee6212e1188568d787dfc3ae894dcede3a9fca10cdba28e42b1f8039bde1 (Caused by NameResolutionError("<urllib3.connection.HTTPSConnection object at 0x748ecdad1660>: Failed to resolve 'dandiarchive.s3.amazonaws.com' ([Errno -3] Temporary failure in name resolution)"))

I haven't tested that on other assets, so I'm not sure if that issue is specific or not.

@magland
Copy link
Collaborator

magland commented Jun 11, 2024

It worked for me, although it took a lot longer than I expected to load. I am going to investigate why this is taking so long.

I think your error was just a network failure. (I suppose we'll want to implement retries)

@magland
Copy link
Collaborator

magland commented Jun 11, 2024

The .lindi.json file is itself around 80 MB, so it takes a bit of time to do the initial download.

Then there are a very large number of objects in the file. But I'm surprised that it takes so long for pynwb to load and process those. I'm taking a closer look...

And the units table I would expect to be very fast to load. Looking into it.

@rly
Copy link
Contributor

rly commented Jun 11, 2024

It worked for me as well. The initial file open & read took about 30 seconds. The trials dataframe was fast. The units dataframe took another ~1.5 min.

When developing PyNWB/HDMF, we did not try to minimize the number of reads, especially when converting DynamicTable objects to pandas DataFrames, so there are likely to be inefficiencies there.

@magland
Copy link
Collaborator

magland commented Jun 11, 2024

Regarding the units table... I think to_dataframe() might not make a lot of sense in this context because maybe it is trying to put all the spike times in there? Not sure. But I think that may be why it takes so long. But I think the actual loading of data using lindi should be efficient.

@rly
Copy link
Contributor

rly commented Jun 11, 2024

maybe it is trying to put all the spike times in there

Yeah, all data in the table is read immediately (as opposed to lazily) when converting to a pandas DataFrame

@oruebel
Copy link

oruebel commented Jun 11, 2024

When developing PyNWB/HDMF, we did not try to minimize the number of reads, especially when converting DynamicTable

One specific example is reading of spike_times from the units table, or more broadly, reading of ragged array columns where values in VectorData are read via a VectorIndex. Here is the related issue on hdmf_zarr that describes this specific problem in more detail: hdmf-dev/hdmf-zarr#141 as well as a corresponding issue on the nwb_benchmarks to add this to our test suite NeurodataWithoutBorders/nwb_benchmarks#13

@vncntprvst
Copy link
Author

Thanks for all the feedback, and for developing this tool. I admittedly did not spend much time trying to debug this, I'm on a deadline... I'll definitely use it in my projects, it's really useful.

@magland magland closed this as completed Jun 12, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants