Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Occasional failure in HTTP bytes #85

Open
mrocklin opened this issue Jun 24, 2019 · 7 comments
Open

Occasional failure in HTTP bytes #85

mrocklin opened this issue Jun 24, 2019 · 7 comments
Assignees
Labels
help wanted Extra attention is needed

Comments

@mrocklin
Copy link
Member

When running CI in this project I sometimes run across the following error:

~/miniconda/envs/test/lib/python3.7/site-packages/dask/bag/core.py in reify()
   1603 def reify(seq):
   1604     if isinstance(seq, Iterator):
-> 1605         seq = list(seq)
   1606     if seq and isinstance(seq[0], Iterator):
   1607         seq = list(map(list, seq))
~/miniconda/envs/test/lib/python3.7/site-packages/dask/bag/core.py in map_chunk()
   1769                 yield f(**k)
   1770     else:
-> 1771         for a in zip(*args):
   1772             yield f(*a)
   1773 
~/miniconda/envs/test/lib/python3.7/site-packages/dask/bag/text.py in file_to_blocks()
    103 def file_to_blocks(lazy_file):
    104     with lazy_file as f:
--> 105         for line in f:
    106             yield line
    107 
~/miniconda/envs/test/lib/python3.7/site-packages/dask/bytes/http.py in read()
    247             # EOF (python files don't error, just return no data)
    248             return b''
--> 249         self. _fetch(self.loc, end)
    250         data = self.cache[self.loc - self.start:end - self.start]
    251         self.loc = end
~/miniconda/envs/test/lib/python3.7/site-packages/dask/bytes/http.py in _fetch()
    258             self.start = start
    259             self.end = end + self.blocksize
--> 260             self.cache = self._fetch_range(start, self.end)
    261         elif start < self.start:
    262             if self.end - end > self.blocksize:
~/miniconda/envs/test/lib/python3.7/site-packages/dask/bytes/http.py in _fetch_range()
    320             if cl <= end - start:
    321                 # data size OK
--> 322                 return r.content
    323             else:
    324                 raise ValueError('Got more bytes (%i) than requested (%i)' % (
~/miniconda/envs/test/lib/python3.7/site-packages/requests/models.py in content()
    826                 self._content = None
    827             else:
--> 828                 self._content = b''.join(self.iter_content(CONTENT_CHUNK_SIZE)) or b''
    829 
    830         self._content_consumed = True
~/miniconda/envs/test/lib/python3.7/site-packages/requests/models.py in generate()
    751                         yield chunk
    752                 except ProtocolError as e:
--> 753                     raise ChunkedEncodingError(e)
    754                 except DecodeError as e:
    755                     raise ContentDecodingError(e)
ChunkedEncodingError: ('Connection broken: OSError("(104, \'ECONNRESET\')")', OSError("(104, 'ECONNRESET')"))
ChunkedEncodingError: ('Connection broken: OSError("(104, \'ECONNRESET\')")', OSError("(104, 'ECONNRESET')"))
You can ignore this error by setting the following in conf.py:
    nbsphinx_allow_errors = True
Notebook error:
CellExecutionError in applications/json-data-on-the-web.ipynb:
------------------
df.spec.value_counts().nlargest(20).to_frame().compute()
------------------

@martindurant , this seems to be in your general domain. Do you have any suggestions on what might be happening here?

@martindurant
Copy link
Member

I'm not sure there's much we can do about broken connections, I can't see that it could be any fault of ours; retries could be built into the HTTPFileSystem, but perhaps it's better to retry the whole tasks in such cases.

@mrocklin
Copy link
Member Author

Is there a good reason to avoid retries in HTTPFileSystem?

@martindurant
Copy link
Member

No, but a couple of things that make it tricky:

  • it is tricky to consider which set of errors should lead to a retry. Perhaps would have to retry everything
  • some things, like establishing the initial connection, are already retried by requests/urllib
  • if it's a timeout, then a set of retries might take a very long time to fail
  • in the fsspec implementation, there is a non-seekable fallback mode when the file-size is unavailable, that gives you a requests file-like object rather than a HTTPFile. I don't think we can easily intercept its read methods for the purposes of catching errors.

@martindurant
Copy link
Member

This SO answer might be the best way to do it globally: https://stackoverflow.com/a/15431343/3821154 , allows you to be explicit about retries following a connection error that should apply to all connections within a session

@ahirner
Copy link

ahirner commented Jul 13, 2020

Quite some refactoring of fsspec's HTTP implementation lately.

Are dask tests still flaky?
AFAICS, fsspec now returns an HTTPFile even if range requests are not possible. Does that mean a retry policy in fsspec makes more sense now @martindurant?

@martindurant
Copy link
Member

HTTPFileSystem might now return a HTTPStreamFile where previously it returned a raw file-like requests response object. I don't think this changes anything from dask's point of view, except that we don't even try the "lets see if this is smaller than a block" approach. A retry would have to be for the whole of the request, not each call to read. However, a retry on establishing the connection (here) would make sense.

@martindurant
Copy link
Member

(feel free to implement that in a PR, in case you have the time)

@jacobtomlinson jacobtomlinson added the help wanted Extra attention is needed label Oct 14, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
help wanted Extra attention is needed
Projects
None yet
Development

No branches or pull requests

4 participants