Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

OSError: Consistency check failed: file should be of size 18612 but has size 18605 (datasets/tau/scrolls@main/scrolls.py). #2645

Open
kaiqinhu opened this issue Oct 30, 2024 · 3 comments
Labels
bug Something isn't working

Comments

@kaiqinhu
Copy link

Describe the bug

`from datasets import load_dataset, DownloadConfig
from datasets import Dataset
Dataset.cleanup_cache_files

scrolls_datasets = ["quality"]
download_config = DownloadConfig(force_download=True)
data = [load_dataset("tau/scrolls", dataset, force_download=True, download_config=download_config) for dataset in scrolls_datasets]`

Reproduction

No response

Logs

$python main.py 
Downloading builder script:  20%|██████████████████▌                                                                           | 3.68k/18.6k [00:00<00:01, 11.4kB/s]Traceback (most recent call last):
  File "main.py", line 12, in <module>
    data = [load_dataset("tau/scrolls", dataset, force_download=True, download_config=download_config) for dataset in scrolls_datasets]
  File "main.py", line 12, in <listcomp>
    data = [load_dataset("tau/scrolls", dataset, force_download=True, download_config=download_config) for dataset in scrolls_datasets]
  File "/opt/conda/lib/python3.8/site-packages/datasets/load.py", line 2606, in load_dataset
    builder_instance = load_dataset_builder(
  File "/opt/conda/lib/python3.8/site-packages/datasets/load.py", line 2277, in load_dataset_builder
    dataset_module = dataset_module_factory(
  File "/opt/conda/lib/python3.8/site-packages/datasets/load.py", line 1923, in dataset_module_factory
    raise e1 from None
  File "/opt/conda/lib/python3.8/site-packages/datasets/load.py", line 1889, in dataset_module_factory
    return HubDatasetModuleFactoryWithScript(
  File "/opt/conda/lib/python3.8/site-packages/datasets/load.py", line 1507, in get_module
    local_path = self.download_loading_script()
  File "/opt/conda/lib/python3.8/site-packages/datasets/load.py", line 1467, in download_loading_script
    return cached_path(file_path, download_config=download_config)
  File "/opt/conda/lib/python3.8/site-packages/datasets/utils/file_utils.py", line 211, in cached_path
    output_path = get_from_cache(
  File "/opt/conda/lib/python3.8/site-packages/datasets/utils/file_utils.py", line 690, in get_from_cache
    fsspec_get(
  File "/opt/conda/lib/python3.8/site-packages/datasets/utils/file_utils.py", line 396, in fsspec_get
    fs.get_file(path, temp_file.name, callback=callback)
  File "/opt/conda/lib/python3.8/site-packages/huggingface_hub/hf_file_system.py", line 640, in get_file
    http_get(
  File "/opt/conda/lib/python3.8/site-packages/huggingface_hub/file_download.py", line 570, in http_get
    raise EnvironmentError(
OSError: Consistency check failed: file should be of size 18612 but has size 18605 (datasets/tau/scrolls@main/scrolls.py).
We are sorry for the inconvenience. Please retry with `force_download=True`.
If the issue persists, please let us know by opening an issue on https://github.com/huggingface/huggingface_hub.
Downloading builder script: 100%|█████████████████████████████████████████████████████████████████████████████████████████████▉| 18.6k/18.6k [00:00<00:00, 36.0kB/s]

System info

- huggingface_hub version: 0.25.1
- Platform: Linux-4.9.151-015.ali3000.alios7.x86_64-x86_64-with-glibc2.17
- Python version: 3.8.18
- Running in iPython ?: No
- Running in notebook ?: No
- Running in Google Colab ?: No
- Running in Google Colab Enterprise ?: No
- Token path ?: /ossfs/workspace/hf_hub/token
- Has saved token ?: True
- Who am I ?: hukaiqin
- Configured git credential helpers: 
- FastAI: N/A
- Tensorflow: N/A
- Torch: 2.3.0
- Jinja2: 3.1.4
- Graphviz: N/A
- keras: N/A
- Pydot: N/A
- Pillow: 9.3.0
- hf_transfer: N/A
- gradio: 4.13.0
- tensorboard: 2.6
- numpy: 1.23.5
- pydantic: 2.5.3
- aiohttp: 3.9.1
- ENDPOINT: https://huggingface.co
- HF_HUB_CACHE: /ossfs/workspace/hf_hub/hub
- HF_ASSETS_CACHE: /ossfs/workspace/hf_hub/assets
- HF_TOKEN_PATH: /ossfs/workspace/hf_hub/token
- HF_HUB_OFFLINE: False
- HF_HUB_DISABLE_TELEMETRY: False
- HF_HUB_DISABLE_PROGRESS_BARS: None
- HF_HUB_DISABLE_SYMLINKS_WARNING: False
- HF_HUB_DISABLE_EXPERIMENTAL_WARNING: False
- HF_HUB_DISABLE_IMPLICIT_TOKEN: False
- HF_HUB_ENABLE_HF_TRANSFER: False
- HF_HUB_ETAG_TIMEOUT: 10
- HF_HUB_DOWNLOAD_TIMEOUT: 10
@kaiqinhu kaiqinhu added the bug Something isn't working label Oct 30, 2024
@Wauplin
Copy link
Contributor

Wauplin commented Oct 30, 2024

Hi @kaiqinhu, sorry for the inconvenience. This is usually due to a network issue while downloading. Can you retry with force_download=True or on a different network and let us know if the same error happens again (on the same file). Thanks in advance

@kaiqinhu
Copy link
Author

Thanks for responding, but I already set force_download=True in 'load_dataset()', and I can't change the network because of server-cluster settings.

@Wauplin
Copy link
Contributor

Wauplin commented Oct 30, 2024

Can you try to run

from huggingface_hub import hf_hub_download

hf_hub_download("tau/scrolls", filename="scrolls.py", repo_type="dataset", force_download=True)

to check if it does the same? (it has less hidden logic)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

2 participants