Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Heterogeneous #684

Draft
wants to merge 146 commits into
base: main
Choose a base branch
from
Draft
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
146 commits
Select commit Hold shift + click to select a range
02afcf1
update
XiaohanZhangCMU May 23, 2024
b0436f9
update
XiaohanZhangCMU May 23, 2024
11df967
update
XiaohanZhangCMU May 23, 2024
cfca07b
update
XiaohanZhangCMU May 23, 2024
2d9d38e
Make cluser id a param
XiaohanZhangCMU May 24, 2024
ee01255
Remove prints
XiaohanZhangCMU May 24, 2024
7e17587
Remove prints
XiaohanZhangCMU May 24, 2024
d8227db
update
XiaohanZhangCMU Jun 2, 2024
f9871e6
Bump pydantic from 2.7.1 to 2.7.2 (#692)
dependabot[bot] Jun 3, 2024
4f338e1
Bump uvicorn from 0.29.0 to 0.30.1 (#691)
dependabot[bot] Jun 3, 2024
f5c57eb
Make sure epoch_size is an int (#693)
snarayan21 Jun 5, 2024
55c9f85
Bump databricks-sdk from 0.27.1 to 0.28.0 (#687)
dependabot[bot] Jun 5, 2024
f7c7a9a
Bump pytest from 8.2.1 to 8.2.2 (#697)
dependabot[bot] Jun 11, 2024
8904102
fix: expand user path for Writer's output directory. (#694)
huxuan Jun 11, 2024
1af4365
Bump pydantic from 2.7.2 to 2.7.3 (#696)
dependabot[bot] Jun 11, 2024
be97343
Fix edge cases with scalar or empty numpy array encoding (#702)
snarayan21 Jun 14, 2024
dc8ac7b
Raise IndexError in `Spanner` object instead of `ValueError` (#701)
snarayan21 Jun 14, 2024
ea4f0c3
Fix linting issues with numpy 2 (#705)
snarayan21 Jun 17, 2024
2b53acb
Bump pydantic from 2.7.3 to 2.7.4 (#704)
dependabot[bot] Jun 17, 2024
a5b9eea
Enable correct resumption from the end of an epoch (#700)
snarayan21 Jun 18, 2024
27d61d8
Fix `drop_first` checking in partitioning to account for `world_size`…
snarayan21 Jun 18, 2024
eb61352
fix convert imagenet (#708)
Hprairie Jun 20, 2024
02db72d
Bump pytest-split from 0.8.2 to 0.9.0 (#710)
dependabot[bot] Jun 24, 2024
517dc6d
Remove duplicate `dbfs:` prefix from error message (#712)
vanshcsingh Jun 25, 2024
67ab85c
a (#713)
bigning Jun 28, 2024
7089eef
Upgrade ci_testing, remove codeql (#714)
snarayan21 Jun 28, 2024
d9198ad
Wrap with nparray (#719)
XiaohanZhangCMU Jul 9, 2024
6bbcf5a
Bump pydantic from 2.7.4 to 2.8.2 (#718)
dependabot[bot] Jul 9, 2024
2f7defa
Bump databricks-sdk from 0.28.0 to 0.29.0 (#715)
dependabot[bot] Jul 9, 2024
55e83ec
Add HF File System Support to Streaming (#711)
orionw Jul 11, 2024
5f939c9
Improve error message on non-0 rank when index file download failed (…
bigning Jul 12, 2024
083191f
Bump pytest from 8.2.2 to 8.3.2 (#735)
dependabot[bot] Jul 29, 2024
551f360
Bump uvicorn from 0.30.1 to 0.30.3 (#730)
dependabot[bot] Jul 29, 2024
4f3bc22
Bump fastapi from 0.111.0 to 0.111.1 (#724)
dependabot[bot] Jul 29, 2024
b14cd7a
Update _version.py (#738)
mvpatel2000 Jul 30, 2024
54b6801
Make Pytest log in color in Github Action (#739)
eitanturok Jul 30, 2024
3a6a549
fix azure container name and blob name in download_from_azure (#733)
jaehwana2z Aug 2, 2024
2580d40
update
XiaohanZhangCMU May 23, 2024
dd8f1b9
update
XiaohanZhangCMU May 23, 2024
bcb9429
update
XiaohanZhangCMU May 23, 2024
3c50853
update
XiaohanZhangCMU May 23, 2024
df799fd
Make cluser id a param
XiaohanZhangCMU May 24, 2024
872d08d
Remove prints
XiaohanZhangCMU May 24, 2024
e975501
Remove prints
XiaohanZhangCMU May 24, 2024
03393dd
update
XiaohanZhangCMU Jun 2, 2024
44eb7f4
Add dbsql
XiaohanZhangCMU Aug 2, 2024
c9263ad
merge
XiaohanZhangCMU Aug 2, 2024
58a296f
update
XiaohanZhangCMU Aug 2, 2024
58fc267
update
XiaohanZhangCMU Aug 2, 2024
e591f35
update
XiaohanZhangCMU Aug 2, 2024
7332b0a
update
XiaohanZhangCMU Aug 2, 2024
d3c7e2f
update
XiaohanZhangCMU Aug 2, 2024
fac3b28
update
XiaohanZhangCMU Aug 2, 2024
2b632ef
update
XiaohanZhangCMU Aug 2, 2024
22105c3
update
XiaohanZhangCMU Aug 2, 2024
3de533e
update
XiaohanZhangCMU Aug 2, 2024
13c7d33
update
XiaohanZhangCMU Aug 2, 2024
e273453
update
XiaohanZhangCMU Aug 2, 2024
9a0b09b
update
XiaohanZhangCMU Aug 2, 2024
4bde02a
update
XiaohanZhangCMU Aug 2, 2024
7ce0c14
update
XiaohanZhangCMU Aug 2, 2024
0f3dbca
update
XiaohanZhangCMU Aug 2, 2024
d28f113
update
XiaohanZhangCMU Aug 2, 2024
493f186
update
XiaohanZhangCMU Aug 2, 2024
0c3917a
update
XiaohanZhangCMU Aug 2, 2024
5ba2200
update
XiaohanZhangCMU Aug 3, 2024
0bda7a2
update
XiaohanZhangCMU Aug 3, 2024
9d8e642
update
XiaohanZhangCMU Aug 3, 2024
e13fd71
update
XiaohanZhangCMU Aug 3, 2024
5e95aba
update
XiaohanZhangCMU Aug 3, 2024
4f29dfa
update
XiaohanZhangCMU Aug 3, 2024
ee5b568
update
XiaohanZhangCMU Aug 3, 2024
18ce277
update
XiaohanZhangCMU Aug 3, 2024
792efe9
update
XiaohanZhangCMU Aug 3, 2024
d0922bd
update
XiaohanZhangCMU Aug 3, 2024
c3c4715
update
XiaohanZhangCMU Aug 3, 2024
75401ed
update
XiaohanZhangCMU Aug 3, 2024
4c8545d
update
XiaohanZhangCMU Aug 3, 2024
df103e8
update
XiaohanZhangCMU Aug 3, 2024
0eb61aa
update
XiaohanZhangCMU Aug 3, 2024
3cd0f24
update
XiaohanZhangCMU Aug 3, 2024
a6e1ec0
update
XiaohanZhangCMU Aug 3, 2024
2e42bc9
update
XiaohanZhangCMU Aug 3, 2024
82b04d5
update
XiaohanZhangCMU Aug 3, 2024
184c44d
update
XiaohanZhangCMU Aug 3, 2024
0e36d21
update
XiaohanZhangCMU Aug 4, 2024
c6795b3
update
XiaohanZhangCMU Aug 4, 2024
0560271
update
XiaohanZhangCMU Aug 4, 2024
837c7cc
update
XiaohanZhangCMU Aug 9, 2024
cdef3df
update
XiaohanZhangCMU Aug 9, 2024
59d19ac
update
XiaohanZhangCMU Aug 11, 2024
4bbc8fd
update
XiaohanZhangCMU Aug 11, 2024
9050650
update
XiaohanZhangCMU Aug 11, 2024
2079d7f
update
XiaohanZhangCMU Aug 11, 2024
6f1e84a
update
XiaohanZhangCMU Aug 11, 2024
8f17ea2
update
XiaohanZhangCMU Aug 11, 2024
c7613d8
update
XiaohanZhangCMU Aug 11, 2024
44d0a6e
update
XiaohanZhangCMU Aug 11, 2024
2de90e7
update
XiaohanZhangCMU Aug 11, 2024
348f183
update
XiaohanZhangCMU Aug 11, 2024
d188c5a
update
XiaohanZhangCMU Aug 11, 2024
2b526c7
Fix column ordering
XiaohanZhangCMU Aug 21, 2024
335a78b
Fixed column size should appear in index.json
XiaohanZhangCMU Aug 22, 2024
caf9ce6
update
XiaohanZhangCMU Aug 28, 2024
34ab263
update
XiaohanZhangCMU Aug 29, 2024
0742df0
update
XiaohanZhangCMU Aug 30, 2024
45bb357
update
XiaohanZhangCMU Sep 6, 2024
9b719ae
Add print
XiaohanZhangCMU Sep 6, 2024
b5fe6c5
Add broadcast
XiaohanZhangCMU Sep 11, 2024
aa4ede7
update tests
XiaohanZhangCMU Sep 11, 2024
29ec017
update
XiaohanZhangCMU Sep 11, 2024
b4d3698
update
XiaohanZhangCMU Sep 11, 2024
9110fde
update
XiaohanZhangCMU Sep 11, 2024
30cae09
update
XiaohanZhangCMU Sep 11, 2024
01972e4
update
XiaohanZhangCMU Sep 12, 2024
6ba7e36
update
XiaohanZhangCMU Sep 12, 2024
825b586
update
XiaohanZhangCMU Sep 12, 2024
5abb617
update
XiaohanZhangCMU Sep 12, 2024
f1b07e1
update
XiaohanZhangCMU Sep 12, 2024
f1d1cd7
update
XiaohanZhangCMU Sep 12, 2024
44b9cc3
update
XiaohanZhangCMU Sep 12, 2024
cb08ce8
update
XiaohanZhangCMU Sep 12, 2024
052af99
update
XiaohanZhangCMU Sep 13, 2024
d83c7da
update
XiaohanZhangCMU Sep 13, 2024
9faed31
update
XiaohanZhangCMU Sep 13, 2024
cd9bb0c
update
XiaohanZhangCMU Sep 13, 2024
c88aef4
update
XiaohanZhangCMU Sep 13, 2024
c0e21b7
update
XiaohanZhangCMU Sep 13, 2024
9ffdb2d
update
XiaohanZhangCMU Sep 13, 2024
2cd31d9
update
XiaohanZhangCMU Sep 13, 2024
c54cb14
update
XiaohanZhangCMU Sep 13, 2024
3092e84
update
XiaohanZhangCMU Sep 13, 2024
91a56c9
update
XiaohanZhangCMU Sep 13, 2024
03e6309
update
XiaohanZhangCMU Sep 14, 2024
9179260
update
XiaohanZhangCMU Sep 14, 2024
565963f
update
XiaohanZhangCMU Sep 14, 2024
ab477cf
update
XiaohanZhangCMU Sep 14, 2024
fc2d9eb
update
XiaohanZhangCMU Sep 14, 2024
d825105
update
XiaohanZhangCMU Sep 16, 2024
b36770d
update
XiaohanZhangCMU Sep 18, 2024
95bb76a
update
XiaohanZhangCMU Sep 18, 2024
8744c82
update
XiaohanZhangCMU Sep 18, 2024
c531aa4
update
XiaohanZhangCMU Sep 19, 2024
745290e
update
XiaohanZhangCMU Sep 19, 2024
177631d
update
XiaohanZhangCMU Sep 19, 2024
2cce6ea
get host/token from WorkspaceClient.config
XiaohanZhangCMU Sep 29, 2024
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
58 changes: 0 additions & 58 deletions .github/workflows/codeql-analysis.yml

This file was deleted.

2 changes: 1 addition & 1 deletion .github/workflows/linting.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -32,7 +32,7 @@ jobs:
uses: actions/checkout@v3
with:
repository: mosaicml/ci-testing
ref: v0.0.2
ref: v0.0.9
path: ./ci-testing
- uses: ./ci-testing/.github/actions/code-quality
with:
Expand Down
18 changes: 18 additions & 0 deletions docs/source/how_to_guides/configure_cloud_storage_credentials.md
Original file line number Diff line number Diff line change
Expand Up @@ -7,6 +7,7 @@ Streaming dataset supports the following cloud storage providers to stream your
- [Oracle Cloud Storage](#oracle-cloud-storage)
- [Azure Blob Storage](#azure-blob-storage-and-azure-datalake)
- [Databricks](#databricks)
- [Huggingface Datasets](#huggingface-datasets)

## Amazon S3

Expand Down Expand Up @@ -251,6 +252,23 @@ export AZURE_ACCOUNT_ACCESS_KEY='NN1KHxKKkj20ZO92EMiDQjx3wp2kZG4UUvfAGlgGWRn6sPR
```
````

## Huggingface Datasets

To authenticate Huggingface Hub access, users must set their HuggingFace token ([HF_TOKEN](https://huggingface.co/docs/huggingface_hub/main/en/package_reference/environment_variables#hftoken)) in the run environment. See the [HF's documentation](https://huggingface.co/docs/huggingface_hub/guides/hf_file_system) on the URL format.

Set the Huggingface token in the run environment as shown below

````{tabs}
```{code-tab} py
import os
os.environ['HF_TOKEN'] = 'EXAMPLEFODNN7EXAMPLE'
```

```{code-tab} sh
export HF_TOKEN='EXAMPLEFODNN7EXAMPLE'
```
````

## Databricks

To authenticate Databricks access for both Unity Catalog and Databricks File System (DBFS), users must set their Databricks host (`DATABRICKS_HOST`) and access token (`DATABRICKS_TOKEN`) in the run environment.
Expand Down
2 changes: 1 addition & 1 deletion pyproject.toml
Original file line number Diff line number Diff line change
Expand Up @@ -101,7 +101,7 @@ reportUnusedCoroutine = "error"
# Pytest
[tool.pytest.ini_options]
# By default, do not run remote tests
addopts = "--cov=streaming --cov-fail-under=50 --codeblocks --strict-markers -m 'not daily and not remote' -ra --tb=native"
addopts = "--cov=streaming --cov-fail-under=50 --codeblocks --strict-markers -m 'not daily and not remote' -ra --tb=native --color=yes"

markers = [
# For distributed testing
Expand Down
Loading