Skip to content

Commit

Permalink
[feature] add git lfs for download data (#147)
Browse files Browse the repository at this point in the history
* add git lfs for download data

* enable git oss cache

* update docs

* fix get project name bug
  • Loading branch information
chengmengli06 authored Mar 29, 2022
1 parent f39329f commit 1938992
Show file tree
Hide file tree
Showing 11 changed files with 600 additions and 26 deletions.
26 changes: 26 additions & 0 deletions .git_bin_path
Original file line number Diff line number Diff line change
@@ -0,0 +1,26 @@
{"leaf_name": "data/test", "leaf_file": ["data/test/batch_criteo_sample.tfrecord", "data/test/criteo_sample.tfrecord", "data/test/dwd_avazu_ctr_deepmodel_10w.csv", "data/test/embed_data.csv", "data/test/lookup_data.csv", "data/test/tag_kv_data.csv", "data/test/test.csv", "data/test/test_sample_weight.txt", "data/test/test_with_quote.csv"]}
{"leaf_name": "data/test/export", "leaf_file": ["data/test/export/data.csv"]}
{"leaf_name": "data/test/hpo_test/eval_val", "leaf_file": ["data/test/hpo_test/eval_val/events.out.tfevents.1597889819.j63d04245.sqa.eu95"]}
{"leaf_name": "data/test/inference", "leaf_file": ["data/test/inference/lookup_data_test80.csv", "data/test/inference/taobao_infer_data.txt"]}
{"leaf_name": "data/test/inference/fg_export_multi", "leaf_file": ["data/test/inference/fg_export_multi/saved_model.pb"]}
{"leaf_name": "data/test/inference/fg_export_multi/assets", "leaf_file": ["data/test/inference/fg_export_multi/assets/pipeline.config"]}
{"leaf_name": "data/test/inference/fg_export_multi/variables", "leaf_file": ["data/test/inference/fg_export_multi/variables/variables.data-00000-of-00001", "data/test/inference/fg_export_multi/variables/variables.index"]}
{"leaf_name": "data/test/inference/fg_export_single", "leaf_file": ["data/test/inference/fg_export_single/saved_model.pb"]}
{"leaf_name": "data/test/inference/fg_export_single/assets", "leaf_file": ["data/test/inference/fg_export_single/assets/pipeline.config"]}
{"leaf_name": "data/test/inference/fg_export_single/variables", "leaf_file": ["data/test/inference/fg_export_single/variables/variables.data-00000-of-00001", "data/test/inference/fg_export_single/variables/variables.index"]}
{"leaf_name": "data/test/inference/fm_export", "leaf_file": ["data/test/inference/fm_export/saved_model.pb"]}
{"leaf_name": "data/test/inference/fm_export/assets", "leaf_file": ["data/test/inference/fm_export/assets/pipeline.config"]}
{"leaf_name": "data/test/inference/fm_export/variables", "leaf_file": ["data/test/inference/fm_export/variables/variables.data-00000-of-00001", "data/test/inference/fm_export/variables/variables.index"]}
{"leaf_name": "data/test/inference/lookup_export", "leaf_file": ["data/test/inference/lookup_export/saved_model.pb"]}
{"leaf_name": "data/test/inference/lookup_export/assets", "leaf_file": ["data/test/inference/lookup_export/assets/pipeline.config"]}
{"leaf_name": "data/test/inference/lookup_export/variables", "leaf_file": ["data/test/inference/lookup_export/variables/variables.data-00000-of-00001", "data/test/inference/lookup_export/variables/variables.index"]}
{"leaf_name": "data/test/inference/tb_multitower_export", "leaf_file": ["data/test/inference/tb_multitower_export/saved_model.pb"]}
{"leaf_name": "data/test/inference/tb_multitower_export/assets", "leaf_file": ["data/test/inference/tb_multitower_export/assets/pipeline.config"]}
{"leaf_name": "data/test/inference/tb_multitower_export/variables", "leaf_file": ["data/test/inference/tb_multitower_export/variables/variables.data-00000-of-00001", "data/test/inference/tb_multitower_export/variables/variables.index"]}
{"leaf_name": "data/test/inference/tb_multitower_placeholder_rename_export", "leaf_file": ["data/test/inference/tb_multitower_placeholder_rename_export/saved_model.pb"]}
{"leaf_name": "data/test/inference/tb_multitower_placeholder_rename_export/assets", "leaf_file": ["data/test/inference/tb_multitower_placeholder_rename_export/assets/pipeline.config"]}
{"leaf_name": "data/test/inference/tb_multitower_placeholder_rename_export/variables", "leaf_file": ["data/test/inference/tb_multitower_placeholder_rename_export/variables/variables.data-00000-of-00001", "data/test/inference/tb_multitower_placeholder_rename_export/variables/variables.index"]}
{"leaf_name": "data/test/latest_ckpt_test", "leaf_file": ["data/test/latest_ckpt_test/model.ckpt-500.data-00000-of-00001", "data/test/latest_ckpt_test/model.ckpt-500.index", "data/test/latest_ckpt_test/model.ckpt-500.meta"]}
{"leaf_name": "data/test/rtp", "leaf_file": ["data/test/rtp/taobao_fg_pred.out", "data/test/rtp/taobao_test_bucketize_feature.txt", "data/test/rtp/taobao_test_feature.txt", "data/test/rtp/taobao_test_input.txt", "data/test/rtp/taobao_train_bucketize_feature.txt", "data/test/rtp/taobao_train_feature.txt", "data/test/rtp/taobao_train_input.txt", "data/test/rtp/taobao_valid.csv", "data/test/rtp/taobao_valid_feature.txt"]}
{"leaf_name": "data/test/tb_data", "leaf_file": ["data/test/tb_data/taobao_ad_feature_gl", "data/test/tb_data/taobao_clk_edge_gl", "data/test/tb_data/taobao_multi_seq_test_data", "data/test/tb_data/taobao_multi_seq_train_data", "data/test/tb_data/taobao_noclk_edge_gl", "data/test/tb_data/taobao_test_data", "data/test/tb_data/taobao_test_data_kd", "data/test/tb_data/taobao_train_data", "data/test/tb_data/taobao_train_data_kd", "data/test/tb_data/taobao_user_profile_gl"]}
{"leaf_name": "data/test/tb_data_with_time", "leaf_file": ["data/test/tb_data_with_time/taobao_test_data_with_time", "data/test/tb_data_with_time/taobao_train_data_with_time"]}
26 changes: 26 additions & 0 deletions .git_bin_url
Original file line number Diff line number Diff line change
@@ -0,0 +1,26 @@
{"leaf_path": "data/test", "sig": "656d73b4e78d0d71e98120050bc51387", "remote_path": "data/git_oss_sample_data/data_test_656d73b4e78d0d71e98120050bc51387"}
{"leaf_path": "data/test/export", "sig": "c2e5ad1e91edb55b215ea108b9f14537", "remote_path": "data/git_oss_sample_data/data_test_export_c2e5ad1e91edb55b215ea108b9f14537"}
{"leaf_path": "data/test/hpo_test/eval_val", "sig": "fef5f6cd659c35b713c1b8bcb97c698f", "remote_path": "data/git_oss_sample_data/data_test_hpo_test_eval_val_fef5f6cd659c35b713c1b8bcb97c698f"}
{"leaf_path": "data/test/inference", "sig": "e2c4b0f07ff8568eb7b8e1819326d296", "remote_path": "data/git_oss_sample_data/data_test_inference_e2c4b0f07ff8568eb7b8e1819326d296"}
{"leaf_path": "data/test/inference/fg_export_multi", "sig": "c6690cef053aed9e2011bbef90ef33e7", "remote_path": "data/git_oss_sample_data/data_test_inference_fg_export_multi_c6690cef053aed9e2011bbef90ef33e7"}
{"leaf_path": "data/test/inference/fg_export_multi/assets", "sig": "7fe7a4525f5d46cc763172f5200e96e0", "remote_path": "data/git_oss_sample_data/data_test_inference_fg_export_multi_assets_7fe7a4525f5d46cc763172f5200e96e0"}
{"leaf_path": "data/test/inference/fg_export_multi/variables", "sig": "1f9aad9744382c6d5b5f152d556d9b30", "remote_path": "data/git_oss_sample_data/data_test_inference_fg_export_multi_variables_1f9aad9744382c6d5b5f152d556d9b30"}
{"leaf_path": "data/test/inference/fg_export_single", "sig": "c314cb4b77db30084cf5964bee6a0844", "remote_path": "data/git_oss_sample_data/data_test_inference_fg_export_single_c314cb4b77db30084cf5964bee6a0844"}
{"leaf_path": "data/test/inference/fg_export_single/assets", "sig": "7fe7a4525f5d46cc763172f5200e96e0", "remote_path": "data/git_oss_sample_data/data_test_inference_fg_export_single_assets_7fe7a4525f5d46cc763172f5200e96e0"}
{"leaf_path": "data/test/inference/fg_export_single/variables", "sig": "1f9aad9744382c6d5b5f152d556d9b30", "remote_path": "data/git_oss_sample_data/data_test_inference_fg_export_single_variables_1f9aad9744382c6d5b5f152d556d9b30"}
{"leaf_path": "data/test/inference/fm_export", "sig": "832943ab03b88e22c21a6624230c54bd", "remote_path": "data/git_oss_sample_data/data_test_inference_fm_export_832943ab03b88e22c21a6624230c54bd"}
{"leaf_path": "data/test/inference/fm_export/assets", "sig": "6e4e5aa125ff052fe1a1328df0737e0e", "remote_path": "data/git_oss_sample_data/data_test_inference_fm_export_assets_6e4e5aa125ff052fe1a1328df0737e0e"}
{"leaf_path": "data/test/inference/fm_export/variables", "sig": "8e7debd5c7417db815f3abd3e2940cfa", "remote_path": "data/git_oss_sample_data/data_test_inference_fm_export_variables_8e7debd5c7417db815f3abd3e2940cfa"}
{"leaf_path": "data/test/inference/lookup_export", "sig": "d0dd1bb6dd53617ddbf7cc66e8ebb102", "remote_path": "data/git_oss_sample_data/data_test_inference_lookup_export_d0dd1bb6dd53617ddbf7cc66e8ebb102"}
{"leaf_path": "data/test/inference/lookup_export/assets", "sig": "d1888265db24724e295f2249c66d8554", "remote_path": "data/git_oss_sample_data/data_test_inference_lookup_export_assets_d1888265db24724e295f2249c66d8554"}
{"leaf_path": "data/test/inference/lookup_export/variables", "sig": "adc3dc59b12dee9a1408b8b532247fc0", "remote_path": "data/git_oss_sample_data/data_test_inference_lookup_export_variables_adc3dc59b12dee9a1408b8b532247fc0"}
{"leaf_path": "data/test/inference/tb_multitower_export", "sig": "140de4544cd9d9c6e19a79df53d82611", "remote_path": "data/git_oss_sample_data/data_test_inference_tb_multitower_export_140de4544cd9d9c6e19a79df53d82611"}
{"leaf_path": "data/test/inference/tb_multitower_export/assets", "sig": "e7ef90fa947d1c2de35d8d674d8c8d6c", "remote_path": "data/git_oss_sample_data/data_test_inference_tb_multitower_export_assets_e7ef90fa947d1c2de35d8d674d8c8d6c"}
{"leaf_path": "data/test/inference/tb_multitower_export/variables", "sig": "198e6d7fbbe1aba7e314cc7be4ec1684", "remote_path": "data/git_oss_sample_data/data_test_inference_tb_multitower_export_variables_198e6d7fbbe1aba7e314cc7be4ec1684"}
{"leaf_path": "data/test/inference/tb_multitower_placeholder_rename_export", "sig": "dc05357e52fd574cba48165bc67af906", "remote_path": "data/git_oss_sample_data/data_test_inference_tb_multitower_placeholder_rename_export_dc05357e52fd574cba48165bc67af906"}
{"leaf_path": "data/test/inference/tb_multitower_placeholder_rename_export/assets", "sig": "750925c4866bf1db8c3188f604271c72", "remote_path": "data/git_oss_sample_data/data_test_inference_tb_multitower_placeholder_rename_export_assets_750925c4866bf1db8c3188f604271c72"}
{"leaf_path": "data/test/inference/tb_multitower_placeholder_rename_export/variables", "sig": "56850b4506014ce1bd3ba9b6d60e2770", "remote_path": "data/git_oss_sample_data/data_test_inference_tb_multitower_placeholder_rename_export_variables_56850b4506014ce1bd3ba9b6d60e2770"}
{"leaf_path": "data/test/latest_ckpt_test", "sig": "d41d8cd98f00b204e9800998ecf8427e", "remote_path": "data/git_oss_sample_data/data_test_latest_ckpt_test_d41d8cd98f00b204e9800998ecf8427e"}
{"leaf_path": "data/test/rtp", "sig": "76cda60582617ddbb7cd5a49eb68a4b9", "remote_path": "data/git_oss_sample_data/data_test_rtp_76cda60582617ddbb7cd5a49eb68a4b9"}
{"leaf_path": "data/test/tb_data", "sig": "2005981b8f3eafd0ba74ab08fee0050b", "remote_path": "data/git_oss_sample_data/data_test_tb_data_2005981b8f3eafd0ba74ab08fee0050b"}
{"leaf_path": "data/test/tb_data_with_time", "sig": "1a7648f4ae55faf37855762bccbb70cc", "remote_path": "data/git_oss_sample_data/data_test_tb_data_with_time_1a7648f4ae55faf37855762bccbb70cc"}
5 changes: 5 additions & 0 deletions .git_oss_config_pub
Original file line number Diff line number Diff line change
@@ -0,0 +1,5 @@
bucket_name = easyrec
git_oss_data_dir = data/git_oss_sample_data
host = oss-cn-beijing.aliyuncs.com
git_oss_cache_dir = ${TMPDIR}/${PROJECT_NAME}/.git_oss_cache
git_oss_private_config = ~/.git_oss_config_private
6 changes: 1 addition & 5 deletions .github/workflows/ci.yml
Original file line number Diff line number Diff line change
Expand Up @@ -22,11 +22,7 @@ jobs:
PULL_REQUEST_NUM: ${{ github.event.pull_request.number }}
run: |
source activate /home/admin/tf12_py2/
if [ ! -e "/tmp/easyrec_data_20220113.tar.gz" ]
then
wget https://easyrec.oss-cn-beijing.aliyuncs.com/data/easyrec_data_20220113.tar.gz -O /tmp/easyrec_data_20220113.tar.gz
fi
tar -zvxf /tmp/easyrec_data_20220113.tar.gz
python git-lfs/git_lfs.py pull
source scripts/ci_test.sh
- name: LabelAndComment
env:
Expand Down
57 changes: 41 additions & 16 deletions docs/source/develop.md
Original file line number Diff line number Diff line change
Expand Up @@ -41,30 +41,53 @@ pre-commit run -a
#### 单元测试

```bash
TEST_DEVICES=0,1 sh scripts/ci_test.sh
sh scripts/ci_test.sh
```

#### Odps 测试
- 运行单个测试用例

```bash
TEMPDIR=/tmp python -m easy_rec.python.test.odps_run --oss_config ~/.ossutilconfig [--odps_config {ODPS_CONFIG} --algo_project {ALOG_PROJ} --arn acs:ram::xxx:role/yyy TestPipelineOnOdps.*]
TEST_DEVICES='' python -m easy_rec.python.test.train_eval_test TrainEvalTest.test_tfrecord_input
```

#### 测试数据

下载测试数据
#### Odps 测试

```bash
wget https://easyrec.oss-cn-beijing.aliyuncs.com/data/easyrec_data_20220113.tar.gz
tar -xvzf easyrec_data_20220113.tar.gz
TMPDIR=/tmp python -m easy_rec.python.test.odps_run --oss_config ~/.ossutilconfig [--odps_config {ODPS_CONFIG} --algo_project {ALOG_PROJ} --arn acs:ram::xxx:role/yyy TestPipelineOnOdps.*]
```

如果您要添加新数据,请在“git commit”之前执行以下操作,以将其提交到 git-lfs:
#### 测试数据

```bash
python git-lfs/git_lfs.py add data/test/new_data
python git-lfs/git_lfs.py push
```
测试数据放在data/test目录下面, remote存储在oss://easyrec bucket里面, 使用git-lfs组件管理测试数据.

- 从remote同步数据:
```bash
python git-lfs/git_lfs.py pull
```

- 增加新数据:
- git-lfs配置文件: .git_oss_config_pub
```yaml
bucket_name = easyrec
git_oss_data_dir = data/git_oss_sample_data
host = oss-cn-beijing.aliyuncs.com
git_oss_cache_dir = ${TMPDIR}/${PROJECT_NAME}/.git_oss_cache
git_oss_private_config = ~/.git_oss_config_private
```
- bucket_name: 数据存储的oss bucket, 默认是easyrec
- git_oss_data_dir: oss bucket内部的存储目录
- host: oss bucket对应的endpoint
- git_oss_cache_dir: 更新数据时使用的本地的临时dir
- git_oss_private_config: [ossutil](https://help.aliyun.com/document_detail/120075.html)对应的config,用于push数据到oss bucket.
- 考虑到安全问题, oss://easyrec暂不开放提交数据到oss的权限
- 如需要提交测试数据, 可以先提交到自己的oss bucket里面, 等pull requst merge以后,再同步到oss://easyrec里面.

- git-lfs提交命令:
```bash
python git-lfs/git_lfs.py add data/test/new_data
python git-lfs/git_lfs.py push
```
git-commit也会自动调用pre-commit hook, 执行git_lfs.py push操作.

### 文档

Expand All @@ -73,18 +96,20 @@ python git-lfs/git_lfs.py push
如果文档包含公式或表格,我们建议您使用 reStructuredText 格式或使用
[md-to-rst](https://cloudconvert.com/md-to-rst) 将现有的 Markdown 文件转换为 reStructuredText 。

**构建文档** # 在python3环境下运行
**构建文档**

```bash
# 在python3环境下运行
bash scripts/build_docs.sh
```

### 构建安装包

构建pip包
**构建pip包**

```bash
python setup.py sdist bdist_wheel
```

### [部署](./release.md)
### 部署
- MaxCompute和DataScience[部署文档](./release.md)
3 changes: 1 addition & 2 deletions docs/source/quick_start/local_tutorial.md
Original file line number Diff line number Diff line change
Expand Up @@ -5,8 +5,7 @@
```bash
git clone https://github.com/alibaba/EasyRec.git
cd EasyRec
wget https://easyrec.oss-cn-beijing.aliyuncs.com/data/easyrec_data_20220113.tar.gz
bash scripts/gen_proto.sh # 根据proto文件生成 配置解析.py文件
bash scripts/init.sh
python setup.py install
```

Expand Down
2 changes: 1 addition & 1 deletion easy_rec/python/test/odps_test_util.py
Original file line number Diff line number Diff line change
Expand Up @@ -38,7 +38,7 @@ class OdpsOSSConfig:

def __init__(self, script_path='./samples/odps_script'):
self.time_stamp = int(time.time())
temp_dir = os.environ.get('TEST_DIR', '/tmp')
temp_dir = os.environ.get('TMPDIR', '/tmp')
self.exp_dir = 'easy_rec_odps_test_%d' % self.time_stamp
self.temp_dir = os.path.join(temp_dir, self.exp_dir)
self.log_dir = os.path.join(self.temp_dir, 'logs/')
Expand Down
Loading

0 comments on commit 1938992

Please sign in to comment.