diff --git a/README.md b/README.md index bbca9b077..4a8071938 100644 --- a/README.md +++ b/README.md @@ -2,10 +2,10 @@ [中文](./README.zh.md) -FATE Flow is a multi-participant scheduling platform for managing secure, privacy-preserving federatedlearning end-to-end pipeline, based on: +FATE Flow is a multi-party federated task security scheduling platform for federated learning end-to-end pipeline -- [Shared-State scheduling architecture](https://storage.googleapis.com/pub-tools-public-publication-data/pdf/41684.pdf) -- Secure multi-participant communication across data centers +- [Shared-State Scheduling Architecture](https://storage.googleapis.com/pub-tools-public-publication-data/pdf/41684.pdf) +- Secure Multi-Party Communication Across Data Centers Providing production-level service capabilities: @@ -27,7 +27,7 @@ Please refer to [FATE](https://github.com/FederatedAI/FATE) ## Documentation -The official FATE Flow documentation is here [https://fate-flow.readthedocs.io/zh/latest/zh/](https://fate-flow.readthedocs.io/zh/latest/zh/) +The official FATE Flow documentation is here [https://fate-flow.readthedocs.io/en/latest/](https://fate-flow.readthedocs.io/en/latest/) ## License [Apache License 2.0](LICENSE) diff --git a/README.zh.md b/README.zh.md index 8a15130ec..434e78256 100644 --- a/README.zh.md +++ b/README.zh.md @@ -2,7 +2,7 @@ [English](./README.md) -FATE Flow是一个管理安全、隐私保护联邦学习端到端全流程的多参与方调度平台, 基于: +FATE Flow是一个联邦学习端到端全流程的多方联合任务安全调度平台, 基于: - [共享状态调度架构](https://storage.googleapis.com/pub-tools-public-publication-data/pdf/41684.pdf) - 跨数据中心的多方安全通信 @@ -27,7 +27,7 @@ FATE Flow是一个管理安全、隐私保护联邦学习端到端全流程的 ## 文档 -FATE Flow官方文档在这里[https://fate-flow.readthedocs.io/zh/latest/zh/](https://fate-flow.readthedocs.io/zh/latest/zh/) +FATE Flow官方文档在这里[https://fate-flow.readthedocs.io/en/latest/zh/](https://fate-flow.readthedocs.io/en/latest/zh/) ## License [Apache License 2.0](LICENSE) diff --git a/doc/cli/checkpoint.md b/doc/cli/checkpoint.md new file mode 100644 index 000000000..5fb850b0c --- /dev/null +++ b/doc/cli/checkpoint.md @@ -0,0 +1,84 @@ +## Checkpoint + +### list + +List checkpoints. + +```bash +flow checkpoint list --model-id --model-version --role --party-id --component-name +``` + +**Options** + +| Parameter | Short Flag | Long Flag | Optional | Description | +| -------------- | ---------- | ------------------ | -------- | -------------- | +| model_id | | `--model-id` | No | Model ID | +| model_version | | `--model-version` | No | Model version | +| role | `-r` | `--role` | No | Party role | +| party_id | `-p` | `--party-id` | No | Party ID | +| component_name | `-cpn` | `--component-name` | No | Component name | + +**Example** + +```json +{ + "retcode": 0, + "retmsg": "success", + "data": [ + { + "create_time": "2021-11-07T02:34:54.683015", + "step_index": 0, + "step_name": "step_name", + "models": { + "HeteroLogisticRegressionMeta": { + "buffer_name": "LRModelMeta", + "sha1": "6871508f6e6228341b18031b3623f99a53a87147" + }, + "HeteroLogisticRegressionParam": { + "buffer_name": "LRModelParam", + "sha1": "e3cb636fc93675684bff27117943f5bfa87f3029" + } + } + } + ] +} +``` + +### get + +Get checkpoint information. + +```bash +flow checkpoint get --model-id --model-version --role --party-id --component-name --step-index +``` + + +**Example** + +| Parameter | Short Flag | Long Flag | Optional | Description | +| -------------- | ---------- | ------------------ | -------- | ------------------------------------------- | +| model_id | | `--model-id` | No | Model ID | +| model_version | | `--model-version` | No | Model version | +| role | `-r` | `--role` | No | Party role | +| party_id | `-p` | `--party-id` | No | Party ID | +| component_name | `-cpn` | `--component-name` | No | Component name | +| step_index | | `--step-index` | Yes | Step index, cannot be used with `step_name` | +| step_name | | `--step-name` | Yes | Step name, cannot be used with `step_index` | + +**Example** + +```json +{ + "retcode": 0, + "retmsg": "success", + "data": { + "create_time": "2021-11-07T02:34:54.683015", + "step_index": 0, + "step_name": "step_name", + "models": { + "HeteroLogisticRegressionMeta": "CgJMMhEtQxzr4jYaPxkAAAAAAADwPyIHcm1zcHJvcDD///////////8BOTMzMzMzM8M/QApKBGRpZmZYAQ==", + "HeteroLogisticRegressionParam": "Ig0KAng3EW1qASu+uuO/Ig0KAng0EcNi7a65ReG/Ig0KAng4EbJbl4gvVea/Ig0KAng2EcZwlVZTkOu/Ig0KAngwEVpG8dCbGvG/Ig0KAng5ESJNTx5MLve/Ig0KAngzEZ88H9P8qfO/Ig0KAng1EVfWP8JJv/K/Ig0KAngxEVS0xVXoTem/Ig0KAngyEaApgW32Q/K/KSiiE8AukPs/MgJ4MDICeDEyAngyMgJ4MzICeDQyAng1MgJ4NjICeDcyAng4MgJ4OUj///////////8B" + } + } +} +``` diff --git a/doc/cli/checkpoint.zh.md b/doc/cli/checkpoint.zh.md index dbc3b8e3d..eaf70d7cd 100644 --- a/doc/cli/checkpoint.zh.md +++ b/doc/cli/checkpoint.zh.md @@ -8,7 +8,7 @@ flow checkpoint list --model-id --model-version --role --party-id --component-name ``` -**参数** +**选项** | 参数 | 短格式 | 长格式 | 可选参数 | 说明 | | -------------- | ------ | ------------------ | -------- | ---------- | @@ -53,7 +53,7 @@ flow checkpoint get --model-id --model-version --role ``` -**参数** +**选项** | 参数 | 短格式 | 长格式 | 可选参数 | 说明 | | -------------- | ------ | ------------------ | -------- | ------------------------------------- | diff --git a/doc/cli/data.md b/doc/cli/data.md new file mode 100644 index 000000000..4960c98b7 --- /dev/null +++ b/doc/cli/data.md @@ -0,0 +1,175 @@ +## Data + +### upload + +Used to upload the input data for the modeling task to the storage system supported by fate + +```bash +flow data upload -c ${conf_path} +``` + +Note: conf_path is the parameter path, the specific parameters are as follows + +**Options** + +| parameter name | required | type | description | +| :------------------ | :--- | :----------- | ------------------------------------------------------------ | +| file | yes | string | data storage path | +| id_delimiter | yes | string | Data separator, e.g. "," | +| head | no | int | Whether the data has a table header | yes | int +| partition | yes | int | Number of data partitions | +| storage_engine | no | storage engine type | default "EGGROLL", also support "HDFS", "LOCALFS", "HIVE", etc. | +| namespace | yes | string | table namespace | yes +| table_name | yes | string | table name | +| storage_address | no | object | The storage address of the corresponding storage engine is required +| use_local_data | no | int | The default is 1, which means use the data from the client's machine; 0 means use the data from the fate flow service's machine. +| drop | no | int | Whether to overwrite uploads | +| extend_sid | no | bool | Whether to add a new column for uuid id, default False | +| auto_increasing_sid | no | bool | Whether the new id column is self-increasing (will only work if extend_sid is True), default False | + +**Example** + +- eggroll + + ```json + { + "file": "examples/data/breast_hetero_guest.csv", + "id_delimiter": ",", + "head": 1, + "partition": 10, + "namespace": "experiment", + "table_name": "breast_hetero_guest", + "storage_engine": "EGGROLL" + } + ``` + +- hdfs + + ```json + { + "file": "examples/data/breast_hetero_guest.csv", + "id_delimiter": ",", + "head": 1, + "partition": 10, + "namespace": "experiment", + "table_name": "breast_hetero_guest", + "storage_engine": "HDFS" + } + ``` + +- localfs + + ```json + { + "file": "examples/data/breast_hetero_guest.csv", + "id_delimiter": ",", + "head": 1, + "partition": 4, + "namespace": "experiment", + "table_name": "breast_hetero_guest", + "storage_engine": "LOCALFS" + } + ``` + +**return parameters** + +| parameter name | type | description | +| :------ | :----- | -------- | +| jobId | string | task id | +| retcode | int | return code | +| retmsg | string | return message | +| data | object | return data | + +**Example** + +```shell +{ + "data": { + "board_url": "http://xxx.xxx.xxx.xxx:8080/index.html#/dashboard?job_id=202111081218319075660&role=local&party_id=0", + "code": 0, + "dsl_path": "/data/projects/fate/jobs/202111081218319075660/job_dsl.json", + "job_id": "202111081218319075660", + "logs_directory": "/data/projects/fate/logs/202111081218319075660", + "message": "success", + "model_info": { + "model_id": "local-0#model", + "model_version": "202111081218319075660" + }, + "namespace": "experiment", + "pipeline_dsl_path": "/data/projects/fate/jobs/202111081218319075660/pipeline_dsl.json", + "runtime_conf_on_party_path": "/data/projects/fate/jobs/202111081218319075660/local/0/job_runtime_on_party_conf.json", + "runtime_conf_path":"/data/projects/fate/jobs/202111081218319075660/job_runtime_conf.json", + "table_name": "breast_hetero_host", + "train_runtime_conf_path":"/data/projects/fate/jobs/202111081218319075660/train_runtime_conf.json" + }, + "jobId": "202111081218319075660", + "retcode": 0, + "retmsg": "success" +} + +``` + +### download + +**Brief description:** + +Used to download data from within the fate storage engine to file format data + +```bash +flow data download -c ${conf_path} +``` + +Note: conf_path is the parameter path, the specific parameters are as follows + +**Options** + +| parameter name | required | type | description | +| :---------- | :--- | :----- | -------------- | +| output_path | yes | string | download_path | +| table_name | yes | string | fate table name | +| namespace | yes | int | fate table namespace | + +Example: + +```json +{ + "output_path": "/data/projects/fate/breast_hetero_guest.csv", + "namespace": "experiment", + "table_name": "breast_hetero_guest" +} +``` + +**return parameters** + +| parameter name | type | description | +| :------ | :----- | -------- | +| retcode | int | return code | +| retmsg | string | return message | +| data | object | return data | + +**Example** + +```json +{ + "data": { + "board_url": "http://xxx.xxx.xxx.xxx:8080/index.html#/dashboard?job_id=202111081457135282090&role=local&party_id=0", + "code": 0, + "dsl_path": "/data/projects/fate/jobs/202111081457135282090/job_dsl.json", + "job_id": "202111081457135282090", + "logs_directory": "/data/projects/fate/logs/202111081457135282090", + "message": "success", + "model_info": { + "model_id": "local-0#model", + "model_version": "202111081457135282090" + }, + "pipeline_dsl_path": "/data/projects/fate/jobs/202111081457135282090/pipeline_dsl.json", + "runtime_conf_on_party_path": "/data/projects/fate/jobs/202111081457135282090/local/0/job_runtime_on_party_conf.json", + "runtime_conf_path": "/data/projects/fate/jobs/202111081457135282090/job_runtime_conf.json", + "train_runtime_conf_path": "/data/projects/fate/jobs/202111081457135282090/train_runtime_conf.json" + }, + "jobId": "202111081457135282090", + "retcode": 0, + "retmsg": "success" +} + +``` \ No newline at end of file diff --git a/doc/cli/data.zh.md b/doc/cli/data.zh.md index 070d48573..43994268e 100644 --- a/doc/cli/data.zh.md +++ b/doc/cli/data.zh.md @@ -10,7 +10,7 @@ flow data upload -c ${conf_path} 注: conf_path为参数路径,具体参数如下 -**参数** +**选项** | 参数名 | 必选 | 类型 | 说明 | | :------------------ | :--- | :----------- | ------------------------------------------------------------ | @@ -71,7 +71,7 @@ flow data upload -c ${conf_path} } ``` -**返回参数** +**返回** | 参数名 | 类型 | 说明 | | :------ | :----- | -------- | @@ -121,7 +121,7 @@ flow data download -c ${conf_path} 注: conf_path为参数路径,具体参数如下 -**参数** +**选项** | 参数名 | 必选 | 类型 | 说明 | | :---------- | :--- | :----- | -------------- | @@ -139,7 +139,7 @@ flow data download -c ${conf_path} } ``` -**返回参数** +**返回** | 参数名 | 类型 | 说明 | | :------ | :----- | -------- | diff --git a/doc/cli/job.md b/doc/cli/job.md new file mode 100644 index 000000000..c355a0aa0 --- /dev/null +++ b/doc/cli/job.md @@ -0,0 +1,277 @@ +## Job + +### submit + +Build a federated learning job with two configuration files: job dsl and job conf, and submit it to the scheduler for execution + +```bash +flow job submit [options] +``` + +**Options** + +| parameter name | required | type | description | +| :-------------- | :------- | :----- | --------------- | +| -d, --dsl-path | yes | string | path to job dsl | +| -c, --conf-path | yes | string | job conf's path | + +**Returns** + +| parameter name | type | description | +| :------------------------------ | :----- | --------------------------------------------------------------------------------------------------------------------- | +| retcode | int | return code | +| retmsg | string | return message | +| jobId | string | Job ID | +| data | dict | return data | +| data.dsl_path | string | The path to the actual running dsl configuration generated by the system based on the submitted dsl content | +| data.runtime_conf_on_party_path | string | The system-generated path to the actual running conf configuration for each party based on the submitted conf content | +| data.dsl_path | string | The system-generated path to the actual running conf configuration for each party based on the submitted conf content | +| data.board_url | string | fateboard view address | +| data.model_info | dict | Model identification information | + +**Example** + +```json +{ + "data": { + "board_url": "http://127.0.0.1:8080/index.html#/dashboard?job_id=202111061608424372620&role=guest&party_id=9999", + "code": 0, + "dsl_path": "$FATE_PROJECT_BASE/jobs/202111061608424372620/job_dsl.json", + "job_id": "202111061608424372620", + "logs_directory": "$FATE_PROJECT_BASE/logs/202111061608424372620", + "message": "success", + "model_info": { + "model_id": "arbiter-10000#guest-9999#host-10000#model", + "model_version": "202111061608424372620" + }, + "pipeline_dsl_path": "$FATE_PROJECT_BASE/jobs/202111061608424372620/pipeline_dsl.json", + "runtime_conf_on_party_path": "$FATE_FATE_PROJECT_BASE/jobs/202111061608424372620/guest/9999/job_runtime_on_party_conf.json", + "runtime_conf_path":"$FATE_PROJECT_BASE/jobs/202111061608424372620/job_runtime_conf.json", + "train_runtime_conf_path": "$FATE_PROJECT_BASE/jobs/202111061608424372620/train_runtime_conf.json" + }, + "jobId": "202111061608424372620", + "retcode": 0, + "retmsg": "success" +} +``` + +### rerun + +Rerun a job + +```bash +flow job rerun [options] +``` + +**Options** + +| parameter name | required | type | description | +| :------------- | :------- | :--- | ----------- |------- | +| -j, --job-id | yes | string | job id path | +| --cpn, --component-name | no | string | Specifies which component to rerun from, unspecified components will not be executed if they have no upstream dependencies on the specified component; if not specified, the entire job will be rerun | +| --force | no | bool | The job will be rerun even if it succeeds; if not specified, the job will be skipped if it succeeds | + +**Returns** + +| parameter name | type | description | +| :------------- | :----- | ------------------ | +| retcode | int | return code | +| retmsg | string | return message | +| jobId | string | Job ID | +| data | dict | return data | + +**Example** + +```bash +flow job rerun -j 202111031100369723120 +``` + +```bash +flow job rerun -j 202111031100369723120 -cpn hetero_lr_0 +``` + +```bash +flow job rerun -j 202111031100369723120 -cpn hetero_lr_0 --force +``` + +### parameter-update + +Update the job parameters + +```bash +flow job parameter-update [options] +``` + +**Options** + +| parameter-name | required | type | description | +| :-------------- | :------- | :----- | ------------------------------------------------------------------------------------------------------------------ | +| -j, --job-id | yes | string | job id path | +| -c, --conf-path | yes | string | The contents of the job conf that needs to be updated, no need to fill in parameters that don't need to be updated | + +**Returns** + +| parameter name | type | description | +| :------------- | :----- | ---------------------------- | +| retcode | int | return code | +| retmsg | string | return message | +| jobId | string | Job ID | +| data | dict | Returns the updated job conf | + +**Example** + +Assuming that the job is updated with some of the execution parameters of the hetero_lr_0 component, the configuration file is as follows. +```bash +{ + "job_parameters": { + }, + "component_parameters": { + "common": { + "hetero_lr_0": { + "alpha": 0.02, + "max_iter": 5 + } + } + } +} +``` + +Execution of the following command takes effect. + +```bash +flow job parameter-update -j 202111061957421943730 -c examples/other/update_parameters.json +``` + +Execute the following command to rerun. + +```bash +flow job rerun -j 202111061957421943730 -cpn hetero_lr_0 --force +``` + +### stop + +Cancels or terminates the specified job + +**Options** + +| number | parameters | short format | long format | required parameters | parameter description | +| ------ | ---------- | ------------ | ----------- | ------------------- | --------------------- | +| 1 | job_id | `-j` | `--job_id` | yes | Job ID | + +**Example** + +``` bash +flow job stop -j $JOB_ID +``` + +### query + +Retrieve task information. +**Options** + +| number | parameters | short-format | long-format | required parameters | parameter description | +| ------ | ---------- | ------------ | ------------ | ------------------- | --------------------- | +| 1 | job_id | `-j` | `--job_id` | no | Job ID | +| 2 | role | `-r` | `--role` | no | role | +| 3 | party_id | `-p` | `--party_id` | no | Party ID | +| 4 | status | `-s` | `--status` | No | Task status | + +**Example** + +``` bash +flow job query -r guest -p 9999 -s complete +flow job query -j $JOB_ID +``` + +### view + +Retrieve the job data view. +**Options** + +| number | parameters | short-format | long-format | required parameters | parameter description | +| ------ | ---------- | ------------ | ------------ | ------------------- | --------------------- | +| 1 | job_id | `-j` | `--job_id` | yes | Job ID | +| 2 | role | `-r` | `--role` | no | role | +| 3 | party_id | `-p` | `--party_id` | no | Party ID | +| 4 | status | `-s` | `--status` | No | Task status | + +**Example** + +``` bash +flow job view -j $JOB_ID -s complete +``` + +### config + +Download the configuration file for the specified job to the specified directory. + +**Options** + +| number | parameters | short-format | long-format | required parameters | parameter description | +| ------ | ----------- | ------------ | --------------- | ------------------- | --------------------- | +| 1 | job_id | `-j` | `--job_id` | yes | Job ID | +| 2 | role | `-r` | `--role` | yes | role | +| 3 | party_id | `-p` | `--party_id` | yes | Party ID | +| 4 | output_path | `-o` | `--output-path` | yes | output directory | + +**Example** + +``` bash +flow job config -j $JOB_ID -r host -p 10000 --output-path . /examples/ +``` + +### log + +Download the log file of the specified job to the specified directory. +**Options** + +| number | parameters | short-format | long-format | required parameters | parameter description | +| ------ | ----------- | ------------ | --------------- | ------------------- | --------------------- | +| 1 | job_id | `-j` | `--job_id` | yes | Job ID | +| 2 | output_path | `-o` | `--output-path` | yes | output directory | + +**Example** + +``` bash +flow job log -j JOB_ID --output-path . /examples/ +``` + +### list + +Show the list of jobs. +**Options** + +| number | parameters | short-format | long-format | required parameters | parameter description | +| ------ | ---------- | ------------ | ----------- | ------------------- | -------------------------------------- | +| 1 | limit | `-l` | `-limit` | no | Returns the number limit (default: 10) | + +**Example** + +``` bash +flow job list +flow job list -l 30 +``` + +### dsl + +Predictive DSL file generator. +**Options** + +| number | parameters | short-format | long-format | required parameters | parameter description | +| ------ | -------------- | ------------ | ----------------- | ------------------- | ------------------------------------------------------------ | +| 1 | cpn_list | | `-cpn-list` | No | List of user-specified component names | +| 2 | cpn_path | | `-cpn-path` | No | User-specified path to a file with a list of component names | +| 3 | train_dsl_path | | `-train-dsl-path` | yes | path to the training dsl file | +| 4 | output_path | `-o` | `--output-path` | no | output directory path | + +**Example** + +``` bash +flow job dsl --cpn-path fate_flow/examples/component_list.txt --train-dsl-path fate_flow/examples/test_hetero_lr_job_dsl.json + +flow job dsl --cpn-path fate_flow/examples/component_list.txt --train-dsl-path fate_flow/examples/test_hetero_lr_job_dsl.json -o fate_flow /examples/ + +flow job dsl --cpn-list "dataio_0, hetero_feature_binning_0, hetero_feature_selection_0, evaluation_0" --train-dsl-path fate_flow/examples/ test_hetero_lr_job_dsl.json -o fate_flow/examples/ + +flow job dsl --cpn-list [dataio_0,hetero_feature_binning_0,hetero_feature_selection_0,evaluation_0] --train-dsl-path fate_flow/examples/ test_hetero_lr_job_dsl.json -o fate_flow/examples/ +``` diff --git a/doc/cli/job.zh.md b/doc/cli/job.zh.md index d7b60b1b6..a54e5f3b7 100644 --- a/doc/cli/job.zh.md +++ b/doc/cli/job.zh.md @@ -8,15 +8,14 @@ flow job submit [options] ``` -**参数** +**选项** | 参数名 | 必选 | 类型 | 说明 | | :-------------- | :--- | :----- | -------------- | | -d, --dsl-path | 是 | string | job dsl的路径 | | -c, --conf-path | 是 | string | job conf的路径 | - -**返回参数** +**返回** | 参数名 | 类型 | 说明 | | :------------------------------ | :----- | --------------------------------------------------------------------- | @@ -63,7 +62,7 @@ flow job submit [options] flow job rerun [options] ``` -**参数** +**选项** | 参数名 | 必选 | 类型 | 说明 | | :--------------------- | :--- | :----- | ----------------------------------------------------------------------------------------------------- | @@ -71,7 +70,7 @@ flow job rerun [options] | -cpn, --component-name | 否 | string | 指定从哪个组件重跑,没被指定的组件若与指定组件没有上游依赖关系则不会执行;若不指定该参数则整个作业重跑 | | --force | 否 | bool | 作业即使成功也重跑;若不指定该参数,作业如果成功,则跳过重跑 | -**返回参数** +**返回** | 参数名 | 类型 | 说明 | | :------ | :----- | -------- | @@ -102,14 +101,14 @@ flow job rerun -j 202111031100369723120 -cpn hetero_lr_0 --force flow job parameter-update [options] ``` -**参数** +**选项** | 参数名 | 必选 | 类型 | 说明 | | :-------------- | :--- | :----- | ---------------------------------------------------- | | -j, --job-id | 是 | string | job id 路径 | | -c, --conf-path | 是 | string | 需要更新的job conf的内容,不需要更新的参数不需要填写 | -**返回参数** +**返回** | 参数名 | 类型 | 说明 | | :------ | :----- | -------------------- | @@ -137,11 +136,13 @@ flow job parameter-update [options] ``` 执行如下命令生效: + ```bash flow job parameter-update -j 202111061957421943730 -c examples/other/update_parameters.json ``` 执行如下命令重跑: + ```bash flow job rerun -j 202111061957421943730 -cpn hetero_lr_0 --force ``` @@ -150,22 +151,22 @@ flow job rerun -j 202111061957421943730 -cpn hetero_lr_0 --force 取消或终止指定任务 -- *参数*: +**选项** | 编号 | 参数 | 短格式 | 长格式 | 必要参数 | 参数介绍 | | ---- | ------ | ------ | ---------- | -------- | -------- | | 1 | job_id | `-j` | `--job_id` | 是 | Job ID | -- *示例*: +**样例** - ``` bash - flow job stop -j $JOB_ID - ``` +``` bash +flow job stop -j $JOB_ID +``` ### query -- *介绍*: 检索任务信息。 -- *参数*: +检索任务信息。 +**选项** | 编号 | 参数 | 短格式 | 长格式 | 必要参数 | 参数介绍 | | ---- | -------- | ------ | ------------ | -------- | -------- | @@ -174,17 +175,17 @@ flow job rerun -j 202111061957421943730 -cpn hetero_lr_0 --force | 3 | party_id | `-p` | `--party_id` | 否 | Party ID | | 4 | status | `-s` | `--status` | 否 | 任务状态 | -- *示例*: +**样例**: - ``` bash - flow job query -r guest -p 9999 -s complete - flow job query -j $JOB_ID - ``` +``` bash +flow job query -r guest -p 9999 -s complete +flow job query -j $JOB_ID +``` ### view -- *介绍*: 检索任务数据视图。 -- *参数*: +检索任务数据视图。 +**选项** | 编号 | 参数 | 短格式 | 长格式 | 必要参数 | 参数介绍 | | ---- | -------- | ------ | ------------ | -------- | -------- | @@ -193,16 +194,16 @@ flow job rerun -j 202111061957421943730 -cpn hetero_lr_0 --force | 3 | party_id | `-p` | `--party_id` | 否 | Party ID | | 4 | status | `-s` | `--status` | 否 | 任务状态 | -- *示例*: +**样例**: - ``` bash - flow job view -j $JOB_ID -s complete - ``` +``` bash +flow job view -j $JOB_ID -s complete +``` ### config -- *介绍*: 下载指定任务的配置文件到指定目录。 -- *参数*: +下载指定任务的配置文件到指定目录。 +**选项** | 编号 | 参数 | 短格式 | 长格式 | 必要参数 | 参数介绍 | | ---- | ----------- | ------ | --------------- | -------- | -------- | @@ -211,38 +212,38 @@ flow job rerun -j 202111061957421943730 -cpn hetero_lr_0 --force | 3 | party_id | `-p` | `--party_id` | 是 | Party ID | | 4 | output_path | `-o` | `--output-path` | 是 | 输出目录 | -- *示例*: +**样例**: - ``` bash - flow job config -j $JOB_ID -r host -p 10000 --output-path ./examples/ - ``` +``` bash +flow job config -j $JOB_ID -r host -p 10000 --output-path ./examples/ +``` ### log -- *介绍*: 下载指定任务的日志文件到指定目录。 -- *参数*: +下载指定任务的日志文件到指定目录。 +**选项** | 编号 | 参数 | 短格式 | 长格式 | 必要参数 | 参数介绍 | | ---- | ----------- | ------ | --------------- | -------- | -------- | | 1 | job_id | `-j` | `--job_id` | 是 | Job ID | | 2 | output_path | `-o` | `--output-path` | 是 | 输出目录 | -- *示例*: +**样例**: - ``` bash - flow job log -j JOB_ID --output-path ./examples/ - ``` +``` bash +flow job log -j JOB_ID --output-path ./examples/ +``` ### list -- *介绍*: 展示任务列表。 -- *参数*: +展示任务列表。 +**选项** | 编号 | 参数 | 短格式 | 长格式 | 必要参数 | 参数介绍 | | ---- | ----- | ------ | --------- | -------- | ------------------------ | | 1 | limit | `-l` | `--limit` | 否 | 返回数量限制(默认:10) | -- *示例*: +**样例**: ``` bash flow job list @@ -251,8 +252,8 @@ flow job list -l 30 ### dsl -- *介绍*: 预测DSL文件生成器。 -- *参数*: +预测DSL文件生成器。 +**选项** | 编号 | 参数 | 短格式 | 长格式 | 必要参数 | 参数介绍 | | ---- | -------------- | ------ | ------------------ | -------- | -------------------------------- | @@ -261,7 +262,7 @@ flow job list -l 30 | 3 | train_dsl_path | | `--train-dsl-path` | 是 | 训练dsl文件路径 | | 4 | output_path | `-o` | `--output-path` | 否 | 输出目录路径 | -- *示例*: +**样例**: ``` bash flow job dsl --cpn-path fate_flow/examples/component_list.txt --train-dsl-path fate_flow/examples/test_hetero_lr_job_dsl.json diff --git a/doc/cli/model.md b/doc/cli/model.md new file mode 100644 index 000000000..1620c967f --- /dev/null +++ b/doc/cli/model.md @@ -0,0 +1,386 @@ +## Model + +### load + +Load a model generated by `deploy` to Fate-Serving. + + +```bash +flow model load -c examples/model/publish_load_model.json +flow model load -c examples/model/publish_load_model.json -j +``` + +**Options** + +| Parameter | Short Flag | Long Flag | Optional | Description | +| --------- | ---------- | ------------- | -------- | ---------------- | +| conf_path | `-c` | `--conf-path` | No | Config file path | +| job_id | `-j` | `--job-id` | Yes | Job ID | + +**Example** + +```json +{ + "data": { + "detail": { + "guest": { + "9999": { + "retcode": 0, + "retmsg": "success" + } + }, + "host": { + "10000": { + "retcode": 0, + "retmsg": "success" + } + } + }, + "guest": { + "9999": 0 + }, + "host": { + "10000": 0 + } + }, + "jobId": "202111091122168817080", + "retcode": 0, + "retmsg": "success" +} +``` + +### bind + +Bind a model generated by `deploy` to Fate-Serving. + +```bash +flow model bind -c examples/model/bind_model_service.json +flow model bind -c examples/model/bind_model_service.json -j +``` + +**Options** + +| Parameter | Short Flag | Long Flag | Optional | Description | +| --------- | ---------- | ------------- | -------- | ---------------- | +| conf_path | `-c` | `--conf-path` | No | Config file path | +| job_id | `-j` | `--job-id` | Yes | Job ID | + +**Example** + +```json +{ + "retcode": 0, + "retmsg": "service id is 123" +} +``` + +### import + +Import the model from a file or storage engine. + +```bash +flow model import -c examples/model/import_model.json +flow model import -c examples/model/restore_model.json --from-database +``` + +**Options** + +| Parameter | Short Flag | Long Flag | Optional | Description | +| ------------- | ---------- | ----------------- | -------- | ------------------------------------ | +| conf_path | `-c` | `--conf-path` | No | Config file path | +| from_database | | `--from-database` | Yes | Import the model from storage engine | + +**Example** + +```json +{ + "data": { + "board_url": "http://127.0.0.1:8080/index.html#/dashboard?job_id=202111091125358161430&role=local&party_id=0", + "code": 0, + "dsl_path": "/root/Codes/FATE-Flow/jobs/202111091125358161430/job_dsl.json", + "job_id": "202111091125358161430", + "logs_directory": "/root/Codes/FATE-Flow/logs/202111091125358161430", + "message": "success", + "model_info": { + "model_id": "local-0#model", + "model_version": "202111091125358161430" + }, + "pipeline_dsl_path": "/root/Codes/FATE-Flow/jobs/202111091125358161430/pipeline_dsl.json", + "runtime_conf_on_party_path": "/root/Codes/FATE-Flow/jobs/202111091125358161430/local/0/job_runtime_on_party_conf.json", + "runtime_conf_path": "/root/Codes/FATE-Flow/jobs/202111091125358161430/job_runtime_conf.json", + "train_runtime_conf_path": "/root/Codes/FATE-Flow/jobs/202111091125358161430/train_runtime_conf.json" + }, + "jobId": "202111091125358161430", + "retcode": 0, + "retmsg": "success" +} +``` + +### export + +Export the model to a file or storage engine. + +```bash +flow model export -c examples/model/export_model.json +flow model export -c examples/model/store_model.json --to-database +``` + +**Options** + +| Parameter | Short Flag | Long Flag | Optional | Description | +| ----------- | ---------- | --------------- | -------- | ---------------------------------- | +| conf_path | `-c` | `--conf-path` | No | Config file path | +| to_database | | `--to-database` | Yes | Export the model to storage engine | + +**Example** + +```json +{ + "data": { + "board_url": "http://127.0.0.1:8080/index.html#/dashboard?job_id=202111091124582110490&role=local&party_id=0", + "code": 0, + "dsl_path": "/root/Codes/FATE-Flow/jobs/202111091124582110490/job_dsl.json", + "job_id": "202111091124582110490", + "logs_directory": "/root/Codes/FATE-Flow/logs/202111091124582110490", + "message": "success", + "model_info": { + "model_id": "local-0#model", + "model_version": "202111091124582110490" + }, + "pipeline_dsl_path": "/root/Codes/FATE-Flow/jobs/202111091124582110490/pipeline_dsl.json", + "runtime_conf_on_party_path": "/root/Codes/FATE-Flow/jobs/202111091124582110490/local/0/job_runtime_on_party_conf.json", + "runtime_conf_path": "/root/Codes/FATE-Flow/jobs/202111091124582110490/job_runtime_conf.json", + "train_runtime_conf_path": "/root/Codes/FATE-Flow/jobs/202111091124582110490/train_runtime_conf.json" + }, + "jobId": "202111091124582110490", + "retcode": 0, + "retmsg": "success" +} +``` + +### migrate + +Migrate the model. + +```bash +flow model migrate -c examples/model/migrate_model.json +``` + +**Options** + +| Parameter | Short Flag | Long Flag | Optional | Description | +| --------- | ---------- | ------------- | -------- | ---------------- | +| conf_path | `-c` | `--conf-path` | No | Config file path | + +**Example** + +```json +{ + "data": { + "arbiter": { + "10000": 0 + }, + "detail": { + "arbiter": { + "10000": { + "retcode": 0, + "retmsg": "Migrating model successfully. The Config of model has been modified automatically. New model id is: arbiter-100#guest-99#host-100#model, model version is: 202111091127392613050. Model files can be found at '/root/Codes/FATE-Flow/temp/fate_flow/arbiter#100#arbiter-100#guest-99#host-100#model_202111091127392613050.zip'." + } + }, + "guest": { + "9999": { + "retcode": 0, + "retmsg": "Migrating model successfully. The Config of model has been modified automatically. New model id is: arbiter-100#guest-99#host-100#model, model version is: 202111091127392613050. Model files can be found at '/root/Codes/FATE-Flow/temp/fate_flow/guest#99#arbiter-100#guest-99#host-100#model_202111091127392613050.zip'." + } + }, + "host": { + "10000": { + "retcode": 0, + "retmsg": "Migrating model successfully. The Config of model has been modified automatically. New model id is: arbiter-100#guest-99#host-100#model, model version is: 202111091127392613050. Model files can be found at '/root/Codes/FATE-Flow/temp/fate_flow/host#100#arbiter-100#guest-99#host-100#model_202111091127392613050.zip'." + } + } + }, + "guest": { + "9999": 0 + }, + "host": { + "10000": 0 + } + }, + "jobId": "202111091127392613050", + "retcode": 0, + "retmsg": "success" +} +``` + +### tag-list + +List tags of the model. + +``` bash +flow model tag-list -j +``` + +**Options** + +| Parameter | Short Flag | Long Flag | Optional | Description | +| --------- | ---------- | ---------- | -------- | ----------- | +| job_id | `-j` | `--job_id` | No | Job ID | + +### tag-model + +Add or remove a tag from the model. + +```bash +flow model tag-model -j -t +flow model tag-model -j -t --remove +``` + +**Options** + +| Parameter | Short Flag | Long Flag | Optional | Description | +| -------- | ------ | ------------ | -------- | -------------- | +| job_id | `-j` | `--job_id` | No | Job ID | +| tag_name | `-t` | `--tag-name` | No | Tag name | +| remove | | `--remove` | Yes | Remove the tag | + +### deploy + +Configure predict DSL. + +```bash +flow model deploy --model-id --model-version +``` + +**Options** + +| Parameter | Short Flag | Long Flag | Optional | Description | +| -------------- | ---------- | ------------------ | -------- | ------------------------------------------------------------ | +| model_id | | `--model-id` | No | Model ID | +| model_version | | `--model-version` | No | Model version | +| cpn_list | | `--cpn-list` | Yes | Components list | +| cpn_path | | `--cpn-path` | Yes | Load components list from a file | +| dsl_path | | `--dsl-path` | Yes | Predict DSL file path | +| cpn_step_index | | `--cpn-step-index` | Yes | Specify a checkpoint model to replace the pipeline model
Use `:` to separate component name and step index
E.g. `--cpn-step-index cpn_a:123` | +| cpn_step_name | | `--cpn-step-name` | Yes | Specify a checkpoint model to replace the pipeline model.
Use `:` to separate component name and step name
E.g. `--cpn-step-name cpn_b:foobar` | + +**Example** + +```json +{ + "retcode": 0, + "retmsg": "success", + "data": { + "model_id": "arbiter-9999#guest-10000#host-9999#model", + "model_version": "202111032227378766180", + "arbiter": { + "party_id": 9999 + }, + "guest": { + "party_id": 10000 + }, + "host": { + "party_id": 9999 + }, + "detail": { + "arbiter": { + "party_id": { + "retcode": 0, + "retmsg": "deploy model of role arbiter 9999 success" + } + }, + "guest": { + "party_id": { + "retcode": 0, + "retmsg": "deploy model of role guest 10000 success" + } + }, + "host": { + "party_id": { + "retcode": 0, + "retmsg": "deploy model of role host 9999 success" + } + } + } + } +} +``` + +### get-predict-dsl + +Get predict DSL of the model. + +```bash +flow model get-predict-dsl --model-id --model-version -o ./examples/ +``` + +**Options** + +| Parameter | Short Flag | Long Flag | Optional | Description | +| ------------- | ---------- | ----------------- | -------- | ------------- | +| model_id | | `--model-id` | No | Model ID | +| model_version | | `--model-version` | No | Model version | +| output_path | `-o` | `--output-path` | No | Output path | + +### get-predict-conf + +Get the template of predict config. + +```bash +flow model get-predict-conf --model-id --model-version -o ./examples/ +``` + +**Options** + +| Parameter | Short Flag | Long Flag | Optional | Description | +| ------------- | ---------- | ----------------- | -------- | ------------- | +| model_id | | `--model-id` | No | Model ID | +| model_version | | `--model-version` | No | Model version | +| output_path | `-o` | `--output-path` | No | Output path | + +### get-model-info + +Get model information. + +```bash +flow model get-model-info --model-id --model-version +flow model get-model-info --model-id --model-version --detail +``` + +**Options** + +| Parameter | Short Flag | Long Flag | Optional | Description | +| ------------- | ---------- | ----------------- | -------- | ---------------------------- | +| model_id | | `--model-id` | No | Model ID | +| model_version | | `--model-version` | No | Model version | +| role | `-r` | `--role` | Yes | Party role | +| party_id | `-p` | `--party-id` | Yes | Party ID | +| detail | | `--detail` | Yes | Display detailed information | + +### homo-convert + +Convert trained homogenous model to the format of another ML framework. + +```bash +flow model homo-convert -c examples/model/homo_convert_model.json +``` + +**Options** + +| Parameter | Short Flag | Long Flag | Optional | Description | +| --------- | ---------- | ------------- | -------- | ---------------- | +| conf_path | `-c` | `--conf-path` | No | Config file path | + +### homo-deploy + +Deploy trained homogenous model to a target online serving system. Currently the supported target serving system is KFServing. + +```bash +flow model homo-deploy -c examples/model/homo_deploy_model.json +``` + +**Options** + +| Parameter | Short Flag | Long Flag | Optional | Description | +| --------- | ---------- | ------------- | -------- | ---------------- | +| conf_path | `-c` | `--conf-path` | No | Config file path | diff --git a/doc/cli/model.zh.md b/doc/cli/model.zh.md index 4c4d5ccdd..a09028233 100644 --- a/doc/cli/model.zh.md +++ b/doc/cli/model.zh.md @@ -9,7 +9,7 @@ flow model load -c examples/model/publish_load_model.json flow model load -c examples/model/publish_load_model.json -j ``` -**参数** +**选项** | 参数 | 短格式 | 长格式 | 可选参数 | 说明 | | --------- | ------ | ------------- | -------- | -------- | @@ -57,7 +57,7 @@ flow model bind -c examples/model/bind_model_service.json flow model bind -c examples/model/bind_model_service.json -j ``` -**参数** +**选项** | 参数 | 短格式 | 长格式 | 可选参数 | 说明 | | --------- | ------ | ------------- | -------- | -------- | @@ -82,12 +82,12 @@ flow model import -c examples/model/import_model.json flow model import -c examples/model/restore_model.json --from-database ``` -**参数** +**选项** -| 参数 | 短格式 | 长格式 | 可选参数 | 说明 | -| ------------- | ------ | ----------------- | -------- | -------------------------------- | -| conf_path | `-c` | `--conf-path` | 否 | 配置文件 | -| from_database | | `--from-database` | 是 | 从 Flow 配置的存储引擎中导入模型 | +| 参数 | 短格式 | 长格式 | 可选参数 | 说明 | +| ------------- | ------ | ----------------- | -------- | -------------------- | +| conf_path | `-c` | `--conf-path` | 否 | 配置文件 | +| from_database | | `--from-database` | 是 | 从存储引擎中导入模型 | **样例** @@ -124,12 +124,12 @@ flow model export -c examples/model/export_model.json flow model export -c examples/model/store_model.json --to-database ``` -**参数** +**选项** -| 参数 | 短格式 | 长格式 | 可选参数 | 说明 | -| ----------- | ------ | --------------- | -------- | ---------------------------------- | -| conf_path | `-c` | `--conf-path` | 否 | 配置文件 | -| to_database | | `--to-database` | 是 | 将模型导出到 Flow 配置的存储引擎中 | +| 参数 | 短格式 | 长格式 | 可选参数 | 说明 | +| ----------- | ------ | --------------- | -------- | ---------------------- | +| conf_path | `-c` | `--conf-path` | 否 | 配置文件 | +| to_database | | `--to-database` | 是 | 将模型导出到存储引擎中 | **样例** @@ -159,13 +159,13 @@ flow model export -c examples/model/store_model.json --to-database ### migrate -迁移模型 +迁移模型。 ```bash flow model migrate -c examples/model/migrate_model.json ``` -**参数** +**选项** | 参数 | 短格式 | 长格式 | 可选参数 | 说明 | | --------- | ------ | ------------- | -------- | -------- | @@ -214,13 +214,13 @@ flow model migrate -c examples/model/migrate_model.json ### tag-list -获取模型的标签列表 +获取模型的标签列表。 ``` bash flow model tag-list -j ``` -**参数** +**选项** | 参数 | 短格式 | 长格式 | 可选参数 | 说明 | | ------ | ------ | ---------- | -------- | ------- | @@ -228,14 +228,14 @@ flow model tag-list -j ### tag-model -向模型添加标签 +从模型中添加或删除标签。 ```bash flow model tag-model -j -t flow model tag-model -j -t --remove ``` -**参数** +**选项** | 参数 | 短格式 | 长格式 | 可选参数 | 说明 | | -------- | ------ | ------------ | -------- | -------------- | @@ -245,13 +245,13 @@ flow model tag-model -j -t --remove ### deploy -配置预测 DSL +配置预测 DSL。 ```bash flow model deploy --model-id --model-version ``` -**参数** +**选项** | 参数 | 短格式 | 长格式 | 可选参数 | 说明 | | -------------- | ------ | ------------------ | -------- | ------------------------------------------------------------ | @@ -313,7 +313,7 @@ flow model deploy --model-id --model-version flow model get-predict-dsl --model-id --model-version -o ./examples/ ``` -**参数** +**选项** | 参数 | 短格式 | 长格式 | 可选参数 | 说明 | | ------------- | ------ | ----------------- | -------- | -------- | @@ -329,7 +329,7 @@ flow model get-predict-dsl --model-id --model-version flow model get-predict-conf --model-id --model-version -o ./examples/ ``` -**参数** +**选项** | 参数 | 短格式 | 长格式 | 可选参数 | 说明 | | ------------- | ------ | ----------------- | -------- | -------- | @@ -346,7 +346,7 @@ flow model get-model-info --model-id --model-version flow model get-model-info --model-id --model-version --detail ``` -**参数** +**选项** | 参数 | 短格式 | 长格式 | 可选参数 | 说明 | | ------------- | ------ | ----------------- | -------- | ------------ | @@ -358,13 +358,13 @@ flow model get-model-info --model-id --model-version ### homo-convert -基于横向训练的模型,生成其他 ML 框架的模型文件。 +基于横向训练的模型,生成其他 ML 框架的模型文件。 ```bash flow model homo-convert -c examples/model/homo_convert_model.json ``` -**参数** +**选项** | 参数 | 短格式 | 长格式 | 可选参数 | 说明 | | --------- | ------ | ------------- | -------- | -------- | @@ -378,8 +378,8 @@ flow model homo-convert -c examples/model/homo_convert_model.json flow model homo-deploy -c examples/model/homo_deploy_model.json ``` -**参数** +**选项** -| 参数 | 短格式 | 长格式 | 可选参数 | 说明 | -| --------- | ------ | ------------- | -------- | ---------------- | -| conf_path | `-c` | `--conf-path` | 否 | 任务配置文件路径 | +| 参数 | 短格式 | 长格式 | 可选参数 | 说明 | +| --------- | ------ | ------------- | -------- | -------- | +| conf_path | `-c` | `--conf-path` | 否 | 配置文件 | diff --git a/doc/cli/privilege.md b/doc/cli/privilege.md new file mode 100644 index 000000000..495a86116 --- /dev/null +++ b/doc/cli/privilege.md @@ -0,0 +1,168 @@ +## Privilege + +### grant + +Add privileges + +```bash +flow privilege grant [options] +``` + +**Options** + +| parameter name | required | type | description | +| :------------------ | :--- | :----- | ------------------------------------------------------------ | +| src-party-id | yes | string | originating-party-id | +| src-role | yes | string | originating-party-role | +| privilege-role | no | string | guest, host, arbiter, all, where all is all privileges granted +| privilege-command | no | string | "stop", "run", "create", all, where all is all privileges granted +| privilege-component | no | string | Lowercase for algorithm components, such as dataio, heteronn, etc., where all is all privileges granted + +**Example** + +- Give role privileges + + ```shell + flow privilege grant --src-party-id 9999 --src-role guest --privilege-role all + ``` + +- Give command privileges + + ```shell + flow privilege grant --src-party-id 9999 --src-role guest --privilege-command all + ``` + +- Grant component privileges + + ```shell + flow privilege grant --src-party-id 9999 --src-role guest --privilege-component all + ``` + +- Grant multiple privileges at the same time + + ```shell + flow privilege grant --src-party-id 9999 --src-role guest --privilege-role all --privilege-command all --privilege-component all + ``` + +**return parameters** + +| parameter-name | type | description | +| ------- | :----- | -------- | +| retcode | int | return-code | +| retmsg | string | return message | + +**Example** + +```shell +{ + "retcode": 0, + "retmsg": "success" +} +``` + +### delete + +Delete permissions + +```bash +flow privilege delete [options] +``` + +**Options** + +| parameter name | required | type | description | +| :------------------ | :--- | :----- | ------------------------------------------------------------ | +| src-party-id | yes | string | originating-party-id | +| src-role | yes | string | originating-party-role | +| privilege-role | no | string | guest, host, arbiter, all, where all is all privileges revoked +| privilege-command | no | string | "stop", "run", "create", all, where all is revoke all privileges +| privilege-component | no | string | lowercase for algorithm components, such as dataio, heteronn, etc., where all is revoke all privileges | + +**Example** + +- Revoke role privileges + + ```shell + flow privilege delete --src-party-id 9999 --src-role guest --privilege-role all + ``` + +- Revoke command privileges + + ```shell + flow privilege delete --src-party-id 9999 --src-role guest --privilege-command all + ``` + +- Revoke component privileges + + ```shell + flow privilege delete --src-party-id 9999 --src-role guest --privilege-component all + ``` + +- Grant multiple privileges at the same time + + ```shell + flow privilege delete --src-party-id 9999 --src-role guest --privilege-role all --privilege-command all --privilege-component all + ``` + +**return parameters** + +| parameter-name | type | description | +| ------- | :----- | -------- | +| retcode | int | return-code | +| retmsg | string | return message | + +**Example** + +```shell +{ + "retcode": 0, + "retmsg": "success" +} +``` + +### query + +Query permissions + +```bash +flow privilege query [options] +``` + +**Options** + +| parameter name | required | type | description | +| :----------- | :--- | :----- | ------------- | +| src-party-id | yes | string | originating-party-id | +| src-role | yes | string | originating-party-role | + +**Example** + +```shell +flow privilege query --src-party-id 9999 --src-role guest +``` + +- **return parameters** + + +| parameter name | type | description | +| ------- | :----- | -------- | +| retcode | int | return-code | +| retmsg | string | return message | +| data | object | return data | + +**Example** + +```shell +{ + "data": { + "privilege_command": [], + "privilege_component": [], + "privilege_role": [], + "role": "guest", + "src_party_id": "9999" + }, + "retcode": 0, + "retmsg": "success" +} + +``` diff --git a/doc/cli/privilege.zh.md b/doc/cli/privilege.zh.md index 445b28827..fa532ef54 100644 --- a/doc/cli/privilege.zh.md +++ b/doc/cli/privilege.zh.md @@ -1,4 +1,4 @@ -## privilege +## Privilege ### grant @@ -8,7 +8,7 @@ flow privilege grant [options] ``` -**参数** +**选项** | 参数名 | 必选 | 类型 | 说明 | | :------------------ | :--- | :----- | ------------------------------------------------------------ | @@ -18,20 +18,20 @@ flow privilege grant [options] | privilege-command | 否 | string | ”stop”, “run”, “create”, all, 其中all为全部权限都给予 | | privilege-component | 否 | string | 算法组件的小写,如dataio,heteronn等等, 其中all为全部权限都给予 | -**样例** +**样例** - 赋予role权限 ```shell flow privilege grant --src-party-id 9999 --src-role guest --privilege-role all ``` - + - 赋予command权限 ```shell flow privilege grant --src-party-id 9999 --src-role guest --privilege-command all ``` - + - 赋予component权限 ```shell @@ -44,14 +44,14 @@ flow privilege grant [options] flow privilege grant --src-party-id 9999 --src-role guest --privilege-role all --privilege-command all --privilege-component all ``` -**返回参数** +**返回** | 参数名 | 类型 | 说明 | | ------- | :----- | -------- | | retcode | int | 返回码 | | retmsg | string | 返回信息 | -**样例** +**样例** ```shell { @@ -68,17 +68,17 @@ flow privilege grant [options] flow privilege delete [options] ``` -**参数** +**选项** | 参数名 | 必选 | 类型 | 说明 | | :------------------ | :--- | :----- | ------------------------------------------------------------ | | src-party-id | 是 | string | 发起方partyid | | src-role | 是 | string | 发起方role | | privilege-role | 否 | string | guest, host, arbiter,all, 其中all为全部权限都撤销 | -| privilege-command | 否 | string | ”stop”, “run”, “create”, all, 其中all为全部权限都撤销 | +| privilege-command | 否 | string | “stop”, “run”, “create”, all, 其中all为全部权限都撤销 | | privilege-component | 否 | string | 算法组件的小写,如dataio,heteronn等等, 其中all为全部权限都撤销 | -**样例** +**样例** - 撤销role权限 @@ -104,14 +104,14 @@ flow privilege delete [options] flow privilege delete --src-party-id 9999 --src-role guest --privilege-role all --privilege-command all --privilege-component all ``` -**返回参数** +**返回** | 参数名 | 类型 | 说明 | | ------- | :----- | -------- | | retcode | int | 返回码 | | retmsg | string | 返回信息 | -**样例** +**样例** ```shell { @@ -128,20 +128,20 @@ flow privilege delete [options] flow privilege query [options] ``` -**参数** +**选项** | 参数名 | 必选 | 类型 | 说明 | | :----------- | :--- | :----- | ------------- | | src-party-id | 是 | string | 发起方partyid | | src-role | 是 | string | 发起方role | -**样例** +**样例** ```shell -flow privilege query --src-party-id 9999 --src-role guest +flow privilege query --src-party-id 9999 --src-role guest ``` -- **返回参数** +**返回** | 参数名 | 类型 | 说明 | @@ -150,7 +150,7 @@ flow privilege query --src-party-id 9999 --src-role guest | retmsg | string | 返回信息 | | data | object | 返回数据 | -**样例** +**样例** ```shell { diff --git a/doc/cli/provider.md b/doc/cli/provider.md new file mode 100644 index 000000000..f8adbaca6 --- /dev/null +++ b/doc/cli/provider.md @@ -0,0 +1,164 @@ +## Provider + +### list + +List all current component providers and information about the components they provide + +```bash +flow provider list [options] +``` + +**Options** + +**Returns** + +| 参数名 | 类型 | 说明 | +| :------ | :----- | -------- | +| retcode | int | 返回码 | +| retmsg | string | 返回信息 | +| data | dict | 返回数据 | + +**Example** + +output: + +```json +{ + "data": { + "fate": { + "1.7.0": { + "class_path": { + "feature_instance": "feature.instance.Instance", + "feature_vector": "feature.sparse_vector.SparseVector", + "homo_model_convert": "protobuf.homo_model_convert.homo_model_convert", + "interface": "components.components.Components", + "model": "protobuf.generated", + "model_migrate": "protobuf.model_migrate.model_migrate" + }, + "components": [ + "heterolinr", + "homoonehotencoder", + "dataio", + "psi", + "homodatasplit", + "homolr", + "columnexpand", + "heterokmeans", + "heterosshelr", + "homosecureboost", + "heteropoisson", + "featureimputation", + "heterofeatureselection", + "heteropearson", + "heterodatasplit", + "ftl", + "heterolr", + "homonn", + "evaluation", + "featurescale", + "intersection", + "heteronn", + "datastatistics", + "heterosecureboost", + "sbtfeaturetransformer", + "datatransform", + "heterofeaturebinning", + "feldmanverifiablesum", + "heterofastsecureboost", + "federatedsample", + "secureaddexample", + "secureinformationretrieval", + "sampleweight", + "union", + "onehotencoder", + "homofeaturebinning", + "scorecard", + "localbaseline", + "labeltransform" + ], + "path": "${FATE_PROJECT_BASE}/python/federatedml", + "python": "" + }, + "default": { + "version": "1.7.0" + } + }, + "fate_flow": { + "1.7.0": { + "class_path": { + "feature_instance": "feature.instance.Instance", + "feature_vector": "feature.sparse_vector.SparseVector", + "homo_model_convert": "protobuf.homo_model_convert.homo_model_convert", + "interface": "components.components.Components", + "model": "protobuf.generated", + "model_migrate": "protobuf.model_migrate.model_migrate" + }, + "components": [ + "download", + "upload", + "modelloader", + "reader", + "modelrestore", + "cacheloader", + "modelstore" + ], + "path": "${FATE_FLOW_BASE}/python/fate_flow", + "python": "" + }, + "default": { + "version": "1.7.0" + } + } + }, + "retcode": 0, + "retmsg": "success" +} +``` + +Contains the `name`, `version number`, `codepath`, `list of provided components` + +### register + +Register a component provider + +```bash +flow provider register [options] +``` + +**Options** + +| 参数名 | 必选 | 类型 | 说明 | +| :--------------------- | :--- | :----- | ------------------------------| +| -c, --conf-path | 是 | string | 配置路径 | + +**Returns** + +| 参数名 | 类型 | 说明 | +| :------ | :----- | -------- | +| retcode | int | 返回码 | +| retmsg | string | 返回信息 | + +**Example** + +```bash +flow provider register -c $FATE_FLOW_BASE/examples/other/register_provider.json +``` + +conf: + +```json +{ + "name": "fate", + "version": "1.7.1", + "path": "${FATE_FLOW_BASE}/python/component_plugins/fateb/python/federatedml" +} +``` + +output: + +```json +{ + "retcode": 0, + "retmsg": "success" +} +``` diff --git a/doc/cli/provider.zh.md b/doc/cli/provider.zh.md index 64fab5400..98f61f657 100644 --- a/doc/cli/provider.zh.md +++ b/doc/cli/provider.zh.md @@ -8,9 +8,9 @@ flow provider list [options] ``` -**参数** +**选项** -**返回参数** +**返回** | 参数名 | 类型 | 说明 | | :------ | :----- | -------- | @@ -125,13 +125,13 @@ flow provider list [options] flow provider register [options] ``` -**参数** +**选项** | 参数名 | 必选 | 类型 | 说明 | | :--------------------- | :--- | :----- | ------------------------------| | -c, --conf-path | 是 | string | 配置路径 | -**返回参数** +**返回** | 参数名 | 类型 | 说明 | | :------ | :----- | -------- | diff --git a/doc/cli/resource.md b/doc/cli/resource.md new file mode 100644 index 000000000..37cb134f2 --- /dev/null +++ b/doc/cli/resource.md @@ -0,0 +1,90 @@ +## resource +## Resources + +### query + +For querying fate system resources + +```bash +flow resource query +``` + +**Options** + +**Returns** + +| parameter name | type | description | +| :------ | :----- | -------- | +| retcode | int | return code | +| retmsg | string | return message | +| data | object | return data | + +**Example** + +```json +{ + "data": { + "computing_engine_resource": { + "f_cores": 32, + "f_create_date": "2021-09-21 19:32:59", + "f_create_time": 1632223979564, + "f_engine_config": { + "cores_per_node": 32, + "nodes": 1 + }, + "f_engine_entrance": "fate_on_eggroll", + "f_engine_name": "EGGROLL", + "f_engine_type": "computing", + "f_memory": 0, + "f_nodes": 1, + "f_remaining_cores": 32, + "f_remaining_memory": 0, + "f_update_date": "2021-11-08 16:56:38", + "f_update_time": 1636361798812 + }, + "use_resource_job": [] + }, + "retcode": 0, + "retmsg": "success" +} +``` + +### return + +Resources for returning a job + +```bash +flow resource return [options] +``` + +**Options** + +| parameter name | required | type | description | +| :----- | :--- | :----- | ------ | +| job_id | yes | string | job_id | + +**Returns** + +| parameter name | type | description | +| :------ | :----- | -------- | +| retcode | int | return code | +| retmsg | string | return message | +| data | object | return data | + +**Example** + +```json +{ + "data": [ + { + "job_id": "202111081612427726750", + "party_id": "8888", + "resource_in_use": true, + "resource_return_status": true, + "role": "guest" + } + ], + "retcode": 0, + "retmsg": "success" +} +``` diff --git a/doc/cli/resource.zh.md b/doc/cli/resource.zh.md index 174096308..93b01aa31 100644 --- a/doc/cli/resource.zh.md +++ b/doc/cli/resource.zh.md @@ -1,4 +1,4 @@ -## resource +## Resource ### query @@ -8,9 +8,9 @@ flow resource query ``` -**参数** +**选项** -**返回参数** +**返回** | 参数名 | 类型 | 说明 | | :------ | :----- | -------- | @@ -18,9 +18,9 @@ flow resource query | retmsg | string | 返回信息 | | data | object | 返回数据 | -样例: +**样例** -``` +```json { "data": { "computing_engine_resource": { @@ -56,13 +56,13 @@ flow resource query flow resource return [options] ``` -**参数** +**选项** | 参数名 | 必选 | 类型 | 说明 | | :----- | :--- | :----- | ------ | | job_id | 是 | string | 任务id | -**返回参数** +**返回** | 参数名 | 类型 | 说明 | | :------ | :----- | -------- | @@ -70,7 +70,7 @@ flow resource return [options] | retmsg | string | 返回信息 | | data | object | 返回数据 | -样例: +**样例** ```json { diff --git a/doc/cli/server.md b/doc/cli/server.md new file mode 100644 index 000000000..92f1e9a7d --- /dev/null +++ b/doc/cli/server.md @@ -0,0 +1,111 @@ +## Server + +### versions + +List all relevant system version numbers + +```bash +flow server versions +``` + +**Options** + +None + +**Returns** + +| parameter name | type | description | +| :------ | :----- | -------- | +| retcode | int | return code | +| retmsg | string | return message | +| data | dict | return data | +| jobId | string | job id | + +**Example** + +```bash +flow server versions +``` + +Output: + +```json +{ + "data": { + "API": "v1", + "CENTOS": "7.2", + "EGGROLL": "2.4.0", + "FATE": "1.7.0", + "FATEBoard": "1.7.0", + "FATEFlow": "1.7.0", + "JDK": "8", + "MAVEN": "3.6.3", + "PYTHON": "3.6.5", + "SPARK": "2.4.1", + "UBUNTU": "16.04" + }, + "retcode": 0, + "retmsg": "success" +} +``` + +### reload + +The following configuration items will take effect again after `reload` + + - All configurations after # engine services in $FATE_PROJECT_BASE/conf/service_conf.yaml + - All configurations in $FATE_FLOW_BASE/python/fate_flow/job_default_config.yaml + +```bash +flow server reload +``` + +**Options** + +None + +**Returns** + +| parameter name | type | description | +| :------ | :----- | -------- | +| retcode | int | return code | +| retmsg | string | return message | +| data | dict | return data | +| jobId | string | job id | + +**Example** + +```bash +flow server reload +``` + +Output: + +```json +{ + "data": { + "job_default_config": { + "auto_retries": 0, + "auto_retry_delay": 1, + "default_component_provider_path": "component_plugins/fate/python/federatedml", + "end_status_job_scheduling_time_limit": 300000, + "end_status_job_scheduling_updates": 1, + "federated_command_trys": 3, + "federated_status_collect_type": "PUSH", + "job_timeout": 259200, + "max_cores_percent_per_job": 1, + "output_data_summary_count_limit": 100, + "remote_request_timeout": 30000, + "task_cores": 4, + "task_memory": 0, + "task_parallelism": 1, + "total_cores_overweight_percent": 1, + "total_memory_overweight_percent": 1, + "upload_max_bytes": 4194304000 + }, + "service_registry": null + }, + "retcode": 0, + "retmsg": "success" +} +``` diff --git a/doc/cli/server.zh.md b/doc/cli/server.zh.md index 5e113cb9e..44c65e7b4 100644 --- a/doc/cli/server.zh.md +++ b/doc/cli/server.zh.md @@ -5,14 +5,14 @@ 列出所有相关系统版本号 ```bash -flow server +flow server versions ``` -**参数** +**选项** 无 -**返回参数** +**返回** | 参数名 | 类型 | 说明 | | :------ | :----- | -------- | @@ -60,11 +60,11 @@ flow server versions flow server reload ``` -**参数** +**选项** 无 -**返回参数** +**返回** | 参数名 | 类型 | 说明 | | :------ | :----- | -------- | diff --git a/doc/cli/table.md b/doc/cli/table.md new file mode 100644 index 000000000..577687ee0 --- /dev/null +++ b/doc/cli/table.md @@ -0,0 +1,184 @@ +## Table + +### info + +Query information about the fate table (real storage address, number, schema, etc.) + +```bash +flow table info [options] +``` + +**Options** + +| parameter name | required | type | description +| :-------- | :--- | :----- | -------------- | +| name | yes | string | fate table name | +| namespace | yes | string | fate table namespace | + +**return parameters** + +| parameter name | type | description | +| :------ | :----- | -------- | +| retcode | int | return code | +| retmsg | string | return message | +| data | object | return data | + +Sample + +```json +{ + "data": { + "address": { + "home": null, + "name": "breast_hetero_guest", + "namespace": "experiment" + }, + "count": 569, + "exists": 1, + "namespace": "experiment", + "partition": 4, + "schema": { + "header": "y,x0,x1,x2,x3,x4,x5,x6,x7,x8,x9", + "sid": "id" + }, + "table_name": "breast_hetero_guest" + }, + "retcode": 0, + "retmsg": "success" +} +``` + +### delete + +You can delete table data with table delete + +```bash +flow table delete [options] +``` + +**Options** + +| parameter name | required | type | description | +| :-------- | :--- | :----- | -------------- | +| name | yes | string | fate table name | +| namespace | yes | string | fate table namespace | + +**return parameters** + +| parameter name | type | description | +| :------ | :----- | -------- | +| retcode | int | return code | +| retmsg | string | return message | +| data | object | return data | + +Sample + +```json +{ + "data": { + "namespace": "xxx", + "table_name": "xxx" + }, + "retcode": 0, + "retmsg": "success" +} +``` + +### bind + +Real storage addresses can be mapped to fate storage tables via table bind + +```bash +flow table bind [options] +``` + +Note: conf_path is the parameter path, the specific parameters are as follows + +**Options** + +| parameter name | required | type | description | +| :------------- | :--- | :----- | ------------------------------------- | +| name | yes | string | fate table name | +| namespace | yes | string | fate table namespace | +| engine | yes | string | storage engine, supports "HDFS", "MYSQL", "PATH" | +| yes | object | real storage address | +| drop | no | int | Overwrite previous information | +| head | no | int | Whether there is a data table header | +| id_delimiter | no | string | Data separator | +| id_column | no | string | id field | +| feature_column | no | array | feature_field | + +**Example** + +- hdfs + +```json +{ + "namespace": "experiment", + "name": "breast_hetero_guest", + "engine": "HDFS", + "address": { + "name_node": "hdfs://fate-cluster", + "path": "/data/breast_hetero_guest.csv" + }, + "id_delimiter": ",", + "head": 1, + "partitions": 10 +} +``` + +- mysql + +```json +{ + "engine": "MYSQL", + "address": { + "user": "fate", + "passwd": "fate", + "host": "127.0.0.1", + "port": 3306, + "db": "experiment", + "name": "breast_hetero_guest" + }, + "namespace": "experiment", + "name": "breast_hetero_guest", + "head": 1, + "id_delimiter": ",", + "partitions": 10, + "id_column": "id", + "feature_column": "y,x0,x1,x2,x3,x4,x5,x6,x7,x8,x9" +} +``` + +- PATH + +```json +{ + "namespace": "xxx", + "name": "xxx", + "engine": "PATH", + "address": { + "path": "xxx" + } +} +``` +**return parameters** + +| parameter name | type | description | +| :------ | :----- | -------- | +| retcode | int | return code | +| retmsg | string | return message | +| data | object | return data | + +Sample + +```json +{ + "data": { + "namespace": "xxx", + "table_name": "xxx" + }, + "retcode": 0, + "retmsg": "success" +} +``` \ No newline at end of file diff --git a/doc/cli/table.zh.md b/doc/cli/table.zh.md index 5ef5f872e..c3b562475 100644 --- a/doc/cli/table.zh.md +++ b/doc/cli/table.zh.md @@ -8,14 +8,14 @@ flow table info [options] ``` -**参数** +**选项** | 参数名 | 必选 | 类型 | 说明 | | :-------- | :--- | :----- | -------------- | | name | 是 | string | fate表名 | | namespace | 是 | string | fate表命名空间 | -**返回参数** +**返回** | 参数名 | 类型 | 说明 | | :------ | :----- | -------- | @@ -56,14 +56,14 @@ flow table info [options] flow table delete [options] ``` -**参数** +**选项** | 参数名 | 必选 | 类型 | 说明 | | :-------- | :--- | :----- | -------------- | | name | 是 | string | fate表名 | | namespace | 是 | string | fate表命名空间 | -**返回参数** +**返回** | 参数名 | 类型 | 说明 | | :------ | :----- | -------- | @@ -94,7 +94,7 @@ flow table bind [options] 注: conf_path为参数路径,具体参数如下 -**参数** +**选项** | 参数名 | 必选 | 类型 | 说明 | | :------------- | :--- | :----- | ------------------------------------- | @@ -162,3 +162,23 @@ flow table bind [options] } } ``` +**返回** + +| 参数名 | 类型 | 说明 | +| :------ | :----- | -------- | +| retcode | int | 返回码 | +| retmsg | string | 返回信息 | +| data | object | 返回数据 | + +样例 + +```json +{ + "data": { + "namespace": "xxx", + "table_name": "xxx" + }, + "retcode": 0, + "retmsg": "success" +} +``` \ No newline at end of file diff --git a/doc/cli/tag.md b/doc/cli/tag.md new file mode 100644 index 000000000..0f4605d30 --- /dev/null +++ b/doc/cli/tag.md @@ -0,0 +1,89 @@ +## Tag + +### create + +Creates a label. + +**Options** + +| number | parameters | short-format | long-format | required parameters | parameter description | +| ---- | ------------ | ------ | ------------ | -------- | -------- | +| 1 | tag_name | `-t` | `-tag-name` | yes | tag_name | +| 2 | tag_parameter_introduction | `-d` | `--tag-desc` | no | tag_introduction | + +**Example** + +``` bash +flow tag create -t tag1 -d "This is the parameter description of tag1." +flow tag create -t tag2 +``` + +### update + +Update the tag information. + +**Options** + +| number | parameters | short format | long format | required parameters | parameter description | +| ---- | ------------ | ------ | ---------------- | -------- | ---------- | +| 1 | tag_name | `-t` | `--tag-name` | yes | tag_name | +| 2 | new_tag_name | | `--new-tag-name` | no | new-tag-name | +| 3 | new_tag_desc | | `--new-tag-desc` | no | new tag introduction | + +**Example** + +``` bash +flow tag update -t tag1 --new-tag-name tag2 +flow tag update -t tag1 --new-tag-desc "This is the introduction of the new parameter." +``` + +### list + +Show the list of tags. + +**options** + +| number | parameters | short-format | long-format | required-parameters | parameter-introduction | +| ---- | ----- | ------ | --------- | -------- | ---------------------------- | +| 1 | limit | `-l` | `-limit` | no | Returns a limit on the number of results (default: 10) | + +**Example** + +``` bash +flow tag list +flow tag list -l 3 +``` + +### query + +Retrieve tags. + +**Options** + +| number | parameters | short-format | long-format | required parameters | parameter description | +| ---- | ---------- | ------ | -------------- | -------- | -------------------------------------- | +| 1 | tag_name | `-t` | `-tag-name` | yes | tag_name | +| 2 | with_model | | `-with-model` | no | If specified, information about models with this tag will be displayed | + +**Example** + +``` bash +flow tag query -t $TAG_NAME +flow tag query -t $TAG_NAME --with-model +``` + +### delete + +Delete the tag. + +**Options** + +| number | parameters | short-format | long-format | required-parameters | parameters introduction | +| ---- | -------- | ------ | ------------ | -------- | -------- +| 1 | tag_name | `-t` | `---tag-name` | yes | tag_name | + +**Example** + +``` bash +flow tag delete -t tag1 +``` diff --git a/doc/cli/tag.zh.md b/doc/cli/tag.zh.md index 35f14ed58..9034c12c7 100644 --- a/doc/cli/tag.zh.md +++ b/doc/cli/tag.zh.md @@ -2,15 +2,16 @@ ### create -- *介绍*: 创建标签。 -- *参数*: +创建标签。 + +**选项** | 编号 | 参数 | 短格式 | 长格式 | 必要参数 | 参数介绍 | | ---- | ------------ | ------ | ------------ | -------- | -------- | | 1 | tag_name | `-t` | `--tag-name` | 是 | 标签名 | | 2 | tag_参数介绍 | `-d` | `--tag-desc` | 否 | 标签介绍 | -- *示例*: +**样例** ``` bash flow tag create -t tag1 -d "This is the 参数介绍 of tag1." @@ -19,8 +20,9 @@ flow tag create -t tag2 ### update -- *介绍*: 更新标签信息。 -- *参数*: +更新标签信息。 + +**选项** | 编号 | 参数 | 短格式 | 长格式 | 必要参数 | 参数介绍 | | ---- | ------------ | ------ | ---------------- | -------- | ---------- | @@ -28,7 +30,7 @@ flow tag create -t tag2 | 2 | new_tag_name | | `--new-tag-name` | 否 | 新标签名 | | 3 | new_tag_desc | | `--new-tag-desc` | 否 | 新标签介绍 | -- *示例*: +**样例** ``` bash flow tag update -t tag1 --new-tag-name tag2 @@ -37,14 +39,15 @@ flow tag update -t tag1 --new-tag-desc "This is the new 参数介绍." ### list -- *介绍*: 展示标签列表。 -- *参数*: +展示标签列表。 + +**选项** | 编号 | 参数 | 短格式 | 长格式 | 必要参数 | 参数介绍 | | ---- | ----- | ------ | --------- | -------- | ---------------------------- | | 1 | limit | `-l` | `--limit` | 否 | 返回结果数量限制(默认:10) | -- *示例*: +**样例** ``` bash flow tag list @@ -53,15 +56,16 @@ flow tag list -l 3 ### query -- *介绍*: 检索标签。 -- *参数*: +检索标签。 + +**选项** | 编号 | 参数 | 短格式 | 长格式 | 必要参数 | 参数介绍 | | ---- | ---------- | ------ | -------------- | -------- | -------------------------------------- | | 1 | tag_name | `-t` | `--tag-name` | 是 | 标签名 | | 2 | with_model | | `--with-model` | 否 | 如果指定,具有该标签的模型信息将被展示 | -- *示例*: +**样例** ``` bash flow tag query -t $TAG_NAME @@ -70,14 +74,15 @@ flow tag query -t $TAG_NAME --with-model ### delete -- *介绍*: 删除标签。 -- *参数*: +删除标签。 + +**选项** | 编号 | 参数 | 短格式 | 长格式 | 必要参数 | 参数介绍 | | ---- | -------- | ------ | ------------ | -------- | -------- | | 1 | tag_name | `-t` | `--tag-name` | 是 | 标签名 | -- *示例*: +**样例** ``` bash flow tag delete -t tag1 diff --git a/doc/cli/task.md b/doc/cli/task.md new file mode 100644 index 000000000..d7aad000f --- /dev/null +++ b/doc/cli/task.md @@ -0,0 +1,38 @@ +## Task + +### query + +Retrieve Task information + +**Options** + +| number | parameters | short format | long format | required parameters | parameter description | +| ---- | -------------- | ------ | ------------------ | -------- | -------- | +| 1 | job_id | `-j` | `--job_id` | no | Job ID | +| 2 | role | `-r` | `--role` | no | role +| 3 | party_id | `-p` | `--party_id` | no | Party ID | +| 4 | component_name | `-cpn` | `--component_name` | no | component_name | +| 5 | status | `-s` | `--status` | No | Task status | + +**Example** + +``` bash +flow task query -j $JOB_ID -p 9999 -r guest +flow task query -cpn hetero_feature_binning_0 -s complete +``` + +### list + +Show the list of Tasks. +**Options** + +| number | parameters | short format | long format | required parameters | parameter description | +| ---- | ----- | ------ | --------- | -------- | ---------------------------- | +| 1 | limit | `-l` | `-limit` | no | Returns a limit on the number of results (default: 10) | + +**Example** + +``` bash +flow task list +flow task list -l 25 +``` diff --git a/doc/cli/task.zh.md b/doc/cli/task.zh.md index 823a4433a..e7c94ee1d 100644 --- a/doc/cli/task.zh.md +++ b/doc/cli/task.zh.md @@ -4,7 +4,7 @@ 检索Task信息 -- *参数*: +**选项** | 编号 | 参数 | 短格式 | 长格式 | 必要参数 | 参数介绍 | | ---- | -------------- | ------ | ------------------ | -------- | -------- | @@ -14,7 +14,7 @@ | 4 | component_name | `-cpn` | `--component_name` | 否 | 组件名 | | 5 | status | `-s` | `--status` | 否 | 任务状态 | -- *示例*: +**样例** ``` bash flow task query -j $JOB_ID -p 9999 -r guest @@ -23,14 +23,14 @@ flow task query -cpn hetero_feature_binning_0 -s complete ### list -- *介绍*: 展示Task列表。 -- *参数*: +展示Task列表。 +**选项** | 编号 | 参数 | 短格式 | 长格式 | 必要参数 | 参数介绍 | | ---- | ----- | ------ | --------- | -------- | ---------------------------- | | 1 | limit | `-l` | `--limit` | 否 | 返回结果数量限制(默认:10) | -- *示例*: +**样例** ``` bash flow task list diff --git a/doc/cli/tracking.md b/doc/cli/tracking.md new file mode 100644 index 000000000..61525352b --- /dev/null +++ b/doc/cli/tracking.md @@ -0,0 +1,604 @@ +## Tracking + +### metrics + +Get a list of all metrics names generated by a component task + +```bash +flow tracking metrics [options] +``` + +**Options** + +| parameter name | required | type | description | +| :--------------------- | :--- | :----- | ----------------------------- | +| -j, --job-id | yes | string | job-id | +| -r, --role | yes | string | participant-role | +| -p, --partyid | yes | string |participant-id | +| -cpn, --component-name | yes | string | Component name, consistent with that in job dsl | + +**Returns** + +| parameter-name | type | description | +| :------ | :----- | -------- | +| retcode | int | return code | +| retmsg | string | return message | +| data | dict | return data | + +**Example** + +```bash +flow tracking metrics -j 202111081618357358520 -r guest -p 9999 -cpn evaluation_0 +``` + +Output: + +```json +{ + "data": { + "train": [ + "hetero_lr_0", + "hetero_lr_0_ks_fpr", + "hetero_lr_0_ks_tpr", + "hetero_lr_0_lift", + "hetero_lr_0_gain", + "hetero_lr_0_accuracy", + "hetero_lr_0_precision", + "hetero_lr_0_recall", + "hetero_lr_0_roc", + "hetero_lr_0_confusion_mat", + "hetero_lr_0_f1_score", + "hetero_lr_0_quantile_pr" + ] + }, + "retcode": 0, + "retmsg": "success" +} +``` + +### metric-all + +Get all the output metrics for a component task + +```bash +flow tracking metric-all [options] +``` + +**Options** + +| parameter-name | required | type | description | +| :--------------------- | :--- | :----- | ----------------------------- | +| -j, --job-id | yes | string | job-id | +| -r, --role | yes | string | participant-role | +| -p, --partyid | yes | string |participant-id | +| -cpn, --component-name | yes | string | Component name, consistent with that in job dsl | + +**Returns** + +| parameter-name | type | description | +| :------ | :----- | -------- | +| retcode | int | return code | +| retmsg | string | return message | +| data | dict | return data | +| jobId | string | job id | + +**Example** + +```bash +flow tracking metric-all -j 202111081618357358520 -r guest -p 9999 -cpn evaluation_0 +``` + +Output (limited space, only some of the metric data is shown and some values are omitted in the middle of the array type data): + +```json +{ + "data": { + "train": { + "hetero_lr_0": { + "data": [ + [ + "auc", + 0.293893 + ], + [ + "ks", + 0.0 + ] + ], + "meta": { + "metric_type": "EVALUATION_SUMMARY", + "name": "hetero_lr_0" + } + }, + "hetero_lr_0_accuracy": { + "data": [ + [ + 0.0, + 0.372583 + ], + [ + 0.99, + 0.616872 + ] + ], + "meta": { + "curve_name": "hetero_lr_0", + "metric_type": "ACCURACY_EVALUATION", + "name": "hetero_lr_0_accuracy", + "thresholds": [ + 0.999471, + 0.002577 + ] + } + }, + "hetero_lr_0_confusion_mat": { + "data": [], + "meta": { + "fn": [ + 357, + 0 + ], + "fp": [ + 0, + 212 + ], + "metric_type": "CONFUSION_MAT", + "name": "hetero_lr_0_confusion_mat", + "thresholds": [ + 0.999471, + 0.0 + ], + "tn": [ + 212, + 0 + ], + "tp": [ + 0, + 357 + ] + } + } + } + }, + "retcode": 0, + "retmsg": "success" +} +``` + +### parameters + +After the job is submitted, the system resolves the actual component task parameters based on the component_parameters in the job conf combined with the system default component parameters + +```bash +flow tracking parameters [options] +``` + +**Options** + +| parameter_name | required | type | description | +| :--------------------- | :--- | :----- | ----------------------------- | +| -j, --job-id | yes | string | job-id | +| -r, --role | yes | string | participant-role | +| -p, --partyid | yes | string |participant-id | +| -cpn, --component-name | yes | string | Component name, consistent with that in job dsl | + + +**Returns** + +| parameter-name | type | description | +| :------ | :----- | -------- | +| retcode | int | return code | +| retmsg | string | return message | +| data | dict | return data | +| jobId | string | job id | + +**Example** + +```bash +flow tracking parameters -j 202111081618357358520 -r guest -p 9999 -cpn hetero_lr_0 +``` + +Output: + +```json +{ + "data": { + "ComponentParam": { + "_feeded_deprecated_params": [], + "_is_raw_conf": false, + "_name": "HeteroLR#hetero_lr_0", + "_user_feeded_params": [ + "batch_size", + "penalty", + "max_iter", + "learning_rate", + "init_param", + "optimizer", + "init_param.init_method", + "alpha" + ], + "alpha": 0.01, + "batch_size": 320, + "callback_param": { + "callbacks": [], + "early_stopping_rounds": null, + "metrics": [], + "save_freq": 1, + "use_first_metric_only": false, + "validation_freqs": null + }, + "cv_param": { + "history_value_type": "score", + "mode": "hetero", + "n_splits": 5, + "need_cv": false, + "output_fold_history": true, + "random_seed": 1, + "role": "guest", + "shuffle": true + }, + "decay": 1, + "decay_sqrt": true, + "early_stop": "diff", + "early_stopping_rounds": null, + "encrypt_param": { + "key_length": 1024, + "method": "Paillier" + }, + "encrypted_mode_calculator_param": { + "mode": "strict", + "re_encrypted_rate": 1 + }, + "floating_point_precision": 23, + "init_param": { + "fit_intercept": true, + "init_const": 1, + "init_method": "random_uniform", + "random_seed": null + }, + "learning_rate": 0.15, + "max_iter": 3, + "metrics": [ + "auc", + "ks" + ], + "multi_class": "ovr", + "optimizer": "rmsprop", + "penalty": "L2", + "predict_param": { + "threshold": 0.5 + }, + "sqn_param": { + "memory_M": 5, + "random_seed": null, + "sample_size": 5000, + "update_interval_L": 3 + }, + "stepwise_param": { + "direction": "both", + "max_step": 10, + "mode": "hetero", + "need_stepwise": false, + "nvmax": null, + "nvmin": 2, + "role": "guest", + "score_name": "AIC" + }, + "tol": 0.0001, + "use_first_metric_only": false, + "validation_freqs": null + }, + "module": "HeteroLR" + }, + "retcode": 0, + "retmsg": "success" +} +``` + +### output-data + +Get the component output + +```bash +flow tracking output-data [options] +``` + +**options** + +| parameter-name | required | type | description | +| :--------------------- | :--- | :----- | ----------------------------- | +| -j, --job-id | yes | string | job-id | +| -r, --role | yes | string | participant-role | +| -p, --partyid | yes | string |participant-id | +| -cpn, --component-name | yes | string | Component name, consistent with that in job dsl | +| -o, --output-path | yes | string | Path to output data | + +**Returns** + +| parameter name | type | description | +| :------ | :----- | -------- | +| retcode | int | Return code | +| retmsg | string | return message | +| data | dict | return data | +| jobId | string | job id | + +**Example** + +```bash +flow tracking output-data -j 202111081618357358520 -r guest -p 9999 -cpn hetero_lr_0 -o . / +``` + +Output : + +```json +{ + "retcode": 0, + "directory": "$FATE_PROJECT_BASE/job_202111081618357358520_hetero_lr_0_guest_9999_output_data", + "retmsg": "Download successfully, please check $FATE_PROJECT_BASE/job_202111081618357358520_hetero_lr_0_guest_9999_output_data directory " +} +``` + +### output-data-table + +Get the output data table name of the component + +```bash +flow tracking output-data-table [options] +``` + +**options** + +| parameter-name | required | type | description | +| :--------------------- | :--- | :----- | ----------------------------- | +| -j, --job-id | yes | string | job-id | +| -r, --role | yes | string | participant-role | +| -p, --partyid | yes | string |participant-id | +| -cpn, --component-name | yes | string | Component name, consistent with that in job dsl | + +**Returns** + +| parameter-name | type | description | +| :------ | :----- | -------- | +| retcode | int | return code | +| retmsg | string | return message | +| data | dict | return data | +| jobId | string | job id | + +**Example** + +```bash +flow tracking output-data-table -j 202111081618357358520 -r guest -p 9999 -cpn hetero_lr_0 +``` + +output: + +```json +{ + "data": [ + { + "data_name": "train", + "table_name": "9688fa00406c11ecbd0bacde48001122", + "table_namespace": "output_data_202111081618357358520_hetero_lr_0_0" + } + ], + "retcode": 0, + "retmsg": "success" +} +``` + +### output-model + +Get the output model of a component task + +```bash +flow tracking output-model [options] +``` + +**options** + +| parameter-name | required | type | description | +| :--------------------- | :--- | :----- | ----------------------------- | +| -j, --job-id | yes | string | job-id | +| -r, --role | yes | string | participant-role | +| -p, --partyid | yes | string |participant-id | +| -cpn, --component-name | yes | string | Component name, consistent with that in job dsl | + +**Returns** + +| parameter-name | type | description | +| :------ | :----- | -------- | +| retcode | int | return code | +| retmsg | string | return message | +| data | dict | return data | +| jobId | string | job id | + +**Example** + +```bash +flow tracking output-model -j 202111081618357358520 -r guest -p 9999 -cpn hetero_lr_0 +``` + +Output: + +```json +{ + "data": { + "bestIteration": -1, + "encryptedWeight": {}, + "header": [ + "x0", + "x1", + "x2", + "x3", + "x4", + "x5", + "x6", + "x7", + "x8", + "x9" + ], + "intercept": 0.24451607054764884, + "isConverged": false, + "iters": 3, + "lossHistory": [], + "needOneVsRest": false, + "weight": { + "x0": 0.04639947589856569, + "x1": 0.19899685467216902, + "x2": -0.18133550931649306, + "x3": 0.44928868756862206, + "x4": 0.05285905125502288, + "x5": 0.319187932844076, + "x6": 0.42578983446194013, + "x7": -0.025765956309895477, + "x8": -0.3699194462271593, + "x9": -0.1212094750908295 + } + }, + "meta": { + "meta_data": { + "alpha": 0.01, + "batchSize": "320", + "earlyStop": "diff", + "fitIntercept": true, + "learningRate": 0.15, + "maxIter": "3", + "needOneVsRest": false, + "optimizer": "rmsprop", + "partyWeight": 0.0, + "penalty": "L2", + "reEncryptBatches": "0", + "revealStrategy": "", + "tol": 0.0001 + }, + "module_name": "HeteroLR" + }, + "retcode": 0, + "retmsg": "success" +} +``` + +### get-summary + +Each component allows to set some summary information for easy observation and analysis + +```bash +flow tracking get-summary [options] +``` + +**Options** + +| parameter-name | required | type | description | +| :--------------------- | :--- | :----- | ----------------------------- | +| -j, --job-id | yes | string | job-id | +| -r, --role | yes | string | participant-role | +| -p, --partyid | yes | string |participant-id | +| -cpn, --component-name | yes | string | Component name, consistent with that in job dsl | + +**Returns** + +| parameter name | type | description | +| :------ | :----- | -------- | +| retcode | int | return code | +| retmsg | string | return message | +| data | dict | return data | +| jobId | string | job id | + +**Example** + +```bash +flow tracking get-summary -j 202111081618357358520 -r guest -p 9999 -cpn hetero_lr_0 +``` + +Output: + +```json +{ + "data": { + "best_iteration": -1, + "coef": { + "x0": 0.04639947589856569, + "x1": 0.19899685467216902, + "x2": -0.18133550931649306, + "x3": 0.44928868756862206, + "x4": 0.05285905125502288, + "x5": 0.319187932844076, + "x6": 0.42578983446194013, + "x7": -0.025765956309895477, + "x8": -0.3699194462271593, + "x9": -0.1212094750908295 + }, + "intercept": 0.24451607054764884, + "is_converged": false, + "one_vs_rest": false + }, + "retcode": 0, + "retmsg": "success" +} +``` + +### tracking-source + +For querying the parent and source tables of a table + +```bash +flow table tracking-source [options] +``` + +**Options** + +| parameter-name | required | type | description | +| :-------- | :--- | :----- | -------------- | +| name | yes | string | fate table name | +| namespace | yes | string | fate table namespace | + +**Returns** + +| parameter name | type | description | +| :------ | :----- | -------- | +| retcode | int | return code | +| retmsg | string | return message | +| data | object | return data | + +**Example** + +```json +{ + "data": [{"parent_table_name": "61210fa23c8d11ec849a5254004fdc71", "parent_table_namespace": "output_data_202111031759294631020_hetero _lr_0_0", "source_table_name": "breast_hetero_guest", "source_table_namespace": "experiment"}], + "retcode": 0, + "retmsg": "success" +} +``` + +### tracking-job + +For querying the usage of a particular table + +```bash +flow table tracking-job [options] +``` + +**Options** + +| parameter name | required | type | description | +| :-------- | :--- | :----- | -------------- | +| name | yes | string | fate table name | +| namespace | yes | string | fate table namespace | + +**Returns** + +| parameter name | type | description | +| :------ | :----- | -------- | +| retcode | int | return code | +| retmsg | string | return message | +| data | object | return data | + +**Example** + +```json +{ + "data": {"count":2, "jobs":["202111052115375327830", "202111031816501123160"]}, + "retcode": 0, + "retmsg": "success" +} +``` diff --git a/doc/cli/tracking.zh.md b/doc/cli/tracking.zh.md index 3ce8512c0..2b5972a5e 100644 --- a/doc/cli/tracking.zh.md +++ b/doc/cli/tracking.zh.md @@ -8,7 +8,7 @@ flow tracking metrics [options] ``` -**参数** +**选项** | 参数名 | 必选 | 类型 | 说明 | | :--------------------- | :--- | :----- | ----------------------------- | @@ -17,7 +17,7 @@ flow tracking metrics [options] | -p, --partyid | 是 | string | 参与方id | | -cpn, --component-name | 是 | string | 组件名,与job dsl中的保持一致 | -**返回参数** +**返回** | 参数名 | 类型 | 说明 | | :------ | :----- | -------- | @@ -64,7 +64,7 @@ flow tracking metrics -j 202111081618357358520 -r guest -p 9999 -cpn evaluation_ flow tracking metric-all [options] ``` -**参数** +**选项** | 参数名 | 必选 | 类型 | 说明 | | :--------------------- | :--- | :----- | ----------------------------- | @@ -73,7 +73,7 @@ flow tracking metric-all [options] | -p, --partyid | 是 | string | 参与方id | | -cpn, --component-name | 是 | string | 组件名,与job dsl中的保持一致 | -**返回参数** +**返回** | 参数名 | 类型 | 说明 | | :------ | :----- | -------- | @@ -173,7 +173,7 @@ flow tracking metric-all -j 202111081618357358520 -r guest -p 9999 -cpn evaluati flow tracking parameters [options] ``` -**参数** +**选项** | 参数名 | 必选 | 类型 | 说明 | | :--------------------- | :--- | :----- | ----------------------------- | @@ -183,7 +183,7 @@ flow tracking parameters [options] | -cpn, --component-name | 是 | string | 组件名,与job dsl中的保持一致 | -**返回参数** +**返回** | 参数名 | 类型 | 说明 | | :------ | :----- | -------- | @@ -303,7 +303,7 @@ flow tracking parameters -j 202111081618357358520 -r guest -p 9999 -cpn hetero_ flow tracking output-data [options] ``` -**参数** +**选项** | 参数名 | 必选 | 类型 | 说明 | | :--------------------- | :--- | :----- | ----------------------------- | @@ -313,7 +313,7 @@ flow tracking output-data [options] | -cpn, --component-name | 是 | string | 组件名,与job dsl中的保持一致 | | -o, --output-path | 是 | string | 输出数据的存放路径 | -**返回参数** +**返回** | 参数名 | 类型 | 说明 | | :------ | :----- | -------- | @@ -346,7 +346,7 @@ flow tracking output-data -j 202111081618357358520 -r guest -p 9999 -cpn hetero flow tracking output-data-table [options] ``` -**参数** +**选项** | 参数名 | 必选 | 类型 | 说明 | | :--------------------- | :--- | :----- | ----------------------------- | @@ -355,7 +355,7 @@ flow tracking output-data-table [options] | -p, --partyid | 是 | string | 参与方id | | -cpn, --component-name | 是 | string | 组件名,与job dsl中的保持一致 | -**返回参数** +**返回** | 参数名 | 类型 | 说明 | | :------ | :----- | -------- | @@ -394,7 +394,7 @@ flow tracking output-data-table -j 202111081618357358520 -r guest -p 9999 -cpn flow tracking output-model [options] ``` -**参数** +**选项** | 参数名 | 必选 | 类型 | 说明 | | :--------------------- | :--- | :----- | ----------------------------- | @@ -403,7 +403,7 @@ flow tracking output-model [options] | -p, --partyid | 是 | string | 参与方id | | -cpn, --component-name | 是 | string | 组件名,与job dsl中的保持一致 | -**返回参数** +**返回** | 参数名 | 类型 | 说明 | | :------ | :----- | -------- | @@ -486,7 +486,7 @@ flow tracking output-model -j 202111081618357358520 -r guest -p 9999 -cpn heter flow tracking get-summary [options] ``` -**参数** +**选项** | 参数名 | 必选 | 类型 | 说明 | | :--------------------- | :--- | :----- | ----------------------------- | @@ -495,7 +495,7 @@ flow tracking get-summary [options] | -p, --partyid | 是 | string | 参与方id | | -cpn, --component-name | 是 | string | 组件名,与job dsl中的保持一致 | -**返回参数** +**返回** | 参数名 | 类型 | 说明 | | :------ | :----- | -------- | @@ -545,14 +545,14 @@ flow tracking get-summary -j 202111081618357358520 -r guest -p 9999 -cpn hetero_ flow table tracking-source [options] ``` -**参数** +**选项** | 参数名 | 必选 | 类型 | 说明 | | :-------- | :--- | :----- | -------------- | | name | 是 | string | fate表名 | | namespace | 是 | string | fate表命名空间 | -**返回参数** +**返回** | 参数名 | 类型 | 说明 | | :------ | :----- | -------- | @@ -560,7 +560,7 @@ flow table tracking-source [options] | retmsg | string | 返回信息 | | data | object | 返回数据 | -样例: +**样例** ```json { @@ -574,20 +574,18 @@ flow table tracking-source [options] 用于查询某张表的使用情况 -**请求CLI** - ```bash flow table tracking-job [options] ``` -**参数** +**选项** | 参数名 | 必选 | 类型 | 说明 | | :-------- | :--- | :----- | -------------- | | name | 是 | string | fate表名 | | namespace | 是 | string | fate表命名空间 | -**返回参数** +**返回** | 参数名 | 类型 | 说明 | | :------ | :----- | -------- | @@ -595,7 +593,7 @@ flow table tracking-job [options] | retmsg | string | 返回信息 | | data | object | 返回数据 | -样例: +**样例** ```json { diff --git a/doc/configuration_instruction.md b/doc/configuration_instruction.md new file mode 100644 index 000000000..ae04a5d23 --- /dev/null +++ b/doc/configuration_instruction.md @@ -0,0 +1,374 @@ +# Configuration Instructions + +## 1. Description + +Contains the general configuration of the `FATE project` and the configuration of each subsystem + +## 2. Global configuration + +- Path: `${FATE_PROJECT_BASE}/conf/server_conf.yaml` +- Description: Commonly used configuration, generally needed to determine when deploying +- Note: Configuration items that are not listed below in the configuration file are internal system parameters and are not recommended to be modified + +```yaml +# If FATEFlow uses the registry, FATEFlow will register the FATEFlow Server address and the published model download address to the registry for the online system FATEServing; it will also get the FATEServing address from the registry. +use_registry: false +# Whether to enable higher security serialization mode +use_deserialize_safe_module: false +dependent_distribution: false +fateflow: + # you must set real ip address, 127.0.0.1 and 0.0.0.0 is not supported + host: 127.0.0.1 + http_port: 9380 + grpc_port: 9360 + http_app_key: + http_secret_key: + # support rollsite/nginx/fateflow as a coordination proxy + # rollsite support fate on eggroll, use grpc protocol + # nginx support fate on eggroll and fate on spark, use http or grpc protocol, default is http + # fateflow support fate on eggroll and fate on spark, use http protocol, but not support exchange network mode + + # format(proxy: rollsite) means rollsite use the rollsite configuration of fate_one_eggroll and nginx use the nginx configuration of fate_one_spark + # you also can customize the config like this(set fateflow of the opposite party as proxy): + # proxy: + # name: fateflow + # host: xx + # http_port: xx + # grpc_port: xx + proxy: rollsite + # support default/http/grpc + protocol: default +database: + name: fate_flow + user: fate + passwd: fate + host: 127.0.0.1 + port: 3306 + max_connections: 100 + stale_timeout: 30 +# The registry address and its authentication parameters +zookeeper: + hosts: + - 127.0.0.1:2181 + use_acl: false + user: fate + password: fate +# engine services +default_engines: + computing: standalone + federation: standalone + storage: standalone +fate_on_standalone: + standalone: + cores_per_node: 20 + nodes: 1 +fate_on_eggroll: + clustermanager: + # CPU cores of the machine where eggroll nodemanager service is running + cores_per_node: 16 + # the number of eggroll nodemanager machine + nodes: 1 + rollsite: + host: 127.0.0.1 + port: 9370 +fate_on_spark: + spark: + # default use SPARK_HOME environment variable + home: + cores_per_node: 20 + nodes: 2 + linkis_spark: + cores_per_node: 20 + nodes: 2 + host: 127.0.0.1 + port: 9001 + token_code: MLSS + python_path: /data/projects/fate/python + hive: + host: 127.0.0.1 + port: 10000 + auth_mechanism: + username: + password: + linkis_hive: + host: 127.0.0.1 + port: 9001 + hdfs: + name_node: hdfs://fate-cluster + # default / + path_prefix: + rabbitmq: + host: 192.168.0.4 + mng_port: 12345 + port: 5672 + user: fate + password: fate + # default conf/rabbitmq_route_table.yaml + route_table: + pulsar: + host: 192.168.0.5 + port: 6650 + mng_port: 8080 + cluster: standalone + # all parties should use a same tenant + tenant: fl-tenant + # message ttl in minutes + topic_ttl: 5 + # default conf/pulsar_route_table.yaml + route_table: + nginx: + host: 127.0.0.1 + http_port: 9300 + grpc_port: 9310 +# external services +fateboard: + host: 127.0.0.1 + port: 8080 + +# on API `/model/load` and `/model/load/do` +# automatic upload models to the model store if it exists locally but does not exist in the model storage +# or download models from the model store if it does not exist locally but exists in the model storage +# this config will not affect API `/model/store` or `/model/restore` +enable_model_store: false +# default address for export model +model_store_address: + # use mysql as the model store engine +# storage: mysql +# database: fate_model +# user: fate +# password: fate +# host: 127.0.0.1 +# port: 3306 + # other optional configs send to the engine +# max_connections: 10 +# stale_timeout: 10 + # use redis as the model store engine +# storage: redis +# host: 127.0.0.1 +# port: 6379 +# db: 0 +# password: + # the expiry time of keys, in seconds. defaults None (no expiry time) +# ex: + # use tencent cos as model store engine + storage: tencent_cos + Region: + SecretId: + SecretKey: + Bucket: + +# The address of the FATE Serving Server needs to be configured if the registry is not used +servings: + hosts: + - 127.0.0.1:8000 +fatemanager: + host: 127.0.0.1 + port: 8001 + federatedId: 0 + +``` + +## 3. FATE Flow Configuration + +### 3.1 FATE Flow Server Configuration + +- Path: `${FATE_FLOW_BASE}/python/fate_flow/settings.py` +- Description: Advanced configuration, generally no changes are needed +- Note: Configuration items that are not listed below in the configuration file are internal system parameters and are not recommended to be modified + +```python +# Thread pool size of grpc server used by FATE Flow Server for multiparty FATE Flow Server communication, not set default equal to the number of CPU cores of the machine +GRPC_SERVER_MAX_WORKERS = None + +# Switch +# The upload data interface gets data from the client by default, this value can be configured at the time of the interface call using use_local_data +UPLOAD_DATA_FROM_CLIENT = True +# Whether to enable multi-party communication authentication, need to be used with FATE Cloud +CHECK_NODES_IDENTITY = False +# Whether to enable the resource authentication function, need to use with FATE Cloud +USE_AUTHENTICATION = False +# Resource privileges granted by default +PRIVILEGE_COMMAND_WHITELIST = [] +``` + +### 3.2 FATE Flow Default Job Configuration + +- Path: `${FATE_FLOW_BASE}/conf/job_default_config.yaml` +- Description: Advanced configuration, generally no changes are needed +- Note: Configuration items that are not listed below in the configuration file are internal system parameters and are not recommended to be modified +- Take effect: use flow server reload or restart fate flow server + +```yaml +# component provider, relative path to get_fate_python_directory +default_component_provider_path: federatedml + +# resource +# total_cores_overweight_percent +total_cores_overweight_percent: 1 # 1 means no overweight +total_memory_overweight_percent: 1 # 1 means no overweight +# Default task parallelism per job, you can configure a custom value using job_parameters:task_parallelism when submitting the job configuration +task_parallelism: 1 +# The default number of CPU cores per task per job, which can be configured using job_parameters:task_cores when submitting the job configuration +task_cores: 4 +# This configuration does not take effect as memory resources are not supported for scheduling at the moment +task_memory: 0 # mb +# The ratio of the maximum number of CPU cores allowed for a job to the total number of resources, e.g., if the total resources are 10 and the value is 0.5, then a job is allowed to request up to 5 CPUs, i.e., task_cores * task_parallelism <= 10 * 0.5 +max_cores_percent_per_job: 1 # 1 means total + +# scheduling +# Default job execution timeout, you can configure a custom value using job_parameters:timeout when submitting the job configuration +job_timeout: 259200 # s +# Timeout for communication when sending cross-participant scheduling commands or status +remote_request_timeout: 30000 # ms +# Number of retries to send cross-participant scheduling commands or status +federated_command_trys: 3 +end_status_job_scheduling_time_limit: 300000 # ms +end_status_job_scheduling_updates: 1 +# Default number of auto retries, you can configure a custom value using job_parameters:auto_retries when submitting the job configuration +auto_retries: 0 +# Default retry interval +auto_retry_delay: 1 #seconds +# Default multiparty status collection method, supports PULL and PUSH; you can also specify the current job collection mode in the job configuration +federated_status_collect_type: PUSH + +# upload +upload_max_bytes: 104857600 # bytes + +#component output +output_data_summary_count_limit: 100 +``` + +## 4. FATE Board Configuration + +- Path: `${FATE_BOARD_BASE}/conf/application.properties` +- Description: Commonly used configuration, generally needed to determine when deploying +- Note: Configuration items that are not listed below in the configuration file are internal system parameters and are not recommended to be modified + +```properties +# Service listening port +server.port=8080 +# fateflow address, referring to the http port address of fateflow +fateflow.url==http://127.0.0.1:9380 +# db address, same as the above global configuration service_conf.yaml inside the database configuration +fateboard.datasource.jdbc-url=jdbc:mysql://localhost:3306/fate_flow?characterEncoding=utf8&characterSetResults=utf8&autoReconnect= true&failOverReadOnly=false&serverTimezone=GMT%2B8 +# db configuration, same as the above global configuration service_conf.yaml inside the database configuration +fateboard.datasource.username= +# db configuration, same as the above global configuration service_conf.yaml inside the database configuration +fateboard.datasource.password= +server.tomcat.max-threads=1000 +server.tomcat.max-connections=20000 +spring.servlet.multipart.max-file-size=10MB +spring.servlet.multipart.max-request-size=100MB +# Administrator account configuration +server.board.login.username=admin +server.board.login.password=admin +server.ssl.key-store=classpath: +server.ssl.key-store-password= +server.ssl.key-password= +server.ssl.key-alias= +# When fateflo server enables api access authentication, you need to configure +HTTP_APP_KEY= +HTTP_SECRET_KEY= +``` + +## 5. EggRoll + +### 5.1 System configuration + +- Path: `${EGGROLL_HOME}/conf/eggroll.properties` +- Description: Commonly used configuration, generally needed to determine when deploying +- Note: Configuration items that are not listed below in the configuration file are internal system parameters and are not recommended to be modified + +```properties +[eggroll] +# core +# MySQL connection configuration, generally required for production applications +eggroll.resourcemanager.clustermanager.jdbc.driver.class.name=com.mysql.cj.jdbc. +# MySQL connection configuration, generally required for production applications +eggroll.resourcemanager.clustermanager.jdbc.url=jdbc:mysql://localhost:3306/eggroll_meta?useSSL=false&serverTimezone=UTC& characterEncoding=utf8&allowPublicKeyRetrieval=true +# Connect to MySQL account, this configuration is required for general production applications +eggroll.resourcemanager.clustermanager.jdbc.username= +# Connect to MySQL password, generally required for production applications +eggroll.resourcemanager.clustermanager.jdbc.password= + +# Data storage directory +eggroll.data.dir=data/ +# Log storage directory +eggroll.logs.dir=logs/ +eggroll.resourcemanager.clustermanager.host=127.0.0.1 +eggroll.resourcemanager.clustermanager.port=4670 +eggroll.resourcemanager.nodemanager.port=4670 + +# python path +eggroll.resourcemanager.bootstrap.eggg_pair.venv= +# pythonpath, usually you need to specify the python directory of eggroll and the python directory of fate +eggroll.resourcemanager.bootstrap.eggg_pair.pythonpath=python + +# java path +eggroll.resourcemanager.bootstrap.eggg_frame.javahome= +# java service startup parameters, no special needs, no need to configure +eggroll.resourcemanager.bootstrap.eggg_frame.jvm.options= +# grpc connection hold time for multi-party communication +eggroll.core.grpc.channel.keepalive.timeout.sec=20 + +# session +# Number of computing processes started per nodemanager in an eggroll session; replaced by the default parameters of the fate flow if using fate for committing tasks +eggroll.session.processors.per.node=4 + +# rollsite +eggroll.rollsite.coordinator=webank +eggroll.rollsite.host=127.0.0.1 +eggroll.rollsite.port=9370 +eggroll.rollsite.party.id=10001 +eggroll.rollsite.route.table.path=conf/route_table.json + +eggroll.rollsite.push.max.retry=3 +eggroll.rollsite.push.long.retry=2 +eggroll.rollsite.push.batches.per.stream=10 +eggroll.rollsite.adapter.sendbuf.size=100000 +``` + +### 5.2 Routing table configuration + +- Path: `${EGGROLL_HOME}/conf/route_table.json` +- Description: Commonly used configuration, generally needed to determine when deploying + - The routing table is mainly divided into two levels + - The first level indicates the site, if the corresponding target site configuration is not found, then use **default** + - The second level represents the service, if you can not find the corresponding target service, then use **default** + - The second level, usually set **default** as the address of our **rollsite** service, and **fateflow** as the grpc address of our **fate flow server** service + +```json +{ + "route_table": + { + "10001": + { + "default":[ + { + "port": 9370, + "ip": "127.0.0.1" + } + ], + "fateflow":[ + { + "port": 9360, + "ip": "127.0.0.1" + } + ] + }, + "10002": + { + "default":[ + { + "port": 9470, + "ip": "127.0.0.1" + } + ] + } + }, + "permission": + { + "default_allow": true + } +} +``` diff --git a/doc/document_navigation.md b/doc/document_navigation.md new file mode 100644 index 000000000..6d9e7f352 --- /dev/null +++ b/doc/document_navigation.md @@ -0,0 +1,66 @@ +# Document Navigation + +## 1. General Document Variables + +You will see the following `document variables` in all `FATE Flow` documentation, with the following meanings. + +- FATE_PROJECT_BASE: denotes the `FATE project` deployment directory, containing configuration, fate algorithm packages, fate clients and subsystems: `bin`, `conf`, `examples`, `fate`, `fateflow`, `fateboard`, `eggroll`, etc. +- FATE_BASE: The deployment directory of `FATE`, named `fate`, contains algorithm packages, clients: `federatedml`, `fate arch`, `fate client`, usually the path is `${FATE_PROJECT_BASE}/fate` +- FATE_FLOW_BASE: The deployment directory of `FATE Flow`, named `fateflow`, containing `fate flow server`, etc., usually the path is `${FATE_PROJECT_BASE}/fateflow` +- FATE_BOARD_BASE: the deployment directory of `FATE Board`, name `fateboard`, contains `fateboard`, usually the path is `${FATE_PROJECT_BASE}/fateboard` +- EGGROLL_HOME: the deployment directory for `EggRoll`, named `eggroll`, containing `rollsite`, `clustermanager`, `nodemanager`, etc., usually in `${FATE_PROJECT_BASE}/eggroll` + + Deploy the `FATE project` with reference to the main repository [FederatedAI/FATE](https://github.com/FederatedAI/FATE), the main directory structure is as follows + + - bin + - conf + - examples + - doc + - fate + - python + - fate_arch + - federatedml + - fateflow + - conf + - doc + - python + - fate_flow + - logs + - jobs + - fateboard + - conf + - fateboard.jar + - logs + - eggroll + - bin + - conf + - lib + - python + - data + - logs + - fate.env + +- FATE_VERSION: The version number of `FATE`, e.g. 1.7.0 +- FATE_FLOW_VERSION: the version number of `FATE Flow`, e.g. 1.7.0 +- version: Generally in the deployment documentation, it means the version number of `FATE project`, such as `1.7.0`, `1.6.0`. +- version_tag: generally in the deployment documentation, indicates the `FATE project` version tag, such as `release`, `rc1`, `rc10` + +## 2. Glossary of terms + +`component_name`: the name of the component when the task is submitted, a task can have more than one of the same component, but the `component_name` is not the same, equivalent to an instance of the class + +`componet_module_name`: the class name of the component + +`model_alias`: similar to `component_name`, which is the name of the output model that the user can configure inside dsl + +Example. + +In the figure `dataio_0` is `component_name`, `DataIO` is `componet_module_name`, `dataio` is `model_alias` + +! [](https://user-images.githubusercontent.com/1758850/124451776-52ee4500-ddb8-11eb-94f2-d43d5174ca4d.png) + +## 3. Reading guide + +1. you can first read [overall design](. /fate_flow.zh.md) +2. Refer to the main repository [FATE](https://github.com/FederatedAI/FATE) for deployment, either standalone (installer, Docker, source compiler) or cluster (Ansible, Docker, Kuberneters) +3. You can refer to the directory in order of navigation diff --git a/doc/faq.md b/doc/faq.md new file mode 100644 index 000000000..80fcce1ac --- /dev/null +++ b/doc/faq.md @@ -0,0 +1,102 @@ +# FAQ + +## 1. Description + +## 2. Log descriptions + +In general, to troubleshoot a problem, the following logs are required. + +## v1.7+ + +- `${FATE_PROJECT_BASE}/fateflow/logs/$job_id/fate_flow_schedule.log`, this is the internal scheduling log of a certain task + +- `${FATE_PROJECT_BASE}/fateflow/logs/$job_id/*` These are all the execution logs of a certain task + +- `${FATE_PROJECT_BASE}/fateflow/logs/fate_flow/fate_flow_stat.log`, this is some logs that are not related to tasks + +- `${FATE_PROJECT_BASE}/fateflow/logs/fate_flow/fate_flow_schedule.log`, this is the overall scheduling log of all tasks + +- `${FATE_PROJECT_BASE}/fateflow/logs/fate_flow/fate_flow_detect.log`, which is the overall exception detection log for all tasks + +### v1.7- + +- `${FATE_PROJECT_BASE}/logs/$job_id/fate_flow_schedule.log`, this is the internal scheduling log for a particular task + +- `${FATE_PROJECT_BASE}/logs/$job_id/*` These are all the execution logs of a certain task + +- `${FATE_PROJECT_BASE}/logs/fate_flow/fate_flow_stat.log`, this is some logs that are not related to the task + +- `${FATE_PROJECT_BASE}/logs/fate_flow/fate_flow_schedule.log`, this is the overall scheduling log of all tasks + +- `${FATE_PROJECT_BASE}/logs/fate_flow/fate_flow_detect.log`, this is the overall exception detection log of all tasks + +## 3. Offline + +### upload failed + +- checking eggroll related services for exceptions. + +### submit job is stuck + +- check if both rollsite services have been killed + +### submit_job returns grpc exception + +- submit job link: guest fate_flow -> guest rollsite -> host rollsite -> host fate_flow +- check that each service in the above link is not hung, it must be ensured that each node is functioning properly. +- checking that the routing table is correctly configured. + +### dataio component exception: not enough values to unpack (expected 2, got 1) + +- the data separator does not match the separator in the configuration + +### Exception thrown at task runtime: "Count of data_instance is 0" + +- task has an intersection component and the intersection match rate is 0, need to check if the output data ids of guest and host can be matched. + +## 4. Serving + +### load model retcode returns 100, what are the possible reasons? + +- no fate-servings deployed + +- flow did not fetch the fate-servings address + +- flow reads the address of the fate-servings in priority order: + + 1. read from zk + + 2. if zk is not open, it will read from the fate-servings configuration file, the configuration path is + + - 1.5+: `${FATE_PROJECT_BASE}/conf/service_conf.yaml` + + - 1.5-: `${FATE_PROJECT_BASE}/arch/conf/server_conf.json` + +### load model retcode returns 123, what are the possible reasons? + +- Incorrect model information. +- This error code is thrown by fate-servings not finding the model. + +### bind model operation prompted "no service id"? + +- Customize the service_id in the bind configuration + +### Where is the configuration of servings? How do I configure it? + +- v1.5+ Configuration path: `${FATE_PROJECT_BASE}/conf/service_conf.yaml` + +```yaml +servings: + hosts: + - 127.0.0.1:8000 +``` + +- v1.5- Configuration path: `${FATE_PROJECT_BASE}/arch/conf/server_conf.json` + +```json +{ + "servers": { + "servings": ["127.0.0.1:8000"] + } +} +``` diff --git a/doc/faq.zh.md b/doc/faq.zh.md index 35c3cf427..7f78cf7a8 100644 --- a/doc/faq.zh.md +++ b/doc/faq.zh.md @@ -6,7 +6,7 @@ 一般来说,排查问题,需要如下几个日志: -### 1.7+: +### v1.7+ - `${FATE_PROJECT_BASE}/fateflow/logs/$job_id/fate_flow_schedule.log`,这个是某个任务的内部调度日志 @@ -18,7 +18,7 @@ - `${FATE_PROJECT_BASE}/fateflow/logs/fate_flow/fate_flow_detect.log`,这个是所有任务的整体异常探测日志 -### 1.7-: +### v1.7- - `${FATE_PROJECT_BASE}/logs/$job_id/fate_flow_schedule.log`,这个是某个任务的内部调度日志 diff --git a/doc/fate_flow.md b/doc/fate_flow.md new file mode 100644 index 000000000..9360d44db --- /dev/null +++ b/doc/fate_flow.md @@ -0,0 +1,110 @@ +# Overall Design + +## 1. Logical Architecture + +- DSL defined jobs +- Top-down vertical subtask flow scheduling, multi-participant joint subtask coordination +- Independent isolated task execution work processes +- Support for multiple types and versions of components +- Computational abstraction API +- Storage abstraction API +- Cross-party transfer abstraction API + +![](./images/fate_flow_logical_arch.png) + +## 2. Service Architecture + +### 2.1 FATE + +![](./images/fate_arch.png) + +### 2.2 FATE Flow + +![](./images/fate_flow_arch.png) + +## 3. [Scheduling Architecture](./fate_flow_job_scheduling.md) + +### 3.1 A new scheduling architecture based on shared-state + +- Stripping state (resources, jobs) and managers (schedulers, resource managers) +- Resource state and job state are persisted in MySQL and shared globally to provide reliable transactional operations +- Improve the high availability and scalability of managed services +- Jobs can be intervened to support restart, rerun, parallel control, resource isolation, etc. + +![](./images/fate_flow_scheduling_arch.png) + +### 3.2 State-Driven Scheduling + +- Resource coordination +- Pull up the child process Executor to run the component +- Executor reports state to local Server and also to scheduler +- Multi-party task state calculation of federal task state +- Upstream and downstream task states compute job states + +![](./images/fate_flow_resource_process.png) + +## 4. [Multiparty Resource Coordination](./fate_flow_resource_management.md) + +- The total resource size of each engine is configured through the configuration file, and the system is subsequently interfaced +- The cores_per_node in the total resource size indicates the number of cpu cores per compute node, and nodes indicates the number of compute nodes. +- FATEFlow server reads the resource size configuration from the configuration file when it starts and registers the update to the database +- The resources are requested in Job dimension, and take effect when Job Conf is submitted, formula: task_parallelism*task_cores +- See separate section of the documentation for details + +## 5. [Data Flow Tracking](./fate_flow_tracking.md) + +- Definition + - metric type: metric type, such as auc, loss, ks, etc. + - metric namespace: custom metric namespace, e.g. train, predict + - metric name: custom metric name, e.g. auc0, hetero_lr_auc0 + - metric data: metric data in key-value form + - metric meta: metric meta information in key-value form, support flexible drawing +- API + - log_metric_data(metric_namespace, metric_name, metrics) + - set_metric_meta(metric_namespace, metric_name, metric_meta) + - get_metric_data(metric_namespace, metric_name) + - get_metric_meta(metric_namespace, metric_name) + +## 6. [Realtime Monitoring](./fate_flow_monitoring.md) + +- Job process survivability detection +- Job timeout detection +- Resource recovery detection +- Base engine session timeout detection + +![](./images/fate_flow_detector.png) + +## 7. [Task Component Registry](./fate_flow_component_registry.md) + +![](./images/fate_flow_component_registry.png) + +## 8. [Multi-Party Federated Model Registry](./fate_flow_model_registry.md) + +- Using Google Protocol Buffer as the model storage protocol, using cross-language sharing, each algorithmic model consists of two parts: ModelParam & ModelMeta +- A Pipeline generates a series of algorithmic models +- The model named Pipeline stores Pipeline modeling DSL and online inference DSL +- Under federal learning, model consistency needs to be guaranteed for all participants, i.e., model binding +- model_key is the model identifier defined by the user when submitting the task +- The model IDs of the federated parties are the party identification information role, party_id, plus model_key +- The model version of the federated parties must be unique and consistent, and FATE-Flow directly sets it to job_id + +![](./images/fate_flow_pipelined_model.png){: style="height:400px;width:450px"} + +![](./images/fate_flow_model_storage.png){: style="height:400px;width:800px"} + +## 9. [Data Access](./fate_flow_data_access.md) + +- Upload. + - External storage is imported directly to FATE Storage, creating a new DTable + - When the job runs, Reader reads directly from Storage + +- Table Bind. + - Key the external storage address to a new DTable in FATE + - When the job is running, Reader reads data from external storage via Meta and transfers it to FATE Storage + - Connecting to the Big Data ecosystem: HDFS, Hive/MySQL + +![](./images/fate_flow_inputoutput.png) + +## 10. [Multi-Party Collaboration Authority Management](./fate_flow_authority_management.md) + +![](./images/fate_flow_authorization.png) \ No newline at end of file diff --git a/doc/fate_flow.zh.md b/doc/fate_flow.zh.md index 4b981d5f9..0aa3ec000 100644 --- a/doc/fate_flow.zh.md +++ b/doc/fate_flow.zh.md @@ -1,9 +1,9 @@ -# 整体设计(待更新) +# 整体设计 ## 1. 逻辑架构 - DSL定义作业 -- 自顶向下的纵向子任务流调度、多参与方联合子任务调度 +- 自顶向下的纵向子任务流调度、多参与方联合子任务协调 - 独立隔离的任务执行工作进程 - 支持多类型多版本组件 - 计算抽象API @@ -22,7 +22,7 @@ ![](./images/fate_flow_arch.png) -## 3. 调度架构 +## 3. [调度架构](./fate_flow_job_scheduling.zh.md) ### 3.1 基于共享状态的全新调度架构 @@ -43,15 +43,15 @@ ![](./images/fate_flow_resource_process.png) -## 4. 多方资源协调 +## 4. [多方资源协调](./fate_flow_resource_management.zh.md) - 每个引擎总资源大小通过配置文件配置,后续实现系统对接 - 总资源大小中的cores_per_node表示每个计算节点cpu核数,nodes表示计算节点个数 - FATEFlow server启动时从配置文件读取资源大小配置,并注册更新到数据库 - 以Job维度申请资源,Job Conf提交时生效,公式:task_parallelism*task_cores -- 详细配置讲解: https://github.com/FederatedAI/FATE/blob/master/doc/dsl_conf_v2_setting_guide_zh.rst#4-%E7%B3%BB%E7%BB%9F%E8%BF%90%E8%A1%8C%E5%8F%82%E6%95%B0 +- 详细请看文档单独章节 -## 5. 数据流动追踪 +## 5. [数据流动追踪](./fate_flow_tracking.zh.md) - 定义 - metric type: 指标类型,如auc, loss, ks等等 @@ -65,21 +65,20 @@ - get_metric_data(metric_namespace, metric_name) - get_metric_meta(metric_namespace, metric_name) -## 6. 作业实时监测 +## 6. [作业实时监测](./fate_flow_monitoring.zh.md) -- 任务执行工作进程 -- task executor存活检测 +- 工作进程存活性检测 - 作业超时检测 -- 资源使用超时检测 -- 会话超时检测 +- 资源回收检测 +- 基础引擎会话超时检测 ![](./images/fate_flow_detector.png) -## 7. 任务组件中心 +## 7. [任务组件中心](./fate_flow_component_registry.zh.md) ![](./images/fate_flow_component_registry.png) -## 8. 多方联合模型注册中心 +## 8. [多方联合模型注册中心](./fate_flow_model_registry.zh.md) - 使用Google Protocol Buffer作为模型存储协议,利用跨语言共享,每个算法模型由两部分组成:ModelParam & ModelMeta - 一个Pipeline产生一系列算法模型 @@ -89,12 +88,11 @@ - 联邦各方的模型ID由本方标识信息role、party_id,加model_key - 联邦各方的模型版本必须唯一且保持一致,FATE-Flow直接设置为job_id -![](./images/fate_flow_pipelined_model.png) +![](./images/fate_flow_pipelined_model.png){: style="height:400px;width:450px"} -![](./images/fate_flow_model_storage.png) +![](./images/fate_flow_model_storage.png){: style="height:400px;width:800px"} - -## 9. 多生态数据接入 +## 9. [数据接入](./fate_flow_data_access.zh.md) - Upload: - 外部存储直接导入到FATE Storage,创建一个新的DTable @@ -107,7 +105,6 @@ ![](./images/fate_flow_inputoutput.png) - -## 10. 多方合作权限管理 +## 10. [多方合作权限管理](./fate_flow_authority_management.zh.md) ![](./images/fate_flow_authorization.png) diff --git a/doc/fate_flow_authority_management.md b/doc/fate_flow_authority_management.md new file mode 100644 index 000000000..df3cd5a1f --- /dev/null +++ b/doc/fate_flow_authority_management.md @@ -0,0 +1,25 @@ +# Multi-Party Collaboration Authority Management + +## 1. Description + +- Permission types include role, command, component + +- Authentication switch: `$FATE_FLOW_BASE/python/fate_flow/settings.py`. + + ```python + USE_AUTHENTICATION = True + ``` + +## 2. authorization + +{{snippet('cli/privilege.md', '### grant')}} + + +## 3. revoke privileges + +{{snippet('cli/privilege.md', '### delete')}} + + +## 4. Permission query + +{{snippet('cli/privilege.md', '### query')}} diff --git a/doc/fate_flow_client.md b/doc/fate_flow_client.md new file mode 100644 index 000000000..b134b4ca4 --- /dev/null +++ b/doc/fate_flow_client.md @@ -0,0 +1,164 @@ +# FATE Flow Client + +## Description + +- Introduces how to install and use the `FATE Flow Client`, which is usually included in the `FATE Client`, which contains several clients of the `FATE Project`: `Pipeline`, `FATE Flow Client` and `FATE Test`. +- Introducing the command line provided by `FATE Flow Client`, all commands will have a common invocation entry, you can type `flow` in the command line to get all the command categories and their subcommands. + +```bash + [IN] + flow + + [OUT] + Usage: flow COMMAND [OPTIONS] + + Fate Flow Client + + Options. + -h, --help Show this message and exit. + + Commands: -h, --help + Component Component Operations + data Data Operations + init Flow CLI Init Command + Job Job Operations + model Model Operations + queue Queue Operations + table Table Operations + task Task Operations +``` + +For more information, please consult the following documentation or use the `flow --help` command. + +- All commands are described + +## Install FATE Client + +### Online installation + +FATE Client will be distributed to `pypi`, you can install the corresponding version directly using tools such as `pip`, e.g. + +```bash +pip install fatale-client +``` + +or + +```bash +pip install atmosphere-client==${version} +``` + +### Installing on a FATE cluster + +Please install on a machine with version 1.5.1 and above of FATE. + +Installation command. + +```shell +cd $FATE_PROJECT_BASE/ +# Enter the virtual environment of FATE PYTHON +source bin/init_env.sh +# Execute the installation +cd fate/python/fate_client && python setup.py install +``` + +Once the installation is complete, type ``flow`` on the command line and enter, the installation will be considered successful if you get the following return. + +```shell +Usage: flow [OPTIONS] COMMAND [ARGS]... + + Fate Flow Client + +Options: + -h, --help Show this message and exit. + +Commands: + component Component Operations + data Data Operations + init Flow CLI Init Command + Job Job Operations + model Model Operations + queue Queue Operations + Table Table Operations + tag Tag Operations + task Task Operations +Task Operations + +## Initialization + +Before using the fate-client, you need to initialize it. It is recommended to use the configuration file of fate-client to initialize it. + +### Specify the fateflow service address + +```bash +### Specify the IP address and port of the fateflow service for initialization +flow init --ip 192.168.0.1 --port 9380 +``` + +### via the configuration file on the FATE cluster + +```shell +### Go to the FATE installation path, e.g. /data/projects/fate +cd $FATE_PROJECT_BASE/ +flow init -c conf/service_conf.yaml +``` + +The initialization is considered successful if you get the following return. + +```json +{ + "retcode": 0, + "retmsg": "Fate Flow CLI has been initialized successfully." +} +``` + +## Verify + +Mainly verify that the client can connect to the `FATE Flow Server`, e.g. try to query the current job status + +```bash +flow job query +``` + +Usually the `retcode` in the return is `0`. + +```json +{ + "data": [], + "retcode": 0, + "retmsg": "no job could be found" +} +``` + +If it returns something like the following, it means that the connection is not available, please check the network situation + +```json +{ + "retcode": 100, + "retmsg": "Connection refused. Please check if the fate flow service is started" +} +``` + +{{snippet('cli/data.md')}} + +{{snippet('cli/table.md')}} + +{{snippet('cli/job.md')}} + +{{snippet('cli/task.md')}} + +{{snippet('cli/tracking.md')}} + +{{snippet('cli/model.md')}} + +{{snippet('cli/checkpoint.md')}} + +{{snippet('cli/provider.md')}} + +{{snippet('cli/resource.md')}} + +{{snippet('cli/privilege.md')}} + +{{snippet('cli/tag.md')}} + +{{snippet('cli/server.md')}} diff --git a/doc/fate_flow_client.zh.md b/doc/fate_flow_client.zh.md index 5a6071f6c..2d240e101 100644 --- a/doc/fate_flow_client.zh.md +++ b/doc/fate_flow_client.zh.md @@ -10,7 +10,7 @@ flow [OUT] - Usage: flow [OPTIONS] COMMAND [ARGS]... + Usage: flow COMMAND [OPTIONS] Fate Flow Client @@ -59,7 +59,7 @@ cd $FATE_PROJECT_BASE/ # 进入FATE PYTHON的虚拟环境 source bin/init_env.sh # 执行安装 -cd ./fate/python/fate_client && python setup.py install +cd fate/python/fate_client && python setup.py install ``` 安装完成之后,在命令行键入`flow` 并回车,获得如下返回即视为安装成功: @@ -100,7 +100,7 @@ flow init --ip 192.168.0.1 --port 9380 ```shell # 进入FATE的安装路径,例如/data/projects/fate cd $FATE_PROJECT_BASE/ -flow init -c ./conf/service_conf.yaml +flow init -c conf/service_conf.yaml ``` 获得如下返回视为初始化成功: @@ -161,4 +161,4 @@ flow job query {{snippet('cli/tag.zh.md')}} -{{snippet('cli/server.zh.md')}} \ No newline at end of file +{{snippet('cli/server.zh.md')}} diff --git a/doc/fate_flow_component_registry.md b/doc/fate_flow_component_registry.md new file mode 100644 index 000000000..9b0376679 --- /dev/null +++ b/doc/fate_flow_component_registry.md @@ -0,0 +1,19 @@ +# Task Component Registry + +## 1. Description + +- After `FATE Flow` version 1.7, it started to support multiple versions of component packages at the same time, for example, you can put both `fate` algorithm component packages of `1.7.0` and `1.7.1` versions +- We refer to the provider of the algorithm component package as the `component provider`, and the `name` and `version` uniquely identify the `component provider`. +- When submitting a job, you can specify which component package to use for this job via `job dsl`, please refer to [componentprovider](./fate_flow_job_scheduling.md#35-Component-Providers) + +## 2. Default Component Provider + +Deploying a `FATE` cluster will include a default component provider, which is usually found in the `${FATE_PROJECT_BASE}/python/federatedml` directory + +## 3. current component provider + +{{snippet('cli/provider.md', '### list')}} + +## 4. new component provider + +{{snippet('cli/provider.md', '### register')}} \ No newline at end of file diff --git a/doc/fate_flow_data_access.md b/doc/fate_flow_data_access.md new file mode 100644 index 000000000..1d766b2fc --- /dev/null +++ b/doc/fate_flow_data_access.md @@ -0,0 +1,81 @@ +# Data Access + +## 1. Description + +- The storage tables of fate are identified by table name and namespace. + +- fate provides an upload component for users to upload data to a storage system supported by the fate compute engine. + +- If the user's data already exists in a storage system supported by fate, the storage information can be mapped to a fate storage table by table bind. + +- If the table bind's table storage type is not consistent with the current default engine, the reader component will automatically convert the storage type; + + + +## 2. data upload + +{{snippet('cli/data.md', '### upload')}} + +## 3. table binding + +{{snippet('cli/table.md', '### bind')}} + + +## 4. table information query + +{{snippet('cli/table.md', '### info')}} + +## 5. Delete table data + +{{snippet('cli/table.md', '### delete')}} + + + +## 6. Download data + +{{snippet('cli/data.md', '### download')}} + + + +## 7. reader component + +**Brief description:** + +- The reader component is a data input component of fate; +- The reader component converts input data into data of the specified storage type; + +**Parameter configuration**: + +The input table of the reader is configured in the conf when submitting the job: + +```shell +{ + "role": { + "guest": { + "0": { "reader_0": { "table": { "name": "breast_hetero_guest", "namespace": "experiment"} + } + } +} + +``` + +**Component Output** + +The output data storage engine of the component is determined by the configuration file conf/service_conf.yaml, with the following configuration items: + +```yaml +default_engines: + storage: eggroll +``` + +- The computing engine and storage engine have certain support dependencies on each other, the list of dependencies is as follows. + + | computing_engine | storage_engine | + | :--------------- | :---------------------------- | + | standalone | standalone | + | eggroll | eggroll | + | spark | hdfs(distributed), localfs(standalone) | + +- The reader component's input data storage type supports: eggroll, hdfs, localfs, mysql, path, etc; +- reader component output data type is determined by default_engines.storage configuration (except for path) + diff --git a/doc/fate_flow_http_api.md b/doc/fate_flow_http_api.md new file mode 100644 index 000000000..7e531cd5c --- /dev/null +++ b/doc/fate_flow_http_api.md @@ -0,0 +1,55 @@ +# REST API + +## 1. Description + +## 2. Interface Authentication + +Flow HTTP API added signature authentication in 1.7.0. If `http_app_key` and `http_secret_key` are set in the configuration file, all requests sent to Flow will need to add the following header + +`TIMESTAMP`: Unix timestamp in milliseconds, e.g. `1634890066095` means `2021-10-22 16:07:46 GMT+0800`, note that the difference between this time and the current time of the server cannot exceed 60 seconds + +`NONCE`: random string, can use UUID, such as `782d733e-330f-11ec-8be9-a0369fa972af` + +`APP_KEY`: must be consistent with `http_app_key` in the Flow configuration file + +`SIGNATURE`: signature generated based on `http_secret_key` and the request parameters in the Flow configuration file + +### 2.1 Signature generation method + +- Combine the following elements in order + +`TIMESTAMP` + +`NONCE` + +`APP_KEY` + +request path + query parameters, if there are no query parameters then the final `? `, such as `/v1/job/submit` or `/v1/data/upload?table_name=dvisits_hetero_guest&namespace=experiment` + +If `Content-Type` is `application/json`, then it is the original JSON, i.e. the request body; if not, this item is filled with the empty string + +If `Content-Type` is `application/x-www-form-urlencoded` or `multipart/form-data`, all parameters need to be sorted alphabetically and `urlencode`, refer to RFC 3986 (i.e. except `a-zA-Z0-9- . _~`), note that the file does not participate in the signature; if not, this item is filled with the empty string + +- Concatenate all parameters with the newline character `\n` and encode them in `ASCII`. + +- Use the `HMAC-SHA1` algorithm to calculate the binary digest using the `http_secret_key` key in the Flow configuration file + +- Encode the binary digest using base64 + +### 2.2. Example + +You can refer to the signature method of [Fate SDK](https://github.com/FederatedAI/FATE/blob/develop-1.7/python/fate_client/flow_sdk/client/base.py#L63) or the proofreading method of [Fate Flow](https://github.com/FederatedAI/FATE/blob/develop-1.7/python/fate_client/flow_sdk/client/base.py#L63) (https://github.com/FederatedAI/FATE-Flow/blob/develop-1.7.0/python/fate_flow/apps/__init__.py#L104) for the checksum method + +### 2.3. Error codes + +`400 Bad Request` request body has both json and form + +`401 Unauthorized` Missing one or more header(s) + +`400 Invalid TIMESTAMP` `TIMESTAMP` could not be parsed + +`425 TIMESTAMP is more than 60 seconds away from the server time` The `TIMESTAMP` in the header is more than 60 seconds away from the server time + +`401 Unknown APP_KEY` header in `APP_KEY` does not match `http_app_key` in the Flow configuration file + +`403 Forbidden` Signature verification failed diff --git a/doc/fate_flow_http_api.zh.md b/doc/fate_flow_http_api.zh.md index 66af548be..528d3b204 100644 --- a/doc/fate_flow_http_api.zh.md +++ b/doc/fate_flow_http_api.zh.md @@ -1,6 +1,4 @@ -# REST API说明 - -[TOC] +# REST API ## 1. 说明 @@ -55,4 +53,3 @@ Flow HTTP API 在 1.7.0 新增了签名鉴权,如果在配置文件里设置 `401 Unknown APP_KEY` header 中的 `APP_KEY` 与 Flow 配置文件中的 `http_app_key` 不一致 `403 Forbidden` 签名校验失败 - diff --git a/doc/fate_flow_job_scheduling.md b/doc/fate_flow_job_scheduling.md new file mode 100644 index 000000000..2b016a534 --- /dev/null +++ b/doc/fate_flow_job_scheduling.md @@ -0,0 +1,728 @@ +# Multi-Party Job Scheduling + +## 1. Description + +Mainly describes how to submit a federated learning job using `FATE Flow` and observe the use of + +## 2. Job submission + +- Build a federated learning job and submit it to the scheduling system for execution +- Two configuration files are required: job dsl and job conf +- job dsl configures the running components: list, input-output relationships +- job conf configures the component execution parameters, system operation parameters + +{{snippet('cli/job.md', '### submit')}} + +## 3. Job DSL configuration description + +The configuration file of DSL is in json format, in fact, the whole configuration file is a json object (dict). + +### 3.1 Component List + +**Description** The first level of this dict is `components`, which indicates the modules that will be used by this job. +**Example** + +```json +{ + "components" : { + ... + } +} +``` + +Each individual module is defined under "components", e.g. + +```json +"data_transform_0": { + "module": "DataTransform", + "input": { + "data": { + "data": [ + "reader_0.train_data" + ] + } + }, + "output": { + "data": ["train"], + "model": ["model"] + } + } +``` + +All data needs to be fetched from the data store via the **Reader** module, note that this module only has the output `output` + +```json +"reader_0": { + "module": "Reader", + "output": { + "data": ["train"] + } +} +``` + +### 3.2 Modules + +**Description** Used to specify the components to be used, all optional module names refer to. +**Example** + +```json +"hetero_feature_binning_1": { + "module": "HeteroFeatureBinning", + ... +} +``` + +### 3.3 Inputs + +**Implications** Upstream inputs, divided into two input types, data and model. + +#### data input + +**Description** Upstream data input, divided into three input types. + + > 1. data: generally used in the data-transform module, feature_engineering module or + > evaluation module. + > 2. train_data: Generally used in homo_lr, hetero_lr and secure_boost + > modules. If the train_data field is present, then the task will be recognized as a fit task + > validate_data: If the train_data + > field is present, then the field is optional. If you choose to keep this field, the data pointed to will be used as the + > validation set + > 4. test_data: Used as prediction data, if provided, along with model input. + +#### model_input + +**Description** Upstream model input, divided into two input types. + 1. model: Used for model input of the same type of component. For example, hetero_binning_0 will fit the model, and then + hetero_binning_1 will use the output of hetero_binning_0 for predict or + transform. code example. + +```json + "hetero_feature_binning_1": { + "module": "HeteroFeatureBinning", + "input": { + "data": { + "data": [ + "data_transform_1.validate_data" + ] + }, + "model": [ + "hetero_feature_binning_0.fit_model" + ] + }, + "output": { + "data": ["validate_data" ], + "model": ["eval_model"] + } + } +``` + 2. isometric_model: Used to specify the model input of the inherited upstream component. For example, the upstream component of feature selection is + feature binning, it will use the information of feature binning as the feature + Code example. +```json + "hetero_feature_selection_0": { + "module": "HeteroFeatureSelection", + "input": { + "data": { + "data": [ + "hetero_feature_binning_0.train" + ] + }, + "isometric_model": [ + "hetero_feature_binning_0.output_model" + ] + }, + "output": { + "data": [ "train" ], + "model": ["output_model"] + } + } +``` + +### 3.4 Output + +**Description** Output, like input, is divided into data and model output + +#### data output + +**Description** Data output, divided into four output types. + +1. data: General module data output +2. train_data: only for Data Split +3. validate_data: Only for Data Split +4. test_data: Data Split only + +#### Model Output + +**Description** Model output, using model only + +### 3.5 Component Providers + +Since FATE-Flow version 1.7.0, the same FATE-Flow system supports loading multiple component providers, i.e. providers, which provide several components, and the source provider of the component can be configured when submitting a job + +**Description** Specify the provider, support global specification and individual component specification; if not specified, the default provider: `fate@$FATE_VERSION` + +**Format** `provider_name@$provider_version` + +**Advanced** You can register a new provider through the component registration CLI, currently supported providers: fate and fate_sql, please refer to [FATE Flow Component Center](./fate_flow_component_registry.md) + +**Example** + +```json +{ + "provider": "fate@1.7.0", + "components": { + "reader_0": { + "module": "Reader", + "output": { + "data": [ + "table" + ] + } + }, + "dataio_0": { + "module": "DataIO", + "provider": "fate@1.7.0", + "input": { + "data": { + "data": [ + "reader_0.table" + ] + } + }, + "output": { + "data": [ + "train" + ], + "model": [ + "dataio" + ] + }, + "need_deploy": true + }, + "hetero_feature_binning_0": { + "module": "HeteroFeatureBinning", + "input": { + "data": { + "data": [ + "dataio_0.train" + ] + } + }, + "output": { + "data": [ + "train" + ], + "model": [ + "hetero_feature_binning" + ] + } + } + } +} +``` + +## 4. Job Conf Configuration Description + +Job Conf is used to set the information of each participant, the parameters of the job and the parameters of each component. The contents include the following. + +### 4.1 DSL Version + +**Description** Configure the version, the default is not 1, it is recommended to configure 2 +**Example** +```json +"dsl_version": "2" +``` + +### 4.2 Job participants + +#### initiating party + +**Description** The role and party_id of the assignment initiator. +**Example** +```json +"initiator": { + "role": "guest", + "party_id": 9999 +} +``` + +#### All participants + +**Description** Information about each participant. +**Description** In the role field, each element represents a role and the party_id that assumes that role. party_id for each role + The party_id of each role is in the form of a list, since a task may involve multiple parties in the same role. +**Example** + +```json +"role": { + "guest": [9999], + "host": [10000], + "arbiter": [10000] +} +``` + +### 4.3 System operation parameters + +**Description** + Configure the main system parameters for job runtime + +#### Parameter application scope policy setting + +**Apply to all participants, use the common scope identifier +**Apply to only one participant, use the role scope identifier, use (role:)party_index to locate the specified participant, direct + +```json +"common": { +} + +"role": { + "guest": { + "0": { + } + } +} +``` + +The parameters under common are applied to all participants, and the parameters under role-guest-0 configuration are applied to the participants under the subscript 0 of the guest role. +Note that the current version of the system operation parameters are not strictly tested for application to only one participant, so it is recommended to use common as a preference. + +#### Supported system parameters + +| Configuration | Default | Supported | Description | +| ----------------------------- | --------------------- | ------------------------------- | ------------------------------------------------------------------------------------------------- | +| job_type | train | train, predict | task_cores | +| task_cores | 4 | positive_integer | total_cpu_cores_applied_to_job | +| task_parallelism | 1 | positive_integer | task_parallelism | +| computing_partitions | number of cpu cores allocated to task | positive integer | number of partitions in the data table at computation time | +| eggroll_run | none | processors_per_node, etc. | eggroll computing engine related configuration parameters, generally do not need to be configured, from task_cores automatically calculated, if configured, task_cores parameters do not take effect | +| spark_run | none | num-executors, executor-cores, etc. | spark compute engine related configuration parameters, generally do not need to be configured, automatically calculated by task_cores, if configured, task_cores parameters do not take effect | +| rabbitmq_run | None | queue, exchange, etc. | Configuration parameters for rabbitmq to create queue, exchange, etc., which are generally not required and take the system defaults. +| pulsar_run | none | producer, consumer, etc. | The configuration parameters for pulsar to create producer and consumer. | +| federated_status_collect_type | PUSH | PUSH, PULL | Multi-party run status collection mode, PUSH means that each participant actively reports to the initiator, PULL means that the initiator periodically pulls from each participant. +| timeout | 259200 (3 days) | positive integer | task_timeout,unit_second | +| audo_retries | 3 | positive integer | maximum number of retries per task failure | +| model_id | \- | \- | The model id to be filled in for prediction tasks. +| model_version | \- | \- | Model version, required for prediction tasks + +1. there is a certain support dependency between the computation engine and the storage engine +2. developers can implement their own adapted engines, and configure the engines in runtime config + +#### reference configuration + +1. no need to pay attention to the compute engine, take the system default cpu allocation compute policy when the configuration + +```json +"job_parameters": { + "common": { + "job_type": "train", + "task_cores": 6, + "task_parallelism": 2, + "computing_partitions": 8, + "timeout": 36000 + } +} +``` + +2. use eggroll as the computing engine, take the configuration when specifying cpu and other parameters directly + +```json +"job_parameters": { + "common": { + "job_type": "train", + "eggroll_run": { + "eggroll.session.processors.per.node": 2 + }, + "task_parallelism": 2, + "computing_partitions": 8, + "timeout": 36000, + } +} +``` + +3. use spark as the computing engine, rabbitmq as the federation engine, take the configuration when specifying the cpu and other parameters directly + +```json +"job_parameters": { + "common": { + "job_type": "train", + "spark_run": { + "num-executors": 1, + "executor-cores": 2 + }, + "task_parallelism": 2, + "computing_partitions": 8, + "timeout": 36000, + "rabbitmq_run": { + "queue": { + "durable": true + }, + "connection": { + "heartbeat": 10000 + } + } + } +} +``` + +4. use spark as the computing engine and pulsar as the federation engine + +```json +"job_parameters": { + "common": { + "spark_run": { + "num-executors": 1, + "executor-cores": 2 + }, + } +} +``` +For more advanced resource-related configuration, please refer to [Resource Management](#4-Resource Management) + +### 4.3 Component operation parameters + +#### Parameter application scope policy setting + +- Apply to all participants, use common scope identifier +- Apply to only one participant, use the role scope identifier, use (role:)party_index to locate the specified participant, directly specified parameters have higher priority than common parameters + +```json +"commom": { +} + +"role": { + "guest": { + "0": { + } + } +} +``` + +where the parameters under the common configuration are applied to all participants, and the parameters under the role-guest-0 configuration indicate that they are applied to the participants under the subscript 0 of the guest role +Note that the current version of the component runtime parameter already supports two application scope policies + +#### Reference Configuration + +- For the `intersection_0` and `hetero_lr_0` components, the runtime parameters are placed under the common scope and are applied to all participants +- The operational parameters of `reader_0` and `data_transform_0` components are configured specific to each participant, because usually the input parameters are not consistent across participants, so usually these two components are set by participant +- The above component names are defined in the DSL configuration file + +```json +"component_parameters": { + "common": { + "intersection_0": { + "intersect_method": "raw", + "sync_intersect_ids": true, + "only_output_key": false + }, + "hetero_lr_0": { + "penalty": "L2", + "optimizer": "rmsprop", + "alpha": 0.01, + "max_iter": 3, + "batch_size": 320, + "learning_rate": 0.15, + "init_param": { + "init_method": "random_uniform" + } + } + }, + "role": { + "guest": { + "0": { + "reader_0": { + "table": {"name": "breast_hetero_guest", "namespace": "experiment"} + }, + "data_transform_0":{ + "with_label": true, + "label_name": "y", + "label_type": "int", + "output_format": "dense" + } + } + }, + "host": { + "0": { + "reader_0": { + "table": {"name": "breast_hetero_host", "namespace": "experiment"} + }, + "data_transform_0":{ + "with_label": false, + "output_format": "dense" + } + } + } + } +} +``` + +## 5. Multi-Host Configuration + +Multi-Host task should list all host information under role + +**Example**: + +```json +"role": { + "guest": [ + 10000 + ], + "host": [ + 10000, 10001, 10002 + ], + "arbiter": [ + 10000 + ] +} +``` + +The different configurations for each host should be listed separately under their respective corresponding modules + +**Example**: + +```json +"component_parameters": { + "role": { + "host": { + "0": { + "reader_0": { + "table": + { + "name": "hetero_breast_host_0", + "namespace": "hetero_breast_host" + } + } + }, + "1": { + "reader_0": { + "table": + { + "name": "hetero_breast_host_1", + "namespace": "hetero_breast_host" + } + } + }, + "2": { + "reader_0": { + "table": + { + "name": "hetero_breast_host_2", + "namespace": "hetero_breast_host" + } + } + } + } + } +} +``` + +## 6. Predictive Task Configuration + +### 6.1 Description + +DSL V2 does not automatically generate prediction dsl for the training task. Users need to deploy the modules in the required model using `Flow Client` first. +For detailed command description, please refer to [fate_flow_client](./fate_flow_client.md) + +```bash +flow model deploy --model-id $model_id --model-version $model_version --cpn-list ... +``` + +Optionally, the user can add new modules to the prediction dsl, such as `Evaluation` + +### 6.2 Sample + +Training dsl. + +```json +"components": { + "reader_0": { + "module": "Reader", + "output": { + "data": [ + "data" + ] + } + }, + "data_transform_0": { + "module": "DataTransform", + "input": { + "data": { + "data": [ + "reader_0.data" + ] + } + }, + "output": { + "data": [ + "data" + ], + "model": [ + "model" + ] + } + }, + "intersection_0": { + "module": "Intersection", + "input": { + "data": { + "data": [ + "data_transform_0.data" + ] + } + }, + "output": { + "data":[ + "data" + ] + } + }, + "hetero_nn_0": { + "module": "HeteroNN", + "input": { + "data": { + "train_data": [ + "intersection_0.data" + ] + } + }, + "output": { + "data": [ + "data" + ], + "model": [ + "model" + ] + } + } +} +``` + +Prediction dsl: + +```json +"components": { + "reader_0": { + "module": "Reader", + "output": { + "data": [ + "data" + ] + } + }, + "data_transform_0": { + "module": "DataTransform", + "input": { + "data": { + "data": [ + "reader_0.data" + ] + } + }, + "output": { + "data": [ + "data" + ], + "model": [ + "model" + ] + } + }, + "intersection_0": { + "module": "Intersection", + "input": { + "data": { + "data": [ + "data_transform_0.data" + ] + } + }, + "output": { + "data":[ + "data" + ] + } + }, + "hetero_nn_0": { + "module": "HeteroNN", + "input": { + "data": { + "train_data": [ + "intersection_0.data" + ] + } + }, + "output": { + "data": [ + "data" + ], + "model": [ + "model" + ] + } + }, + "evaluation_0": { + "module": "Evaluation", + "input": { + "data": { + "data": [ + "hetero_nn_0.data" + ] + } + }, + "output": { + "data": [ + "data" + ] + } + } +} +``` + +## 7. Job reruns + +In `1.5.0`, we started to support re-running a job, but only failed jobs are supported. +Version `1.7.0` supports rerunning of successful jobs, and you can specify which component to rerun from, the specified component and its downstream components will be rerun, but other components will not be rerun + +{{snippet('cli/job.md', '### rerun')}} + +## 8. Job parameter update + +In the actual production modeling process, it is necessary to constantly debug the component parameters and rerun, but not all components need to be adjusted and rerun at this time, so after `1.7.0` version support to modify a component parameter update, and with the `rerun` command on-demand rerun + +{{snippet('cli/job.md', '### parameter-update')}} + +## 9. Job scheduling policy + +- Queuing by commit time +- Currently, only FIFO policy is supported, i.e. the scheduler will only scan the first job each time, if the first job is successful in requesting resources, it will start and get out of the queue, if the request fails, it will wait for the next round of scheduling. + +## 10. dependency distribution + +**Brief description:** + +- Support for distributing fate and python dependencies from client nodes; +- The work node does not need to deploy fate; +- Only fate on spark supports distribution mode in current version; + +**Related parameters configuration**: + +conf/service_conf.yaml: + +```yaml +dependent_distribution: true +``` + +fate_flow/settings.py + +```python +FATE_FLOW_UPDATE_CHECK = False +``` + +**Description:** + +- dependent_distribution: dependent distribution switch;, off by default; when off, you need to deploy fate on each work node, and also fill in the configuration of spark in spark-env.sh to configure PYSPARK_DRIVER_PYTHON and PYSPARK_PYTHON. + +- FATE_FLOW_UPDATE_CHECK: Dependency check switch, turned off by default; it will automatically check if the fate code has changed every time a task is submitted; if it has changed, the fate code dependency will be re-uploaded; + +## 11. More commands + +Please refer to [Job CLI](./cli/job.md) and [Task CLI](./cli/task.md) \ No newline at end of file diff --git a/doc/fate_flow_job_scheduling.zh.md b/doc/fate_flow_job_scheduling.zh.md index a581978bd..9a2ffc610 100644 --- a/doc/fate_flow_job_scheduling.zh.md +++ b/doc/fate_flow_job_scheduling.zh.md @@ -19,7 +19,7 @@ DSL 的配置文件采用 json 格式,实际上,整个配置文件就是一 ### 3.1 组件列表 -**含义** 在这个 dict 的第一级是 `components`,用来表示这个任务将会使用到的各个模块。 +**描述** 在这个 dict 的第一级是 `components`,用来表示这个任务将会使用到的各个模块。 **样例** ```json @@ -62,7 +62,7 @@ DSL 的配置文件采用 json 格式,实际上,整个配置文件就是一 ### 3.2 模块 -**含义** 用来指定使用的组件,所有可选module名称参考: +**描述** 用来指定使用的组件,所有可选module名称参考: **样例** ```json @@ -74,11 +74,11 @@ DSL 的配置文件采用 json 格式,实际上,整个配置文件就是一 ### 3.3 输入 -**含义** 上游输入,分为两种输入类型,分别是数据和模型。 +**描述** 上游输入,分为两种输入类型,分别是数据和模型。 #### 数据输入 -**含义** 上游数据输入,分为三种输入类型: +**描述** 上游数据输入,分为三种输入类型: > 1. data: 一般被用于 data-transform模块, feature_engineering 模块或者 > evaluation 模块 @@ -91,7 +91,7 @@ DSL 的配置文件采用 json 格式,实际上,整个配置文件就是一 #### 模型输入 -**含义** 上游模型输入,分为两种输入类型: +**描述** 上游模型输入,分为两种输入类型: 1. model: 用于同种类型组件的模型输入。例如,hetero_binning_0 会对模型进行 fit,然后 hetero_binning_1 将会使用 hetero_binning_0 的输出用于 predict 或 transform。代码示例: @@ -140,11 +140,11 @@ DSL 的配置文件采用 json 格式,实际上,整个配置文件就是一 ### 3.4 输出 -**含义** 输出,与输入一样,分为数据和模型输出 +**描述** 输出,与输入一样,分为数据和模型输出 #### 数据输出 -**含义** 数据输出,分为四种输出类型: +**描述** 数据输出,分为四种输出类型: 1. data: 常规模块数据输出 2. train_data: 仅用于Data Split @@ -153,13 +153,13 @@ DSL 的配置文件采用 json 格式,实际上,整个配置文件就是一 #### 模型输出 -**含义** 模型输出,仅使用model +**描述** 模型输出,仅使用model ### 3.5 组件Provider FATE-Flow 1.7.0版本开始,同一个FATE-Flow系统支持加载多种且多版本的组件提供方,也即provider,provider提供了若干个组件,提交作业时可以配置组件的来源provider -**含义** 指定provider,支持全局指定以及单个组件指定;若不指定,默认 provider:`fate@$FATE_VERSION` +**描述** 指定provider,支持全局指定以及单个组件指定;若不指定,默认 provider:`fate@$FATE_VERSION` **格式** `provider_name@$provider_version` @@ -227,7 +227,7 @@ Job Conf用于设置各个参与方的信息, 作业的参数及各个组件的 ### 4.1 DSL版本 -**含义** 配置版本,默认不配置为1,建议配置为2 +**描述** 配置版本,默认不配置为1,建议配置为2 **样例** ```json "dsl_version": "2" @@ -237,7 +237,7 @@ Job Conf用于设置各个参与方的信息, 作业的参数及各个组件的 #### 发起方 -**含义** 任务发起方的role和party_id。 +**描述** 任务发起方的role和party_id。 **样例** ```json "initiator": { @@ -248,7 +248,7 @@ Job Conf用于设置各个参与方的信息, 作业的参数及各个组件的 #### 所有参与方 -**含义** 各参与方的信息。 +**描述** 各参与方的信息。 **说明** 在 role 字段中,每一个元素代表一种角色以及承担这个角色的 party_id。每个角色的 party_id 以列表形式存在,因为一个任务可能涉及到多个 party 担任同一种角色。 **样例** @@ -263,7 +263,7 @@ Job Conf用于设置各个参与方的信息, 作业的参数及各个组件的 ### 4.3 系统运行参数 -**含义** +**描述** 配置作业运行时的主要系统参数 #### 参数应用范围策略设置 @@ -520,7 +520,7 @@ Job Conf用于设置各个参与方的信息, 作业的参数及各个组件的 ### 6.1 说明 DSL V2不会自动为训练任务生成预测dsl。 用户需要首先使用`Flow Client`部署所需模型中模块。 -详细命令说明请参考[fate_flow_client](./fate_flow_client.zh.md#deploy) +详细命令说明请参考[fate_flow_client](./fate_flow_client.zh.md) ```bash flow model deploy --model-id $model_id --model-version $model_version --cpn-list ... diff --git a/doc/fate_flow_model_migration.md b/doc/fate_flow_model_migration.md new file mode 100644 index 000000000..1ae82fefc --- /dev/null +++ b/doc/fate_flow_model_migration.md @@ -0,0 +1,205 @@ +# Inter-cluster Model Migration + +The model migration function allows the model file to be copied to a cluster with a different party id and still be available, the following two scenarios require model migration. + +1. the cluster of any of the model generation participants is redeployed and the party id of the cluster is changed after the deployment, e.g. the source participant is arbiter-10000#guest-9999#host-10000, changed to arbiter-10000#guest-99#host-10000 +2. Any one or more of the participants will copy the model file from the source cluster to the target cluster, which needs to be used in the target cluster + +Basics. +1. In the above two scenarios, the participant `party_id` of the model changes, such as `arbiter-10000#guest-9999#host-10000` -> `arbiter-10000#guest-99#host-10000`, or `arbiter-10000#guest -9999#host-10000` -> `arbiter-100#guest-99#host-100` +2. the model's participant `party_id` changes, so `model_id` and the model file involving `party_id` need to be changed +3. The overall process has three steps: copy and transfer the original model file, execute the model migration task on the original model file, and import the new model generated by the model migration task +4. where `execute model migration task on the original model file` is actually a temporary copy of the original model file at the execution, and then modify `model_id` and the content of the model file involving `party_id` according to the configuration, in order to adapt to the new participant `party_id`. +5. All the above steps need to be performed on all new participants, even if the `party_id` of one of the target participants has not changed. +6. the new participant cluster version needs to be greater than or equal to `1.5.1`. + +The migration process is as follows. + +## Transfer the model file + +Please package and transfer the model files (including the directory named by model id) generated by the machine where the source participant fate flow service is located to the machine where the target participant fate flow is located, and please transfer the model files to a fixed directory as follows. + +```bash +$FATE_PROJECT_BASE/model_local_cache +``` + +Instructions: +1. just transfer the folder, if you do the transfer by compressing and packing, please extract the model files to the directory where the model is located after the transfer. +2. Please transfer the model files one by one according to the source participants. + +## Preparation work before migration + +### Instructions + +1. refer to [fate flow client](. /fate_flow_client.zh.md) to install the client fate-client which supports model migration, only fate 1.5.1 and above are supported. + +## Execute the migration task + +### Description +1. Execute the migration task by replacing the source model file with the model_id, model_version and the contents of the model involving `role` and `party_id` according to the migration task configuration file +2. The cluster submitting the task must have completed the above migration preparation + +### 1. Modify the configuration file + +Modify the configuration file of the migration task in the new participant (machine) according to the actual situation, as follows for the migration task example configuration file [migrate_model.json](https://github.com/FederatedAI/FATE-Flow/blob/main/examples/model /migrate_model.json) + +```json +{ + "job_parameters": { + "federated_mode": "SINGLE" + }, + "role": { + "guest": [9999], + "arbiter": [10000], + "host": [10000] + }, + "migrate_initiator": { + "role": "guest", + "party_id": 99 + }, + "migrate_role": { + "guest": [99], + "arbiter": [100], + "host": [100] + }, + "execute_party": { + "guest": [9999], + "arbiter": [10000], + "host": [10000] + }, + "model_id": "arbiter-10000#guest-9999#host-10000#model", + "model_version": "202006171904247702041", + "unify_model_version": "202901_0001" +} +``` + +Please save the above configuration content to a location in the server for modification. + +The following are explanatory notes for the parameters in this configuration. + +1. **`job_parameters`**: The `federated_mode` in this parameter has two optional parameters, which are `MULTIPLE` and `SINGLE`. If set to `SINGLE`, the migration job will be executed only in the party that submitted the migration job, then the job needs to be submitted in all new participants separately; if set to `MULTIPLE`, the job will be distributed to the participants specified in `execute_party` to execute the job, only the new The task will be distributed to the participant specified in `execute_party`, and only needs to be submitted in the new participant as `migrate_initiator`. +2. **`role`**: This parameter fills in the `role` of the participant that generated the original model and its corresponding `party_id` information. +3. **`migrate_initiator`**: This parameter is used to specify the task initiator information of the migrated model, and the initiator's `role` and `party_id` should be specified respectively. +4. **`migrate_role`**: This parameter is used to specify the `role` and `party_id` information of the migrated model. +5. **`execute_party`**: This parameter is used to specify the `role` and `party_id` information of the `party_id` that needs to execute the migration, which is the source cluster `party_id`. +6. **`model_id`**: This parameter is used to specify the `model_id` of the original model to be migrated. +7. **`model_version`**: This parameter is used to specify the `model_version` of the original model that needs to be migrated. +8. **`unify_model_version`**: This parameter is not required, it is used to specify the `model_version` of the new model. If this parameter is not provided, the new model will take the `job_id` of the migrated job as its new `model_version`. + +Examples of the above configuration files are. +1. the source model has guest: 9999, host: 10000, arbiter: 10000, migrate the model to have guest: 99, host: 100, arbiter: 100, and the new initiator as guest: 99 +2. `federated_mode`: `SINGLE`: means that each migration task will be executed only in the cluster where the task is submitted, then the task needs to be submitted in 99 and 100 respectively. +3. For example, if the task is executed in `99`, then `execute_party` is configured as `guest`: [9999]. +4. For example, if the task is executed in `10`, then `execute_party` is configured as "arbiter": [10000], "host": [10000] + + +## 2. Submit the migration task (separate operation in all target clusters) + + +The migration task needs to be submitted using FATE Flow CLI v2. The sample execution command is as follows + +```bash +flow model migrate -c $FATE_FLOW_BASE/examples/model/migrate_model.json +``` + +## 3. Task execution results + +The following is the content of the configuration file for the actual migration task. + +```json +{ + "job_parameters": { + "federated_mode": "SINGLE" + }, + "role": { + "guest": [9999], + "host": [10000] + }, + "migrate_initiator": { + "role": "guest", + "party_id": 99 + }, + "migrate_role": { + "guest": [99], + "host": [100] + }, + "execute_party": { + "guest": [9999], + "host": [10000] + }, + "model_id": "guest-9999#host-10000#model", + "model_version": "202010291539339602784", + "unify_model_version": "fate_migration" +} +``` + +What this task achieves is that the cluster with party_id of 9999 (guest) and 10000 (host) generates a model with model_id of guest-9999#host-10000#model and model_version of 202010291539339602784 modifies the migration generation adaptation The new model with party_id of 99 (guest) and 100 (host) clusters + +The following is the return result of the successful migration. + +```json +{ + "data": { + "detail": { + "guest": { + "9999": { + "retcode": 0, + "retmsg": "Migrating model successfully. the configuration of model has been modified automatically. new model id is: guest-99#host-100#model, Model files can be found at '/data/projects/fate/temp/fate_flow/guest#99#guest-99#host-100#model_fate_migration.zip'.zip. migration.zip'." + } + }, + "host": { + "10000": { + "retcode": 0, + "retmsg": "Migrating model successfully. The configuration of model has been modified automatically, Model files can be found at '/data/projects/fate/temp/fate_flow/host#100#guest-99#host-100#model_fate_migration.zip'.zip. migration.zip'." + } + } + }, + "guest": { + "9999": 0 + }, + "host": { + "10000": 0 + } + }, + "jobId": "202010292152299793981", + "retcode": 0, + "retmsg": "success" +} +``` + +After the task is successfully executed, a compressed file of the migrated model is generated in each machine of the executing party, and the path of this file can be obtained in the returned results. For example, the path of the post-migration model file for the guest side (9999) is: `/data/projects/fate/temp/fate_flow/guest#99#guest-99#host-100#model_fate_migration.zip`, and the path of the post-migration model file for the host side (10000) is: `/data/projects/fate/temp/fate_flow/guest#99#guest-99#host-100#model_fate_migration.zip`, and the path of the The path of the migrated model file is: `/data/projects/fate/temp/fate_flow/host#100#guest-99#host-100#model_fate_migration.zip`. The new model_id and model_version can also be obtained from the return. + +## 4. Transfer files and import (operate separately in all target clusters) + +After the migration task is successful, please manually transfer the newly generated model compression files to the fateflow machines of the target clusters. For example, the new model compression file generated by guest party (99) in point 3 needs to be transferred to the guest (99) machine. The zip file can be placed anywhere on the corresponding machine. Next, you need to configure the model import task, see [import_model.json](https://github.com/FederatedAI/FATE/blob/master/python/fate_flow/) for the configuration file examples/import_model.json). + +The following example describes the configuration file for importing the migrated model in guest (99). + +``` +{ + "role": "guest", + "party_id": 99, + "model_id": "guest-99#host-100#model", + "model_version": "fate_migration", + "file": "/data/projects/fate/python/temp/guest#99#guest-99#host-100#model_fate_migration.zip" +} +``` + +Please fill in the role role, the current party_id, the new model_id and model_version of the migrated model, and the path to the zip file of the migrated model according to the actual situation. + +The following is a sample command to submit an imported model using FATE Flow CLI v2. + +```bash +flow model import -c $FATE_FLOW_BASE/examples/model/import_model.json +``` + +The import is considered successful when it returns the following. + +```json +{ + "retcode": 0, + "retmsg": "success" +} +``` + +The migration task is now complete and the user can use the new model_id and model_version for task submission to perform prediction tasks with the migrated model. diff --git a/doc/fate_flow_model_migration.zh.md b/doc/fate_flow_model_migration.zh.md index e9e744392..0de30a83c 100644 --- a/doc/fate_flow_model_migration.zh.md +++ b/doc/fate_flow_model_migration.zh.md @@ -41,7 +41,7 @@ $FATE_PROJECT_BASE/model_local_cache ### 1. 修改配置文件 -在新参与方(机器)中根据实际情况对迁移任务的配置文件进行修改,如下为迁移任务示例配置文件 [migrate_model.json](fateflow/examples/model/migrate_model.json) +在新参与方(机器)中根据实际情况对迁移任务的配置文件进行修改,如下为迁移任务示例配置文件 [migrate_model.json](https://github.com/FederatedAI/FATE-Flow/blob/main/examples/model/migrate_model.json) ```json { @@ -88,7 +88,7 @@ $FATE_PROJECT_BASE/model_local_cache 上述配置文件举例说明: 1. 源模型的参与方为guest: 9999, host: 10000, arbiter: 10000, 将模型迁移成参与方为guest: 99, host: 100, arbiter: 100, 且新发起方为guest: 99 -2. `federated_mode`: `SINGLE`: 表示每个迁移任务只在提交任务的集群执行任务,那么需要在99、100分别提交任务 +2. `federated_mode`: `SINGLE`: 表示每个迁移任务只在提交任务的集群执行任务,那么需要在99、100分别提交任务 3. 例如在`99`执行,则`execute_party`配置为"guest": [9999] 4. 例如在`10`执行,则`execute_party`配置为"arbiter": [10000], "host": [10000] @@ -102,8 +102,6 @@ $FATE_PROJECT_BASE/model_local_cache flow model migrate -c $FATE_FLOW_BASE/examples/model/migrate_model.json ``` - - ## 3. 任务执行结果 如下为实际迁移任务的配置文件内容: @@ -137,8 +135,6 @@ flow model migrate -c $FATE_FLOW_BASE/examples/model/migrate_model.json 该任务实现的是,将party_id为9999(guest),10000(host)的集群生成的model_id为guest-9999#host-10000#model,model_version为202010291539339602784的模型修改迁移生成适配party_id为99(guest),100(host)集群的新模型 - - 如下为迁移成功的后得到的返回结果: ```json @@ -206,4 +202,4 @@ flow model import -c $FATE_FLOW_BASE/examples/model/import_model.json } ``` -迁移任务至此完成,用户可使用新的model_id及model_version进行任务提交,以利用迁移后的模型执行预测任务。 \ No newline at end of file +迁移任务至此完成,用户可使用新的model_id及model_version进行任务提交,以利用迁移后的模型执行预测任务。 diff --git a/doc/fate_flow_model_registry.md b/doc/fate_flow_model_registry.md new file mode 100644 index 000000000..9cbbb3081 --- /dev/null +++ b/doc/fate_flow_model_registry.md @@ -0,0 +1,78 @@ +# Multi-Party Federated Model Registry + +## 1. Description + +Models trained by FATE are automatically saved locally and recorded in the FATE-Flow database. models saved after each component run are called Pipeline models, and models saved at regular intervals while the component is running are called Checkpoint models. checkpoint models can also be used for retrying after a component run is unexpectedly interrupted The Checkpoint model can also be used for "breakpoints" when a component is retrying after an unexpected interruption. + +Checkpoint model support has been added since 1.7.0 and is not saved by default. To enable it, add the callback `ModelCheckpoint` to the DSL. + +### Local disk storage + +- Pipeline models are stored in `model_local_cache///variables/data//`. + +- Checkpoint models are stored in `model_local_cache///checkpoint//#`. + +### Remote storage engine + +Local disk is not reliable, so there is a risk of losing models. FATE-Flow supports exporting models to specified storage engines, importing from specified storage engines, and pushing models to engine storage when publishing models automatically. + +The storage engine supports Tencent Cloud Object Storage, MySQL and Redis, please refer to [Storage Engine Configuration](#5 - Storage Engine Configuration) + +## 2. Model + +{{snippet('cli/model.md', '## Model')}} + +## 3. Checkpoint + +{{snippet('cli/checkpoint.md', '## Checkpoint')}} + +## 4. Storage engine configuration + +### `enable_model_store` + +This option affects API `/model/load`. + +Automatic upload models to the model store if it exists locally but does not exist in the model storage, or download models from the model store if it does not exist locally but does not exist in the model storage. + +This option does not affect API `/model/store` or `/model/restore`. + +### `model_store_address` + +This config defines which storage engine to use. + +#### Tencent Cloud Object Storage + +```yaml +storage: tencent_cos +# get these configs from Tencent Cloud console +Region: +SecretId: +SecretKey: +Bucket: +``` + +#### MySQL + +```yaml +storage: mysql +database: fate_model +user: fate +password: fate +host: 127.0.0.1 +port: 3306 +# other optional configs send to the engine +max_connections: 10 +stale_timeout: 10 +``` + +#### Redis + +```yaml +storage: redis +host: 127.0.0.1 +port: 6379 +db: 0 +password: +# the expiry time of keys, in seconds. defaults None (no expiry time) +ex: +``` diff --git a/doc/fate_flow_monitoring.md b/doc/fate_flow_monitoring.md new file mode 100644 index 000000000..e7f8ba441 --- /dev/null +++ b/doc/fate_flow_monitoring.md @@ -0,0 +1,5 @@ +# Real-time Monitoring + +## 1. Description + +Mainly introduces `FATE Flow` to monitor job running status, Worker execution status, etc., in real time to ensure final consistency \ No newline at end of file diff --git a/doc/fate_flow_resource_management.md b/doc/fate_flow_resource_management.md new file mode 100644 index 000000000..4928d71e5 --- /dev/null +++ b/doc/fate_flow_resource_management.md @@ -0,0 +1,102 @@ +# Multi-Party Resource Coordination + +## 1. Description + +Resources refer to the basic engine resources, mainly CPU resources and memory resources of the compute engine, CPU resources and network resources of the transport engine, currently only the management of CPU resources of the compute engine is supported + +## 2. Total resource allocation + +- The current version does not automatically get the resource size of the base engine, so you configure it through the configuration file `$FATE_PROJECT_BASE/conf/service_conf.yaml`, that is, the resource size of the current engine allocated to the FATE cluster +- `FATE Flow Server` gets all the base engine information from the configuration file and registers it in the database table `t_engine_registry` when it starts. +- `FATE Flow Server` has been started and the resource configuration can be modified by restarting `FATE Flow Server` or by reloading the configuration using the command: `flow server reload`. +- `total_cores` = `nodes` * `cores_per_node` + +**Example** + +fate_on_standalone: is for executing a standalone engine on the same machine as `FATE Flow Server`, generally used for fast experiments, `nodes` is generally set to 1, `cores_per_node` is generally the number of CPU cores of the machine, also can be moderately over-provisioned + +```yaml +fate_on_standalone: + standalone: + cores_per_node: 20 + nodes: 1 +``` + +fate_on_eggroll: configured based on the actual deployment of `EggRoll` cluster, `nodes` denotes the number of `node manager` machines, `cores_per_node` denotes the average number of CPU cores per `node manager` machine + +```yaml +fate_on_eggroll: + clustermanager: + cores_per_node: 16 + nodes: 1 + rollsite: + host: 127.0.0.1 + port: 9370 +``` + +fate_on_spark: configured based on the resources allocated to the `FATE` cluster in the `Spark` cluster, `nodes` indicates the number of `Spark` nodes, `cores_per_node` indicates the average number of CPU cores per node allocated to the `FATE` cluster + +```yaml +fate_on_spark: + spark: + # default use SPARK_HOME environment variable + home: + cores_per_node: 20 + nodes: 2 +``` + +Note: Please make sure that the `Spark` cluster allocates the corresponding amount of resources to the `FATE` cluster, if the `Spark` cluster allocates less resources than the resources configured in `FATE` here, then it will be possible to submit the `FATE` job, but when `FATE Flow` submits the task to the `Spark` cluster, the task will not actually execute because the `Spark` cluster has insufficient resources. Insufficient resources, the task is not actually executed + +## 3. Job request resource configuration + +We generally use ``task_cores`'' and ``task_parallelism`' to configure job request resources, such as + +```json +{ +"job_parameters": { + "common": { + "job_type": "train", + "task_cores": 6, + "task_parallelism": 2, + "computing_partitions": 8, + "timeout": 36000 + } + } +} +``` + +The total resources requested by the job are `task_cores` * `task_parallelism`. When creating a job, `FATE Flow` will distribute the job to each `party` based on the above configuration, running role, and the engine used by the party (via `$FATE_PROJECT_BASE/conf/service_conf .yaml#default_engines`), the actual parameters will be calculated as follows + +## 4. The process of calculating the actual parameter adaptation for resource requests + +- Calculate `request_task_cores`: + - guest, host. + - `request_task_cores` = `task_cores` + - arbiter, considering that the actual operation consumes very few resources: `request_task_cores + - `request_task_cores` = 1 + +- Further calculate `task_cores_per_node`. + - `task_cores_per_node"` = max(1, `request_task_cores` / `task_nodes`) + + - If `eggroll_run` or `spark_run` configuration resource is used in the above `job_parameters`, then the `task_cores` configuration is invalid; calculate `task_cores_per_node`. + - `task_cores_per_node"` = eggroll_run["eggroll.session.processors.per.node"] + - `task_cores_per_node"` = spark_run["executor-cores"] + +- The parameter to convert to the adaptation engine (which will be presented to the compute engine for recognition when running the task). + - fate_on_standalone/fate_on_eggroll: + - eggroll_run["eggroll.session.processors.per.node"] = `task_cores_per_node` + - fate_on_spark: + - spark_run["num-executors"] = `task_nodes` + - spark_run["executor-cores"] = `task_cores_per_node` + +- The final calculation can be seen in the job's `job_runtime_conf_on_party.json`, typically in `$FATE_PROJECT_BASE/jobs/$job_id/$role/$party_id/job_runtime_on_party_conf.json ` + +## 5. Resource Scheduling Policy +- `total_cores` see [total_resource_allocation](#2-total-resource-allocation) +- `apply_cores` see [job_request_resource_configuration](#3-job-request-resource-configuration), `apply_cores` = `task_nodes` * `task_cores_per_node` * `task_parallelism` +- If all participants apply for resources successfully (total_cores - apply_cores) > 0, then the job applies for resources successfully +- If not all participants apply for resources successfully, then send a resource rollback command to the participants who have applied successfully, and the job fails to apply for resources + +## 6. Related commands + +{{snippet('cli/resource.md')}} diff --git a/doc/fate_flow_resource_management.zh.md b/doc/fate_flow_resource_management.zh.md index bcc52d8b3..43abf2490 100644 --- a/doc/fate_flow_resource_management.zh.md +++ b/doc/fate_flow_resource_management.zh.md @@ -4,7 +4,7 @@ 资源指基础引擎资源,主要指计算引擎的CPU资源和内存资源,传输引擎的CPU资源和网络资源,目前仅支持计算引擎CPU资源的管理 -## 1. 总资源配置 +## 2. 总资源配置 - 当前版本未实现自动获取基础引擎的资源大小,因此你通过配置文件`$FATE_PROJECT_BASE/conf/service_conf.yaml`进行配置,也即当前引擎分配给FATE集群的资源大小 - `FATE Flow Server`启动时从配置文件获取所有基础引擎信息并注册到数据库表`t_engine_registry` @@ -15,7 +15,7 @@ fate_on_standalone:是为执行在`FATE Flow Server`同台机器的单机引擎,一般用于快速实验,`nodes`一般设置为1,`cores_per_node`一般为机器CPU核数,也可适量超配 -```json +```yaml fate_on_standalone: standalone: cores_per_node: 20 @@ -24,7 +24,7 @@ fate_on_standalone: fate_on_eggroll:依据`EggRoll`集群实际部署情况进行配置,`nodes`表示`node manager`的机器数量,`cores_per_node`表示平均每台`node manager`机器CPU核数 -```json +```yaml fate_on_eggroll: clustermanager: cores_per_node: 16 @@ -36,7 +36,7 @@ fate_on_eggroll: fate_on_spark:依据在`Spark`集群中配置给`FATE`集群的资源进行配置,`nodes`表示`Spark`节点数量,`cores_per_node`表示平均每个节点分配给`FATE`集群的CPU核数 -```json +```yaml fate_on_spark: spark: # default use SPARK_HOME environment variable @@ -47,11 +47,12 @@ fate_on_spark: 注意:请务必确保在`Spark`集群分配了对应数量的资源于`FATE`集群,若`Spark`集群分配资源少于此处`FATE`所配置的资源,那么会出现可以提交`FATE`作业,但是`FATE Flow`将任务提交至`Spark`集群时,由于`Spark`集群资源不足,任务实际不执行 -## 2. 作业申请资源配置 +## 3. 作业申请资源配置 我们一般使用`task_cores`和`task_parallelism`进行配置作业申请资源,如: ```json +{ "job_parameters": { "common": { "job_type": "train", @@ -59,13 +60,14 @@ fate_on_spark: "task_parallelism": 2, "computing_partitions": 8, "timeout": 36000 + } } } ``` 作业申请的总资源为`task_cores` * `task_parallelism`,创建作业时,`FATE Flow`分发作业到各`party`时会依据上述配置、运行角色、本方使用引擎(通过`$FATE_PROJECT_BASE/conf/service_conf.yaml#default_engines`),适配计算出实际参数,如下 -## 3. 资源申请实际参数适配计算过程 +## 4. 资源申请实际参数适配计算过程 - 计算`request_task_cores`: - guest、host: @@ -89,13 +91,13 @@ fate_on_spark: - 最终计算结果可以查看job的`job_runtime_conf_on_party.json`,一般在`$FATE_PROJECT_BASE/jobs/$job_id/$role/$party_id/job_runtime_on_party_conf.json` -## 4. 资源调度策略 +## 5. 资源调度策略 -- `total_cores`见上述[总资源配置](#41-总资源配置) -- `apply_cores`见上述[作业申请资源配置](#42-作业申请资源配置),`apply_cores` = `task_nodes` * `task_cores_per_node` * `task_parallelism` +- `total_cores`见上述[总资源配置](#2-总资源配置) +- `apply_cores`见上述[作业申请资源配置](#3-作业申请资源配置),`apply_cores` = `task_nodes` * `task_cores_per_node` * `task_parallelism` - 若所有参与方均申请资源成功(total_cores - apply_cores) > 0,则该作业申请资源成功 - 若非所有参与方均申请资源成功,则发送资源回滚指令到已申请成功的参与方,该作业申请资源失败 -## 5. 相关命令 +## 6. 相关命令 {{snippet('cli/resource.zh.md')}} diff --git a/doc/fate_flow_server_operation.md b/doc/fate_flow_server_operation.md new file mode 100644 index 000000000..957575281 --- /dev/null +++ b/doc/fate_flow_server_operation.md @@ -0,0 +1,13 @@ +# Server Operation + +## 1. Description + +Starting from version `1.7.0`, we provide some maintenance functions for `FATE Flow Server`, which will be further enhanced in future versions. + +## 2. View version information + +{{snippet('cli/server.md', '### versions')}} + +## 3. Reload the configuration file + +{{snippet('cli/server.md', '### reload')}} \ No newline at end of file diff --git a/doc/fate_flow_service_registry.md b/doc/fate_flow_service_registry.md new file mode 100644 index 000000000..acdb42d14 --- /dev/null +++ b/doc/fate_flow_service_registry.md @@ -0,0 +1,24 @@ +# Service Registry + +## 1. Description + +FATE-Flow interacts with FATE-Serving through Apache ZooKeeper. If `use_registry` is enabled in the configuration, Flow registers model download URLs with ZooKeeper when it starts, and Serving can get the models through these URLs. + +Likewise, Serving registers its own address with ZooKeeper, which Flow will fetch to communicate with. If `use_registry` is not enabled, Flow will try to communicate with the set `servings` address in the configuration file. + +## 2. Configuring the ZooKeeper service + +```yaml +zookeeper: + hosts: + - 127.0.0.1:2181 + use_acl: false + user: fate + password: fate +``` + +## 3. ZNode + +- FATE-Flow: `/FATE-SERVICES/flow/online/transfer/providers` + +- FATE-Serving: `/FATE-SERVICES/serving/online/publishLoad/providers` diff --git a/doc/fate_flow_tracking.md b/doc/fate_flow_tracking.md new file mode 100644 index 000000000..0190872db --- /dev/null +++ b/doc/fate_flow_tracking.md @@ -0,0 +1,49 @@ +# Data Flow Tracking + +## 1. Description + +## 2. Task output indicators + +## 2.1 List of metrics + +{{snippet('cli/tracking.md', '### metrics')}} + +### 2.2 All metrics + +{{snippet('cli/tracking.md', '### metric-all')}} + +## 3. Task run parameters + +{{snippet('cli/tracking.md', '### parameters')}} + +## 4. Task output data + +### 4.1 Download output data + +{{snippet('cli/tracking.md', '### output-data')}} + +### 4.2 Get the name of the data table where the output data is stored + +{{snippet('cli/tracking.md', '### output-data-table')}} + +## 5. Task output model + +{{snippet('cli/tracking.md', '### output-model')}} + +## 6. Task output summary + +{{snippet('cli/tracking.md', '### get-summary')}} + +## 7. Dataset usage tracking + +Tracing source datasets and their derived datasets, such as component task output datasets + +### 7.1 Source table query + +{{snippet('cli/tracking.md', '### tracking-source')}} + +### 7.2 Querying with table tasks + +{{snippet('cli/tracking.md', '### tracking-job')}} + +## 8. Developing the API \ No newline at end of file diff --git a/doc/images/fate_arch.png b/doc/images/fate_arch.png index 510318054..bd8b2eda6 100644 Binary files a/doc/images/fate_arch.png and b/doc/images/fate_arch.png differ diff --git a/doc/images/fate_flow_arch.png b/doc/images/fate_flow_arch.png index 499acf6aa..2bb3e3e3d 100644 Binary files a/doc/images/fate_flow_arch.png and b/doc/images/fate_flow_arch.png differ diff --git a/doc/images/fate_flow_logical_arch.png b/doc/images/fate_flow_logical_arch.png index 578019618..4c7677dac 100644 Binary files a/doc/images/fate_flow_logical_arch.png and b/doc/images/fate_flow_logical_arch.png differ diff --git a/doc/images/fate_flow_pipelined_model.png b/doc/images/fate_flow_pipelined_model.png index 7208cb21e..ae20a0105 100644 Binary files a/doc/images/fate_flow_pipelined_model.png and b/doc/images/fate_flow_pipelined_model.png differ diff --git a/doc/mkdocs/theme/overrides/home.html b/doc/mkdocs/theme/overrides/home.html index d6e2b70e6..adec1499a 100644 --- a/doc/mkdocs/theme/overrides/home.html +++ b/doc/mkdocs/theme/overrides/home.html @@ -56,11 +56,29 @@
-

FATE-Flow

-

Secure, Privacy-preserving Machine Learning Multi-Party Schduling System +

{{page.title}}

+

FATE Flow Base on: +

    +
  • Shared-State Scheduling Architecture
  • +
  • Secure Multi-Party Communication
  • +
+

+

Providing production-level service capabilities: +

    +
  • Data Access
  • +
  • Multi-Party Federated Scheduling
  • +
  • Multi-Party Resource Coordination
  • +
  • Data Flow Tracking
  • +
  • Real-time Monitoring
  • +
  • Component Registry
  • +
  • Multi-Party Federated Model Registry
  • +
  • Multi-Party Cooperation Authority Management
  • +
  • CLI, REST API, Python API
  • +

Learn More GitHub +
diff --git a/doc/swagger.zh.md b/doc/swagger.zh.md new file mode 100644 index 000000000..860c06477 --- /dev/null +++ b/doc/swagger.zh.md @@ -0,0 +1,3 @@ +## Swagger API + +!!swagger swagger.yaml!! \ No newline at end of file diff --git a/doc/system_operational.md b/doc/system_operational.md new file mode 100644 index 000000000..97467029d --- /dev/null +++ b/doc/system_operational.md @@ -0,0 +1,75 @@ +# System Operation + +## 1. Description + +## 2. Log cleaning + +## 2.1 Job logs (N=14 days) + +- Machine: the machine where fate flow is located +- Directory: ${FATE_PROJECT_BASE}/fateflow/logs/ +- Rule: directory starts with $jobid, clean up the data before $jobid is **N days** +- Reference command. + +```bash +rm -rf ${FATE_PROJECT_BASE}/fateflow/logs/20200417* +``` + +### 2.2 EggRoll Session logs (N=14 days) + +- Machine: eggroll node +- Directory: ${FATE_PROJECT_BASE}/eggroll/logs/ +- Rule: directory starts with $jobid, clean up data before $jobid is **N days** +- Reference command. + +```bash +rm -rf ${FATE_PROJECT_BASE}/eggroll/logs/20200417* +``` + +### 2.3 fateflow system logs (N=14 days) + +- Machine: fate flow machine +- Directory: ${FATE_PROJECT_BASE}/logs/fate_flow/ +- Rule: Log file ends with yyyy-dd-mm, clean up data before **N days** +- Archive: log file ends with yyyy-dd-mm, archive to keep 180 days of logs +- Reference command. + +```bash +rm -rf ${FATE_PROJECT_BASE}/logs/fate_flow/fate_flow_stat.log.2020-12-15 +``` + +### 2.4 EggRoll system logs (N=14 days) + +- Machine: eggroll deployment machine +- Directory: ${FATE_PROJECT_BASE}/eggroll/logs/eggroll +- Rule: directory is yyyy/mm/dd, clean up data before **N days** +- Archive: directory is yyyy/mm/dd, archive the logs retained for 180 days +- Reference command. + +```bash +rm -rf ${FATE_PROJECT_BASE}/eggroll/logs/2020/12/15/ +``` + +## 3. Data cleanup + +### 3.1 Calculate temporary data (N=2 days) + +- Machine: eggroll node +- Directory: ${FATE_PROJECT_BASE}/eggroll/data/IN_MEMORY +- Rule: namespace starts with $jobid, clean up data before $jobid is **N days** +- Reference command. + +```bash +rm -rf ${FATE_PROJECT_BASE}/eggroll/data/IN_MEMORY/20200417* +``` + +### 3.2 Component output data (N=14 days) + +- Machine: eggroll node +- Directory: ${FATE_PROJECT_BASE}/eggroll/data/LMDB +- Rule: namespace starts with output_data_$jobid, clean up $jobid for data before **N days** +- Reference command. + +```bash +rm -rf ${FATE_PROJECT_BASE}/eggroll/data/LMDB/output_data_20200417* +``` diff --git a/mkdocs.yml b/mkdocs.yml index 53f88311e..f8feaf84f 100644 --- a/mkdocs.yml +++ b/mkdocs.yml @@ -27,7 +27,7 @@ nav: - configuration_instruction.md - system_operational.md - faq.md - - API: swagger.md + #- API: swagger.md theme: name: material @@ -47,13 +47,13 @@ theme: - media: "(prefers-color-scheme: light)" scheme: default toggle: - icon: material/lightbulb + icon: material/weather-sunny name: Switch to dark mode - media: "(prefers-color-scheme: dark)" - scheme: slate + scheme: slate primary: teal toggle: - icon: material/lightbulb-outline + icon: material/weather-night name: Switch to light mode @@ -78,6 +78,7 @@ markdown_extensions: - footnotes - meta - def_list + - attr_list - pymdownx.arithmatex - pymdownx.betterem: smart_enable: all diff --git a/python/fate_flow/operation/job_tracker.py b/python/fate_flow/operation/job_tracker.py index 93ba30f2d..f3bc3d831 100644 --- a/python/fate_flow/operation/job_tracker.py +++ b/python/fate_flow/operation/job_tracker.py @@ -394,25 +394,28 @@ def read_output_data_info_from_db(self, data_name=None): @classmethod @DB.connection_context() def query_output_data_infos(cls, **kwargs) -> typing.List[TrackingOutputDataInfo]: - tracking_output_data_info_model = cls.get_dynamic_db_model(TrackingOutputDataInfo, kwargs.get("job_id")) - filters = [] - for f_n, f_v in kwargs.items(): - attr_name = 'f_%s' % f_n - if hasattr(tracking_output_data_info_model, attr_name): - filters.append(operator.attrgetter('f_%s' % f_n)(tracking_output_data_info_model) == f_v) - if filters: - output_data_infos_tmp = tracking_output_data_info_model.select().where(*filters) - else: - output_data_infos_tmp = tracking_output_data_info_model.select() - output_data_infos_group = {} - # only the latest version of the task output data is retrieved - for output_data_info in output_data_infos_tmp: - group_key = cls.get_output_data_group_key(output_data_info.f_task_id, output_data_info.f_data_name) - if group_key not in output_data_infos_group: - output_data_infos_group[group_key] = output_data_info - elif output_data_info.f_task_version > output_data_infos_group[group_key].f_task_version: - output_data_infos_group[group_key] = output_data_info - return list(output_data_infos_group.values()) + try: + tracking_output_data_info_model = cls.get_dynamic_db_model(TrackingOutputDataInfo, kwargs.get("job_id")) + filters = [] + for f_n, f_v in kwargs.items(): + attr_name = 'f_%s' % f_n + if hasattr(tracking_output_data_info_model, attr_name): + filters.append(operator.attrgetter('f_%s' % f_n)(tracking_output_data_info_model) == f_v) + if filters: + output_data_infos_tmp = tracking_output_data_info_model.select().where(*filters) + else: + output_data_infos_tmp = tracking_output_data_info_model.select() + output_data_infos_group = {} + # only the latest version of the task output data is retrieved + for output_data_info in output_data_infos_tmp: + group_key = cls.get_output_data_group_key(output_data_info.f_task_id, output_data_info.f_data_name) + if group_key not in output_data_infos_group: + output_data_infos_group[group_key] = output_data_info + elif output_data_info.f_task_version > output_data_infos_group[group_key].f_task_version: + output_data_infos_group[group_key] = output_data_info + return list(output_data_infos_group.values()) + except Exception as e: + return [] @classmethod def get_output_data_group_key(cls, task_id, data_name): diff --git a/python/requirements.txt b/python/requirements.txt index 58bcfeed2..85ae2fd62 100644 --- a/python/requirements.txt +++ b/python/requirements.txt @@ -1,4 +1,8 @@ +pip>=21 apsw<3.10 +importlib_metadata<2.0.0 +markdown==3.3.5 +pkginfo==1.7.1 beautifultable==1.0.0 cachetools==3.0.0 cloudpickle==0.6.1 @@ -21,8 +25,8 @@ PyMySQL==0.9.3 pyspark==3.1.2 python-dotenv==0.13.0 redis==3.5.3 -urllib3==1.25.11 -requests==2.24.0 +urllib3==1.26.5 +requests==2.26.0 requests_toolbelt==0.9.1 ruamel-yaml==0.16.10 scikit-learn==0.24.2