-
Notifications
You must be signed in to change notification settings - Fork 47
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Merge pull request #551 from FederatedAI/dev-2.0.0-rc
update doc
- Loading branch information
Showing
29 changed files
with
1,906 additions
and
480 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,122 @@ | ||
# FATE Data Access Guide | ||
|
||
## 1. Upload Process | ||
The process diagram for data upload is as follows: | ||
|
||
![Data Upload](./images/upload_data.png) | ||
- The client uploads data to the server. | ||
- The server encapsulates the upload parameters into a DAG job configuration, including two components: 'upload' and 'dataframe-transformer,' then calls the submit interface to submit the job. | ||
- The 'upload' component stores data into the FATE storage service. | ||
- The 'transformer' component converts the data output from the 'upload' component into a dataframe and stores it into the FATE storage service. | ||
- Metadata about the data is stored in the database. | ||
|
||
## 2. Data Upload Methods | ||
Note: FATE provides clients including SDK, CLI, and Pipeline. If you haven't deployed the FATE Client in your environment, you can use `pip install fate_client` to download it. The following operations are CLI-based. | ||
|
||
### 2.1 Upload Scenario Explanation | ||
- Client-server separation: Installed client and server are on different machines. | ||
- Client-server non-separation: Installed client and server are on the same machine. | ||
Difference: In scenarios where the client and server are not separated, the step "the client uploads data to the server" in the above process can be omitted to improve data upload efficiency in scenarios with large data volumes. There are differences in interfaces and parameters between the two scenarios, and you can choose the corresponding scenario for data upload. | ||
|
||
### 2.2 Data Upload | ||
#### 2.2.1 Configuration and Data Preparation | ||
- Upload configuration is located in [examples-upload](https://github.com/FederatedAI/FATE-Flow/tree/v2.0.0/examples/upload) | ||
```yaml | ||
{ | ||
"file": "examples/data/breast_hetero_guest.csv", | ||
"head": true, | ||
"partitions": 16, | ||
"extend_sid": true, | ||
"meta": { | ||
"delimiter": ",", | ||
"label_name": "y", | ||
"match_id_name": "id" | ||
}, | ||
"namespace": "experiment", | ||
"name": "breast_hetero_guest" | ||
} | ||
``` | ||
- file: File path | ||
- head: Whether the data contains a header: true/false | ||
- partitions: Number of data storage partitions | ||
- extend_sid: Whether to generate an 'sid' column | ||
- meta: Metadata about the data | ||
- namespace && name: Reference to data in the FATE storage table | ||
- Uploaded data is located in [upload-data](https://github.com/FederatedAI/FATE-Flow/tree/v2.0.0/examples/data) | ||
- You can also use your own data and modify the "meta" information in the upload configuration. | ||
|
||
#### 2.2.2 Data Upload Commands | ||
##### Client-Server Non-Separation | ||
```shell | ||
flow data upload -c examples/upload/upload_guest.json | ||
``` | ||
Note: Ensure that the file path in the configuration exists on the server. | ||
##### Client-Server Separation | ||
```shell | ||
flow data upload-file -c examples/upload/upload_guest.json | ||
``` | ||
#### 2.2.3 Upload Results | ||
```json | ||
{ | ||
"code": 0, | ||
"data": { | ||
"name": "breast_hetero_guest", | ||
"namespace": "experiment" | ||
}, | ||
"job_id": "202312281606030428210", | ||
"message": "success" | ||
} | ||
``` | ||
|
||
#### 2.2.4 Data Query | ||
Since the entire upload is an asynchronous operation, it's necessary to confirm successful upload before performing subsequent operations. | ||
```shell | ||
flow table query --namespace experiment --name breast_hetero_guest | ||
``` | ||
- Successful data upload returns: | ||
```json | ||
{ | ||
"code": 0, | ||
"data": { | ||
"count": 569, | ||
"data_type": "dataframe", | ||
"engine": "standalone", | ||
"meta": {}, | ||
"name": "breast_hetero_guest", | ||
"namespace": "experiment", | ||
"path": "xxx", | ||
"source": { | ||
"component": "dataframe_transformer", | ||
"output_artifact_key": "dataframe_output", | ||
"output_index": null, | ||
"party_task_id": "202312281606030428210_transformer_0_0_local_0", | ||
"task_id": "202312281606030428210_transformer_0", | ||
"task_name": "transformer_0" | ||
} | ||
}, | ||
"message": "success" | ||
} | ||
``` | ||
|
||
## 3. Data Binding | ||
For specific algorithms that may require particular datasets, FATE Flow provides a data binding interface to make the data available for use in FATE. | ||
|
||
```shell | ||
flow table bind --namespace bind_data --name breast_hetero_guest --path /data/projects/fate/fate_flow/data/xxx | ||
``` | ||
|
||
## 4. Data Query | ||
For uploaded or bound data tables, you can use the query interface to retrieve brief information about the data. | ||
|
||
```shell | ||
flow table query --namespace experiment --name breast_hetero_guest | ||
``` | ||
|
||
## 5. Data Cleaning | ||
You can use delete cli to clean data tables that already exist in FATE. | ||
|
||
```shell | ||
flow table delete --namespace experiment --name breast_hetero_guest | ||
``` | ||
|
||
This covers the translation of the document into English. |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,121 @@ | ||
# FATE数据接入指南 | ||
## 1. 上传流程 | ||
数据上传的流程图如下: | ||
|
||
![数据上传](./images/upload_data.png) | ||
- 客户端将数据上传到服务端; | ||
- 服务端将上传参数封装成DAG作业配置, 配置中包含两个组件, 即upload和dataframe-transformer,并调用submit接口提交作业; | ||
- upload组件将数据存储到fate存储服务中; | ||
- transformer组件将upload组件的数据输出转化成dataframe并存储到fate存储服务中; | ||
- 数据的meta信息存储到DB中. | ||
|
||
## 2. 数据上传方式 | ||
注: fate提供的客户端包括SDK、CLI、Pipeline,若你的环境中没有部署FATE Client,可以使用`pip install fate_client`下载,以下的使用操作均基于cli编写。 | ||
### 2.1 上传场景说明 | ||
- 客户端、服务器分离:安装的客户端和服务器不在一台机器 | ||
- 客户端、服务器不分离:安装的客户端和服务器在同一台机器 | ||
两者区别:客户端不分离的场景,可以去掉上述流程中"客户端将数据上传到服务端",以此提高大数据量场景下数据上传的效率。两种场景接口、参数有区别,可以选择对应的场景进行数据上传。 | ||
|
||
### 2.2 数据上传 | ||
#### 2.2.1 配置及数据准备 | ||
- 上传配置位于[examples-upload](https://github.com/FederatedAI/FATE-Flow/tree/v2.0.0/examples/upload) | ||
```yaml | ||
{ | ||
"file": "examples/data/breast_hetero_guest.csv", | ||
"head": true, | ||
"partitions": 16, | ||
"extend_sid": true, | ||
"meta": { | ||
"delimiter": ",", | ||
"label_name": "y", | ||
"match_id_name": "id" | ||
}, | ||
"namespace": "experiment", | ||
"name": "breast_hetero_guest" | ||
} | ||
``` | ||
- file: 文件路径 | ||
- head: 数据是否携带header: true/false | ||
- partitions: 数据存储分区数量 | ||
- extend_sid:是否需要生成sid列 | ||
- meta:数据的元信息 | ||
- namespace && name: 数据在fate的存储表引用 | ||
- 上传数据位于[upload-data](https://github.com/FederatedAI/FATE-Flow/tree/v2.0.0/examples/data) | ||
- 你也可以使用自己的数据,并修改upload配置中的"meta"信息 | ||
|
||
#### 2.2.2 上传数据命令 | ||
##### 客户端-服务器不分离 | ||
```shell | ||
flow data upload -c examples/upload/upload_guest.json | ||
``` | ||
注:需要保证配置中的file路径在服务器中存在。 | ||
##### 客户端-服务器分离 | ||
```shell | ||
flow data upload-file -c examples/upload/upload_guest.json | ||
``` | ||
#### 2.2.3 上传结果 | ||
```json | ||
{ | ||
"code": 0, | ||
"data": { | ||
"name": "breast_hetero_guest", | ||
"namespace": "experiment" | ||
}, | ||
"job_id": "202312281606030428210", | ||
"message": "success" | ||
} | ||
``` | ||
|
||
#### 2.2.4 数据查询 | ||
因为整个上传为异步操作,需要确认是否上传成功才可进行后续操作。 | ||
```shell | ||
flow table query --namespace experiment --name breast_hetero_guest | ||
``` | ||
- 数据上传成功返回 | ||
```json | ||
{ | ||
"code": 0, | ||
"data": { | ||
"count": 569, | ||
"data_type": "dataframe", | ||
"engine": "standalone", | ||
"meta": {}, | ||
"name": "breast_hetero_guest", | ||
"namespace": "experiment", | ||
"path": "xxx", | ||
"source": { | ||
"component": "dataframe_transformer", | ||
"output_artifact_key": "dataframe_output", | ||
"output_index": null, | ||
"party_task_id": "202312281606030428210_transformer_0_0_local_0", | ||
"task_id": "202312281606030428210_transformer_0", | ||
"task_name": "transformer_0" | ||
} | ||
}, | ||
"message": "success" | ||
} | ||
``` | ||
|
||
## 3. 数据绑定 | ||
对于特定的算法,可能需要特殊的数据集,FATE Flow提供data bind接口来将数据供FATE使用 | ||
|
||
```shell | ||
flow table bind --namespace bind_data --name breast_hetero_guest --path /data/projects/fate/fate_flow/data/xxx | ||
``` | ||
|
||
## 4. 数据查询 | ||
对于上传或者绑定的数据表,可以通过查询接口来获取数据的简略信息 | ||
|
||
```shell | ||
flow table query --namespace experiment --name breast_hetero_guest | ||
``` | ||
|
||
## 5. 数据清理 | ||
可以通过清理接口来清理已经存在FATE的数据表 | ||
|
||
```shell | ||
flow table delete --namespace experiment --name breast_hetero_guest | ||
``` | ||
|
Oops, something went wrong.