Skip to content

Commit

Permalink
Merge pull request #176 from wangxinbiao/main
Browse files Browse the repository at this point in the history
retrieve files from Minio for data processing
  • Loading branch information
bjwswang authored Nov 9, 2023
2 parents 8b71e57 + 813660c commit 1aca1fc
Show file tree
Hide file tree
Showing 24 changed files with 1,710 additions and 0 deletions.
109 changes: 109 additions & 0 deletions assets/data_process.drawio
Original file line number Diff line number Diff line change
@@ -0,0 +1,109 @@
<mxfile host="Electron" modified="2023-11-02T10:49:05.695Z" agent="Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) draw.io/21.2.8 Chrome/112.0.5615.165 Electron/24.2.0 Safari/537.36" etag="MtKcBN_l9eNnbYWFwm7D" version="21.2.8" type="device">
<diagram name="第 1 页" id="loeKpyqY9KO9q6GEpTu8">
<mxGraphModel dx="1026" dy="1843" grid="0" gridSize="10" guides="1" tooltips="1" connect="1" arrows="1" fold="1" page="1" pageScale="1" pageWidth="827" pageHeight="1169" math="0" shadow="0">
<root>
<mxCell id="0" />
<mxCell id="1" parent="0" />
<mxCell id="iNnB15NnGvmIqP9MPX9o-1" value="&lt;font style=&quot;font-size: 18px;&quot;&gt;数据处理HTTP服务&lt;/font&gt;" style="rounded=0;whiteSpace=wrap;html=1;fillColor=#4472c4;strokeColor=none;fontColor=#ffffff;" parent="1" vertex="1">
<mxGeometry x="9" y="-170" width="810" height="40" as="geometry" />
</mxCell>
<mxCell id="iNnB15NnGvmIqP9MPX9o-23" value="" style="rounded=0;whiteSpace=wrap;html=1;fillColor=#dce6f2;strokeColor=none;" parent="1" vertex="1">
<mxGeometry x="120" y="-120" width="700" height="60" as="geometry" />
</mxCell>
<mxCell id="iNnB15NnGvmIqP9MPX9o-24" value="&lt;font color=&quot;#ffffff&quot; style=&quot;font-size: 18px;&quot;&gt;Controller&lt;/font&gt;" style="rounded=0;whiteSpace=wrap;html=1;fillColor=#4472c4;strokeColor=none;" parent="1" vertex="1">
<mxGeometry x="9" y="-120" width="100" height="60" as="geometry" />
</mxCell>
<mxCell id="iNnB15NnGvmIqP9MPX9o-25" value="&lt;font style=&quot;font-size: 18px;&quot;&gt;Service&lt;/font&gt;" style="rounded=0;whiteSpace=wrap;html=1;fillColor=#4472c4;fontColor=#FFFFFF;strokeColor=none;" parent="1" vertex="1">
<mxGeometry x="9" y="-55" width="100" height="60" as="geometry" />
</mxCell>
<mxCell id="iNnB15NnGvmIqP9MPX9o-26" value="&lt;font style=&quot;font-size: 18px;&quot;&gt;Handle&lt;/font&gt;" style="rounded=0;whiteSpace=wrap;html=1;fillColor=#4472c4;fontColor=#FFFFFF;strokeColor=none;" parent="1" vertex="1">
<mxGeometry x="9" y="10" width="100" height="115" as="geometry" />
</mxCell>
<mxCell id="iNnB15NnGvmIqP9MPX9o-27" value="&lt;font style=&quot;font-size: 18px;&quot;&gt;Transform&lt;/font&gt;" style="rounded=0;whiteSpace=wrap;html=1;fillColor=#4472c4;fontColor=#FFFFFF;strokeColor=none;" parent="1" vertex="1">
<mxGeometry x="9" y="130" width="100" height="60" as="geometry" />
</mxCell>
<mxCell id="iNnB15NnGvmIqP9MPX9o-29" value="&lt;font style=&quot;font-size: 18px;&quot;&gt;基础类&lt;/font&gt;" style="rounded=0;whiteSpace=wrap;html=1;fillColor=#4472c4;fontColor=#ffffff;strokeColor=none;" parent="1" vertex="1">
<mxGeometry x="9" y="197" width="100" height="60" as="geometry" />
</mxCell>
<mxCell id="6DU0XRBK3AAMFuq2KatI-3" value="" style="rounded=0;whiteSpace=wrap;html=1;fillColor=#dce6f2;strokeColor=none;" vertex="1" parent="1">
<mxGeometry x="120" y="-55" width="700" height="60" as="geometry" />
</mxCell>
<mxCell id="6DU0XRBK3AAMFuq2KatI-5" value="&lt;font style=&quot;font-size: 18px;&quot;&gt;minio_process&lt;/font&gt;" style="rounded=0;whiteSpace=wrap;html=1;fillColor=#a20025;fontColor=#ffffff;strokeColor=none;" vertex="1" parent="1">
<mxGeometry x="126" y="-50" width="208" height="50" as="geometry" />
</mxCell>
<mxCell id="6DU0XRBK3AAMFuq2KatI-6" value="&lt;font color=&quot;#ffffff&quot; style=&quot;font-size: 18px;&quot;&gt;text_clean_for_minio&lt;/font&gt;" style="rounded=0;whiteSpace=wrap;html=1;fillColor=#5b9bd5;strokeColor=none;" vertex="1" parent="1">
<mxGeometry x="127" y="-116" width="689" height="50" as="geometry" />
</mxCell>
<mxCell id="6DU0XRBK3AAMFuq2KatI-7" value="&lt;font style=&quot;font-size: 18px;&quot;&gt;database_process&lt;/font&gt;" style="rounded=0;whiteSpace=wrap;html=1;fillColor=#a20025;fontColor=#ffffff;strokeColor=none;" vertex="1" parent="1">
<mxGeometry x="339.5" y="-50" width="239" height="50" as="geometry" />
</mxCell>
<mxCell id="6DU0XRBK3AAMFuq2KatI-8" value="&lt;span style=&quot;font-size: 18px;&quot;&gt;web_api_process&lt;/span&gt;" style="rounded=0;whiteSpace=wrap;html=1;fillColor=#a20025;fontColor=#ffffff;strokeColor=none;" vertex="1" parent="1">
<mxGeometry x="584" y="-50" width="229" height="50" as="geometry" />
</mxCell>
<mxCell id="6DU0XRBK3AAMFuq2KatI-9" value="" style="rounded=0;whiteSpace=wrap;html=1;fillColor=#dce6f2;strokeColor=none;" vertex="1" parent="1">
<mxGeometry x="120" y="10" width="700" height="115" as="geometry" />
</mxCell>
<mxCell id="6DU0XRBK3AAMFuq2KatI-10" value="&lt;span style=&quot;font-size: 18px;&quot;&gt;json&lt;/span&gt;" style="rounded=0;whiteSpace=wrap;html=1;fillColor=#d80073;fontColor=#ffffff;strokeColor=none;" vertex="1" parent="1">
<mxGeometry x="126" y="16" width="135" height="50" as="geometry" />
</mxCell>
<mxCell id="6DU0XRBK3AAMFuq2KatI-11" value="&lt;span style=&quot;font-size: 18px;&quot;&gt;csv&lt;/span&gt;" style="rounded=0;whiteSpace=wrap;html=1;fillColor=#d80073;fontColor=#ffffff;strokeColor=none;" vertex="1" parent="1">
<mxGeometry x="126" y="70" width="135" height="50" as="geometry" />
</mxCell>
<mxCell id="6DU0XRBK3AAMFuq2KatI-12" value="&lt;span style=&quot;font-size: 18px;&quot;&gt;txt&lt;/span&gt;" style="rounded=0;whiteSpace=wrap;html=1;fillColor=#d80073;fontColor=#ffffff;strokeColor=none;" vertex="1" parent="1">
<mxGeometry x="265" y="16" width="135" height="50" as="geometry" />
</mxCell>
<mxCell id="6DU0XRBK3AAMFuq2KatI-13" value="&lt;span style=&quot;font-size: 18px;&quot;&gt;pdf&lt;/span&gt;" style="rounded=0;whiteSpace=wrap;html=1;fillColor=#d80073;fontColor=#ffffff;strokeColor=none;" vertex="1" parent="1">
<mxGeometry x="265" y="70" width="135" height="50" as="geometry" />
</mxCell>
<mxCell id="6DU0XRBK3AAMFuq2KatI-14" value="&lt;span style=&quot;font-size: 18px;&quot;&gt;doc&lt;/span&gt;" style="rounded=0;whiteSpace=wrap;html=1;fillColor=#d80073;fontColor=#ffffff;strokeColor=none;" vertex="1" parent="1">
<mxGeometry x="403" y="16" width="135" height="50" as="geometry" />
</mxCell>
<mxCell id="6DU0XRBK3AAMFuq2KatI-15" value="&lt;span style=&quot;font-size: 18px;&quot;&gt;markdown&lt;/span&gt;" style="rounded=0;whiteSpace=wrap;html=1;fillColor=#d80073;fontColor=#ffffff;strokeColor=none;" vertex="1" parent="1">
<mxGeometry x="403" y="70" width="135" height="50" as="geometry" />
</mxCell>
<mxCell id="6DU0XRBK3AAMFuq2KatI-16" value="&lt;span style=&quot;font-size: 18px;&quot;&gt;html&lt;/span&gt;" style="rounded=0;whiteSpace=wrap;html=1;fillColor=#d80073;fontColor=#ffffff;strokeColor=none;" vertex="1" parent="1">
<mxGeometry x="541" y="16" width="135" height="50" as="geometry" />
</mxCell>
<mxCell id="6DU0XRBK3AAMFuq2KatI-17" value="&lt;span style=&quot;font-size: 18px;&quot;&gt;ppt&lt;/span&gt;" style="rounded=0;whiteSpace=wrap;html=1;fillColor=#d80073;fontColor=#ffffff;strokeColor=none;" vertex="1" parent="1">
<mxGeometry x="541" y="70" width="135" height="50" as="geometry" />
</mxCell>
<mxCell id="6DU0XRBK3AAMFuq2KatI-18" value="&lt;span style=&quot;font-size: 18px;&quot;&gt;excel&lt;/span&gt;" style="rounded=0;whiteSpace=wrap;html=1;fillColor=#d80073;fontColor=#ffffff;strokeColor=none;" vertex="1" parent="1">
<mxGeometry x="679" y="16" width="135" height="50" as="geometry" />
</mxCell>
<mxCell id="6DU0XRBK3AAMFuq2KatI-19" value="" style="rounded=0;whiteSpace=wrap;html=1;fillColor=#dce6f2;strokeColor=none;" vertex="1" parent="1">
<mxGeometry x="120" y="130" width="700" height="60" as="geometry" />
</mxCell>
<mxCell id="6DU0XRBK3AAMFuq2KatI-20" value="&lt;font style=&quot;font-size: 18px;&quot;&gt;text&lt;/font&gt;" style="rounded=0;whiteSpace=wrap;html=1;fillColor=#008a00;fontColor=#ffffff;strokeColor=none;" vertex="1" parent="1">
<mxGeometry x="126" y="135" width="225" height="50" as="geometry" />
</mxCell>
<mxCell id="6DU0XRBK3AAMFuq2KatI-21" value="&lt;span style=&quot;font-size: 18px;&quot;&gt;image&lt;/span&gt;" style="rounded=0;whiteSpace=wrap;html=1;fillColor=#008a00;fontColor=#ffffff;strokeColor=none;" vertex="1" parent="1">
<mxGeometry x="357.5" y="135" width="225" height="50" as="geometry" />
</mxCell>
<mxCell id="6DU0XRBK3AAMFuq2KatI-22" value="&lt;span style=&quot;font-size: 18px;&quot;&gt;table&lt;/span&gt;" style="rounded=0;whiteSpace=wrap;html=1;fillColor=#008a00;fontColor=#ffffff;strokeColor=none;" vertex="1" parent="1">
<mxGeometry x="589" y="135" width="225" height="50" as="geometry" />
</mxCell>
<mxCell id="6DU0XRBK3AAMFuq2KatI-23" value="" style="rounded=0;whiteSpace=wrap;html=1;fillColor=#dce6f2;strokeColor=none;" vertex="1" parent="1">
<mxGeometry x="120" y="197" width="700" height="60" as="geometry" />
</mxCell>
<mxCell id="6DU0XRBK3AAMFuq2KatI-24" value="&lt;font style=&quot;font-size: 18px;&quot;&gt;utils&lt;/font&gt;" style="rounded=0;whiteSpace=wrap;html=1;fillColor=#1ba1e2;fontColor=#ffffff;strokeColor=none;" vertex="1" parent="1">
<mxGeometry x="126" y="202" width="225" height="50" as="geometry" />
</mxCell>
<mxCell id="6DU0XRBK3AAMFuq2KatI-25" value="&lt;font style=&quot;font-size: 18px;&quot;&gt;common&lt;/font&gt;" style="rounded=0;whiteSpace=wrap;html=1;fillColor=#1ba1e2;fontColor=#ffffff;strokeColor=none;" vertex="1" parent="1">
<mxGeometry x="358" y="202" width="225" height="50" as="geometry" />
</mxCell>
<mxCell id="6DU0XRBK3AAMFuq2KatI-26" value="&lt;font style=&quot;font-size: 18px;&quot;&gt;OCR&lt;/font&gt;" style="rounded=0;whiteSpace=wrap;html=1;fillColor=#1ba1e2;fontColor=#ffffff;strokeColor=none;" vertex="1" parent="1">
<mxGeometry x="588" y="202" width="225" height="50" as="geometry" />
</mxCell>
<mxCell id="6DU0XRBK3AAMFuq2KatI-27" value="" style="rounded=0;whiteSpace=wrap;html=1;fillColor=#dce6f2;strokeColor=none;" vertex="1" parent="1">
<mxGeometry x="119.5" y="262" width="700" height="60" as="geometry" />
</mxCell>
<mxCell id="6DU0XRBK3AAMFuq2KatI-28" value="&lt;font style=&quot;font-size: 18px;&quot;&gt;Sanic&lt;/font&gt;" style="rounded=0;whiteSpace=wrap;html=1;fillColor=#6a00ff;fontColor=#ffffff;strokeColor=#3700CC;" vertex="1" parent="1">
<mxGeometry x="126" y="267" width="684" height="50" as="geometry" />
</mxCell>
<mxCell id="6DU0XRBK3AAMFuq2KatI-29" value="&lt;font style=&quot;font-size: 18px;&quot;&gt;Web框架&lt;/font&gt;" style="rounded=0;whiteSpace=wrap;html=1;fillColor=#4472c4;fontColor=#ffffff;strokeColor=none;" vertex="1" parent="1">
<mxGeometry x="9" y="262" width="100" height="60" as="geometry" />
</mxCell>
</root>
</mxGraphModel>
</diagram>
</mxfile>
Binary file added assets/data_process.drawio.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
7 changes: 7 additions & 0 deletions data-process/.gitignore
Original file line number Diff line number Diff line change
@@ -0,0 +1,7 @@
# python
__pycache__
.ipynb_checkpoints

mock_data

log
1 change: 1 addition & 0 deletions data-process/README.md
Original file line number Diff line number Diff line change
@@ -1 +1,2 @@
# Data Process
The current documentation is only available in Chinese. Please refer to the content in .zh.md for specific details.
30 changes: 30 additions & 0 deletions data-process/README.zh.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,30 @@
# Data Process

## 当前版本主要功能
Data Process用于做数据处理,通MinIO、数据库、Web API等方式获取数据,数据类型包括一下几种:
- txt
- json
- doc
- html
- excel
- csv
- pdf
- markdown
- ppt

### 当前文本类型处理
数据处理的过程包括:异常数据清洗、过滤、去重、去隐私。

## 设计
![设计](../assets/data_process.drawio.png)

## 本地开发
### 软件要求
本地搭建 data-process 环境之前请确保已经安装一下软件:
- Python 3.10.x

### 环境搭建
安装 requirements.txt 中的python依赖包

### 运行
python data_manipulation/server.py文件
21 changes: 21 additions & 0 deletions data-process/data_manipulation/common/config.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,21 @@
# Copyright 2023 KubeAGI.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.

import os

minio_access_key = os.getenv('MINIO_ACCESSKEY', 'minioadmin')
minio_secret_key = os.getenv('MINIO_SECRETKEY', 'minioadmin')
minio_api_url = os.getenv('MINIO_API_URL', '192.168.90.31:9000')
# 如果使用HTTP,将secure设置为False;如果使用HTTPS,将其设置为True
minio_secure = os.getenv('MINIO_SECURE', False)
Loading

0 comments on commit 1aca1fc

Please sign in to comment.