-
Notifications
You must be signed in to change notification settings - Fork 151
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Merge pull request #309 from IndustryEssentials/dev
Dev merge to master
- Loading branch information
Showing
55 changed files
with
388 additions
and
212 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -82,3 +82,4 @@ mysql/ | |
redis/ | ||
ymir-data/ | ||
ymir-workplace | ||
.mir_lock |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -18,14 +18,14 @@ | |
- [2.2. 安装 YMIR-GUI](#22-%E5%AE%89%E8%A3%85-ymir-gui) | ||
- [2.3. 安装配置LabelStudio (可选)](#23-%E5%AE%89%E8%A3%85%E9%85%8D%E7%BD%AElabelstudio-%E5%8F%AF%E9%80%89) | ||
- [3. GUI使用-典型模型生产流程](#3-gui%E4%BD%BF%E7%94%A8-%E5%85%B8%E5%9E%8B%E6%A8%A1%E5%9E%8B%E7%94%9F%E4%BA%A7%E6%B5%81%E7%A8%8B) | ||
- [3.1. 原始数据准备](#31-%E5%8E%9F%E5%A7%8B%E6%95%B0%E6%8D%AE%E5%87%86%E5%A4%87) | ||
- [3.2. 数据标注](#32-%E6%95%B0%E6%8D%AE%E6%A0%87%E6%B3%A8) | ||
- [3.3. 训练模型](#33-%E8%AE%AD%E7%BB%83%E6%A8%A1%E5%9E%8B) | ||
- [模型迭代(通过迭代提升模型精度)](#%E6%A8%A1%E5%9E%8B%E8%BF%AD%E4%BB%A3%E9%80%9A%E8%BF%87%E8%BF%AD%E4%BB%A3%E6%8F%90%E5%8D%87%E6%A8%A1%E5%9E%8B%E7%B2%BE%E5%BA%A6) | ||
- [3.4. 数据挖掘](#34-%E6%95%B0%E6%8D%AE%E6%8C%96%E6%8E%98) | ||
- [3.5. 合并训练](#35-%E5%90%88%E5%B9%B6%E8%AE%AD%E7%BB%83) | ||
- [3.6. 模型验证](#36-%E6%A8%A1%E5%9E%8B%E9%AA%8C%E8%AF%81) | ||
- [3.7. 模型下载](#37-%E6%A8%A1%E5%9E%8B%E4%B8%8B%E8%BD%BD) | ||
- [3.1. 标签管理](#31-%E6%A0%87%E7%AD%BE%E7%AE%A1%E7%90%86) | ||
- [3.2. 原始数据准备](#32-%E5%8E%9F%E5%A7%8B%E6%95%B0%E6%8D%AE%E5%87%86%E5%A4%87) | ||
- [3.3. 数据标注](#33-%E6%95%B0%E6%8D%AE%E6%A0%87%E6%B3%A8) | ||
- [3.4. 训练模型](#34-%E8%AE%AD%E7%BB%83%E6%A8%A1%E5%9E%8B) | ||
- [3.5. 数据挖掘](#35-%E6%95%B0%E6%8D%AE%E6%8C%96%E6%8E%98) | ||
- [3.6. 合并训练](#36-%E5%90%88%E5%B9%B6%E8%AE%AD%E7%BB%83) | ||
- [3.7. 模型验证](#37-%E6%A8%A1%E5%9E%8B%E9%AA%8C%E8%AF%81) | ||
- [3.8. 模型下载](#38-%E6%A8%A1%E5%9E%8B%E4%B8%8B%E8%BD%BD) | ||
- [4. 进阶版:Ymir-CMD line使用指南](#4-%E8%BF%9B%E9%98%B6%E7%89%88ymir-cmd-line%E4%BD%BF%E7%94%A8%E6%8C%87%E5%8D%97) | ||
- [4.1 安装](#41-%E5%AE%89%E8%A3%85) | ||
- [方式一:通过pip安装](#%E6%96%B9%E5%BC%8F%E4%B8%80%E9%80%9A%E8%BF%87pip%E5%AE%89%E8%A3%85) | ||
|
@@ -140,9 +140,9 @@ YMIR-GUI项目包在DockerHub上,安装部署YMIR步骤如下: | |
将部署项目YMIR下拉到本地服务器,克隆仓库地址命令: | ||
`git clone [email protected]:IndustryEssentials/ymir.git` | ||
|
||
2. 无需修改相应配置,使用默认配置情况下可以直接执行启动命令:`sh ymir.sh start` | ||
2. 无需修改相应配置,使用默认配置情况下可以直接执行启动命令:`bash ymir.sh start` | ||
|
||
服务启动成功后,默认配置端口为12001,可以直接访问 [http://localhost:12001/](http://localhost:12001/) 显示登录界面即安装成功。如果需要**停止服务**,运行命令为:`sh ymir.sh stop` | ||
服务启动成功后,默认配置端口为12001,可以直接访问 [http://localhost:12001/](http://localhost:12001/) 显示登录界面即安装成功。如果需要**停止服务**,运行命令为:`bash ymir.sh stop` | ||
|
||
如无可用显卡,需要安装CPU模式,请修改为CPU启动模式,修改.env文件将SERVER_RUNTIME参数修改为runc: | ||
|
||
|
@@ -200,27 +200,31 @@ LABEL_TASK_LOOP_SECONDS=60 | |
|
||
使用更新后的数据集再次训练模型,以此来提高模型能力。相比于对全部数据标注后再训练,YMIR平台提供的方法更高效,减少了对低质量数据的标注成本。通过挖掘,标注,训练的循环,扩充高质量数据,提升模型能力。 | ||
|
||
## 3.1. 原始数据准备 | ||
## 3.1. 标签管理 | ||
|
||
用户准备好带有训练目标的数据集(训练集,测试集),用于训练初始模型。针对本身带有标注文件的数据集,在导入之前,保证数据集的格式符合格式要求: | ||
当用户需要导入的数据集带有标注文件时,请确保标注类型属于系统已有的标签列表,否则需要用户前往标签管理界面新增自定义标签,以便导入数据。如下图所示: | ||
|
||
![标签管理](docs/images/%E6%96%B0%E5%A2%9E%E6%A0%87%E7%AD%BE.jpg) | ||
|
||
## 3.2. 原始数据准备 | ||
|
||
用户准备好带有训练目标的数据集(训练集,测试集),用于训练初始模型。针对本身带有标注文件的数据集,在导入之前,需要保证数据集的格式符合格式要求: | ||
|
||
* 数据集为.zip格式,其中包含两个文件夹,需分别命名为images和annotations; | ||
* images文件夹存放数据的图片信息,图像格式限为jpg、jpeg、png; | ||
* annotations文件夹存放数据的标注信息,标注文件格式为pascal voc(当无标注文件时,该文件夹为空); | ||
* 当数据集带有标注文件时,标注类型必须属于平台内置的标签列表,详见[标签列表](https://github.com/IndustryEssentials/ymir-proto/blob/master/ymir/ids/type_id_names.csv) ; | ||
* 如用户需要新增自定义标签,以便导入数据,请查看[如何修改标签类别文件](#71-常见问题)。 | ||
|
||
数据集导入支持四种形式:公共数据集导入、网络导入、本地导入和路径导入,如下图所示: | ||
|
||
![数据导入引导](docs/images/%E6%95%B0%E6%8D%AE%E5%AF%BC%E5%85%A5%E5%BC%95%E5%AF%BC.jpeg) | ||
|
||
(1) 公共数据集复制:导入公共用户内置的数据集,该数据集存储在公共用户上,以复制的形式导入到当前的操作用户上。用户可以根据标签筛选对应的数据集,如下图所示: | ||
(1) 公共数据集复制:导入公共用户内置的数据集,该数据集存储在公共用户上,以复制的形式导入到当前的操作用户上。如下图所示: | ||
|
||
![公共数据集导入1](docs/images/%E5%85%AC%E5%85%B1%E6%95%B0%E6%8D%AE%E9%9B%86%E5%AF%BC%E5%85%A51.jpeg) | ||
|
||
![公共数据集导入2](docs/images/%E5%85%AC%E5%85%B1%E6%95%B0%E6%8D%AE%E9%9B%86%E5%AF%BC%E5%85%A52.jpeg) | ||
|
||
选择符合条件的数据集,点击【确定】即可开始复制。 | ||
选择数据集,可根据需求选择是否要同步导入公共数据集包含的标签,点击【确定】即可开始复制。 | ||
|
||
(2) 网络导入:输入数据集对应的url路径,如下图所示: | ||
|
||
|
@@ -240,7 +244,7 @@ LABEL_TASK_LOOP_SECONDS=60 | |
|
||
![voc训练集测试集](docs/images/voc%E8%AE%AD%E7%BB%83%E9%9B%86%E6%B5%8B%E8%AF%95%E9%9B%86.jpeg) | ||
|
||
## 3.2. 数据标注 | ||
## 3.3. 数据标注 | ||
|
||
如果导入的训练集或测试集没有标签,则需要进行标注。点击任务管理界面的【新建标注任务】按钮,跳转至创建数据标注任务界面,如下图所示: | ||
|
||
|
@@ -252,7 +256,7 @@ LABEL_TASK_LOOP_SECONDS=60 | |
|
||
创建成功后,跳转到任务管理界面,可以查看到相应的任务进度和信息,任务完成后,系统自动获取标注完成的结果,生成带有新标注的数据集。 | ||
|
||
## 3.3. 训练模型 | ||
## 3.4. 训练模型 | ||
|
||
点击任务管理界面的【新建训练任务】按钮,跳转至创建模型训练任务界面,如下图所示: | ||
|
||
|
@@ -266,7 +270,7 @@ LABEL_TASK_LOOP_SECONDS=60 | |
|
||
## 模型迭代(通过迭代提升模型精度) | ||
|
||
## 3.4. 数据挖掘 | ||
## 3.5. 数据挖掘 | ||
|
||
由于在模型训练的初期,很难一次性找到大量的优质数据来进行训练,导致初始模型的精度不够。因此,寻找有利于模型训练的数据一直是人工智能算法开发的一大问题,在这个过程中,往往会对算法工程师的人力资源产生很大消耗。在此基础上,YMIR提供成熟的挖掘算法,支持百万级数据挖掘,在海量数据中快速寻找到对模型优化最有利的数据,降低标注成本,减少迭代时间,保障模型的持续迭代。 | ||
|
||
|
@@ -280,7 +284,7 @@ LABEL_TASK_LOOP_SECONDS=60 | |
|
||
创建成功后,跳转到任务管理界面,可以查看到相应的任务进度和信息,任务完成后可查挖掘出的结果数据集。 | ||
|
||
## 3.5. 合并训练 | ||
## 3.6. 合并训练 | ||
|
||
![流程-中文](docs/images/%E6%B5%81%E7%A8%8B-%E4%B8%AD%E6%96%87.jpeg) | ||
|
||
|
@@ -292,7 +296,7 @@ LABEL_TASK_LOOP_SECONDS=60 | |
|
||
![合并2](docs/images/%E5%90%88%E5%B9%B62-1.jpeg) | ||
|
||
## 3.6. 模型验证 | ||
## 3.7. 模型验证 | ||
|
||
每次训练模型后,可以对模型结果进行验证,即通过可视化的方式查看模型在真实图片中的表现。在【模型管理】页面,点击对应模型的【验证】按钮,跳转到【模型验证】页面,如下图所示: | ||
|
||
|
@@ -306,7 +310,7 @@ LABEL_TASK_LOOP_SECONDS=60 | |
|
||
用户可对达到预期的模型进行下载。或继续用该模型挖掘,进入下一轮数据挖掘-数据标注-模型训练,进一步优化模型。 | ||
|
||
## 3.7. 模型下载 | ||
## 3.8. 模型下载 | ||
|
||
在【模型列表】页面,点击【下载】按钮,下载文件格式为tar包,包含模型的网络结构、为网络权重、超参数配置文件、训练的环境参数及结果,如下图所示: | ||
|
||
|
@@ -406,6 +410,32 @@ $ mir init # 将此目录初始化成一个mir repo | |
$ mkdir ~/ymir-assets ~/ymir-models # 建立资源和模型存储目录,所有的图像资源都会保存在此目录中,而在mir repo中只会保留对这些资源的引用 | ||
``` | ||
|
||
mir repo 中的标签通过标签文件进行统一管理,打开标签文件 `~/mir-demo-repo/labels.csv`,可以看到以下内容: | ||
|
||
``` | ||
# type_id, preserved, main type name, alias... | ||
``` | ||
|
||
在这个文件中,每一行代表一个类别标签,依次是标签 id(从 0 开始增长),留空,主标签名,一个或多个标签别名,例如,如果想要导入的数据集中含有 person, cat 和 tv 这几个标签,可以编辑此文件为: | ||
|
||
``` | ||
0,,person | ||
1,,cat | ||
2,,tv | ||
``` | ||
|
||
一个类别标签可以指定一个或多个别名,例如,如果指定 television 作为 tv 的别名,则 `labels.csv` 文件可更改为: | ||
|
||
``` | ||
0,,person | ||
1,,cat | ||
2,,tv,television | ||
``` | ||
|
||
可以使用vi,或其他的编辑工具对此文件进行编辑,用户可以添加类别的别名,也可以增加新的类别,但不建议更改或删除已经有的类别的主名和id。 | ||
|
||
`labels.csv` 文件可以通过建立软链接的方式,在多个 mir repo 之间共享。 | ||
|
||
用户需要事先准备三个数据集: | ||
|
||
1. 训练集 dataset-training,带标注,用于初始模型的训练; | ||
|
@@ -741,44 +771,15 @@ YMIR repo中的任何代码都应遵循编码标准,并将在CI测试中进行 | |
|
||
成功完成训练后,系统会输出模型的 id,用户可以根据这个 id 到 `--model-location` 位置找到对应的文件,它事实上是一个 tar 文件,可以直接使用 tar 命令解压,得到 params 和 json 格式的 mxnet 模型文件。 | ||
|
||
* 如何查看及修改类别名称标签文件? | ||
|
||
标签文件位于ymir-proto包的安装目录下的ymir/ids/type_id_names.csv,通过以下命令显示此文件中的内容: | ||
|
||
``` | ||
cat `pip show ymir-proto | grep 'Location: ' | cut -d ' ' -f2`/ymir/ids/type_id_names.csv | ||
``` | ||
|
||
可以看到里面的格式如下: | ||
|
||
``` | ||
0,frisbee | ||
1,car | ||
2,person | ||
3,surfboard | ||
4,cat | ||
5,bed | ||
6,clock | ||
7,pizza,pizza pie | ||
8,skateboard | ||
9,dining table,diningtable,board | ||
``` | ||
|
||
可以看到文件的每一行都由英文逗号分隔,第一项是类别id,后面是类别名称。 | ||
|
||
值得注意的是id为7的行,它有三个数据项:7, pizza, pizza pie。当出现一行有两个以上的数据项的时候,第一个数据项是类别id,第二个数据项是类别主名,后续的都是类别别名。 | ||
|
||
在ymir-cmd的filter命令中,用户输入的-c和-C参数都由此文件映射到类别id,用户可以输入类别主名或类别别名,它们的作用相同。 | ||
|
||
可以使用vim,或其他类似的编辑工具对此文件进行编辑,用户可以添加类别的别名,也可以增加新的类别,但不建议更改已经有的类别的主名和id。 | ||
|
||
## 7.2 License | ||
|
||
YMIR开源项目符合Apache 2.0证书许可。查看 [LICENSE](https://github.com/IndustryEssentials/ymir/blob/master/LICENSE) file for details. | ||
|
||
## 7.3 联系我们 | ||
|
||
当您有其他问题时,请联系我们:[email protected] | ||
当您有其他问题时,请联系我们: [email protected] | ||
|
||
或者加入我们的[Slack community](https://join.slack.com/t/ymir-users/shared_invite/zt-ywephyib-ccghwp8vrd58d3u6zwtG3Q),我们将会实时解答您的问题。 | ||
|
||
<!-- ALL-CONTRIBUTORS-BADGE:START - Do not remove or modify this section --> | ||
[![All Contributors](https://img.shields.io/badge/All%20Contributors-8-brightgreen)](#contributors-) | ||
|
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
59 changes: 59 additions & 0 deletions
59
...nd/src/ymir-app/alembic/versions/5645ff0023eb_update_task_fractional_seconds_for_last_.py
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,59 @@ | ||
"""update task: fractional seconds for last_message_datetime | ||
Revision ID: 5645ff0023eb | ||
Revises: f30d60e75b8a | ||
Create Date: 2022-02-10 20:06:04.527925 | ||
""" | ||
from alembic import context, op | ||
import sqlalchemy as sa | ||
from sqlalchemy.dialects.mysql import DATETIME | ||
|
||
|
||
# revision identifiers, used by Alembic. | ||
revision = '5645ff0023eb' | ||
down_revision = 'f30d60e75b8a' | ||
branch_labels = None | ||
depends_on = None | ||
|
||
|
||
def upgrade(): | ||
# ### commands auto generated by Alembic - please adjust! ### | ||
if context.get_x_argument(as_dictionary=True).get("sqlite", None): | ||
with op.batch_alter_table("task") as batch_op: | ||
batch_op.alter_column( | ||
"last_message_datetime", | ||
existing_type=sa.DateTime(), | ||
type_=DATETIME(fsp=6), | ||
existing_nullable=True, | ||
) | ||
else: | ||
op.alter_column( | ||
"task", | ||
"last_message_datetime", | ||
existing_type=sa.DateTime(), | ||
type_=DATETIME(fsp=6), | ||
existing_nullable=True, | ||
) | ||
# ### end Alembic commands ### | ||
|
||
|
||
def downgrade(): | ||
# ### commands auto generated by Alembic - please adjust! ### | ||
if context.get_x_argument(as_dictionary=True).get("sqlite", None): | ||
with op.batch_alter_table("task") as batch_op: | ||
batch_op.alter_column( | ||
"last_message_datetime", | ||
existing_type=DATETIME(fsp=6), | ||
type_=sa.DateTime(), | ||
existing_nullable=True, | ||
) | ||
else: | ||
op.alter_column( | ||
"task", | ||
"last_message_datetime", | ||
existing_type=DATETIME(fsp=6), | ||
type_=sa.DateTime(), | ||
existing_nullable=True, | ||
) | ||
# ### end Alembic commands ### |
34 changes: 34 additions & 0 deletions
34
.../backend/src/ymir-app/alembic/versions/f30d60e75b8a_update_dataset_and_model_add_state.py
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,34 @@ | ||
"""update dataset and model: add state | ||
Revision ID: f30d60e75b8a | ||
Revises: faf55734ec0d | ||
Create Date: 2022-02-10 10:57:28.067020 | ||
""" | ||
from alembic import op | ||
import sqlalchemy as sa | ||
|
||
|
||
# revision identifiers, used by Alembic. | ||
revision = 'f30d60e75b8a' | ||
down_revision = 'faf55734ec0d' | ||
branch_labels = None | ||
depends_on = None | ||
|
||
|
||
def upgrade(): | ||
# ### commands auto generated by Alembic - please adjust! ### | ||
op.add_column('dataset', sa.Column('state', sa.Integer(), nullable=True)) | ||
op.create_index(op.f('ix_dataset_state'), 'dataset', ['state'], unique=False) | ||
op.add_column('model', sa.Column('state', sa.Integer(), nullable=True)) | ||
op.create_index(op.f('ix_model_state'), 'model', ['state'], unique=False) | ||
# ### end Alembic commands ### | ||
|
||
|
||
def downgrade(): | ||
# ### commands auto generated by Alembic - please adjust! ### | ||
op.drop_index(op.f('ix_model_state'), table_name='model') | ||
op.drop_column('model', 'state') | ||
op.drop_index(op.f('ix_dataset_state'), table_name='dataset') | ||
op.drop_column('dataset', 'state') | ||
# ### end Alembic commands ### |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.