Inference 2018 5

Jump to bottom Edit New page

Tao Luo edited this page Dec 9, 2019 · 1 revision

6-13 Need discussion this week

MKL dynamic link @tangjian下周会开始看
parallelDo 的性能差于v2，需要磨平吗
需要push NLP上线了吗

6-13 周报

@tangjian

性能

现在v2的性能可以跟以前磨平了，但是fluid没有跟之前磨平。

NLP

anakin开始复现我们的数据，线上用的v3,v4,5117等cpu，我们测的是V2的cpu。
[WIP] 序列标注开源任务中，遇到的是预测速度比较慢的问题，正在看，接口@焦振宇
[Merged] add initial memory flag in MB for infer
[Merged] Infer multi-threads API Demo and UT
[WIP] fix unknown use_mkldnn flag
[WIP] scope thread safe

MKLDNN

5-31 Need discussion this week

ResNet50在6148上的训练和预测性能，与Intel Team对齐（QA）@chengsi 先拿一个短期的机器
inference示例加入repo并能在ci上验证 https://github.com/PaddlePaddle/Paddle/issues/10990#issuecomment-393034634
将contrib/inference_api打包并部署
6148 机器
QA 人力不够
Train CPU with multi-thread @luotao 参考parallel do
~~TODO Inference with multi-thread @tangjian~~ done

5-31 会议纪要

6148机器 ready
MKLDNN 7.5 的 milestone 可以设成与 V2 对齐
CPU 训练多线程尝试下 ParallelDo
Inference 7.5 目标
- 高层 API 及文档
  - 原生实现
  - Anakin 集成
  - 子图初步集成 TensorRT
- CPU 两个模型达到上线标准
- 子图初步完成
  - 目标为 MLP ，争取 resnet
- 图像 4个 inference demo

5-31 周报

Inference engine && op 整体优化

整体进展

高层API 60%
- DONE 确定高层接口
- DONE 原生实现
- DOING 阿拉金
子图 30%
- DOING framework
- DOING TRT 支持
MKLDNN 30%
CPU核心模型优化 70%
- DOING OCR CPU
- DOING 情感分类 CPU
文档 && CI 40%
- DONE 旧接口文档初步
- TODO 新接口文档

本周进展

MKLDNN

[Merged] Blas optimized elementwise_add forward and backward passes (10% speedup on elemtwise_add op of OCR CRNN_CTC model): https://github.com/PaddlePaddle/Paddle/pull/10913 @intel-team
[Merged] Top K algorithm parallel version: https://github.com/PaddlePaddle/Paddle/pull/10941 @intel-team
[Doing] ResNet50 benchmark on fluid (MKLML version first) @intel-team
[Review] Withdraw MKLDNN Mul operator https://github.com/PaddlePaddle/Paddle/pull/10703 @intel-team
[Merged] speedup vInvSqrt vLogqp vTanh with mklml in V2：https://github.com/PaddlePaddle/Paddle/pull/10934 @tangjian
[Merged] fix inference_lib_dist deps：https://github.com/PaddlePaddle/Paddle/pull/10988 @tangjian

高层API

@panxin

merged, paddle inference interface implementation
merged, add tests and polish infer impl
图像，PS正在接入

@chunwei

doing, 新版高层接口接入阿拉金（阿拉金外网无 library，暂时手动拷贝验证）
doing, 新版高层接口fix + 文档
open, fc converter
merged, tensorrt engine op
merged, mul converter

CPU核心模型优化

OCR: 使用mklml动态库会导致其他任务coredump @luotao @visualization-team
- [已验证生效] 因为在使用MKLDNN和MKLML的时候会需要iomp.so，所以线上也必须要链接iomp。但是线上有很多服务已经使用了gomp，我们需要推荐用户直接使用iomp，链接方式是：target_link_libraries(${TARGET_NAME} "-L${MKLML_LIB_DIR} -liomp5 -Wl,--as-needed") @tangjian @yangjingyuan-OCR
- [正在进行中] 依然存在mklml动态库与其他任务的mkl静态库冲突的问题。尝试使用MKL大包中的-lmkl_core -lmkl_intel_lp64 -lmkl_sequential进行编译，尝试v1版本的情况。@luotao @yangjingyuan-OCR @lixuan-OCR
- [正在沟通中] 向Intel MKL组的 [email protected] 寻求单线程和多线程版的静态MKLML库
情感分类@tensor
- 正在出性能报告

文档 && CI版本

TeamCity上打印Git Commid Id @yanxu @luotao https://github.com/PaddlePaddle/Paddle/pull/10991

下周计划

高层API 正式发布（接口稳定，文档基本ready) @chunwei @panxin
子图人工配置整体跑通 @chunwei
CPU核心模型出性能报告以及解决线上bug（联合 Intel团队）@luotao @tensor

release-note

下周提供高层API的release note @chunwei

5-23 待讨论

how to sync the status, using Omniplan one by one?
MKLDNN bi-weekly time
when @guochaorong or someone others could take over the CI deployment?
Is the name of make inference_lib_dist suitable? since train in C++ end also use this commend: Add cpp trainer lib and demo
how about the application status of 6148 CPU?
issue:
- what save_inference_model should do: https://github.com/PaddlePaddle/Paddle/issues/10803
- op clip in save_inference_model https://github.com/PaddlePaddle/Paddle/issues/10811
- global mkldnn flag: https://github.com/PaddlePaddle/Paddle/issues/10765

5-23 周报

MKLDNN 进展

确认和波兰团队的双周会议，暂定下周三下午一点为第一次会议。
MKLDNN的时间规划，波兰团队邮件中说这周末给出（他们经理Marcin在外面参加会议）。
- we are merging internally the latest code, supporting MKL-DNN data layouts, elementwise_add and SUM to check, how are we doing with Resnet training/inference and to identify next bottlenecks.
- CRNN-CTC model shall also benefit from those upgrades. From OPs point of view, we will look at top_k layer and optimize algorithm by parallelizing it (as it is a sorting problem, MKL-DNN isn’t a good place to implement it)
Merged add mkldnn to paddle lib @qiaolongfei
Merged Reuse of pooling mkldnn primitives @intel team
Merged Update activations for MKL-DNN @intel team
Merged enable MKLDNN inference test @tangjian

CPU核心模型优化

子图Engine进展

framework
- Merged inference/analysis/data flow graph @yanchunwei
- PR refine/data flow graph @yanchunwei
subgraph test
- PR feature/mul converter @yanchunwei

高层接口封装

Merged Move contrib to paddle/ @yanchunwei
PR feature/inference api demo impl @yanchunwei

版本发布 & 文档

CI 编译

Merged add version and cmakecache in inference_lib @luotao
Merged change CMAKE_INSTALL_PREFIX in inference_lib_dist to FLUID_INSTALL_DIR @luotao
auto build and deploy fluid.tgz on TeamCity(cuda8.0_cudnn5_avx_mkl) @luotao @yanxu

存在问题：

包有多余路径：paddle/build/fluid_install_dir
GIT COMMIT ID在CI中没有打出来
其他版本的部署

文档

Merged Add Inference doc for fluid @weixing 官网展示

Bug fix

5-23 会议纪要

MKLDNN 如何与Intel团队沟通产出/承诺有疑惑，需要@wangyi确定
save_inference_model 需要接受targets 留空，默认输出所有逻辑（除了backward)

5-16 会议纪要

计划讨论

Refactor inference documentation and deploy on the official website
tracking the status of different branches
tracking the status of deployment seven models
the plan of MKLDNN, more clear milestones
refine the GitHub/projects
determine the date of the internal weekly meeting
have a GitHub wiki for tracking the common questions and bugs
Release the Inference Lib with a document containing version details such as the following information so that users can reproduce it
- commit id
- the compilation commands and flags
- a performance report (QA provides)
Higher API to hide the underlying details (including concepts, third-party symbols)

文档方面

建立文档机制，初期可以让 shanyi 帮把现有文档迁移到 paddle/doc，同时开始跑通部署到 official site 的流程

周会机制

每周三晚上 6:30~7:30，Inference内部讨论，增加远程协同的信息带宽

周会目的是，回顾过去一周的工作，讨论共享信息，以及确定下周大概的工作目标和方向

其中，会有如下流程

将需要讨论的内容汇总到 Need Discussion
讨论完，将会议纪要总结到 Weekly Status

面向 7 个上线模型的优化

明确上线的目标性能
确定模型优先级
每个模型，有沟通好完整的测试性能的方法，对应的测试数据等，自己可以复现业务线的指标
分析瓶颈，逐步优化

Inference 框架

优化框架，Kernel 的重复创建可以cache下，这点在 cudnn和 mkldnn上比较明显，具体收益不太确定
高层API，与图像沟通，他们会按照我们现在的高层API，先将他们的接口实现贡献过来，上层用很薄一层封装；未来我们的版本发布需要对性能问题负责

CPU 方向

MKLDNN需要用单线程测试结果
MKLML 对比 Openblas 有明显性能增益，可以考虑先替代 openblas 拿到 CPU的第一步收益
对于MKLDNN还未支持，但又是特定模型瓶颈的 OP，可以自己使用类似MKLDNN的方式优化（这点可能sys也是这么做的）

GPU 方向

尽早完成子图测试验证