-
Notifications
You must be signed in to change notification settings - Fork 5.6k
2018 03 28
Tao Luo edited this page Dec 9, 2019
·
1 revision
- Fluid and MPI discussion: https://github.com/PaddlePaddle/Paddle/issues/9405#issuecomment-376769705
- Data augmentation discussion: https://github.com/PaddlePaddle/Paddle/issues/9413
- Complete Fluid
- ParallelExecutor
- Prototype complete
- https://github.com/PaddlePaddle/Paddle/pull/9080
Number of GPUs | 1 | 2 | 3 | 4 |
---|---|---|---|---|
Image/Sec | 18.639 | 27.8863 | 39.3787 | 52.9688 |
Speed Up | N/A | 1.4961264 | 2.11270454 | 2.84182628 |
- Helping debuging C++ readers.
- fluid dist train perf enhancements and bug fixes:
- https://github.com/PaddlePaddle/Paddle/issues/9454
- https://github.com/PaddlePaddle/Paddle/pull/9448
- https://github.com/PaddlePaddle/Paddle/pull/9425
- https://github.com/PaddlePaddle/Paddle/pull/9409
- https://github.com/PaddlePaddle/Paddle/pull/9377
- https://github.com/PaddlePaddle/Paddle/pull/9336
- MPI design discussions with @tangwei
- Reviews:
- inference:
- [Merge] add MKL for fluid static and shared library: https://github.com/PaddlePaddle/Paddle/pull/8887
- [Merge] remove vars when remove ops when merge batch norm: https://github.com/PaddlePaddle/Paddle/pull/9384
- survey TensorRT integration with @liuyiqun @yanchunwei
- change WITH_FLUID to WITH_FLUID_ONLY: https://github.com/PaddlePaddle/Paddle/pull/9427
- code review:
- MKLDNN:
- [Merge] Implementation of MKLDNN LRN: https://github.com/PaddlePaddle/Paddle/pull/9123, https://github.com/PaddlePaddle/Paddle/pull/9329
- [Merge] MKLDNN Relu Tanh Sqrt Abs activations added: https://github.com/PaddlePaddle/Paddle/pull/9081
- Implementation of MKLDNN FC: https://github.com/PaddlePaddle/Paddle/pull/9385
- Speed/sequence op1: https://github.com/PaddlePaddle/Paddle/pull/9217
- fix submit_local's paddle pip name issue: https://github.com/PaddlePaddle/Paddle/pull/9392
- doc: Build Sphinx tree for fluid directory: https://github.com/PaddlePaddle/Paddle/pull/9403
- MKLDNN:
- PR
- Parallel send gradienets and backward ops, https://github.com/PaddlePaddle/Paddle/pull/9382
- Fix dist errro with lr decay, https://github.com/PaddlePaddle/Paddle/pull/9489/files
- Fix dist compile error, https://github.com/PaddlePaddle/Paddle/pull/9320
- Review
- Discuss large model develop plan with @longfei
-
All Fluid API doc problems
- float16 support
- Validated the correctness of float16 mode on cifar10
- Enabling tensor core for float16 cudnn conv op: https://github.com/PaddlePaddle/Paddle/pull/9488
- Benchmarking float16 vs float32 on V100: https://github.com/kexinzhao/Paddle_benchmark/blob/master/float16_benchmark.md
-
PR
- make dependency fix: https://github.com/PaddlePaddle/Paddle/pull/9342
- grpc make function fix: https://github.com/PaddlePaddle/Paddle/pull/9415
- docker paddle cmd fix: https://github.com/PaddlePaddle/Paddle/pull/9392
-
Issues
-
Benchmark dist Model
-[RDMA/GPUDriect] https://github.com/PaddlePaddle/Paddle/issues/9405
- [design doc] https://github.com/PaddlePaddle/Paddle/pull/9490
-Benchmark image_classification
Doc:
-
Build Sphinx tree for fluid directory
-
Add contents for manully build documentation(Cn version)
PR
- Parallel Executor Baseline: https://github.com/PaddlePaddle/Paddle/pull/9035
- In place:
Issue:
- https://github.com/PaddlePaddle/Paddle/issues/9464
- https://github.com/PaddlePaddle/Paddle/issues/9416
- Average 'moving mean' and 'moving variance' of batch_normal op
- Refine OCR CTC model.
- Write document for OCR CTC model[WIP]
- Review:
- Make the first device share data with the global scope in parallel_do_op.
- Set stop_gradient=True for some variables in SSD API.
- Fix paralell training for MobileNet-SSD.
- [WIP] Profiling and optimize MobileNet-SSD.
-
Update and merge decoder for DeepASR model
-
Solve the problem of fetching prediction and make data dim configurable
-
[WIP] The draft of ONNX design doc
Code Review:
- https://github.com/PaddlePaddle/models/pull/788
- https://github.com/PaddlePaddle/models/pull/776
- https://github.com/PaddlePaddle/models/pull/769
- https://github.com/PaddlePaddle/Paddle/pull/9345
- Inference Framework
- Verify the correctness of resnet50
- Analysis the profiling data of Fluid and TensorRT
- Start the work of integrating TensorRT
- Mobile
- Support the MDL group
- NMT:
- Decouple the program desc with batch_size in Transformer.
- Refine the ReshapeOp enhancement.
- Transformer on NIST dataset related.
- PR
- Add CUDAPinnedPlace
- Add SE-ResNeXt-152_parallel_exe
- Add cos and sin
- Fix concat_op[merged]
- Add pinned memory[merged]
- Review
- Cpp parallel executor
- Fix the order of reads and write from buffered channel
- Fluid channels should match the semantics of Go Channels
- Improve layer_norm speed
-
Fluid support Abacus(discuss with @wuyi @yanxu @helin @wangyi @lidong)
-
Fluid implementation: TODO: https://github.com/PaddlePaddle/Paddle/issues/9211
- support empty tensor https://github.com/PaddlePaddle/Paddle/pull/9338
- add split ids op https://github.com/PaddlePaddle/Paddle/pull/9370
- fix compile send_op on mac https://github.com/PaddlePaddle/Paddle/pull/9360
- WIP prefetch_op
-
Others:
- Fix data transform when inplace https://github.com/PaddlePaddle/Paddle/pull/9450
- Paddle/python/paddle/fluid/tests/book/test_label_semantic_roles.py在给crf层添加正则项后CUDA下报错 https://github.com/PaddlePaddle/Paddle/issues/9234
- 使用crf层,在多线程GPU下,如果batch_size不为1会出错 https://github.com/PaddlePaddle/Paddle/issues/9261
- change boost download url to speed up download https://github.com/PaddlePaddle/Paddle/pull/9331
- Fix data transform when inplace https://github.com/PaddlePaddle/Paddle/pull/9450
- Profiling of C++ Reader:
instance/sec
Net Config | Simple Demo Net | VGG16 |
---|---|---|
V2 Reader | 819.11 | 57.49 |
V2 Reader with cache | - | 58.9 |
C++ Reader | 1629.88 | 61.44 |
C++ Reader with DoubleBuffer | 2382.13 | DOING |
-
Kernels for increment_op:
-
Reviews:
- [support empty tensor] https://github.com/PaddlePaddle/Paddle/pull/9338
- [SSD API Update] https://github.com/PaddlePaddle/Paddle/pull/9396
- [activation in place by default] https://github.com/PaddlePaddle/Paddle/pull/9417
- [Channel bug fix] https://github.com/PaddlePaddle/Paddle/pull/9423
- grpc throughout test:
- Add drop_out_op unit test
- SendOp can't capture sendop time:
- Improve LayerNorm speed by 3x-4x. transformer speed up 15%~20%
- Follow up on P40 machines and configuration
- Have enough machine to develop and evaluate performance
- Have same configuration as Paddle Cloud machines
- Have 1 machine for continuous model evaluation.
- Follow up on 5.1 Paddle Cloud goals
- Review ParallelExecutor and ParallelGPUExecutor and profile speed
- Bug fix for AsyncReader
https://github.com/PaddlePaddle/models/pull/776
https://github.com/PaddlePaddle/models/pull/769 - Debug NAN problem for Transformer
https://github.com/PaddlePaddle/models/issues/786
- Finishing CSP Project
- Fluid channels should match the semantics of Go Channels https://github.com/PaddlePaddle/Paddle/pull/9265
- Fix the order of reads and write from buffered channel https://github.com/PaddlePaddle/Paddle/pull/9423
- Disabling channel test to debug issue https://github.com/PaddlePaddle/Paddle/pull/9491
- Channel Unit Tests timeout on CI https://github.com/PaddlePaddle/Paddle/issues/9503
- https://github.com/PaddlePaddle/Paddle/pull/9463#pullrequestreview-107942417
- https://github.com/PaddlePaddle/Paddle/pull/9393#pullrequestreview-107132746
- Completing Fluid
- Discuss possible fluid imperative programming paradigms https://github.com/PaddlePaddle/Paddle/issues/9466
- Documentation and Translation PR Reviews
- https://github.com/PaddlePaddle/Paddle/pull/9427#pullrequestreview-107565595
- https://github.com/PaddlePaddle/Paddle/pull/9400#pullrequestreview-107142289
- https://github.com/PaddlePaddle/Paddle/pull/9381#pullrequestreview-107021784
- https://github.com/PaddlePaddle/Paddle/pull/9359#pullrequestreview-107022638
- https://github.com/PaddlePaddle/Paddle/pull/9356#pullrequestreview-107023057
- https://github.com/PaddlePaddle/Paddle/pull/9321#pullrequestreview-106577162
- https://github.com/PaddlePaddle/Paddle/pull/9296#pullrequestreview-107511008
-
modelce -> teamcity
-
visualdl PRs ...
- [Speed] ~1x acceleration sequence expand/grad op by merging cuda kernels.
- [Speed] ~8x acceleration in sequence pooling op(max, average, ..) by merging cuda kernels
- [Speed] sequence softmax op by merging cuda kernels
- [Benchmark] migrate the benchmark repo into paddle main repo
- [Benchmark] add scripts for model CI
- polish init code
- fix bug in parallel do
- fix bug in dropout
- multiple GPU executor implementation and testing with YangYang: https://github.com/PaddlePaddle/Paddle/pull/9035
- Discuss possible fluid imperative programming paradigms: https://github.com/PaddlePaddle/Paddle/issues/9466
- PR and issue reviews:
- https://github.com/PaddlePaddle/Paddle/pull/9331#pullrequestreview-106222963
- https://github.com/PaddlePaddle/Paddle/pull/9352#pullrequestreview-107006822
- https://github.com/PaddlePaddle/Paddle/issues/9348#issuecomment-376259111
- https://github.com/PaddlePaddle/Paddle/pull/9331#issuecomment-376613109
- https://github.com/PaddlePaddle/Paddle/pull/9382#pullrequestreview-107407493
- https://github.com/PaddlePaddle/edl/pull/14#pullrequestreview-107482878
PR:
- Create go_op design doc (https://github.com/PaddlePaddle/Paddle/pull/9389)
- Add in is_copy attribute to SelectCase (https://github.com/PaddlePaddle/Paddle/pull/9393)
- Add channel design document (https://github.com/PaddlePaddle/Paddle/pull/9463)
Discussions:
- Initial discussions about back propagation for CSP ops
- Discuss possible fluid imperative programming paradigms: https://github.com/PaddlePaddle/Paddle/issues/9466
PR:
- Create Text storage backend component: https://github.com/PaddlePaddle/VisualDL/pull/333
- Create Text frontend UI Vue component: https://github.com/PaddlePaddle/VisualDL/pull/337
- Connect Text backend and frontend component with real data: https://github.com/PaddlePaddle/VisualDL/pull/341
- Fix Travis CI script: https://github.com/PaddlePaddle/VisualDL/pull/336
- Fix time format issue and disappearing slider issue: https://github.com/PaddlePaddle/VisualDL/pull/343
Research and Demo
- Embedding Visualization: https://github.com/PaddlePaddle/VisualDL/issues/247#issuecomment-376629893, https://github.com/PaddlePaddle/VisualDL/issues/247#issuecomment-377047373
PR:
- Create Audio preview feature API: https://github.com/PaddlePaddle/VisualDL/pull/344
- Add Audio API Unit tests https://github.com/PaddlePaddle/VisualDL/pull/345
-
VisualDL:
- Switched from Cytoscape to D3+Dagre as the latter is most robust and can build more complex node
- Can distinguish different nodes (input, operator, output) and will add diff info for diff nodes
- Helped ECharts to market VisualDL: http://www.iqiyi.com/w_19rwr76q69.html
-
PaddlePaddle.org:
-
Code Review: