-
Notifications
You must be signed in to change notification settings - Fork 0
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
- Loading branch information
Showing
1 changed file
with
100 additions
and
0 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,100 @@ | ||
--- | ||
title: "joblib: 简单易懂的平行世界" | ||
author: | ||
name: Cloudac7 | ||
date: 2023-06-15 | ||
categories: | ||
- Tricks | ||
--- | ||
|
||
Joblib是一款轻量级的Python工作流实现,可以简单快速地把相互独立的循环并行起来,从而节省时间。 | ||
|
||
## 安装 | ||
|
||
```bash | ||
pip install joblib | ||
``` | ||
|
||
## 使用方法 | ||
|
||
以打印出0-9的平方根为例,如果采用直接循环方法,我们可以很简单地写出如下代码。这里用 `sleep` 来模拟任务本身的延迟: | ||
|
||
```bash | ||
from math import sqrt | ||
from time import sleep | ||
from datetime import datetime | ||
|
||
for i in range(10): | ||
print(datetime.now(), sqrt(i)) | ||
sleep(1) | ||
``` | ||
输出如下,可见是每秒吐出一个值: | ||
``` | ||
2023-06-15 11:02:01.200605 0.0 | ||
2023-06-15 11:02:02.203232 1.0 | ||
2023-06-15 11:02:03.207490 1.4142135623730951 | ||
2023-06-15 11:02:04.211349 1.7320508075688772 | ||
2023-06-15 11:02:05.211646 2.0 | ||
2023-06-15 11:02:06.215645 2.23606797749979 | ||
2023-06-15 11:02:07.218109 2.449489742783178 | ||
2023-06-15 11:02:08.220789 2.6457513110645907 | ||
2023-06-15 11:02:09.225884 2.8284271247461903 | ||
2023-06-15 11:02:10.229875 3.0 | ||
``` | ||
若利用 `joblib.Parallel`,将循环体写成函数,并把结果作为返回值,则可改写如下: | ||
```python | ||
from math import sqrt | ||
from time import sleep | ||
from datetime import datetime | ||
from joblib import Parallel, delayed | ||
|
||
def func(i): | ||
result = sqrt(i) | ||
print(datetime.now(), result) | ||
sleep(1) | ||
return result | ||
|
||
Parallel(n_jobs=2)(delayed(func)(i) for i in range(10)) | ||
``` | ||
可以看到输出如下,由于 `n_jobs` 设置为2,故每次有2个worker同时运行,每秒得到两个输出: | ||
``` | ||
2023-06-15 11:07:45.843004 0.0 | ||
2023-06-15 11:07:45.873286 1.0 | ||
2023-06-15 11:07:46.864319 1.4142135623730951 | ||
2023-06-15 11:07:46.891164 1.7320508075688772 | ||
2023-06-15 11:07:47.871781 2.0 | ||
2023-06-15 11:07:47.896673 2.23606797749979 | ||
2023-06-15 11:07:48.875107 2.449489742783178 | ||
2023-06-15 11:07:48.902218 2.6457513110645907 | ||
2023-06-15 11:07:49.880168 2.8284271247461903 | ||
2023-06-15 11:07:49.907760 3.0 | ||
``` | ||
且最终返回值以一个列表形式列出: | ||
```python | ||
[0.0, | ||
1.0, | ||
1.4142135623730951, | ||
1.7320508075688772, | ||
2.0, | ||
2.23606797749979, | ||
2.449489742783178, | ||
2.6457513110645907, | ||
2.8284271247461903, | ||
3.0] | ||
``` | ||
可见上述代码的效果等价于如下的列表生成式: | ||
```python | ||
[func(i) for i in range(10)] | ||
``` | ||
用这种方法,可以简单快捷地把彼此相互独立的循环体拆解成多个平行任务。当然 `n_jobs` 的设置请遵循自己使用环境的情况,避免影响到其他人的使用。 |