Skip to content

Commit

Permalink
update: joblib easy usage
Browse files Browse the repository at this point in the history
  • Loading branch information
Cloudac7 committed Jun 15, 2023
1 parent 420fe26 commit 54f76fa
Showing 1 changed file with 100 additions and 0 deletions.
100 changes: 100 additions & 0 deletions content/post/tricks/joblib-easy-usage/index.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,100 @@
---
title: "joblib: 简单易懂的平行世界"
author:
name: Cloudac7
date: 2023-06-15
categories:
- Tricks
---

Joblib是一款轻量级的Python工作流实现,可以简单快速地把相互独立的循环并行起来,从而节省时间。

## 安装

```bash
pip install joblib
```

## 使用方法

以打印出0-9的平方根为例,如果采用直接循环方法,我们可以很简单地写出如下代码。这里用 `sleep` 来模拟任务本身的延迟:

```bash
from math import sqrt
from time import sleep
from datetime import datetime

for i in range(10):
print(datetime.now(), sqrt(i))
sleep(1)
```
输出如下,可见是每秒吐出一个值:
```
2023-06-15 11:02:01.200605 0.0
2023-06-15 11:02:02.203232 1.0
2023-06-15 11:02:03.207490 1.4142135623730951
2023-06-15 11:02:04.211349 1.7320508075688772
2023-06-15 11:02:05.211646 2.0
2023-06-15 11:02:06.215645 2.23606797749979
2023-06-15 11:02:07.218109 2.449489742783178
2023-06-15 11:02:08.220789 2.6457513110645907
2023-06-15 11:02:09.225884 2.8284271247461903
2023-06-15 11:02:10.229875 3.0
```
若利用 `joblib.Parallel`,将循环体写成函数,并把结果作为返回值,则可改写如下:
```python
from math import sqrt
from time import sleep
from datetime import datetime
from joblib import Parallel, delayed

def func(i):
result = sqrt(i)
print(datetime.now(), result)
sleep(1)
return result

Parallel(n_jobs=2)(delayed(func)(i) for i in range(10))
```
可以看到输出如下,由于 `n_jobs` 设置为2,故每次有2个worker同时运行,每秒得到两个输出:
```
2023-06-15 11:07:45.843004 0.0
2023-06-15 11:07:45.873286 1.0
2023-06-15 11:07:46.864319 1.4142135623730951
2023-06-15 11:07:46.891164 1.7320508075688772
2023-06-15 11:07:47.871781 2.0
2023-06-15 11:07:47.896673 2.23606797749979
2023-06-15 11:07:48.875107 2.449489742783178
2023-06-15 11:07:48.902218 2.6457513110645907
2023-06-15 11:07:49.880168 2.8284271247461903
2023-06-15 11:07:49.907760 3.0
```
且最终返回值以一个列表形式列出:
```python
[0.0,
1.0,
1.4142135623730951,
1.7320508075688772,
2.0,
2.23606797749979,
2.449489742783178,
2.6457513110645907,
2.8284271247461903,
3.0]
```
可见上述代码的效果等价于如下的列表生成式:
```python
[func(i) for i in range(10)]
```
用这种方法,可以简单快捷地把彼此相互独立的循环体拆解成多个平行任务。当然 `n_jobs` 的设置请遵循自己使用环境的情况,避免影响到其他人的使用。

0 comments on commit 54f76fa

Please sign in to comment.