Skip to content

Commit

Permalink
Multithreading in Python (#171)
Browse files Browse the repository at this point in the history
* Added first draft on concurrency.

* Added section on multiprocessing module.

* Added new material on performance.

* Adding new material on threads.

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Added exercise.

* Moved document.

* Moved again.

* Added section on threads, added skeleton for tests.

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Added an example for threading.

* Fixed test skeleton.

* Trying to run tests for async code.

* Added missing material.

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Added matherial on async.

* Added material on Locks and Queues.

* Removed unused cells.

* Removed unused cells.

* Improved tests of solutions.

* Improved tests.

* fix notebook json syntax error at exercise 1

* Update threads.ipynb

Co-authored-by: Despina Adamopoulou <[email protected]>

* Update threads.ipynb

Co-authored-by: Despina Adamopoulou <[email protected]>

* Update threads.ipynb

Co-authored-by: Despina Adamopoulou <[email protected]>

* Moved loading of testsuite.

* Added TOC.

* Added quiz.

* Update threads.ipynb

Co-authored-by: Despina Adamopoulou <[email protected]>

* Update tutorial/tests/test_threads.py

Co-authored-by: Despina Adamopoulou <[email protected]>

* Update threads.ipynb

Co-authored-by: Despina Adamopoulou <[email protected]>

* Update threads.ipynb

Co-authored-by: Despina Adamopoulou <[email protected]>

* Update threads.ipynb

Co-authored-by: Despina Adamopoulou <[email protected]>

* Update threads.ipynb

Co-authored-by: Despina Adamopoulou <[email protected]>

* Update threads.ipynb

Co-authored-by: Despina Adamopoulou <[email protected]>

* Update threads.ipynb

Co-authored-by: Despina Adamopoulou <[email protected]>

* Update threads.ipynb

Co-authored-by: Despina Adamopoulou <[email protected]>

* Update threads.ipynb

Co-authored-by: Despina Adamopoulou <[email protected]>

* Update threads.ipynb

Co-authored-by: Despina Adamopoulou <[email protected]>

* Update threads.ipynb

Co-authored-by: Despina Adamopoulou <[email protected]>

* Update threads.ipynb

Co-authored-by: Despina Adamopoulou <[email protected]>

* Update threads.ipynb

Co-authored-by: Despina Adamopoulou <[email protected]>

* Update threads.ipynb

Co-authored-by: Despina Adamopoulou <[email protected]>

* Update threads.ipynb

Co-authored-by: Despina Adamopoulou <[email protected]>

* Update threads.ipynb

Co-authored-by: Despina Adamopoulou <[email protected]>

* Update threads.ipynb

Co-authored-by: Despina Adamopoulou <[email protected]>

* Update threads.ipynb

Co-authored-by: Despina Adamopoulou <[email protected]>

* Update threads.ipynb

Co-authored-by: Despina Adamopoulou <[email protected]>

* Fixes to the text.

* Improvements in cell output.

* Fixed test arguments.

* Improved woring of exercise.

* Improved test comments and methods.

* Update tutorial/threads.py

Co-authored-by: Despina Adamopoulou <[email protected]>

* Add threads notebook to the index.

* Cosmetic changes.

* Apply suggestions from code review

Applied sugestions on text

Co-authored-by: Edoardo Baldi <[email protected]>

* Make `threading` notebook examples work. (#175)

* Make notebook work.

* Added warning on different modules.

---------

Co-authored-by: Simone Baffelli <[email protected]>

* Added missing example on threading. (#176)

---------

Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: despadam <[email protected]>
Co-authored-by: Despina Adamopoulou <[email protected]>
Co-authored-by: Aliaksandr Yakutovich <[email protected]>
Co-authored-by: Edoardo Baldi <[email protected]>
  • Loading branch information
6 people authored Dec 13, 2023
1 parent b1cea87 commit cafd088
Show file tree
Hide file tree
Showing 7 changed files with 1,343 additions and 8 deletions.
1 change: 1 addition & 0 deletions binder/environment.yml
Original file line number Diff line number Diff line change
Expand Up @@ -17,3 +17,4 @@ dependencies:
- markdown
- pre-commit
- attrs
- multiprocess
Binary file added images/concurrency_vs_parallelism.jpg
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added images/process_performance.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
10 changes: 2 additions & 8 deletions index.ipynb
Original file line number Diff line number Diff line change
Expand Up @@ -19,15 +19,9 @@
"\n",
"- [Manage Python project](./manage_python_project.ipynb)\n",
"- [Advanced functions](./functions_advanced.ipynb)\n",
"- [Advanced Object-oriented programming](./object_oriented_programming_advanced.ipynb)\n"
"- [Advanced Object-oriented programming](./object_oriented_programming_advanced.ipynb)\n",
"- [Parallelism and concurrency in Python](./threads.ipynb)\n"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": []
}
],
"metadata": {
Expand Down
1,123 changes: 1,123 additions & 0 deletions threads.ipynb

Large diffs are not rendered by default.

155 changes: 155 additions & 0 deletions tutorial/tests/test_threads.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,155 @@
import asyncio
import functools
import pathlib
import random
import string
from collections import Counter
from concurrent.futures import ProcessPoolExecutor
from typing import Awaitable, Callable, Dict

import multiprocess
import pytest


class SecretServer:
def __init__(self, key: str, timeout: int = 0.01):
self.key = key
self.inner_key = "/" + key
self.timeout = timeout
self.sequence = 0
self.reset_flag = False
# Count how many concurrent requests are being made
self.resetter: asyncio.Task = None

async def start(self):
self.resetter = asyncio.create_task(self.reset_sequence())

async def reset_sequence(self):
while True:
await asyncio.sleep(self.timeout)
self.reset_flag = True

async def get_value(self):
# Increase the concurrency counter
if self.reset_flag:
self.sequence = 0
self.reset_flag = False
return "/"
await asyncio.sleep(self.timeout / len(self.inner_key) * 1.5)
seq = self.sequence
# Increase the sequence counter
self.sequence = (self.sequence + 1) % len(self.inner_key)
return self.inner_key[seq]

async def check_key(self, key: str):
return key == self.key


@pytest.fixture(scope="session")
def make_random_file(tmp_path_factory: pytest.TempPathFactory) -> str:
def inner_file(size: int = 1000):
file = tmp_path_factory.mktemp("data").joinpath("file.txt")
with open(file, "w") as f:
f.write("".join(random.choices(string.ascii_letters, k=size)))
return file

return inner_file


def read_segment(file: pathlib.Path, start: int, end: int) -> str:
with open(file) as f:
f.seek(start)
return f.read(end - start)


def segment_stat(segment: str) -> Dict[str, int]:
return Counter(segment.strip())


def count_words(
file: pathlib.Path, size: int, n_processes: int, index: int
) -> Dict[str, int]:
segment_size = size // n_processes
start = index * segment_size
end = start + segment_size
return segment_stat(read_segment(file, start, end))


def reference_exercise1(input_path: pathlib.Path, size: int) -> Dict[str, int]:
workers = multiprocess.cpu_count()
with ProcessPoolExecutor(workers) as executor:
result = executor.map(
functools.partial(count_words, input_path, size, workers), range(workers)
)
return dict(functools.reduce(lambda x, y: x + y, result, Counter()))


@pytest.mark.parametrize("size", [1000, 10000, 100000])
def test_exercise1_total_counts(
function_to_test: Callable,
make_random_file: Callable[[None], pathlib.Path],
size: int,
):
rf = make_random_file(size)
reference_res = reference_exercise1(rf, size)
total_letters = sum(reference_res.values())
user_res = function_to_test(rf, size)
total_letters_user = sum(user_res.values())
assert total_letters == total_letters_user


@pytest.mark.parametrize("size", [1000, 10000, 100000])
def test_exercise1_counts(
function_to_test: Callable,
make_random_file: Callable[[None], pathlib.Path],
size: int,
):
rf = make_random_file(size)
reference_res = reference_exercise1(rf, size)
user_res = function_to_test(rf, size)
assert user_res == reference_res


# #TODO: find a way to test that the user is using multiprocessing (directly or indirectly)
# def test_exercise1_processes(function_to_test: Callable, make_random_file: Callable[[None], pathlib.Path], monkeypatch: pytest.MonkeyPatch):
# with patch.object(multiprocessing.Process, "start") as process_mock:
# size = 1000
# rf = make_random_file(size)
# user_res = function_to_test(rf, size)
# assert process_mock.mock_calls or


def find_word(letters: list[str], separator: str) -> bool:
"""
This function finds a word in a list of letters separated by a separator.
"""
return [w for w in "".join(letters).split(separator) if len(w) > 0]


async def reference_exercise2(server: SecretServer) -> str:
rng = 50
# Concurrently get 30 letters from the server
letters = await asyncio.gather(*[server.get_value() for _ in range(rng)])

# Function to concurrently check if the key is valid
async def check_key(key: str):
valid = await server.check_key(key)
return valid, key

res = await asyncio.gather(*[check_key(key) for key in find_word(letters, "/")])
# Return the first valid key
return [key for valid, key in res if valid][0]


@pytest.mark.parametrize("secret_key", ["Secret", "Very secret", "Extremely secret"])
def test_exercise2(function_to_test: Callable[[None], Awaitable[str]], secret_key: str):
server = SecretServer(secret_key, timeout=1)

async def run_test() -> str:
await server.start()
res = await function_to_test(server)
return res

res = asyncio.run(run_test())
print(res, secret_key)
assert secret_key == res
62 changes: 62 additions & 0 deletions tutorial/threads.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,62 @@
import os
from concurrent.futures import ProcessPoolExecutor
from time import sleep

from .common import Question, Quiz


class Threads(Quiz):
def __init__(self, title="Decide if the following are parallel or not"):
q1 = Question(
question="One cashier serves two lines of people in a store",
options={
"Parallel": "What if the cashier is slow?",
"Not parallel": "Correct, there's only one cashier",
},
correct_answer="Not parallel",
shuffle=True,
)

q2 = Question(
question="A swimming pool offers multiple shower stalls",
options={
"Parallel": "Correct!",
"Not parallel": "We have more than one shower",
},
correct_answer="Parallel",
shuffle=True,
)

q3 = Question(
question="Multiple people take turns drinking from a cup",
options={
"Parallel": "Why are they sharing a cup?",
"Not parallel": "Correct!",
},
correct_answer="Not parallel",
shuffle=True,
)

super().__init__(questions=[q1, q2, q3])


def work(n: int, show: bool = False) -> int:
"""This function waits a small time and returns the number"""
pid = os.getpid()
if show:
print(f"{pid} Working on {n}\n")
sleep(0.001)
return n


def parallel_work(executor: ProcessPoolExecutor, n: int, batch_size=5) -> int:
"""Wrapper function to run the `work` function in parallel and compute the sum of their results"""
res = executor.map(work, range(n), chunksize=batch_size)
return sum(res)


def sequential_work(n: int) -> int:
"""
This function computes the sum of the results of the `work` function sequentially
"""
return sum([work(i) for i in range(n)])

0 comments on commit cafd088

Please sign in to comment.