Skip to content

Latest commit

 

History

History
353 lines (259 loc) · 9.83 KB

File metadata and controls

353 lines (259 loc) · 9.83 KB

IPython Cookbook, Second Edition This is one of the 100+ free recipes of the IPython Cookbook, Second Edition, by Cyrille Rossant, a guide to numerical computing and data science in the Jupyter Notebook. The ebook and printed book are available for purchase at Packt Publishing.

Text on GitHub with a CC-BY-NC-ND license
Code on GitHub with a MIT license

Chapter 2 : Best practices in Interactive Computing

2.2. Using the latest features of Python 3

The latest version of the Python 2.x branch, Python 2.7, was released in 2010. It will reach its end of life in 2020. On the other hand, the first version of the Python 3.x branch, Python 3.0, was released in 2008. The decade-long transition period between Python 2 and Python 3, which are slightly incompatible, has been somewhat chaotic.

Choosing between Python 2 (also known as Legacy Python) and Python 3 used to be tricky since many Python users had not transitioned to Python 3 yet, and many libraries were only compatible with Python 2. These times are gone and it is now safe to stick with Python 3 in virtually all cases. The only exceptions are when you have to support old unmaintained libraries, or when your users cannot transition to Python 3 for whatever reason.

In addition to fixing bugs and annoyances of Python 2 (for example related to Unicode support), Python 3 brings many interesting features in terms of syntax, capabilities of the language, and new built-in libraries.

We use the latest stable version of Python in this recipe, which is Python 3.6.

How to do it...

  1. In Python 3, print() is a function whereas it was a statement in Python 2 (it was a historical oddity). This function may accept multiple arguments as well as a few options. Let's create a list:
my_list = list(range(10))

We can print the my_list object:

print(my_list)
[0, 1, 2, 3, 4, 5, 6, 7, 8, 9]

But we can also print the ten numbers in the list:

print(*my_list)
0 1 2 3 4 5 6 7 8 9

Finally we can customize the separator and the end of the string to print:

print(*my_list, sep=" + ", end=" = %d" % sum(my_list))
0 + 1 + 2 + 3 + 4 + 5 + 6 + 7 + 8 + 9 = 45
  1. Python 3 supports more advanced variable unpacking features:
first, second, *rest, last = my_list
print(first, second, last)
0 1 9
rest
[2, 3, 4, 5, 6, 7, 8]
  1. In Python 3, variable names can contain Unicode characters. This technique may be useful when writing mathematical code. To type mathematical symbols in the Jupyter Notebook, write LaTeX code and press Tab. For example, to create a variable $\pi$, type \pi and then Tab:
from math import pi, cos
α = 2
π = pi
cos(α * π)
1.000
  1. Python 3.6 brings new string literals called f-strings. They offer a convenient syntax to define strings depending on existing variables:
a, b = 1, 2
f"The sum of {a} and {b} is {a + b}"
'The sum of 1 and 2 is 3'
  1. We can add custom annotations to function arguments and output. These function annotations convey no semantics, but they can be used in the code as we like. Here is an example coming from https://stackoverflow.com/a/7811344/1595060 :
def kinetic_energy(mass: 'kg',
                   velocity: 'm/s') -> 'J':
    """The annotations serve here as documentation."""
    return .5 * mass * velocity ** 2

These annotations are stored in the __annotations__ attribute of the function, and they can be used as follows:

annotations = kinetic_energy.__annotations__
print(*(f"{key} is in {value}"
        for key, value in annotations.items()),
      sep=", ")
mass is in kg, velocity is in m/s, return is in J

The typing module, which has been included in Python 3.5 on a provisional basis, implements several annotations that can be used to specify typing information in functions.

  1. Python 3.5 brings a new operator @ for matrix multiplication. It is supported by NumPy 1.10 and later:
import numpy as np
M = np.array([[0, 1], [1, 0]])

The * operator is the element-wise multiplication:

M * M
array([[0, 1],
       [1, 0]])

Previously, matrix multiplication could be performed with np.dot(). The new syntax is clearer:

M @ M
array([[1, 0],
       [0, 1]])
  1. Python 3.3 brings the new yield from syntax that allows you, among other things, to compose multiple generators. For example, the two following functions are equivalent:
def gen1():
    for i in range(5):
        for j in range(i):
            yield j
def gen2():
    for i in range(5):
        yield from range(i)
list(gen1())
[0, 0, 1, 0, 1, 2, 0, 1, 2, 3]
list(gen2())
[0, 0, 1, 0, 1, 2, 0, 1, 2, 3]
  1. The functools library provides a @lru_cache decorator to implement a simple in-memory caching system for Python functions:
import time

def f1(x):
    time.sleep(1)
    return x
%timeit -n1 -r1 f1(0)
1 s ± 0 ns per loop (mean ± std. dev. of 1 run,
    1 loop each)
%timeit -n1 -r1 f1(0)
1 s ± 0 ns per loop (mean ± std. dev. of 1 run,
    1 loop each)

Here, the two successive identical calls to f1(0) take one second. Now, let's define a cached version of this function:

from functools import lru_cache

@lru_cache(maxsize=32)  # keep the latest 32 calls
def f2(x):
    time.sleep(1)
    return x
%timeit -n1 -r1 f2(0)
1 s ± 0 ns per loop (mean ± std. dev. of 1 run,
    1 loop each)
%timeit -n1 -r1 f2(0)
6.14 µs ± 0 ns per loop (mean ± std. dev. of 1 run,
    1 loop each)

The first call takes one second, whereas the next one returns immediately. In the second case, the function is not called but the output corresponding to the argument of 0 was cached and returned.

  1. The new pathlib module offers filesystem facilities that are more convenient to use than the Python 2 os.path methods. The main class is Path:
from pathlib import Path

We instantiate a Path object representing the current directory:

p = Path('.')

Let's list all Markdown files in the directory:

sorted(p.glob('*.md'))
[PosixPath('00_intro.md'),
 PosixPath('01_py3.md'),
 PosixPath('02_workflows.md'),
 PosixPath('03_git.md'),
 PosixPath('04_git_advanced.md'),
 PosixPath('05_tips.md'),
 PosixPath('06_high_quality.md'),
 PosixPath('07_test.md'),
 PosixPath('08_debugging.md')]

We can easily get the contents of a text file:

_[0].read_text()
'# Introduction\n\n...\n'

Let's obtain the list of subdirectories:

[d for d in p.iterdir() if d.is_dir()]
[PosixPath('images'),
 PosixPath('.ipynb_checkpoints'),
 PosixPath('__pycache__'),

Finally, we list all files in the images subfolder (note the slash / operator on Path instances):

list((p / 'images').iterdir())
[PosixPath('images/github_new.png'),
 PosixPath('images/folder.png')]
  1. Python 3.4 provides a new statistics module which implements basic statistical routines. This module may be useful when depending on NumPy or SciPy is not desirable. Let's import the module:
import random as r
import statistics as st

We create a list of normally-distributed random variables:

my_list = [r.normalvariate(0, 1)
           for _ in range(100000)]

We compute the mean, median, and standard deviation:

print(st.mean(my_list),
      st.median(my_list),
      st.stdev(my_list),
      )
0.00073 -0.00052 1.00050

There's more...

Other interesting features of Python 3 include coroutines with the asyncio module and asynchronous operations with the new await and async keywords.

Here are a few references: