This article is the third one of the series about High Computation with Python. For anyone that missed the first and the second parts check this link and this one. The goal is to present approaches to make CPU-demanding tasks in Python run much faster.
The techniques that are being covered:
Python Profiling - How to find bottlenecks Cython - Annotate your code and compile to C
- Numpy Vectors - Fast vector operations using numpy arrays
- Numpy integration with Cython - fast numerical Python library wrapped by Cython
- PyPy - Python's new Just in Time Compiler
- You define an array with numpy.array statement, in our case a list of tuples indexed by the labels keys and ranks. (lines 29 and 30).
- Lots of operations already implemented in numpy, such as numpy.in1d which finds where the elements in the first vector are in the second vector returning an array os bools.
- We have numpy.sort which sort the elements based on a key, in this example (ranks) (lines 16 and 17).
- diffs * diffs does a pairwise multiplication, think of it as diff = diff * diff; diff = diff * diff...; diff[n-1] = diff[n-1] * diff[n-1]. (line 36)
- size is an attribute from numpy.array to fetch the m*n elements (count) from an array.
In the next post we will study the Pypy, a JIT Compiler which can speed your code with minimal changes at your code!