## Sunday, September 11, 2011

### High Performance Computation with Python - Part 01

Hi all,

I decided to start a series of posts about High Performance with Python.  I will give a brief explanation about how you can make CPU-bound tasks in Python run much faster.  For machine learning developers and who works with systems that demand processing and memory it is really important to have optimized algorithms to handle with large data sets.

I've been working since last year with recommender systems, and I will show some techniques that I used to make CPU-demanding tasks in Python run much faster. The techniques that will be covered:
1.  Python Profiling - How to find bottlenecks
2.  Cython -  Annotate your code and compile to C
3.  Numpy Vectors - Fast vector operations using numpy arrays
4.  Numpy integration with Cython - fast numerical Python library wrapped by Cython
5.  PyPy - Python's new Just in Time  Compiler
In this post I will begin with the Python Profiling, so let's get started!

The Problem

In this series we will analyze how to optimize the statistical Spearman Rank's Correlation coefficient,  which it is a particular measure used to compute the similarity between items in recommender systems and assesses how well the relationship between two variables can be described using a monotonic function. It looks like the Pearson Correlation, but compares relative ranking of the preference values instead of preference values themselves. That is, each user's preferences are sorted, and then assign a rank as their preference value, with 1 being assigned to the least preferred item.
If you want a background on the correlation, please check this link. We will look at improvements in Python to make the code run a bit faster and then we will look at fast ways to convert the code directly to C for the best speed-ups.

Consider the input:
```
>>> ranks1= [('a', 2.5), ('b', 3.5), ('c', 3.0), ('d', 3.5)]

>>> ranks2= [('c', 3.5), ('b', 4.5), ('e', 1.0), ('d', 1.5)]
```

This is the output we’re after. Let's look the pure python implementation for this measure.

```
>>> print spearman_correlation(ranks1, ranks2)

0.5
```

Pure Python implementation
```

```
Below we have the basic pure-python implementation. As you can see, we have the function spearman, which does the hard work calculating the output (correlation).  We also use a function _rank_dists that will compare common keys in both ranks and compute the difference between them.  The inputs are arrays of tuples, each with the key and the rating for the item.  For this snippet I will use as parameter the number of elements for each array (n) so we can evaluate the coefficient and see the difference in computation time when we vary the input.
```

```
```
import datetime

import sys

import random

def _rank_dists(ranks1, ranks2):

"""Finds the difference between the values in ranks1 and ranks2 for keys

present in both dicts. If the arguments are not dicts, they are converted

from (key, rank) sequences.

"""

ranks1 = dict(ranks1)

ranks2 = dict(ranks2)

for k, v1 in ranks1.iteritems():

try:

yield k, v1 - ranks2[k]

except KeyError:

pass

def spearman_correlation(ranks1, ranks2):

"""Returns the Spearman correlation coefficient for two rankings, which

should be dicts or sequences of (key, rank). The coefficient ranges from

-1.0 (ranks are opposite) to 1.0 (ranks are identical), and is only

calculated for keys in both rankings (for meaningful results, remove keys

present in only one list before ranking)."""

n = 0

res = 0

ranks1 = sorted(ranks1, key=lambda k: -k[1])

ranks2 = sorted(ranks2, key=lambda k: -k[1])

ranks1 = [(t[0], ix) for ix, t in enumerate(ranks1)]

ranks2 = [(t[0], ix) for ix, t in enumerate(ranks2)]

for k, d in _rank_dists(ranks1, ranks2):

res += d * d

n += 1

try:

return 1 - (6 * float(res) / (n * (n * n - 1)))

except ZeroDivisionError:

# Result is undefined if only one item is ranked

return 0.0

if __name__ == '__main__':

n = sys.argv[1]

n = int(n)

possible_items_x = range(1,  n+1)

possible_items_y = range(1,  n+1)

random.shuffle(possible_items_y)

random.shuffle(possible_items_x)

ranks1 = []

ranks2 = []

for x in xrange(n):

item1 = possible_items_x.pop()

item2 = possible_items_y.pop()

ranks1.append((item1, random.random() * 4.0 + 1.0))

ranks2.append((item2, random.random() * 4.0 + 1.0))

start_time = datetime.datetime.now()

print spearman_correlation(ranks1, ranks2)

end_time = datetime.datetime.now()

secs = end_time - start_time

print "Main took", secs

```

Profiling with CProfile, LineProfiler

The profile module is the most common method to profile the Python Code. Take a look at it here.  To run our Python using it just call at your terminal the following command:

\$ python -m cProfile -o rep.prof spearman.py 199999

This generates a out.prof output containing the profiling results. Then, we can now load this into the pstats module and print out the top 10 slowest functions for example.

```
>>> print import pstats
```
```
>>> p = pstats.Stats('rep.prof')
```
```
>>> p.sort_stats('cumulative').print_stats(10)

```
This generates a out.prof output containing the profiling results. Then, we can now load this into the pstats module and print out the top 10 slowest functions for example.

As you can see at the result above, it tells us that the spearman_correlation costs 0.755 seconds for its lines of code and in total (including all the function it calls) costs  about 3.024 CPU seconds. Two functions, we could analyze and improve it. One is the sort operation, which is called only once and costs about 20% of the total time as also the _rank_dists which is called 200000 times, and could be optimized to be faster (maybe?!).

But sometimes the output when comes with several lines, it becomes hard to understand.  For this task you can use the libray runsnake, which it is a visualization tool for display the profiled results.

So if you run at your terminal:

\$ runsnake rep.prof

It will generate the following report. This tool is really useful specially when you have a complex project with several modules. You can easily identify which functions are worth dealing with first of all!

But maybe you're wondering if is it possible to check which lines that are causing the code ro run slow ?  As you saw cProfile couldn't answer that.  So  we have to check the line_profiler module.  All we need is to decorate our chosen function with @profile:

@profile
def spearman_correlation(ranks1, ranks2):

After we will run kernprof.py and make it to do line-by-line profiling and which module we want to profile. Warning: Since the profiling now is line-by-line depending on your problem, it will take longer to profile it!

>> kernprof.py -l -v pure_python.py 300 100

Here we can see that the most of the time spent is in the  for in _rank_dists loop (line 34)  as also in the building of the ranks (line 31,32). So if we could make those lines run faster then the entire function would run much faster.  The advantage of the line_profiler profiling is that you can easily identify which lines are causing you the biggest problems and focus directly on the bottlenecks rather than guessing which lines are the slow ones!

REMEMBER to remove the @profile decorator after you're done with the line_profiler or the Python will throw an exception since it won't recognize the @profile decorator outsied of kernprof.py

In the next post I will cover the first optimization using Cython, where we will make some small modifications to the code. Stay tunned!

I expect you enjoyed this step-by-step analysis and it may be useful not only in scientific computing, but as also in non-CPU bound tasks.  You could profile your web serve to check out which lines are running slowly!

PS:  You could use the Python module dis to disassemble your custom module and get a deeper analysis seeing the underlying bytecode. Just type:

```
>>> import spearman
```
```
>>> dis.dis(spearman)
```

That's all,

Cheers,

Marcel Caraciolo

#### 2 comentÃ¡rios:

Anonymous said...

line_profile was everything what I needed. thanks!

imarcelolz said...

excellent post!

I'm python developer and looking more performance.