## Sunday, September 11, 2011

Hi all,

I decided to start a series of posts about High Performance with Python.  I will give a brief explanation about how you can make CPU-bound tasks in Python run much faster.  For machine learning developers and who works with systems that demand processing and memory it is really important to have optimized algorithms to handle with large data sets.

I've been working since last year with recommender systems, and I will show some techniques that I used to make CPU-demanding tasks in Python run much faster. The techniques that will be covered:
1.  Python Profiling - How to find bottlenecks
2.  Cython -  Annotate your code and compile to C
3.  Numpy Vectors - Fast vector operations using numpy arrays
4.  Numpy integration with Cython - fast numerical Python library wrapped by Cython
5.  PyPy - Python's new Just in Time  Compiler
In this post I will begin with the Python Profiling, so let's get started!

The Problem

In this series we will analyze how to optimize the statistical Spearman Rank's Correlation coefficient,  which it is a particular measure used to compute the similarity between items in recommender systems and assesses how well the relationship between two variables can be described using a monotonic function. It looks like the Pearson Correlation, but compares relative ranking of the preference values instead of preference values themselves. That is, each user's preferences are sorted, and then assign a rank as their preference value, with 1 being assigned to the least preferred item.
If you want a background on the correlation, please check this link. We will look at improvements in Python to make the code run a bit faster and then we will look at fast ways to convert the code directly to C for the best speed-ups.

Consider the input:
>>> ranks1= [('a', 2.5), ('b', 3.5), ('c', 3.0), ('d', 3.5)]
>>> ranks2= [('c', 3.5), ('b', 4.5), ('e', 1.0), ('d', 1.5)]

This is the output we’re after. Let's look the pure python implementation for this measure.

>>> print spearman_correlation(ranks1, ranks2)
0.5

Pure Python implementation
Below we have the basic pure-python implementation. As you can see, we have the function spearman, which does the hard work calculating the output (correlation).  We also use a function _rank_dists that will compare common keys in both ranks and compute the difference between them.  The inputs are arrays of tuples, each with the key and the rating for the item.  For this snippet I will use as parameter the number of elements for each array (n) so we can evaluate the coefficient and see the difference in computation time when we vary the input.
import datetime
import sys
import random
def _rank_dists(ranks1, ranks2):
"""Finds the difference between the values in ranks1 and ranks2 for keys
present in both dicts. If the arguments are not dicts, they are converted
from (key, rank) sequences.
"""
ranks1 = dict(ranks1)
ranks2 = dict(ranks2)
for k, v1 in ranks1.iteritems():
try:
yield k, v1 - ranks2[k]
except KeyError:
pass
def spearman_correlation(ranks1, ranks2):
"""Returns the Spearman correlation coefficient for two rankings, which
should be dicts or sequences of (key, rank). The coefficient ranges from
-1.0 (ranks are opposite) to 1.0 (ranks are identical), and is only
calculated for keys in both rankings (for meaningful results, remove keys
present in only one list before ranking)."""
n = 0
res = 0
ranks1 = sorted(ranks1, key=lambda k: -k[1])
ranks2 = sorted(ranks2, key=lambda k: -k[1])
ranks1 = [(t[0], ix) for ix, t in enumerate(ranks1)]
ranks2 = [(t[0], ix) for ix, t in enumerate(ranks2)]
for k, d in _rank_dists(ranks1, ranks2):
res += d * d
n += 1
try:
return 1 - (6 * float(res) / (n * (n * n - 1)))
except ZeroDivisionError:
# Result is undefined if only one item is ranked
return 0.0
if __name__ == '__main__':
n = sys.argv[1]
n = int(n)
possible_items_x = range(1, n+1)
possible_items_y = range(1, n+1)
random.shuffle(possible_items_y)
random.shuffle(possible_items_x)
ranks1 = []
ranks2 = []
for x in xrange(n):
item1 = possible_items_x.pop()
item2 = possible_items_y.pop()
ranks1.append((item1, random.random() * 4.0 + 1.0))
ranks2.append((item2, random.random() * 4.0 + 1.0))
start_time = datetime.datetime.now()
print spearman_correlation(ranks1, ranks2)
end_time = datetime.datetime.now()
secs = end_time - start_time
print "Main took", secs

Profiling with CProfile, LineProfiler

The profile module is the most common method to profile the Python Code. Take a look at it here.  To run our Python using it just call at your terminal the following command:

\$ python -m cProfile -o rep.prof spearman.py 199999

This generates a out.prof output containing the profiling results. Then, we can now load this into the pstats module and print out the top 10 slowest functions for example.

>>> print import pstats
>>> p = pstats.Stats('rep.prof')
>>> p.sort_stats('cumulative').print_stats(10)
This generates a out.prof output containing the profiling results. Then, we can now load this into the pstats module and print out the top 10 slowest functions for example.

As you can see at the result above, it tells us that the spearman_correlation costs 0.755 seconds for its lines of code and in total (including all the function it calls) costs  about 3.024 CPU seconds. Two functions, we could analyze and improve it. One is the sort operation, which is called only once and costs about 20% of the total time as also the _rank_dists which is called 200000 times, and could be optimized to be faster (maybe?!).

But sometimes the output when comes with several lines, it becomes hard to understand.  For this task you can use the libray runsnake, which it is a visualization tool for display the profiled results.

So if you run at your terminal:

\$ runsnake rep.prof

It will generate the following report. This tool is really useful specially when you have a complex project with several modules. You can easily identify which functions are worth dealing with first of all!

But maybe you're wondering if is it possible to check which lines that are causing the code ro run slow ?  As you saw cProfile couldn't answer that.  So  we have to check the line_profiler module.  All we need is to decorate our chosen function with @profile:

@profile
def spearman_correlation(ranks1, ranks2):

After we will run kernprof.py and make it to do line-by-line profiling and which module we want to profile. Warning: Since the profiling now is line-by-line depending on your problem, it will take longer to profile it!

>> kernprof.py -l -v pure_python.py 300 100

Here we can see that the most of the time spent is in the  for in _rank_dists loop (line 34)  as also in the building of the ranks (line 31,32). So if we could make those lines run faster then the entire function would run much faster.  The advantage of the line_profiler profiling is that you can easily identify which lines are causing you the biggest problems and focus directly on the bottlenecks rather than guessing which lines are the slow ones!

REMEMBER to remove the @profile decorator after you're done with the line_profiler or the Python will throw an exception since it won't recognize the @profile decorator outsied of kernprof.py

In the next post I will cover the first optimization using Cython, where we will make some small modifications to the code. Stay tunned!

I expect you enjoyed this step-by-step analysis and it may be useful not only in scientific computing, but as also in non-CPU bound tasks.  You could profile your web serve to check out which lines are running slowly!

PS:  You could use the Python module dis to disassemble your custom module and get a deeper analysis seeing the underlying bytecode. Just type:

>>> import spearman
>>> dis.dis(spearman)

That's all,

Cheers,

Marcel Caraciolo

1. line_profile was everything what I needed. thanks!

2. excellent post!

I'm python developer and looking more performance.

3. Welcome to Wiztech Automation - Embedded System Training in Chennai. We have knowledgeable Team for Embedded Courses handling and we also are after Job Placements offer provide once your Successful Completion of Course. We are Providing on Microcontrollers such as 8051, PIC, AVR, ARM7, ARM9, ARM11 and RTOS. Free Accommodation, Individual Focus, Best Lab facilities, 100% Practical Training and Job opportunities.

Embedded System Training in chennai
Embedded System Training Institute in chennai
Embedded Training in chennai
Embedded Course in chennai
Best Embedded System Training in chennai
Best Embedded System Training Institute in chennai
Best Embedded System Training Institutes in chennai
Embedded Training Institute in chennai
Embedded System Course in chennai
Best Embedded System Training in chennai

4. WIZTECH Automation, Anna Nagar, Chennai, has earned reputation offering the best automation training in Chennai in the field of industrial automation. Flexible timings, hands-on-experience, 100% practical. The candidates are given enhanced job oriented practical training in all major brands of PLCs (AB, Keyence, ABB, GE-FANUC, OMRON, DELTA, SIEMENS, MITSUBISHI, SCHNEIDER, and MESSUNG)

PLC training in chennai
Automation training in chennai
Best plc training in chennai
Process automation training in chennai
Final year eee projects in chennai
VLSI training in chennai

5. Embedded system training: Wiztech Automation Provides Excellent training in embedded system training in Chennai - IEEE Projects - Mechanical projects in Chennai. Wiztech provide 100% practical training, Individual focus, Free Accommodation, Placement for top companies. The study also includes standard microcontrollers such as Intel 8051, PIC, AVR, ARM, ARMCotex, Arduino, etc.

Embedded system training in chennai
Embedded Course training in chennai
Matlab training in chennai
Android training in chennai
LabVIEW training in chennai
Robotics training in chennai
Oracle training in chennai
Final year projects in chennai
Mechanical projects in chennai
ece projects in chennai

6. Great post. Thank you for sharing such useful information. Please keep sharing

Best B.Tech College in Noida