Pages

High Performance Computation with Python - Part 03

Saturday, September 24, 2011

Hi all,


This article is the third one of the series about High Computation with Python.  For anyone that missed the first and the second parts check this link and this one.  The goal is to present approaches to make CPU-demanding tasks in Python run much faster.


The techniques that are being covered:


  1.  Python Profiling - How to find bottlenecks
  2.  Cython -  Annotate your code and compile to C
  3.  Numpy Vectors - Fast vector operations using numpy arrays
  4.  Numpy integration with Cython - fast numerical Python library wrapped by Cython
  5.  PyPy - Python's new Just in Time  Compiler

In this post I will talk about Numpy Vectors and how you can wrap it with Cython!


The Problem

In this series we will analyze how to optimize the statistical Spearman Rank's Correlation coefficient,  which it is a particular measure used to compute the similarity between items in recommender systems and assesses how well the relationship between two variables can be described using a monotonic function. The source code for this metric can be found in the first post.


Numpy


Numpy is a powerful extension to Python, adding support for large, multi-dimensional array and matrices, along with several mathematical functions to manipulate these arrays. To install it you can type this command at your terminal

$ sudo easy_install numpy

Or

$ pip install numpy

In our example we will change the spearman.py . Import the numpy library and change the spearman_correlation to look  the one below. If you run and test it you will ger the same output as before.



The numpy strength is that can simplify lots of operations on vectors or matrixes of numbers since they work directly in all list rather than on individual elements at one time.  So before we had nested for loops over individual terms in a list, now with numpy you could do the same job in a faster and simple way.
Some notes:

  • You define an array with numpy.array statement, in our case a list of tuples indexed by the labels keys and ranks. (lines 29 and 30).
  • Lots of operations already implemented in numpy, such as numpy.in1d which finds where the elements in the first vector are in the second vector returning an array os bools.
  •  We have numpy.sort which sort the elements based on a key, in this example (ranks) (lines 16 and 17).
  • diffs * diffs does a pairwise multiplication, think of it as diff[0] = diff[0] * diff[0]; diff[1] = diff[1] * diff[1]...; diff[n-1] = diff[n-1] * diff[n-1]. (line 36)
  • size is an attribute from numpy.array to fetch the m*n elements (count) from an array.
If it stills unclear I suggest you to try it at the command line, step-by-step to look over the results. Put a small number of elements in the array and see it in action.

Numpy with Cython


Numpy is a powerful library and uses very fast C optimized math libraries to perform these calculations very quickly. You can also wrap your python code with Cython. The main difference is the annotation of the numpy arrays. You can see the tutorial for further details.  The difference are how we import: cimport numpy as np and the assinature of the function _rank_dists.




Special Notes - Meeting Scipy

Another poweful library is Scipy, it is a package for Python that brings several algebra techniques for dealing with matrices and vectors.  One special module is the scipy.stats, which comes with the spearmanr function. It receives two arrays with the observations and returns the spearman coefficient. Amazing! Let's see our code below:


In the next post we will study the Pypy, a JIT Compiler which can speed your code with minimal changes at your code!

I hope you enjoyed this article,

Marcel Caraciolo

4 comments:

  1. Welcome to Wiztech Automation - Embedded System Training in Chennai. We have knowledgeable Team for Embedded Courses handling and we also are after Job Placements offer provide once your Successful Completion of Course. We are Providing on Microcontrollers such as 8051, PIC, AVR, ARM7, ARM9, ARM11 and RTOS. Free Accommodation, Individual Focus, Best Lab facilities, 100% Practical Training and Job opportunities.

    Embedded System Training in chennai
    Embedded System Training Institute in chennai
    Embedded Training in chennai
    Embedded Course in chennai
    Best Embedded System Training in chennai
    Best Embedded System Training Institute in chennai
    Best Embedded System Training Institutes in chennai
    Embedded Training Institute in chennai
    Embedded System Course in chennai
    Best Embedded System Training in chennai

    ReplyDelete
  2. WIZTECH Automation, Anna Nagar, Chennai, has earned reputation offering the best automation training in Chennai in the field of industrial automation. Flexible timings, hands-on-experience, 100% practical. The candidates are given enhanced job oriented practical training in all major brands of PLCs (AB, Keyence, ABB, GE-FANUC, OMRON, DELTA, SIEMENS, MITSUBISHI, SCHNEIDER, and MESSUNG)

    PLC training in chennai
    Automation training in chennai
    Best plc training in chennai
    PLC SCADA training in chennai
    Process automation training in chennai
    Final year eee projects in chennai
    VLSI training in chennai

    ReplyDelete
  3. Embedded system training: Wiztech Automation Provides Excellent training in embedded system training in Chennai - IEEE Projects - Mechanical projects in Chennai. Wiztech provide 100% practical training, Individual focus, Free Accommodation, Placement for top companies. The study also includes standard microcontrollers such as Intel 8051, PIC, AVR, ARM, ARMCotex, Arduino, etc.

    Embedded system training in chennai
    Embedded Course training in chennai
    Matlab training in chennai
    Android training in chennai
    LabVIEW training in chennai
    Robotics training in chennai
    Oracle training in chennai
    Final year projects in chennai
    Mechanical projects in chennai
    ece projects in chennai

    ReplyDelete