Hi all,
Let's continue our series of posts about High Performance with Python. Last post I presented how you analyze your code using Python Profiling. If you missed the first part please check this link. To sum up, our goal is to present several techniques to make CPU-demanding tasks in Python run much faster.
The techniques that will be covered:
Python Profiling- How to find bottlenecks- Cython - Annotate your code and compile to C
- Numpy Vectors - Fast vector operations using numpy arrays
- Numpy integration with Cython - fast numerical Python library wrapped by Cython
- PyPy - Python's new Just in Time Compiler
In this post I will talk about Cython and how do you compile your code to C with this powerful tool!
The Problem
In this series we will analyze how to optimize the statistical Spearman Rank's Correlation coefficient, which it is a particular measure used to compute the similarity between items in recommender systems and assesses how well the relationship between two variables can be described using a monotonic function. The source code for this metric can be found in the last post.
Cython
Cython is a Python extension that lets developers annotate functions so they can be compiled to C. It takes a little time to develop but typically give a nice speed-up. If you're starting now with Cython, I recommend you to check this tutorial, it quite useful for beginners.
In our previous example we decided to optimize the function spearman_correlation. So first we will start a new module called spearman_correlation_cython.py , and move the spearman_correlation function into this module. In the original source you will have to import the spearman_correlation_cython and replace the reference to spearman_correlation(...) with spearman_correlation_cython. spearman_correlation(...).
So the code for your spearman_correlation_cython.py now is:
def spearman_correlation(ranks1, ranks2):"""Returns the Spearman correlation coefficient for two rankings, whichshould be dicts or sequences of (key, rank). The coefficient ranges from-1.0 (ranks are opposite) to 1.0 (ranks are identical), and is onlycalculated for keys in both rankings (for meaningful results, remove keyspresent in only one list before ranking)."""n = 0res = 0ranks1 = sorted(ranks1, key=lambda k: -k[1])ranks2 = sorted(ranks2, key=lambda k: -k[1])ranks1 = [(t[0], ix) for ix, t in enumerate(ranks1)]ranks2 = [(t[0], ix) for ix, t in enumerate(ranks2)]for k, d in _rank_dists(ranks1, ranks2):res += d * dn += 1try:return 1 - (6 * float(res) / (n * (n * n - 1)))except ZeroDivisionError:# Result is undefined if only one item is rankedreturn 0.0
Next we will have to rename the spearman_correlation_cython.py to spearman_correlation_cython.pyx. Cython uses .pyx to indicate that it is a file that will compile to C. Add also a new setup.py with the following contents:
# setup.pyfrom distutils.core import setupfrom distutils.extension import Extensionfrom Cython.Distutils import build_ext# for notes on compiler flags see:# http://docs.python.org/install/index.htmlsetup(cmdclass = {'build_ext': build_ext},ext_modules = [Extension("spearman_correlation_cython", ["spearman_correlation_cython.pyx"])])
Now run the command:
$ python setup.py build_ext --inplace
This command runs the setup.py script that we just created by calling the build_ext command. The new module is built in-place in the directory. You will see that it will be generated a new r spearman_correlation_cython.so in the directory.
Run the new code using python cython_spearman 199999 and you will see a slight improvement in the speed of the calculation (very minor yet!). You can take a look to see how well the slower Python calls are being replace with faster Cython calls by using:
$ cython -a rank_dists.pyx
It will generate a rank_dists.html file. If you open it in your browser, you will see something like:
Result of cython -a spearman_correlation_cython |
The workflow now that you will use at your code is progressive. Each time you add a type with Cython, it may improve the resulting code. When it does so successfully, you will see that the dark yellow lines will turn lighter and eventually they will turn white (it will represent that there is no need for improvements, it is faster!). If you are interested to analyze deeper, you could expand the code by double clicking at one of the lines with yellow code. It will show the C Python API calls that it is making.
Double-Clicking at one of the yellow lines of code at html it will show the C Python API calls |
You could also add annotations. So if you add type definitions (such as cdef int or cdef double... ) and run the cython -a ... command, you will can monitor the reduction in yellow in your browser. Don't forget to recompile using the setup.py command and confirm that the result is slightly faster!
Added some Cython types at the source code. |
Cython Compiler Directives
You could also set several compiler directives that comes with Cython. To enable them, you could use a comment at the top of the file or by changing the setup.py or even decorating the function individually.
Using the comment at top of the file.
1 #cython: boundscheck=False
Using the decorate function
import cython
...
@cython.boundscheck(False) # turn off boundscheck for this function
def f():
...
with cython.boundscheck(True): # turn it temporarily on again for this block
...
Using the setup.py
ext_modules = [Extension("spam", ["spam.pyx"]),
Extension("ham", ["ham.pyx"])]
for e in ext modules:
e.pyrex_directives = {"boundscheck": False}
One of the most used is the cProfile, that is useful for profiling cython code. It gives you exactly same output as running cProfile on a normal python module. Another common directive is the boundscheck. It desables out-of-bounds index checking on buffered arrays (common in numpy arrays, so since it does not check for IndexError exceptions it will run faster. Remember that any mistake prepare to expect a segmentation fault. So be careful when you decide to use boundscheck, that is, be sure that code is working correctly as you planned. There is also another one, the infer_types which is supposed to guess the type of variables.
Cython directly with C
But you may asking yourself if it is possible to wrap with Cython your existing libraries of C code. Yes it is possible! Cython uses external declarations to declare the C functions and variables from the library that you want to use. So let's see a quick example:
Let's consider a simple fatorial function written in C and we want to wrap it and call with Python/Cython:
fatorialEx.c
#include <stdio.h>int fatorial(int n){if (n == 1) {return 1;}return fatorial(n-1) * n;}
Now, you have to wrap it at your fatorial.pyx module:
fatorial.pyx
cdef extern from "fatorialEx.c":
int fatorial(int n)
def fat(n):
return fatorial(n)
See the line cdef extern (it's how Cython knows how to include external libraries). Finally create your setup.py module to build the extension:
# setup.pyfrom distutils.core import setupfrom distutils.extension import Extensionfrom Cython.Distutils import build_ext# for notes on compiler flags see:# http://docs.python.org/install/index.htmlsetup(cmdclass = {'build_ext': build_ext},ext_modules = [Extension("fatorial", ["fatorial.pyx"])])Build it:$python setup.py build_ext --inplace
You will see the module fatorial.so in the directory, this is the file that you will use now to import your code at Python. So in the Python console , type the following commands to test it:
>>> from fatorial import fat
>>> fat(5)
120
It is working! I am providing the source for this example. For further information about writing your extension check the Cython docs.
That's all, I hope you enjoyed!
Regards,
Marcel Caraciolo
Embedded system training: Wiztech Automation Provides Excellent training in embedded system training in Chennai - IEEE Projects - Mechanical projects in Chennai. Wiztech provide 100% practical training, Individual focus, Free Accommodation, Placement for top companies. The study also includes standard microcontrollers such as Intel 8051, PIC, AVR, ARM, ARMCotex, Arduino, etc.
ReplyDeleteEmbedded system training in chennai
Embedded Course training in chennai
Matlab training in chennai
Android training in chennai
LabVIEW training in chennai
Robotics training in chennai
Oracle training in chennai
Final year projects in chennai
Mechanical projects in chennai
ece projects in chennai
Great post. Thank you for sharing such useful information. Please keep sharing
ReplyDeleteBest B.Tech College in Noida
Amazing content.
ReplyDeleteData Mining Service Providers in Bangalore
python training in bangalore | python online taining
ReplyDeleteaws training in bangalore | aws online training
artificial intelligence training in bangalore | artificial intelligence online training
machine learning training in bangalore | machine learning online training
data science training in bangalore | data science online training
This professional hacker is absolutely reliable and I strongly recommend him for any type of hack you require. I know this because I have hired him severally for various hacks and he has never disappointed me nor any of my friends who have hired him too, he can help you with any of the following hacks:
ReplyDelete-Phone hacks (remotely)
-Credit repair
-Bitcoin recovery (any cryptocurrency)
-Make money from home (USA only)
-Social media hacks
-Website hacks
-Erase criminal records (USA & Canada only)
-Grade change
-funds recovery
Email: onlineghosthacker247@ gmail .com
Studyprovider has experts team are giving the homework help, assignment help, report, thesis, research writing services and physcis assignment help available 24/7 seven days a week contact now.
ReplyDelete