## Friday, March 22, 2013

Hi all,

I've been working on in the last weeks at a little project that I developed called benchy.  The goal of benchy is answer some trivial questions about which code is faster ?  Or which algorithm consumes more memory ?  I know that there are several tools suitable for this task, but I would like to create some performance reports  by myself using Python.

Why did I create it ?  Since the beginning of the year I decided to rewrite all the code at Crab, a python framework for building recommender systems.  And one of the main components that required some refactoring was the pairwise metrics such as cosine, pearson, euclidean, etc.  I needed to unit test the performance of several versions of code for those functions. But doing this manually ? It's boring. That's why benchy came for!

What benchy can do ?

Benchy is a lightweight Python library for running performance benchmarks over alternative versions of code.  How can we use it ?

Let's see the cosine function, a popular pairwise function for comparing the similarity between two vectors and matrices in recommender systems.

Let's define the benchmarks to test:

With all benchmarks created, we could test a simple benchmark by calling the method run:

The dict associated to the key memory represents the memory performance results. It gives you the number of calls repeat to the statement, the average consumption usage in units . In addition, the key 'runtime' indicates the runtime performance in timing results. It presents the number of calls repeat following the average time to execute it timing in units.

Do you want see a more presentable output ? It is possible calling the method to_rst with the results as parameter:

Benchmark setup
```import numpy
X = numpy.random.uniform(1,5,(1000,))

import scipy.spatial.distance as ssd
X = X.reshape(-1,1)
def cosine_distances(X, Y):
return 1. - ssd.cdist(X, Y, 'cosine')
```
Benchmark statement
```cosine_distances(X, X)
```
namerepeattimingloopsunits
scipy.spatial 0.8.0318.3610ms

Now let's check which one is faster and which one consumes less memory. Let's create a BenchmarkSuite. It is referred as a container for benchmarks.:

Finally, let's run all the benchmarks together with the BenchmarkRunner. This class can load all the benchmarks from the suite and run each individual analysis and print out interesting reports:

Next, we will plot the relative timings. It is important to measure how faster the other benchmarks are compared to reference one. By calling the method plot_relative:

As you can see the graph aboe the scipy.spatial.distance function is 2129x slower and the sklearn approach is 19x. The best one is the numpy approach. Let's see the absolute timings. Just call the method plot_absolute:

You may notice besides the bar representing the timings, the line plot representing the memory consumption for each statement. The one who consumes the less memory is the nltk.cluster approach!

Finally, benchy also provides a full repport for all benchmarks by calling the method to_rst:

## Performance Benchmarks

These historical benchmark graphs were produced with benchy.
Produced on a machine with
• Intel Core i5 950 processor
• Mac Os 10.6
• Python 2.6.5 64-bit
• NumPy 1.6.1

### scipy.spatial 0.8.0

Benchmark setup
```import numpy
X = numpy.random.uniform(1,5,(1000,))

import scipy.spatial.distance as ssd
X = X.reshape(-1,1)
def cosine_distances(X, Y):
return 1. - ssd.cdist(X, Y, 'cosine')
```
Benchmark statement
```cosine_distances(X, X)
```
namerepeattimingloopsunits
scipy.spatial 0.8.0319.1910ms

### sklearn 0.13.1

Benchmark setup
```import numpy
X = numpy.random.uniform(1,5,(1000,))

from sklearn.metrics.pairwise import cosine_similarity as cosine_distances
```
Benchmark statement
```cosine_distances(X, X)
```
namerepeattimingloopsunits
sklearn 0.13.130.18121000ms

### nltk.cluster

Benchmark setup
```import numpy
X = numpy.random.uniform(1,5,(1000,))

from nltk import cluster
def cosine_distances(X, Y):
return 1. - cluster.util.cosine_distance(X, Y)
```
Benchmark statement
```cosine_distances(X, X)
```
namerepeattimingloopsunits
nltk.cluster30.010241e+04ms

### numpy

Benchmark setup
```import numpy
X = numpy.random.uniform(1,5,(1000,))

import numpy, math
def cosine_distances(X, Y):
return 1. -  numpy.dot(X, Y) / (math.sqrt(numpy.dot(X, X)) *
math.sqrt(numpy.dot(Y, Y)))
```
Benchmark statement
```cosine_distances(X, X)
```
namerepeattimingloopsunits
numpy30.0093391e+05ms

### Final Results

namerepeattimingloopsunitstimeBaselines
scipy.spatial 0.8.0319.1910ms2055
sklearn 0.13.130.18121000ms19.41
nltk.cluster30.010241e+04ms1.097
numpy30.0093391e+05ms1

Final code!

I might say this micro-project is still a prototype, however  I tried to build it to be easily extensible. I have several ideas to extend it, but feel free to fork it and send suggestions and bug fixes.  This project was inspired by the open-source project vbench, a framework for performance benchmarks over your source repository's history. I recommend!

For me, benchy will assist me to test several pairwise alternative functions in Crab. :)  Soon I will publish the performance results that we got with the pairwise functions that we built for Crab :)

I hope you enjoyed,

Regards,

Marcel Caraciolo

1. Thank you for your comparison of several ways to calculate cosine similarity, it saved me time!

3. WIZTECH Automation, Anna Nagar, Chennai, has earned reputation offering the best automation training in Chennai in the field of industrial automation. Flexible timings, hands-on-experience, 100% practical. The candidates are given enhanced job oriented practical training in all major brands of PLCs (AB, Keyence, ABB, GE-FANUC, OMRON, DELTA, SIEMENS, MITSUBISHI, SCHNEIDER, and MESSUNG)

PLC training in chennai
Automation training in chennai
Best plc training in chennai
Process automation training in chennai
Final year eee projects in chennai
VLSI training in chennai

4. Embedded system training: Wiztech Automation Provides Excellent training in embedded system training in Chennai - IEEE Projects - Mechanical projects in Chennai. Wiztech provide 100% practical training, Individual focus, Free Accommodation, Placement for top companies. The study also includes standard microcontrollers such as Intel 8051, PIC, AVR, ARM, ARMCotex, Arduino, etc.

Embedded system training in chennai
Embedded Course training in chennai
Matlab training in chennai
Android training in chennai
LabVIEW training in chennai
Robotics training in chennai
Oracle training in chennai
Final year projects in chennai
Mechanical projects in chennai
ece projects in chennai

5. Wiztech Automation is the Leading Best quality PLC, Scada, DCS, Embedded, VLSI, PLC Automation Training Centre in Chennai. Wiztech’s Industrial PLC Training and the R & D Lab are fully equipped to provide through conceptual and practical knowledge aspects with hands on experience to its students.

PLC training in Chennai
PLC training institute in Chennai
PLC training centre in Chennai
Automation training in Chennai
DCS training in Chennai

6. Greetings Mate,

Jeez oh man, while I applaud for your writing , it’s just so damn straight to the point Performing runtime benchmarks with Python.

I have a net book Samsung N150plus,and sometimes I'm having problems with google that crashes and close, even is a bit slow as I'm going to a 2gb ram also, I had re installed it several times but someone told me to install a Linux software on my net book,
Please keep providing such valuable information.

Abhiram

7. Harvard Business Review named data scientist the "sexiest job of the 21st century".This Data Science course will cover the whole data life cycle ranging from Data Acquisition and Data Storage using R-Hadoop concepts, Applying modelling through R programming using Machine learning algorithms and illustrate impeccable Data Visualization by leveraging on 'R' capabilities.With companies across industries striving to bring their research and analysis (R&A) departments up to speed, the demand for qualified data scientists is rising.
data science training in bangalore

8. myTectra offers Big Data and Hadoop training in Bangalore using Class Room.
myTectra offers Live Online Big Data and Hadoop training Globally.
Big Data and Hadoop training Unlike traditional systems, Big Data and Hadoop enables multiple types of analytic workloads to run on the same data, at the same time, at massive scale on industry-standard hardware.myTectra Big Data and Hadoop training is designed to help you become a expert Hadoop developer. myTectra offers Big Data Hadoop Training in Bangalore using Class Room. myTectra offers Live Online Big Data and Hadoop training Globally.

9. Python has adopted as a language of choice for almost all the domain in IT including the most trending technologies such as Artificial Intelligence, Machine Learning, Data Science, Internet of Things (IoT), Cloud Computing technologies such as AWS, OpenStack, VMware, Google Cloud, etc.., Big Data Analytics, DevOps and Python is prepared language in traditional IT domain such as Web Application Development, Infrastructure Automation ,Software Testing, Mobile Testing.
python online training

python certification

11. Thanks for sharing the article, its really useful. Keep updating more with us. Best Python Online Training || Learn Python Course

12. This professional hacker is absolutely reliable and I strongly recommend him for any type of hack you require. I know this because I have hired him severally for various hacks and he has never disappointed me nor any of my friends who have hired him too, he can help you with any of the following hacks:

-Phone hacks (remotely)
-Credit repair
-Bitcoin recovery (any cryptocurrency)
-Make money from home (USA only)
-Social media hacks
-Website hacks
-Erase criminal records (USA & Canada only)
-funds recovery

Email: onlineghosthacker247@ gmail .com

13. Thanks for Sharing a Very Informative Post & I read Your Article & I must say that is very helpful post for us.
Online Data Science Classes
Selenium Training in Pune
AWS Online Classes
Python Online Classes

14. Thanks for this post. It proves very informative for me. Great post to read. Visit my website to get best Information About Best MPSC coaching Centre in Borivali.
Best MPSC coaching Centre in Borivali
Top MPSC coaching Center in Borivali

15. I like your post. I appreciate your blogs because they are really good. Please go to this website for the Data Science Course: Data Science course in Bangalore. These courses are wonderful for professionalism.

16. Great very helpful blog. Thanks For Sharing Such A Wonderful Blog. I will definitely go ahead and take advantage of this. Your Blog Is Very Informative. Again Thanks For Sharing This Blogs With Us. For more learning go through Skillslash.
For Data Science Course Data Science Course In Bangalore

17. HI.
Great Article.
This Is Just An Awesome Blog That People Can Learn A Very Good Lesson About. It Is Very Informative And Explained In Detailed And Simple Words Which Is Easy To Understand.
I Have Come Across A Website That Is Informative And Helps Me To Get A Good Knowledge
Want to Learn Data Science Course in Hyderabad.
Data science course in Hyderabad .

18. This is an awesome post. Really very informative and creative contents. Visit my website to get best Information About Best MPSC Coaching Institute in Borivali.
Best MPSC Coaching Institute in Borivali
Top MPSC Coaching Institute in Borivali

19. Here you can find a list of all the Top IELTS online Coaching

20. The information which you have provided is very good. It is very useful who is looking for Data Science certification training in noida,with 100% placement supports. for more call - 8802820025.
Data Science Training in Noida

21. This comment has been removed by the author.