Pages

Non-Personalized Recommender systems with Pandas and Python

Tuesday, October 22, 2013


Hi all,

At the last PythonBrasil I gave a tutorial about Python and Data Analysis focused on recommender systems, the main topic I've been studying for the last years. There is a popular python package among the statisticians and data scientists called Pandas. I watched several talks and keynotes about it, but I didn't have a try on it. The tutorial gave me this chance and after the tutorial me and the audience fell quite excited about the potential and power that this library gives.

This post starts a series of articles that I will write about recommender systems and even the introduction for the new-old refreshed library that I am working on:  Crab,  a python library for building recommender systems. :)

This post starts with the first topic about the theme: Non-personalized Recommender Systems and giving several examples with the python package Pandas.  In future I will also post an alternative version of this post but referencing Crab, about how it works with him.

But first let's introduce what Pandas is.

Introduction to Pandas


Pandas is a data analysis library for Python that is great for data preparation, joining and ultimately generating well-formed, tabular data that's easy to use in a variety of visualization tools or (as we will see here) machine learning applications. For further introduction about pandas, check this website or this notebook.

Non-personalized Recommenders


Non-personalized recommenders can  recommend items to consumers based on what other consumers have said about the items on average. That is, the recommendations are independent of the customer,  so each customer gets the same recommendation.  For example, if you go to amazon.com as an anonymous user it shows items that are currently viewed by other members.

Generally the recommendations come in two flavours: predictions or recommendations. In case of predictions are simple statements that are formed in form of scores, stars or counts.  On the other hand, recommendations are generally simple a list of items shown without any number associated with it.

Let's going by an example:

Simple Prediction using Average

The score in the scale of 1 to 5 to the book Programming Collective Intelligence was 4.5 stars out of 5.
This is an example of a simple prediction. It displays a simple average of other customer reviews about the book.
The math behind it is quite simple:

Score = ( 65 * 5 + 18 * 4 + 7 * 3 +  4 * 2 +  2 * 1)
Score =  428/ 96
Score = 4.45 ˜= 4.5 out of 5 stars

In the same page it also displays the information about the other books which the customers bought after buying Programming Collective Intelligence. A list of recommended books presented to anyone who visits the product's page. It is an example of recommendation.




But how Amazon came up with those recommendations ? There are several techniques that could be applied to provide those recommendations. One would be the association rules mining, a data mining technique to generate a set of  rules and combinatios of items that were bought together. Or it could be a simple average measure based on the proportion of who bought x and y by who bought x. Let's explain using some maths:




Let X be the number of customers who purchased the book Programming Collective Intelligence. Let Y be the other books they purchased. You need to compute the ration given below for each book and sort them by descending order.  Finally, pick up the top K books and show them as related. :D

Score(X, Y) =  Total Customers who purchased X and Y / Total Customers who purchased X


Using this simple score function for all the books you wil achieve:


Python for Data Analysis                                                 100%

Startup Playbook                                                              100%

MongoDB Definitive Guid                                                0 %

Machine Learning for Hackers                                          0%


As we imagined the book  Python for Data Analysis makes perfect sense. But why did the book  Startup Playbook came to the top when it has been purchased by customers who have not purchased Programming Collective Intelligence.  This a famous trick in e-commerce applications called banana trap.   Let's explain: In a grocery store most of customers will buy bananas. If someones buys a razor and a banana then you cannot tell that the purchase of a razor influenced the purchase of banana.  Hence we need to adjust the math to handle this case as well. Modfying the version:

Score(X, Y) =  (Total Customers who purchased X and Y / Total Customers who purchased X) / 
         (Total Customers who did not purchase X but got Y / Total Customers who did not purchase X)

Substituting the number we get:

Python for Data Analysis =   ( 2 / 2 ) /  ( 1 / 3) =  1 / 1/3  =  3 

Startup Playbook   =   ( 2 / 2)  /  ( 3 /  3)  =  1 

The denominator acts as a normalizer and you can see that Python for Data Analysis clearly stands out.  Interesting, doesn't ? 

The next article I will work more with non-personalized recommenders, presenting some ranking algorithms that I developed for Atepassar.com for ranking  professors. :)

Examples with real dataset (let's play with CourseTalk dataset)

To present non-personalized recommenders let's play with some data. I decided to crawl the data from the popular ranking site for MOOC's  Course Talk.  It is an aggregator of several MOOC's where people can rate the courses and write reviews.  The dataset is a mirror from the date  10/11/2013 and it is only used here for study purposes.



Let's use Pandas to read all the data and start showing what we can do with Python and present a list of top courses ranked by some non-personalized metrics :)

Update: For better analysis I hosted all the code provided at the IPython Notebook at the following link by using nbviewer.

All the dataset and source code will be provided at crab's github, the idea is to work on those notebooks to provide a future book about recommender systems :)

I hope you enjoyed this article,  and stay tunned for the next one about another type of non-personalized recommenders:  Ranking algorithms for vote up/vote down systems!

Special thanks for the tutorial of Diego Manillof :)

Cheers,

Marcel Caraciolo

44 comments:

  1. Great article, mate. Can't wait for next part!
    Good luck

    ReplyDelete
  2. Great post, Marcel.

    I've been using pandas for a while now, it's really great for data management. The only downside is that pandas has limited out-of-core capabilities. My dataset is ~200GB big and I have to use a high-performance cluster to be able to use it with pandas. But apparently Wes McKinney is working on that (see his last post: http://wesmckinney.com/blog/?p=697).

    ReplyDelete
  3. Nice Information.....
    Please refer this site also
    Java Training in Chennai,

    javatraininginchennai

    ReplyDelete
  4. Java is one of the popular technologies with improved job opportunity for hopeful professionals. Java Training in Chennai helps you to study this technology in details.

    ReplyDelete
  5. It was really a wonderful article and I was really impressed by reading this blog. We are giving all software Course Online Training. The HTML5 Training in Chennai is one of the reputed Training institute in Chennai. They give professional and real time training for all students.

    ReplyDelete
  6. Your information is really useful for me.Thanks for sharing such a valuable information. If anyone wants to get SEO Training in Chennai visit FITA Academy located at Chennai. Rated as No.1 SEO Training institutes in Chennai.

    ReplyDelete
  7. your information is really useful for me.Most of the company using the python programming language.Thank you for your discussion you paragraphBest Python training institute in Chennai

    ReplyDelete
  8. your information is very useful for python programming language.Python training center in Chennai

    ReplyDelete
  9. You have stated definite points about the technology that is discussed above. The content published here derives a valuable inspiration to technology geeks like me. Moreover you are running a great blog. Many thanks for sharing this in here.

    Salesforce Training in Chennai
    Salesforce Training
    Salesforce training institutes in chennai

    ReplyDelete
  10. If wants to get real time Oracle Training visit this blog They give professional and job oriented training for all students.To make it easier for you Greens Technologies trained as visualizing all the real-world Application and how to implement in Archiecture trained with expert trainners guide may you want.. Start brightening your career with us Green Technologies In Chennai

    ReplyDelete
  11. Nice site....Please refer this site also if Our vision succes!Training are focused on perfect improvement of technical skills for Freshers and working professional. Our Training classes are sure to help the trainee with COMPLETE PRACTICAL TRAINING and Realtime methodologies Green Technologies In Chennai

    ReplyDelete
  12. I also wanted to share few links related to sas training Check this sitete.if share indepth sas training.Go here if you’re looking for information on sas training. SAS Training in Chennai

    ReplyDelete
  13. This site has very useful inputs related to qtp.This page lists down detailed and information about QTP for beginners as well as experienced users of QTP. If you are a beginner, it is advised that you go through the one after the other as mentioned in the list. So let’s get started… QTP Training in Chennai

    ReplyDelete
  14. Hi. Nice post. I am wondering if it is possible.Actually pega software that can be used in many companies for their day to day business activities it has great scope in future.if suggest best coaching center visit Pega Training in Chennai

    ReplyDelete
  15. fantastic presentation of informatica..if sharinng this session will describe near real-time architectures for accelerating the delivery of data to critical analytics and customer service applications in real world once again i want to share this sites Informatica Training in chennai

    ReplyDelete
  16. Hey, nice site you have here!We provide world-class Oracle certification and placement training course as i wondered Keep up the excellent work experience!Please visit Greens Technologies located at Chennai Adyar Oracle Training in chennai

    ReplyDelete
  17. if share valuable information about hadoop training courses, certification, online resources, and private training for Developers, Administrators, and Data Analysts may visit Hadoop Training in Chennai

    ReplyDelete
  18. Hey, nice site you have here!We provide world-class Oracle certification and placement training course as i wondered Keep up the excellent work experience!Please visit Greens Technologies located at Chennai Adyar Oracle Training in chennai

    ReplyDelete
  19. I would recommend the Qlikview course to anyone interested in learning Business Intelligence .Absolutely professional and engaging training sessions helped me to appreciate and understand the technology better. thank you very much if our dedicated efforts and valuable insights which made it easy for me to understand the concepts taught and more ... qlikview Training in chennai

    ReplyDelete
  20. Thanks for sharing this informative blog .To make it easier for you Greens Techonologies at Chennai is visualizing all the materials about (OBIEE).SO lets Start brightening your future.and using modeling tools how to prepare and build objects and metadata to be used in reports and more trained itself visit Obiee Training in chennai

    ReplyDelete
  21. Very good articles,thanks for sharing this useful information.

    Hyperion

    Informatica

    ReplyDelete
  22. hai,i have to learned to lot of information about java Gain the knowledge and hands-on experience you need to successfully design, build and deploy applications with java.
    Java Training in Chennai

    ReplyDelete
  23. hybernet is a framework Tool which helps in Functional and Regression testing of an application. If you are interested in hybernet training, our real time working.
    Hibernate Training in Chennai,

    ReplyDelete
  24. Looking for real-time training institue.Get details now may if share this link visit
    Spring Training in chennai
    oraclechennai.in:

    ReplyDelete
  25. Awesome blog if our training additional way as an SQL and PL/SQL trained as individual, you will be able to understand other applications more quickly and continue to build your skill set which will assist you in getting hi-tech industry jobs as possible in future courese of action..visit this blog
    plsql in Chennai
    greenstechnologies.in:

    ReplyDelete
  26. Nice site.... refer this site .if Our vision succes!Training are focused on perfect improvement of technical skills for Freshers and working professional. Our Training classes are sure to help the trainee with COMPLETE PRACTICAL TRAINING and Realtime methodologies.
    Oracle Rac Training Chennai
    haddoop:

    ReplyDelete
  27. Job oriented form_reports training in Chennai is offered by our institue is mainly focused on real time and industry oriented. We provide training from beginner’s level to advanced level techniques thought by our experts.
    forms-reports Training in Chennai

    ReplyDelete
  28. hai,i have to learned to lot of information about java Gain the knowledge and hands-on experience you need to successfully design, build and deploy applications with java.
    Java Training in Chennai

    ReplyDelete
  29. hybernet is a framework Tool which helps in Functional and Regression testing of an application. If you are interested in hybernet training, our real time working.
    Hibernate Training in Chennai,

    ReplyDelete
  30. Looking for real-time training institue.Get details now may if share this link visit
    Spring Training in chennai
    oraclechennai.in:

    ReplyDelete
  31. Awesome blog if our training additional way as an SQL and PL/SQL trained as individual, you will be able to understand other applications more quickly and continue to build your skill set which will assist you in getting hi-tech industry jobs as possible in future courese of action..visit this blog
    plsql in Chennai
    greenstechnologies.in:

    ReplyDelete
  32. Nice site.... refer this site .if Our vision succes!Training are focused on perfect improvement of technical skills for Freshers and working professional. Our Training classes are sure to help the trainee with COMPLETE PRACTICAL TRAINING and Realtime methodologies.
    Oracle Rac Training Chennai
    haddoop:

    ReplyDelete
  33. Job oriented form_reports training in Chennai is offered by our institue is mainly focused on real time and industry oriented. We provide training from beginner’s level to advanced level techniques thought by our experts.
    forms-reports Training in Chennai

    ReplyDelete


  34. hai you have to learned to lot of information about c# .net Gain the knowledge and hands-on experience you need to successfully design, build and deploy applications with c#.net.
    C-Net-training-in-chennai

    ReplyDelete


  35. hai If you are interested in asp.net training, our real time working.
    asp.net Training in Chennai.
    Asp-Net-training-in-chennai.html

    ReplyDelete

  36. Amazing blog if our training additional way as an silverlight training trained as individual, you will be able to understand other applications more quickly and continue to build your skill set which will assist you in getting hi-tech industry jobs as possible in future courese of action..visit this blog
    silverlight-training.html
    greenstechnologies.in:

    ReplyDelete


  37. awesome Job oriented sharepoint training in Chennai is offered by our institue is mainly focused on real time and industry oriented. We provide training from beginner’s level to advanced level techniques thought by our experts.
    if you have more details visit this blog.
    SharePoint-training-in-chennai.html

    ReplyDelete



  38. if share valuable information about cloud computing training courses, certification, online resources, and private training for Developers, Administrators, and Data Analysts may visit
    Cloud-Computing-course-content.html

    ReplyDelete
  39. Embedded system training: Wiztech Automation Provides Excellent training in embedded system training in Chennai - IEEE Projects - Mechanical projects in Chennai. Wiztech provide 100% practical training, Individual focus, Free Accommodation, Placement for top companies. The study also includes standard microcontrollers such as Intel 8051, PIC, AVR, ARM, ARMCotex, Arduino, etc.

    Embedded system training in chennai
    Embedded system course in chennai
    VLSI trraining in chennai
    Final year projects in chennai

    ReplyDelete
  40. Embedded system training: Wiztech Automation Provides Excellent training in embedded system training in Chennai - IEEE Projects - Mechanical projects in Chennai. Wiztech provide 100% practical training, Individual focus, Free Accommodation, Placement for top companies. The study also includes standard microcontrollers such as Intel 8051, PIC, AVR, ARM, ARMCotex, Arduino, etc.

    Embedded system training in chennai
    Embedded Course training in chennai
    Matlab training in chennai
    Android training in chennai
    LabVIEW training in chennai
    Robotics training in chennai
    Oracle training in chennai
    Final year projects in chennai
    Mechanical projects in chennai
    ece projects in chennai

    ReplyDelete
  41. WIZTECH Automation, Anna Nagar, Chennai, has earned reputation offering the best automation training in Chennai in the field of industrial automation. Flexible timings, hands-on-experience, 100% practical. The candidates are given enhanced job oriented practical training in all major brands of PLCs (AB, Keyence, ABB, GE-FANUC, OMRON, DELTA, SIEMENS, MITSUBISHI, SCHNEIDER, and MESSUNG)

    PLC training in chennai
    Automation training in chennai
    Best plc training in chennai
    PLC SCADA training in chennai
    Process automation training in chennai
    Final year eee projects in chennai
    VLSI training in chennai

    ReplyDelete