Recommender Engines and Data mining with Python at PythonBrasil Conference

Thursday, October 28, 2010

Hi all,

Last week I was at an important event in Curitiba called Python Brasil. It is a annual event where it joins several brazilian developers to discuss about technology and of course about the programming language Python.

I also had the opportunity  to lecture three presentations about several topics of my interest.   The official presentation was about recommendation engines with Python.  This work shows how developers could use python in developing recommendation engines with several examples and explaining the main concepts behind this subject.  The best part was the demos where I used real data from the web such as Twitter, to suggest users that are similar to me among the PythonBrasil followers.  The other example is based on collective buying, which i crawled some popular brazilian web sites and gathered real offers at Curitiba. The main idea is to recommend new offers according to my interests and what people similar to me also liked. It is the classical example of the collaborative filtering, commonly used at several e-commerces today including Amazon.
The presentation was great with lots of feedback.  If you want to take a look at the slides (it is on portuguese) please take a look here:

The main contribution of this work is a new library for building recommendation engines in Python language called Crab.  I've decided with some colleagues to develop this library in order to be a powerful tool for developers to use python as the main language to build and use classical recommender algorithms in their applications. Besides it is extensible so developers can add easily new algorithms to the engine. There is also a easy API so users can plug with their web apps running in different platforms such as Django, AppEngine, Web2Py, etc. 

If you want to collaborate or interest about this subject please feel free to join us at this work. The project is hosted at GitHub with the link:

My second presentation was a lighting talk. What's that ?  There is a extra category of presentations in PythonBrasil where you have 5 minutes to speak about any topic you want.  The one rule is 5 minutes, no more!   I was challenged so I one day before the presentation to develop a web crawler in order to scrap all the lectures submitted and approved at PythonBrasil conference.  With all this data in my hands I've decided to make an analysis to answer three questions in my mind:

a) Which are the main and  frequent topics showed at the keynotes at Python Brazil ?

b) Based on this information, how we could organize the speakers based on those topics ? That is group speakers with similar topics. A classical problem of clustering.

c) What information we could also extract such as level of expertise of the lectures, total time spent in the lectures, etc.

For all those questions I was seeking to answer I decided to use Python, Matplotlib and Ubigraph (A 3D Visualization tool for graphs).   It was really interesting because I really could find some groups based on similar interests.  The main subjects was Entrepreneurship, Hardware, Web, Design Patterns, Data Mining, Django and Artificial Intelligence.

With those subjects I could now group the speakers using a simple clustering algorithm such as K-means and organize them based what their topics were. I've recorded a simple video to present the result using the tool Ubigraph. Take a look:

The presentation in portuguese  you can see here:

In the end I think the event was awesome some great keynotes and of course lots of new contacts at my network.  I have to say it is a great opportunity to meet great people and share ideas and technology!

Next year it will be in São Paulo, Brazil. I expect to be there !

Best regards,

Marcel Caraciolo

My new experiment: TweetTalk : A Twitter Post Chatter ;D Update tweets from your Google Talk Account!

Saturday, October 2, 2010

Hi all,

During my recent studies about Chatter Bots, I've inspired myself to build a new one now integrating Twitter and Google Talk IM Service.  His name is  TweetTalk.  What's the catch ?  

TweetTalk is a Jabber IM Bot for anyone who wants to quickly update a post on your Twitter Account. So instead of going to a separated client, directly from anywhere you can access the Google Talk Engine (Web mail or Desktop Client) you can just write your tweet and the bot will responsible of sending the post to the Twitter.

Let's see it in action:

The tweet status on Twitter:

As you can see, it is really fast now for me to send a tweet from my webmail gmail. It is a simple experiment of how those type of bots can improve your life and work daily.

If you wanna try it, please add to your contacts:  

By the way I am using Python for developing the main logic of the Bot.  For web communication I used Django + Google AppEngine + Twitter API.  And as the bot infra-structure the Imified API.

I am writing some new articles about performance evaluation, recommendation engines, REST APIs and SVM with Keyword/Term Extraction. 

Stay tuned !

Marcel Caraciolo