Pages

Free Online Book for Introducing Data Mining!

Friday, December 31, 2010



For anyone who wants to start on the data mining topics, I extremely recommend this  book a available online for free.  I found it last week and one of the interesting features of this book that caught my attention was its structure and visualization.  They provided a table of content which is actually a Data Mining Map.




The book is practical and more suited for people who wants to be introduced to Data Mining in a easy way. There are a lot of pictures, tables , definitions and exercises.   This project was created by the Department of Chemical Engineering and Applied Chemistry at the University of Toronto.

Great work and finding!

Marcel Caraciolo

Developing a Computer-Assisted Twitter User . Meet @caocurseiro !

Wednesday, December 29, 2010

Hi all,

It has been a while since my last post but the reason is that I've been working in some new stuff related to social recommendation and at my master thesis. But this post is to talk about a recent project that I developed for a brazilian social network called AtePassar.  AtePassar is a brazilian social network for people who wants apply for positions at brazilian civil (government) services.  One of the great features of this social network is because people can share their interests about studies and meet people all around Brazil with same interests or someone that will apply for the same exam as him.  Can you imagine the possibilities ?

AtePassar  Main Page

It is a social network for students into a virtual space where there are several relations of friendship, studies and even exam partners. All the network services are free for the users and there is also a e-commerce store where the users can buy resources for helping them in their studies like video lectures, papers, books, etc. My current job there is to apply intelligence in the social network, specifically collective intelligence. Right now I am working in a big project for a social recommender at AtePassar (it will be soon posted here explaining how I managed to build it) and small projects for helping the social network to fetch more new users.   

One of them that I'd like to comment is the new Twitter Bot that I developed that tries to simulate a human, specifically controlling all the actions of a twitter account.  Let me explain more about it.  The goal is not to develop a SPAM bot or a bot that only talk about specified terms.  I imagined it as a bot that could post new updates about open public positions at Brazil, self-helping quotes for stimulating people and simple mentions related to the AtePassar. Also it would retweet tweets from another users that speak about open public exams in Brazil, send mentions to people that are interest about those subjects presenting them the AtePassar social network and even following those users.

Written in Python and hosted at Google AppEngine I created the bot @caocurseiro, who is the mascot of the social network. What does it do ?
  • Post 1 to n updates in a random time set by a interval defined by the administrator.
  • Send 1 to n mentions to new users  (@newuser) in a random time set by a interval defined by the administrator
  • Follow 1 to n new users in a random time set by a interval defined by the administrator
  • Retweet 1 to n tweets in a random time set by a interval defined by the administrator.
Of course, since the Twitter politics is against at all kinds of SPAM, I developed this bot to obey the rules and the rare limits established by them.  Other feature is that  he even sleeps during the night (our Brazilian local time)  for a random period.  I am still thinking how to improve it by answering mentions when is directed to him and improve the quality of the posts that sometimes come repeated.

It has been online for about two days our mascot and he's already following 42 users and it's followed by 25 users.  As soon as I put a new retweets rules which it would retweet posts from another users about  subjects related to open public exams and related stuff, I think it will improve much more the quality of the bot.

Unfortunately the code is not yet open-source, but I am working on it to provide it as open-source.  I think the biggest contribution is how to develop a computer driven Twitter User to be similar as human user as possible. Although he still uses static rules, maybe I could put some intelligence to him by answering real questions about any open public exams at Brazil ( future ideas).

Caocurseiro Twitter Computer's Driven

That's all,

Marcel Caraciolo

My lecture about Recommender Systems at IX Pernambuco Python User Group Meeting and my contributions.

Tuesday, December 7, 2010

Hi all,

It has been while since my last post, but my master thesis is taking a lot of time available. Soon I will come back with posts and content related to what I am working now.

But the main reason of this post is to publish my lectures that were recorded at the IX Pernambuco Python User Group Meeting (PUG-PE) last month in November.  I had the opportunity to talk more about what I am studying, which is related to the topics of recommender systems and a lighting talk about lighting talks!

But since this blog concerns about artificial intelligence,  I will focus on the recommender systems. In this lecture I've introduced the main concepts behind recommender systems, how it works, the advantages and drawbacks of each classing filtering algorithm.  Both examples presented were used in my lecture at PythonBrasil (a main meeting that joins all Python Users of Brazil).  The result of this project will be explained in the future in two posts. But let me explain my main contributions in this field.

One is my work currently at the startup Orygens, where I am developing a novel recommender system applied on social networks. The idea is to recommend users and content to the users of a brazilian social network called AtePassar.

Main Profile of AtePassar



The other contribution is the development of a framework written purely in Python called Crab.  It was originally idealized by me to be a simple easy-to-use recommendation engine framework in order to  be applied on any domains. Besides it will be used to test new approaches, since it will be easy to plug-in new recommender algorithms and test them with the evaluation tools available.  This project is open-source is completely available at my account on GitHub.com.

The main page of the Crab Project


Today we have four collaborators and we are planning to keep going forward with some demos and a distributed computing framework totally integrated with Crab.  More information I will provide soon here at my blog with some demonstrations.


My video about Recommender Systems.  The video is in portuguese.








Wait for news!  Please any further information, contact me!

Marcel Caraciolo