Providing Recommendations in Social Networks using Python: AtePassar Study Case

Monday, March 14, 2011

Hi all,

Recently I've been working on recommendations, specially related to social networks. One of my tasks is to investigate, create and analyze a recommendation engine capable of generating suggestions of friends,  study groups, videos and related content to a registered user in a social network.

The social network that I am working on is called AtePassar, a brazilian social network for people who wants apply for positions at brazilian civil (government) services.  One of the great features of this social network is because people can share their interests about studies and meet people all around Brazil with same interests or someone that will apply for the same exam as him.  Can you imagine the possibilities ?

It is a social network for students into a virtual space where there are several relations of friendship, studies and even exam partners.

AtePassar Social Network

Since we believe in interaction between people and discovering of relevant content a real need for users inside a social network, specially finding tasted-like users which indicate the similarity of their needs and opinions,  we decided to build a recommender system capable of recommending new users based on similar interests such as common friends,  common videos both have watched or study groups where both have joined in, etc.   We  also have developed a study group recommender, which suggests relevant  study groups based on the friends' study groups that they are participating and the active user is not.  Finally, we also have developed a video recommender,  providing suggestions for classes (on-line video classes) relevant for the users based on what his friends has already watched.

As you may have noticed that we focus on more in the collaborative filtering process, where we are more interested on finding similar users and bring unknown items closer to the active user's historical preferences  as possible recommendations.

Recommendations SideBar

One of our priorities in this recommendation process is to always explain to the users the reason of the recommendation is presented to them.  We believe it is really important for the user to know the relevance of the recommendation and even for us to improve the level of acceptance of the recommendation.  If you receive a recommendation and that recommendation is joined with extra information that 4 friends of you also liked that item, it can be more meaningful than only receiving the recommendation without knowing the reasons.

Another priority for us is to provide recommendation to all users, even new users who has started using the AtePassar network and don't have sufficient information in his profile (friends, studygroups, etc) to bring relevant recommendation. I've developed a simple algorithm that it is a solution for this common problem in recommendations called 'cold start'.  Cold Start happens specially for new users, where there is not enough user and item information and therefore it's hard for the recommendation system to produce recommendations. We decided to bring the most accepted recommendations from users in our social network (the most popular) and populate them as possible suggestions for new users.  We know this is not the best solution, because it is not personalized, but it can bring recommendations out-of-the box in cases where we are newbies in the social network.

We are doing a poll to know from Atepassar users to know more about whether they are liking the suggestions our system is providing.  More than 60% of the ones who have answered the poll said that they like most of suggestions provided.  We are working harder to even improve this recommendation process bringing more content to be recommended and the utility by considering extra information in the user's profile.

Here a brief introduction (video) to the social recommender engine running at AtePassar Social Network.

In the next posts I will bring more information in a detailed view of development and explain more about recommender engines, the area I am working on also in my master thesis. I've been using also a open-source recommendation engine in this work, it is beginning but we are improving in small steps bringing new releases every month.   Until now, our framework called Crab is only working for Collaborative-filtering  recommendations (written in Python) and we are planning for the next releases bring Content-based ones and distributed algorithms  using map-reduce features, etc.

If you want to take a look at our recommendation engine, please check it out here in this link (It is hosted in my personal GitHub repository).  In a previous post in my blog I introduced the framework and I am planning to write a series of posts to deep into recommendation engines, explaining on how to use it, evaluations, etc.

I also wrote an introduction for recommendation engines if you're starting now in this machine learning field. You can check it out here.

I hope you enjoyed,

Marcel Caraciolo

Atepassar Social Network Friendship Connections Visualizations using GeoLocalization!

Friday, March 11, 2011

Hi all

I've been looking after some visualization tools for social networks in order to present a visual representation of the the AtePassar social network, helping me to see how the users are connected and  visualize the friendships between them.  However, sometime ago I found this post about a new visualization created by the Facebook team which has explored new types of visualization. They plotted a new visualization that showed how geography and political borders affected where people lived relative to their friends.  This visualization focused on which cities all around the world had a lot of friendships between them.

The result is shown here:

Facebook Friendship Visualization

If you want to know more about the how they managed to create this map, you can check the Facebook's blog.  Inspired by this work I decided to create one by my own analyzing the AtePassar network. AtePassar is a famous brazilian social network where I work for as a data mining analyst creating and bringing collective intelligence to improve the features of the website.

AtePassar Social Network

So I have created a Python script which exports the data from the AtePassar Profile users and then convert it to a structured file with information of each user's current city and summed the number of friends between each pair of cities. Then, I merged the data with the longitude and latitude of each city. The coordinates of the brazilian cities were obtained at the Datasus website, a Brazilian data repository    
for the government with statistics and data files about Brazil's  population, health, geography, etc.  You can download the database with the information of the cities here.  To open and read it you can use a third-party library called dbfpy, which handles with .dbf data files. The script is available for download at my personal repository at Github. You can use and modify it for your needs. 

The result of all the experiment is shown in the figure below.  There are some interesting insights about it:

Atepassar SocialNetwork until Feb 2011 - Friendship Visualization

  • There are several black areas in the map. Since Brazil is a huge country and there are several places, specially in the North region where we have the Amazon Forest, the demography there is quite low, so we don't have many users around there.  Also, in the North region is the region with the lowest number of users at AtePassar. We see only in the capitals the presence of users, so we believe the access to internet is still a problem around that region or maybe our network is not yet released there.
  • We have a great number of users in Recife (PE), São Paulo (SP), Rio de Janeiro (RJ) and Brasília (DF).  As you may see the white shinning lines that interconnect those states in contrast to another cities states in the map. We believe Recife is important specially because the team working behind AtePassar is from Recife, PE, so the marketing around network there is more present than other cities. Another reason is because of the videos available at Atepassar, which the provider (the course and teachers staff) is also quite famous around Recife, PE.  São Paulo, Rio de Janeiro and Brasília are considered currently the cities that have the greatest number of students registering for public exams according to a research made by a popular news site  CorreioWeb, specialized in news about public exams.

After seeing the Perone's post at his blog using the visualization tool Gource to create a new visualization for the Google Analytics,  I realized that project could help me to tell the history of AtePassar Social Network. After writing some python code,  I decided to represent the users by using the states of the users and I also changed the default user icon from Gource to brazilian state flags (You can download them here). 

The social network started at 2009 and launched for public in middle of 2010, where today the network have more than 30 thousand users registered.  We modified the Gource in a way that it could represent the history users registering of all social network by showing the users and his hometowns. Unfortunately, Gource does not work with more than ~= 15.000 nodes, so I decided to show only a period of the social network since its launch until April 2010.  

I've also tried the visualization tool 3D Ubigraph, however since there were thousands of nodes, it didn't work for long periods. This time I've tried to present the network in a different aspect by checking the friendship between the users. It is clear in the video below that the network centers around between two users, by the way, the founders of the social network rjcf and marcoscampello. Another aspect to see is that there are many users but with low degree of friendship. This happens because the timeline of the socialnetwork is the same for all users. Different from Twitter, the user in AtePassar can see what everyone posts in the timeline. We believed that in the beginning of the social network in order to estimulate the interaction between users, we decided to show the posts of all users at Atepassar. But the team is looking carefully if the timeline stream becomes overloaded. The video is presented below.

I was so excited with the results that I decided to use the Gource tool for presenting the history of all users that joined our local community of Python Technology here at Pernambuco-Brazil to present in a lecture of one of our meetings. You can read more about it in this post.

I'd like to mention Andreas Kaltenbrunner for supporting me in this work, giving me some insights on how drawing the brazilian map using coordinates.  He did a similar work on a spanish social network called Tuenti. You can see his post about it here.

I hope you like it,


Marcel Caraciolo