Hi all,
In this post I will present one of the tecnhiques used at Atépassar, a brazilian social network that help students around Brazil in order to pass the exams for a civil job, our recommender system.
I will describe some of the data models that we use and discuss our approach to algorithmic innovation that combines offline machine learning with online testing. For this task we use distributed computing since we deal with over with 140 thousand users. MapReduce is a powerful technique and we use it by writting in python code with the framework MrJob. I recommend you to read further about it at my last post here.
One of our recommender techniques is the simple 'people you might know' recommender algorithm. Indeed, there are several components behind the algorithm since at Atépassar, users can follow other people as also be followed by other people. In this post I will talk about the basic idea of the algorithm which can be derivated for those other components. The idea of the algorithm is that if person A and person B do know each other but they have a lot of mutual friends, then the system should recommend that they connect with each other.
We will implement this algorithm using the MapReduce architecture in order to use Hadoop, which is a open source software for highly reliable, scalable distributed computing. I assume that you already familiar with those concepts, if not please take a look at those posts and see what map and reduce jobs are.
But before introducing the algoritm, let's present a simple algorithm in case of the bi-directional connections, that is, if I am your friend, you are also my friend. In order to recommend such friends, we first need to count the number of mutual friends that each pair of users have in the network. For this, we will need to implement a map reduce job that works similar to the job that count the frequency of words in a file. For every pair of friends in a list, we output the tuple {friend1, friend2}, 1:
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
class FriendsRecommender(MRJob): | |
def steps(self): | |
return [self.mr(self.map_input, self.count_number_of_friends), | |
self.mr(self.count_max_of_mutual_friends, | |
self.top_recommendations)] | |
def map_input(self, key, line): | |
''' | |
Compute a cartesian product using nested loops | |
for each friend in connection_list | |
Input (source -> {“friend1”, “friend2”, “friend3”}): | |
marcel,jonas,maria,jose,amanda | |
Output {[source, friend1], -1}; | |
{[friend1, friend2], 1};): | |
["jonas", "marcel"] -1 | |
["jonas", "maria"] 1 | |
["jonas", "jose"] 1 | |
["amanda", "jonas"] 1 | |
["marcel", "maria"] -1 | |
["jose", "maria"] 1 | |
["amanda", "maria"] 1 | |
["jose", "marcel"] -1 | |
["amanda", "jose"] 1 | |
["amanda", "marcel"] -1 | |
''' | |
input = line.split(';') | |
user_id, item_ids = input[0], input[1:] | |
for i in range(len(item_ids)): | |
f1 = item_ids[i] | |
if user_id < f1: | |
yield (user_id, f1), -1 | |
else: | |
yield (f1, user_id), -1 | |
for j in range(i + 1, len(item_ids)): | |
f2 = item_ids[j] | |
if f1 < f2: | |
yield (f1, f2), 1 | |
else: | |
yield (f2, f1), 1 |
In addition to this, we also output a tuple with -1 for every pair of users who are friends {user_id, friend}, -1.
Now, the reducer gets a input key denoting a pair of friends along with their list of their number of common friends.
In the reduce function, we can check if the first record has a -1 in the value. If there is such a reccord, we can ignore that pair of friends, because they already have a direct connection between them. Finally we aggregate the values of all the keys in the reducer and output the tuple for all the pair of friends which don't have third attribute as -1 in the key.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
def count_number_of_friends(self, key, values): | |
''' | |
Count the number of mutual friends. | |
Input ({[friend1, friend2], [-1, -1]}; | |
{[friend1, friend2],[1, 1]};): | |
["jonas", "marcel"] -1 | |
["marcel", "maria"] -1 | |
["jose", "marcel"] -1 | |
["amanda", "marcel"] -1 | |
["fabio", "marcel"] 1 | |
["fabiola", "marcel"] 1 | |
["carol", "marcel"] 1 | |
["carol", "marcel"] 1 | |
Output ({[friend1, friend2], numberOfMutualFriends}): | |
["fabio", "marcel"] 1 | |
["fabiola", "marcel"] 1 | |
["marcel", "patricia"] 1 | |
["marcel", "paula"] 1 | |
["carol", "marcel"] 2 | |
''' | |
f1, f2 = key | |
mutual_friends_count = 0 | |
for value in values: | |
if value == -1: | |
return | |
mutual_friends_count += value | |
yield (f1, f2), mutual_friends_count |
After the first map-reduce job, we obtain a list of pair of persons along the number of common friends that they have. Our final map-reduce job looks at this list {[friend1, friend2], numberOfMutualFriends} and outputs a list of persons they have maximum number of common friends with. Our map job outputs {friend1,[numberOfMutualFriends, friend2]}, {friend2, [numberOfMutualFriends, friend1]} .
The reducer would look at the person in the key and our comparator would look at the numberOfCommonFriends to sort the keys. This would ensure that the tuples for the same person go on the same reducer and in a sorted order by the number of common friends. Our reducer then just need to look at the top 5 values and output the list (TOP_N).
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
def count_max_of_mutual_friends(self, key, values): | |
''' | |
Prepare the dataset to yield the source and | |
get the top suggestions. | |
Input ({[friend1, friend2], numberOfMutualFriends}): | |
["fabio", "marcel"] 1 | |
["fabiola", "marcel"] 1 | |
["marcel", "patricia"] 1 | |
["marcel", "paula"] 1 | |
["carol", "marcel"] 2 | |
Output ({friend1, [numberOfMutualFriends, friend2]}, | |
{friend2, [numberOfMutualFriends, friend1]}): | |
"fabio", [1,"marcel"] | |
"marcel", [1,"fabio"] | |
"fabiola", [1,"marcel"] | |
"marcel", [1,"fabiola"] | |
"marcel", [1,"patricia"] | |
"patricia", [1,"marcel"] | |
"marcel", [1,"paula"] | |
"paula", [1,"marcel"] | |
"marcel", [2,"carol"] | |
"carol", [2,"marcel"] | |
''' | |
f1, f2 = key | |
# for score in values: | |
yield f1, (f2, int(values)) | |
yield f2, (f1, int(values)) | |
def top_recommendations(self, key, values): | |
''' | |
Get the TOP N recommendations for user. | |
Input ({friend1, [(numberOfMutualFriends, friend), | |
(numberOfMutualFriends2, friend)]}): | |
"fabio", [[1,"marcel"]] | |
"marcel", [[2,"carol"], [1,"fabio"], [1,"fabiola"], [1,"patricia"], | |
[1,"paula"]] | |
"fabiola", [[1,"marcel"]] | |
"patricia", [[1,"marcel"]] | |
"paula", [[1,"marcel"]] | |
"carol", [[2,"marcel"]] | |
Output ({friend1, [(numberOfMutualFriends, friend), | |
(numberOfMutualFriends2, friend)]}): | |
Ex: Get the top 3 suggestions. | |
"fabio", [[1,"marcel"]] | |
"marcel", [[2,"carol"], [1,"fabio"], [1,"fabiola"]] | |
"fabiola", [[1,"marcel"]] | |
"patricia", [[1,"marcel"]] | |
"paula", [[1,"marcel"]] | |
"carol", [[2,"marcel"]] | |
''' | |
recommendations = [] | |
for idx, (item, score) in enumerate(values): | |
recommendations.append((item, score)) | |
yield key, sorted(recommendations, key=lambda k: -k[1])[:TOP_N] |
The reducer would look at the person in the key and our comparator would look at the numberOfCommonFriends to sort the keys. This would ensure that the tuples for the same person go on the same reducer and in a sorted order by the number of common friends. Our reducer then just need to look at the top 5 values and output the list (TOP_N).
Now, we run through our social network data.
marcel;jonas,maria,jose,amanda
maria;carol,fabiola,amanda,marcel
amanda;paula,patricia,maria,marcel
carol;maria,jose,patricia
fabiola;maria
paula;fabio,amanda
patricia;amanda,carol
jose;marcel,carol
jonas;marcel,fabio
fabio;jonas,paula
carla
Let's see some interesting stuff. The recommended friends to me are:
"marcel" [["carol", 2], ["fabio", 1], ["fabiola", 1], ["patricia", 1], ["paula", 1]]
"maria" [["jose", 2], ["patricia", 2], ["jonas", 1], ["paula", 1]]
"patricia" [["maria", 2], ["jose", 1], ["marcel", 1], ["paula", 1]]
"paula" [["jonas", 1], ["marcel", 1], ["maria", 1], ["patricia", 1]]
"amanda" [["carol", 2], ["fabio", 1], ["fabiola", 1], ["jonas", 1], ["jose", 1]]
"carol" [["amanda", 2], ["marcel", 2], ["fabiola", 1]]
"fabio" [["amanda", 1], ["marcel", 1]]
"fabiola" [["amanda", 1], ["carol", 1], ["marcel", 1]]
"jonas" [["amanda", 1], ["jose", 1], ["maria", 1], ["paula", 1]]
"jose" [["maria", 2], ["amanda", 1], ["jonas", 1], ["patricia", 1]]
As I expected the person with most common friends to me is carol, with maria and jose.
marcel;jonas,maria,jose,amanda
maria;carol,fabiola,amanda,marcel
amanda;paula,patricia,maria,marcel
carol;maria,jose,patricia
fabiola;maria
paula;fabio,amanda
patricia;amanda,carol
jose;marcel,carol
jonas;marcel,fabio
fabio;jonas,paula
carla
Let's see some interesting stuff. The recommended friends to me are:
"marcel" [["carol", 2], ["fabio", 1], ["fabiola", 1], ["patricia", 1], ["paula", 1]]
"maria" [["jose", 2], ["patricia", 2], ["jonas", 1], ["paula", 1]]
"patricia" [["maria", 2], ["jose", 1], ["marcel", 1], ["paula", 1]]
"paula" [["jonas", 1], ["marcel", 1], ["maria", 1], ["patricia", 1]]
"amanda" [["carol", 2], ["fabio", 1], ["fabiola", 1], ["jonas", 1], ["jose", 1]]
"carol" [["amanda", 2], ["marcel", 2], ["fabiola", 1]]
"fabio" [["amanda", 1], ["marcel", 1]]
"fabiola" [["amanda", 1], ["carol", 1], ["marcel", 1]]
"jonas" [["amanda", 1], ["jose", 1], ["maria", 1], ["paula", 1]]
"jose" [["maria", 2], ["amanda", 1], ["jonas", 1], ["patricia", 1]]
As I expected the person with most common friends to me is carol, with maria and jose.
There are still plenty of things that can be done here in the implementation in order to have our final recommender, for example here we are not considering the followers in common or whether if we live at the same state. Even the results and scalability can be improved.
And what about recommending friends from Facebook? I decided to mine some data based on connections between my connections.
$python friends_recommender.py - r emr --num-ec2-instances 5 facebook_data.csv > output.dat
Let's see some results:
Let's pick my friend Rafael Carício. The top suggestions for him would be:
"Rafael_Caricio_584129827" [["Alex_Sandro_Gomes_625299988", 28], ["Oportunidadetirecife_Otir_100002020544265", 26], ["Thiago_Diniz_784848380", 20], ["Guilherme_Barreto_735956697", 18], ["Andre_Ferraz_100001464967635", 17], ["Sofia_Galvao_Lima_1527232153", 16], ["Edmilson_Rodrigues_1003183323", 15], ["Pericles_Miranda_100001613052998", 14], ["Andre_Santos_100002368908054", 14], ["Edemilson_Dantas_100000732193812", 13]]
Rafael is my old partner and studied with me at UFPE (University). All recommended are entrepreneurs or old students from CIN/UFPE.
And about my colleague Osvaldo Santana, one of the most famous python developers in Brazil.
"Osvaldo_Santana_Neto_649598880" [["Hugo_Lopes_Tavares_100000635436030", 14], ["Francisco_Souza_100000560629656", 12], ["Daker_Fernandes_Pinheiro_100000315704652", 8], ["Flavio_Ribeiro_100000349831236", 6], ["Jinmi_Lee_1260075333", 5], ["Alex_Sandro_Gomes_625299988", 5], ["Romulo_Jales_100001734547813", 5], ["Felipe_Andrade_750803015", 5], ["Adones_Cunha_707904846", 5], ["Flavio_Junior_100000544023443", 4]]
Facebook Data
And what about recommending friends from Facebook? I decided to mine some data based on connections between my connections.
$python friends_recommender.py - r emr --num-ec2-instances 5 facebook_data.csv > output.dat
Let's see some results:
Let's pick my friend Rafael Carício. The top suggestions for him would be:
"Rafael_Caricio_584129827" [["Alex_Sandro_Gomes_625299988", 28], ["Oportunidadetirecife_Otir_100002020544265", 26], ["Thiago_Diniz_784848380", 20], ["Guilherme_Barreto_735956697", 18], ["Andre_Ferraz_100001464967635", 17], ["Sofia_Galvao_Lima_1527232153", 16], ["Edmilson_Rodrigues_1003183323", 15], ["Pericles_Miranda_100001613052998", 14], ["Andre_Santos_100002368908054", 14], ["Edemilson_Dantas_100000732193812", 13]]
Rafael is my old partner and studied with me at UFPE (University). All recommended are entrepreneurs or old students from CIN/UFPE.
And about my colleague Osvaldo Santana, one of the most famous python developers in Brazil.
"Osvaldo_Santana_Neto_649598880" [["Hugo_Lopes_Tavares_100000635436030", 14], ["Francisco_Souza_100000560629656", 12], ["Daker_Fernandes_Pinheiro_100000315704652", 8], ["Flavio_Ribeiro_100000349831236", 6], ["Jinmi_Lee_1260075333", 5], ["Alex_Sandro_Gomes_625299988", 5], ["Romulo_Jales_100001734547813", 5], ["Felipe_Andrade_750803015", 5], ["Adones_Cunha_707904846", 5], ["Flavio_Junior_100000544023443", 4]]
Interesting! Recommended other Python Developers as also some entrepreneurs! :)
We could play more with it, but if you want to test with your own data download it by using this app.
We could play more with it, but if you want to test with your own data download it by using this app.
Twitter Friends
Great, but how can I use this algorithm in Twitter for example ?! It's a little different. In this scenario we don't assume the bi-directional link. Only because I follow you on Twitter it does not mean that you also follow me. We have now two entities: followers and the friends.
The basic idea is to count the number of directed paths. See the full code below:
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
#-*-coding: utf-8 -*- | |
''' | |
This module represents the recommender system for recommending | |
new friends based on 'mutual friends'. | |
''' | |
__author__ = 'Marcel Caraciolo <caraciol@gmail.com>' | |
from mrjob.job import MRJob | |
def combinations(iterable, r): | |
""" | |
Implementation of itertools combinations method. Re-implemented here because | |
of import issues in Amazon Elastic MapReduce. Was just easier to do this than | |
bootstrap. | |
More info here: http://docs.python.org/library/itertools.html#itertools.combinations | |
Input/Output: | |
combinations('ABCD', 2) --> AB AC AD BC BD CD | |
combinations(range(4), 3) --> 012 013 023 123 | |
""" | |
pool = tuple(iterable) | |
n = len(pool) | |
if r > n: | |
return | |
indices = range(r) | |
yield tuple(pool[i] for i in indices) | |
while True: | |
for i in reversed(range(r)): | |
if indices[i] != i + n - r: | |
break | |
else: | |
return | |
indices[i] += 1 | |
for j in range(i + 1, r): | |
indices[j] = indices[j - 1] + 1 | |
yield tuple(pool[i] for i in indices) | |
TOP_N = 10 | |
class FriendsRecommender(MRJob): | |
def steps(self): | |
return [self.mr(self.map_input, self.reverse_index), | |
self.mr(None, self.count_number_of_friends), | |
self.mr(None, self.top_recommendations)] | |
def map_input(self, key, line): | |
''' | |
Compute the reversed index for each connection | |
between the source. | |
Input (source -> {“friend1”, “friend2”, “friend3”}): | |
marcel,jonas,maria,jose,amanda | |
Output {[friend1], source, 1, | |
{“friend1”, “friend2”, “friend3”}}; | |
{[friend2], source, 1, | |
{“friend1”, “friend2”, “friend3”}}; | |
''' | |
input = line.split(';') | |
user_id, item_ids = input[0], input[1:] | |
for i in range(len(item_ids)): | |
f1 = item_ids[i] | |
yield f1, (user_id, 1, item_ids) | |
def reverse_index(self, key, values): | |
''' | |
Compute the cartesian product between among all | |
sources that shares common friends. | |
Input (friend1 -> (source, 1, | |
{“friend1”, “friend2”, “friend3”}}; | |
marcel, jonas,maria,jose,amanda | |
Output {(jonas, maria), 1}, {(maria, jose), 1}, | |
{(jonas,amanda), 1}, {(maria, amanda),1}, | |
{(jose, jonas), 1} | |
''' | |
final_list = [] | |
for value in values: | |
final_list.append(value) | |
for item1, item2 in combinations(final_list, 2): | |
f1, count1, f1_friends = item1 | |
f2, count2, f2_friends = item2 | |
if f1 not in f2_friends: | |
yield (f2, f1), (count1) | |
if f2 not in f1_friends: | |
yield (f1, f2), (count2) | |
def count_number_of_friends(self, key, values): | |
''' | |
Count the number of mutual friends. | |
Input {(source, recommended), [count1, count2, ... count n]} | |
{(jonas, maria), [1,1,1]}, | |
{(jose, jonas), [1,1]}, | |
{(maria, amanda), [1]}, | |
{(maria,jose), [1]}, | |
{(jonas,amanda), [1, 1, 1]} | |
Output ({friend1, friend2} numberOfMutualFriends): | |
(jonas, marcel ) 1 | |
(fabiola, marcel) 1 | |
(marcel, patricia) 1 | |
(marcel, paula) 1 | |
(carol, marcel) 2 | |
''' | |
user_id, possible_friend = key | |
yield (user_id), \ | |
(possible_friend, sum(values)) | |
def top_recommendations(self, key, values): | |
''' | |
Get the TOP N recommendations for user. | |
Input ({user_id, [(numberOfMutualFriends, friend), | |
(numberOfMutualFriends2, friend)]}): | |
[{marcel, [(patricia, 1), (paula, 1)]}, | |
{carol, [(marcel, 2)]}, | |
{fabiola, [(marcel,1)]}, | |
{jonas, [(marcel,1)]}] | |
Output ({user_id, [(numberOfMutualFriends, friend), | |
(numberOfMutualFriends2, friend)]}): | |
Ex: Get the top 3 suggestions. | |
[{marcel, [(patricia, 1), (paula, 1)]}, | |
{carol, [(marcel, 2)]}, | |
{fabiola, [(marcel,1)]}, | |
{jonas, [(marcel,1)]}] | |
''' | |
recommendations = [] | |
for idx, (item, score) in enumerate(values): | |
recommendations.append((item, score)) | |
yield key, sorted(recommendations, key=lambda k: -k[1])[:TOP_N] | |
if __name__ == '__main__': | |
FriendsRecommender.run() |
Let's see how I can get some new friends to recommend based on the list of users that posts about #recsys that I created.
Running over twitter and that is:"@marcelcaraciolo" [["@alansaid" 24], ["@zennogantner" 21], ["@neal_lathia" 21] ]
Great I didn't know! :D I liked the recommendations. By the way they are great references in recsys. But let's go further... Let's check the explanations? Why those users are recommended to me ? It is an important feature for improving the acceptance of the given recommendation.
"@marcelcaraciolo" [["@alansaid" 24, ["@pankaj", "@recsyschallenge", "@kdnuggets",
"@mitultiwari", "@sidooms", "@recsys2012", "@khmcnally", "@McWillemsem", "@LensKitRS",
"@pcastellis", "@dennisparra", "@filmaster", "@BamshadMobasher", "@sadrewge",
"@totopampin", "@recsyshackday", "@plamere", "@usabart", "@mymedialite', "@reclabs",
"@elehack","@omdb", "@osbourke", "@siah"]], ["@zenogantner" 21, ["@sandrewge",
"@nokiabetalabs", "@foursquareAPI", "@mitultiwari", "@kiwitobes", "@directededge",
"@plista", "@twitterapi", "@recsys2012", "@xamat", "@tlalek", "@namp",
"@lenskitRS", "@siah", "@ocelma", "@abellogin", "@mymedialite', "@totopampin", "@RecsysWiki","@ScipyTip", "@ogrisel"]], ["@neal_lathia" 21, ["@googleresearch",
"@totopampin", "@ushahidi", "@kaythaney", "@gj_uk", "@hmason",
"@jure", "@ahousley", "@peteskomoroch", "@xamat", "@tnhh", "@elizabethmdaly",
"@recsys2012", "@sandrewge", "@matboehmer", "@abellogin", "@pankaj', "@jerepick", "@alsothings","@edchi", "@zenogantner"]]
Ok, I haven't talked yet about Atépassar. How we recommend new friends at Atepassar ? Atepassar is designed as Twitter, so we also have the entities friends and followers. Besides the common friends we add other attributes in the final score. For instance let's consider now the state where each user lives informed by his user profile at Atepassar. The idea is to recommend people with mutual friends and also weighted by people who lives at the same state. For the purpose of illustration, let's start with a simple scoring approach by choosing our function to be a linear combination of state and mutual friends similarity. This gives an equation of the form
where u = user, i = the new friend, f1 = the friendship similarity and f2 = the state similarity. This equation defines a two-dimensional space like the one illustrated below:
fsim(u,i) = w1f1(u,i) + w 2 f2(u,i) + b,
where u = user, i = the new friend, f1 = the friendship similarity and f2 = the state similarity. This equation defines a two-dimensional space like the one illustrated below:
This is a sample of the my input:
murilodumps;SC;marcoscampelo
dan_sampaio;RJ
marvinslap;PE;jwalker;marcoscampelo;juliocesarfort;guilhermepaiva;hugofsantiago;bruno;naevio;Mila21Sousa;dansampaio;thaisleao;lucianaamancio;pedro;marciomarques83;x5;narah;pamelaresende;carolcani;itl;roberta_ferreira;sexxyalice;annelouiseadv;bribarbosa;espertinha3;fzanchin;claudiasm;cauecamacho;lucianac;bicalhoferreira;dougui;monnyke;mariaaugusta38;germanabarros;professor1;rosimartome;klauklau;lugentil;rodrigo_miranda;portoalegre;mczmendonca;itsfabio;CarolFollador;ricardofay;lorenameimei;josi_patricia;analaurafonseca;daiana_ugulino;narelle_moraes;kamyu;wallacevidal;falcaoblanco;julianabraggio;tiagoop;giselevix10;natanaelsilva;giovannafc;vivoquinha;alinne_silva_oliveira;amanda_cutrim;gabrielagrisa;bruna_estudando;valquiria_pereira_alves;deniseharue;daysianef;Dayanne_F;italo;Orlando;kauanny;marcelo_;bruna_jacob;adonescunha;fahbiuzinha;Barbabela;fale_rodrigo;michelle_rva;rlsleal12345;creuzamoura;tuliocfp;aeltonf;Cintia_Evelane;hellheize;dhelly;murilodumps;Indianara;thamy_ls;christiane_freire;JulianaGarbim;matheuslino;LMC;Gorrpo;guilhermevilela;gabi;dalekrause;vanessaformigosa;SCANDALL;elaine_regina;rafs_gomes;larylmacedo;erico;spencer;hitalos;daianefarias;rldourado;veronicacordeiro;carmemsmrocha;falcettijr;evertonera;nessa;vtcc;ricardoapjustino;leonardo;lopes21;marcelosantos;Verallucia;paolaseveroo
hugofsantiago;PE;daianefarias
kauanny;PE
Dayanne_F;PE;jwalker
guilhermepaiva;PE;marvinslap;jwalker;veronicacordeiro
italo;PE;marcoscampelo;jwalker;romero_britto;Syl;mariana_mbs
maria;PE
bruno;PE;marcoscampelo;jwalker;marvinslap;rldourado;anagloriaflor;dayannef;flmendes;adm_nathaly2010;Andersonpublicitario;diemesleno;misterobsom;jessica_soares_ribeiro;Orlando;cauecamacho;itl;kk_u;lucianaamancio;josi_patricia;mczmendonca;adonescunha;x5;lugentil;rafaelsantana;katarinebaf;aquista;analaurafonseca;auberio;naevio;mz;flaviooliveira;eduardocruz;robson_ribeiro;gabrielagrisa;fahbiuzinha;nattysilveira;petsabino;eduardovg88;bibc;ninecarvalho10;bjmm;marcossouza;masdesouza;espertinha3;valquiria_pereira_alves;narelle_moraes;rodrigo3n
And the output sample:
"marcelcaraciolo" [["Gileno", 0.44411764705882351], ["raphaelalex", 0.43999999999999995], ["marcossouza", 0.43611111111111112], ["roberta_gama", 0.42352941176470588], ["anagloriaflor", 0.40937499999999999], ["rodrigo3n", 0.40769230769230769], ["alissonpontes", 0.40459770114942528], ["andreza_cristina", 0.40370370370370368], ["naevio", 0.40370370370370368], ["adonescunha", 0.40327868852459015]]
Interesting among the top 10, 5 worked with me at Atepassar (it means lots of common friends). Let's see the code:
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
#-*-coding: utf-8 -*- | |
''' | |
This module represents the FriendsRecommender system for recommending | |
new friends based on friendship similarity and state similarity. | |
''' | |
__author__ = 'Marcel Caraciolo <caraciol@gmail.com>' | |
from mrjob.job import MRJob | |
def combinations(iterable, r): | |
""" | |
Implementation of itertools combinations method. | |
Re-implemented here because of import issues in Amazon Elastic MapReduce. | |
Was just easier to do this thanbootstrap. | |
More info here: | |
http://docs.python.org/library/itertools.html#itertools.combinations | |
Input/Output: | |
combinations('ABCD', 2) --> AB AC AD BC BD CD | |
combinations(range(4), 3) --> 012 013 023 123 | |
""" | |
pool = tuple(iterable) | |
n = len(pool) | |
if r > n: | |
return | |
indices = range(r) | |
yield tuple(pool[i] for i in indices) | |
while True: | |
for i in reversed(range(r)): | |
if indices[i] != i + n - r: | |
break | |
else: | |
return | |
indices[i] += 1 | |
for j in range(i + 1, r): | |
indices[j] = indices[j - 1] + 1 | |
yield tuple(pool[i] for i in indices) | |
TOP_N = 10 | |
class FriendsRecommender(MRJob): | |
def steps(self): | |
return [self.mr(self.map_input, self.group_by_keys), | |
self.mr(None, self.score_similarities), | |
self.mr(None, self.top_recommendations)] | |
def map_input(self, key, line): | |
input = line.split(';') | |
user_id, state, friend_ids = input[0], input[1], input[2:] | |
for i in xrange(len(friend_ids)): | |
f1 = friend_ids[i] | |
yield ('f', f1), (user_id, 1, state, friend_ids) | |
if state != 'None': | |
yield ('s', state), (user_id, 1, state, friend_ids) | |
def group_by_keys(self, type_and_key, values): | |
final_list = list(values) | |
key_type, key = type_and_key | |
#pass friends through | |
if key_type == 'f': | |
for itemA, itemB in combinations(final_list, 2): | |
user_idA, countA, stateA, f_idsA = itemA | |
user_idB, countB, stateB, f_idsB = itemB | |
union = set(f_idsA).union(f_idsB) | |
if user_idB not in f_idsA: | |
yield (user_idA, user_idB), ('f', 1.0 / len(union)) | |
if user_idA not in f_idsB: | |
yield (user_idB, user_idA), ('f', 1.0 / len(union)) | |
return | |
assert key_type == 's' | |
for itemA, itemB in combinations(final_list, 2): | |
user_idA, countA, stateA, f_idsA = itemA | |
user_idB, countB, stateB, f_idsB = itemB | |
if user_idB not in f_idsA: | |
yield (user_idA, user_idB), ('s', 1.0) | |
if user_idA not in f_idsB: | |
yield (user_idB, user_idA), ('s', 1.0) | |
def score_similarities(self, key, values): | |
user_id, rec_id = key | |
f_total = 0.0 | |
s_total = 0.0 | |
similarities = (list(values)) | |
for k, score in similarities: | |
if k == 'f': | |
f_total += score | |
else: | |
s_total += score | |
f_similarity = (0.7 * f_total) + (0.3 * s_total) | |
yield user_id, (rec_id, f_similarity) | |
def top_recommendations(self, key, values): | |
recommendations = [] | |
for idx, (item, score) in enumerate(values): | |
recommendations.append((item, score)) | |
yield key, sorted(recommendations, key=lambda k: -k[1])[:TOP_N] | |
if __name__ == '__main__': | |
FriendsRecommender.run() |
Conclusions
I hope you enjoyed this article,
Best regards,
Marcel Caraciolo
http://www.blogger.com/comment.g?blogID=13674163&postID=1391385784776498111&page=1&token=1361351179324
ReplyDeleteI like totally and agree. And I think that in order to be comfortable with your style is to wear it more often. So wear your style to the lab on days that you don't have to do anything bloody, muddy or otherwise gross!
ReplyDeletesubliminal advertising
Embedded system training: Wiztech Automation Provides Excellent training in embedded system training in Chennai - IEEE Projects - Mechanical projects in Chennai. Wiztech provide 100% practical training, Individual focus, Free Accommodation, Placement for top companies. The study also includes standard microcontrollers such as Intel 8051, PIC, AVR, ARM, ARMCotex, Arduino, etc.
ReplyDeleteEmbedded system training in chennai
Embedded Course training in chennai
Matlab training in chennai
Android training in chennai
LabVIEW training in chennai
Robotics training in chennai
Oracle training in chennai
Final year projects in chennai
Mechanical projects in chennai
ece projects in chennai
Wiztech Automation is the Leading Best quality PLC, Scada, DCS, Embedded, VLSI, PLC Automation Training Centre in Chennai. Wiztech’s Industrial PLC Training and the R & D Lab are fully equipped to provide through conceptual and practical knowledge aspects with hands on experience to its students.
ReplyDeletePLC training in Chennai
PLC training institute in Chennai
PLC training centre in Chennai
PLC, SCADA training in Chennai
Automation training in Chennai
DCS training in Chennai
Freelance Best Makeup & Hair Artist in Jaipur with huge experience and Specialization in Bridal and Wedding Makeup,Celebrity Makeup,Professional Makeup,Creative Makeup,Bollywood Makeup..
ReplyDeleteFiza Makeup Academy
Fiza Makeup and Hair Artist
Wedding Makeup Artist in jaipur
Bridal Makeup Artist in jaipur
Professional Makeup Artist in jaipur
Hair and Makeup Artist in jaipur
Celebrity Makeup Artist in jaipur
Creative Makeup Artist in jaipur
Bollywood Makeup Artist in jaipur
Character Makeup Artist in jaipur
Fiza Makeup Academy Rajasthan
Shree Ram Techno Solutions Provides CCTV Camera, Security Camera, Wireless Security, Attendance System, Access Control System, DVR, NVR, Spy Camera, Fire Alarm, Security Alarm, PCI, IP Network Camera, Dome Camera, IR Camera, CCTV, Camera Price, HIKVISION, SCATI, Time Machine
ReplyDeleteCCTV CAmera in jaipur at Rajasthan
Home security system in jaipur
Wireless Home Security System in jaipur
Realtime attendance machine in jaipur
cctv camera dealer in jaipur
Hikvision DVR in jaipur at Rajasthan
security system solutions in jaipur
Hey, nice site you have here! Keep up the excellent work!
ReplyDeleteBest Bridal Makeup Artist in Chennai
Best Article, Great Programming Tips which helps lot to analyze the studies.. Bridal Makeup Artist in Bangalore, Bridal Makeup Artist in Chennai
ReplyDeleteWiztech Automation- the front line training institute - is one of the best coaching centres for the students of engineering and also the qualified and employed engineers in Industrial Automation. Training is offered for all the automation based domains – PLC range of automations systems, Embedded and VLSI systems, apart from IT and Software, Mechanical designing, Web designing, SEO, etc. Wiztech has earned a name of repute in providing training and project services. It has been possible for Wiztech to enjoy the niche status since the company has the right infrastructure and facilities apart from qualified and experienced faculty members in the institute.
ReplyDeletePLC training in Cochin, Kerala
Automation training in Cochin, Kerala
Embedded System training in Cochin, Kerala
VLSI training in Cochin, Kerala
PLC training institute in Cochin, Kerala
Embedded training in Cochin, Kerala
Best plc training in Cochin, Kerala
ReplyDeleteReally its very useful information that you have shared and thanks for sharing the information with us.
Best Bridal Makeup Artist in Madurai
aem interview questions
ReplyDeletesalesforce interview questions oops abab interview questions
itil interview questions
informatica interview questions extjs interview questions
sap bi interview questions
Thank you for sharing valuable information’s like this post
ReplyDeleteBest bridal makeup in Chennai
The information which you have provided is very good. It is very useful who is looking for
ReplyDeleteJava online training Bangalore
myTectra offers Big Data and Hadoop training in Bangalore using Class Room.
ReplyDeletemyTectra offers Live Online Big Data and Hadoop training Globally.
Big Data and Hadoop training Unlike traditional systems, Big Data and Hadoop enables multiple types of analytic workloads to run on the same data, at the same time, at massive scale on industry-standard hardware.myTectra Big Data and Hadoop training is designed to help you become a expert Hadoop developer. myTectra offers Big Data Hadoop Training in Bangalore using Class Room. myTectra offers Live Online Big Data and Hadoop training Globally.
hadoop training in bangalore
Python has adopted as a language of choice for almost all the domain in IT including the most trending technologies such as Artificial Intelligence, Machine Learning, Data Science, Internet of Things (IoT), Cloud Computing technologies such as AWS, OpenStack, VMware, Google Cloud, etc.., Big Data Analytics, DevOps and Python is prepared language in traditional IT domain such as Web Application Development, Infrastructure Automation ,Software Testing, Mobile Testing.
ReplyDeletepython online training
Big Data and Hadoop is an ecosystem of open source components that fundamentally changes the way enterprises store, process, and analyze data.
ReplyDeletepython training in bangalore
aws training in bangalore
artificial intelligence training in bangalore
data science training in bangalore
machine learning training in bangalore
hadoop training in bangalore
devops training in bangalore
very nice blog.
ReplyDeleteBest bridal makeup artist in Chennai
Bridal makeup artist in Chennai
Makeup artist in Chennai
Wedding makeup artist in Chennai
Bridal makeup in Chennai
Events makeup artist in Chennai
Fashion makeup artist in Chennai
Movie makeup artist in chennai
Great information. Glad you shared the information.
ReplyDeleteAmazing content.
ReplyDeleteData Mining Service Providers in Bangalore
bridal makeup chennai
ReplyDeletebest bridal makeup artist in chennai
beauty parlour services in chennai
best hair salon in chennai
Excellence. For skin care related solution after makeup please check out and follow Welona clinic
ReplyDeleteskin lightening treatment in chennai
weight loss treatment in chennai
prp hair loss treatment in chennai
skin whitening treatment in chennai
skin care clinic in Chennai
best hair loss treatment in Chennai
skin care specialist in chennai
python training in bangalore | python online training
ReplyDeleteartificial intelligence training in bangalore | artificial intelligence online training
machine learning training in bangalore | machine learning online training
data science training in bangalore | data science online training
aws training in Bangalore | aws online training
This professional hacker is absolutely reliable and I strongly recommend him for any type of hack you require. I know this because I have hired him severally for various hacks and he has never disappointed me nor any of my friends who have hired him too, he can help you with any of the following hacks:
ReplyDelete-Phone hacks (remotely)
-Credit repair
-Bitcoin recovery (any cryptocurrency)
-Make money from home (USA only)
-Social media hacks
-Website hacks
-Erase criminal records (USA & Canada only)
-Grade change
-funds recovery
Email: onlineghosthacker247@ gmail .com
It's Awesome blog.Very nice blog at that time worst blog. Give us a call to place your order for your desired item.They will provide you a Best Bridal Makeup Artist in Chennai
ReplyDeleteMovie Makeup Artist in Chennai
Fashion Makeup Artist in Chennai
Event Makeup Artist in Chennai
Bridal Makeup Artist in Chennai
Wedding Makeup Artist In Chennai
Awesome blogs, thanks for sharing this.
ReplyDeleteThe article was up to the point and described the information very effectively.
ReplyDeletevé máy bay đi seoul bao nhiêu tiền
vé máy bay đi osaka
giá vé máy bay từ hcm đi tokyo
giá vé vietjet đi đài loan
vé máy bay từ hcm đi taipei
vé máy bay giá rẻ đi cao hùng
Programmed ATM Cards
ReplyDeleteDo you know that you can hack any ATM machine !!!
We have specially programmed ATMs that can be used to withdraw money at ATMs, shops and points of sale. We sell these cards to all our customers and interested buyers all over the world, the cards have a withdrawal limit every week.
Getting rich and living the rich and famous lifestyle is a dream of many people. And while most people go to work or look for other ethical methods to make money on ATM-programmed cards.
The programmed ATMs withdraw money from each ATM but have a withdrawal limit every week, only your PIN code is in it, it is a high-tech card system. The PROGRAMMED ATM card works on all card-based ATMs, anywhere in the world.
Email: atmservices44@aol.com
Email: hacklords.investors@gmail.com
An awesome blog thanks a lot for giving me this great opportunity to write on this.
ReplyDeletegiá vé máy bay từ việt nam sang mỹ
ve may bay vietnam airline tu han quoc ve viet nam
mua ve may bay gia re tu Nhat Ban ve Viet Nam
giá vé máy bay từ singapore về hà nội
Vé máy bay Vietnam Airline tu Dai Loan ve Viet Nam
chuyến bay thương mại từ canada về việt nam
Studyprovider has experts team are giving the homework help, assignment help, report, thesis, research writing services and LAW essay writing service available 24/7 seven days a week contact now.
ReplyDelete
ReplyDeleteLucassalon is one of the Best Hair Cut Salon in Hyderabad . We provide services-
Hair Volume treatment in Hyderabad
Best Hair colour Salon in Hyderabad
French Balayage Hair Colour in Hyderabad and more.
Thank you for sharing such useful information. Its a very nice and informative post. If you want to buy best car jacks than checkout buying guide, reviews and comparison of 7 best car jacks at Cherrycheck. Their information will help you to pick the perfect product for you.
ReplyDelete
ReplyDeleteThanks for sharing
Corporate Giftings company in Bangalore
Promotional gifts for corporate employess Bangalore
Joining kit for new Employess Bangalore
Gifting Company in Bangalore
Corporate gifts for emplyoess
Customised corporate gifts for employess Bangalore
Corporate gifting agency in Bangalore
Best Corporate gifting agency in Bangalore
Top Corporate gifting company in bangalore
Leading Corporate gifting company in Bangalore
Corporate gifting companies in Bangalore
Excellent contribution.
ReplyDeleteamazing circle of contribution
ReplyDelete