Google launches their new Google Prediction API: the machine learning as cloud services!

Thursday, May 20, 2010

Hi all,

I'd like to share some news that I saw yesterday about the launch of the new Google Prediction API. During the Google I/O anual event that have started yesterday, Google has released new web services including this new API.

So, What is the Google Prediction API ? The Prediction API enables access to Google's machine learning algorithms to analyze your historic data and predict likely future outcomes.  It makes possible for developers and researchers to upload their data to Google Storage for Developers (another service launched during the event), and with the Prediction API , it helps them to make real-time decisions such as recommending products, evaluating user sentiment from blogs or even tweets, routing messages or assessing suspicious activities. 

The Prediction API implements supervised learning algorithms as a RESTful web service to let you leverage patterns in your data, providing more relevant information to your users. Run your predictions on Google's infrastructure and scale effortlessly as your data grows in size and complexity.

A simple screenshot (extracted from the Google Prediction HomePage) shows the idea of the service. In this example, it assess the language of the text passed as parameter.

Diagram showing French language prediction
Google Prediction API Workflow

According to the official home page of the API,  it only implements supervised learning algorithms (no unsupervised like clustering algorithms) as a RESTful web service so you can run your predictions on Google's infrastructure and scale effortlessly as your data grows in size and complexity. 

They don't say about the specific algorithms they are using or how they select the one from several available machine learning techniques (I am very curious about it).  It supports almost the most used types of inputs: numeric or data or unstructured text. Their outputs can be hundreds of discrete categories (doesn't  work with continuous output). And the best it is accessible from many platforms like Google App Engine, web , desktop apps ( mobile apps are included?) and command line.

At least, Google introduced another tool for analyzing your data: BigQuery.  This API enables fast, interactive analysis over huge datasets (Imagine trillions of records). Using SQL-like commands via a RESTful API,  you can quickly explore and understand your massive data. It can help you, for example, analyze your network logs, identify seasonal sales trends, etc.

My opinion about this ? Google made a huge step forward to help the current applications in order to use their historical data for improving the usability, decisions and make money, of course! A new generation of applications using those techniques will appear in the next few years, using Natural Language Processing and Machine Learning for improving their services.  A lot of data is available for users and Google is helping them to analyze this data in order to quickly make decisions. With this  RESTful interface, even a young boy with some lines of code could develop a simple application to predict the weather in its city or a twitter-spam filter. Imagine the possibilities!  Now, you don't need to be under a lot of machine learning and statistics books in order to give intelligence to your application or analysis at your data.

It's the intelligence now injected in black boxes for anyone with basic knowledge of programming. Let's see what happens with this step. Anyway,  Google has made a step forward to the Cloud Data Analysis Computing (CDAC) ( I invented this name).

What do you think about it ? Let's wait for the next chapters!


Marcel Caraciolo

Webservices and Robots: How can they help you ? Twitter Bots and Intelligent Agents

Monday, May 10, 2010

Hi folks,

During the free time (it doesn't happen often), I've worked on some projects including Twitter and Web Services. Those topics are extremely related to my current master thesis, which uses data source from social networks such as Twitter, Foursquare, Gowalla in order to monitor the behavior of the users and recommend new content, services or products based on their interest. Deliver this content means  the user receive information by web, mobile phones or through a specific interface.

One of my recent works is building Twitter robots. What's that ?!  WebServices that could be offered by the Twitter interface. For instance, imagine a service where you ask an information about a movie at Twitter through statuses updates, such as '@movieTheaters Iron man 2'.  The webservice receives this data (since you mentioned the @moviesTheaters) , parses, interpret it and as result deliver to the user the schedule of the theaters that play this movie close to the user's location or also the synopsis  of the movie, etc.

Why Twitter ?!  Twitter has been a sucessful player on this new generation of microblogs and social networks. Based on messages with 140 characters shared between users, nowadays the number of tweets (i.e. messages) have made nearly 13 billion. Yet, there's two unique things about the Twitter's content that makes it much valuable than any other public database of this size.

A Tweet has an author, a time and possibly hastags and @reply information that is all incredibly easy to access computationally. While web pages & blog posts also often have this information, it is much harder to access. For computers, there’s no simple way to respond to the author of most information on the web. Together these features lead to some very interesting possibilities for Twitter robots (bots).
 The structure of Twitter makes it relatively easy to extract the information contained in a Tweet- moreover, if it's a single question. ' Where can I find... ?' 'What's the best... ? ' , 'How much does ... cost'?  All those questions regularly appears in the Twitter stream. Building a Twitter bot that extract these Tweets & parses them for meaning within a specific field is extremely valuable. Thus, from the meta-data attached to a status update, a bot can easily answer to the author and reference the tweet is it replying too. The original author will pick up the response in their @replies or @directmessages (for privacy issues)  and see the link to the Tweet the response is to. Before presenting some examples, let's now talk about the robots. Its popular name is bots, which is an abbreviation for it.


Bots are intelligent agents that visit a number of search engines to identify information that matches a search profile provided by a user. There are a number of different kinds of bots designed to fulfil different purposes, such as software bots, stock bots, update bots, fun bots, chatter bots, and news bots. The possibilities for bots, specially on Twitter, is endless. Here I present two bots that I developed that demonstrate this concept.


TransitoRe is a Twitter Bot that crawls the data provided from the many traffic cameras located along highways and streets throughout the city of Recife, Pernambuco - Brazil. The data is provided by the Recife Mayor Council and is updated every minute. Through the Twitter, the user now can obtain the real time traffic as also the location of the camera, since the Tweets come with the information about the traffic, name of the highway, images of the camera and even the geo coordinates where they are placed. It's a useful web service for the user that lives at Recife and wants to quickly gather information about the traffic in order to avoid this streets before going out. The service runs under the Google AppEngine and it'sa perfect example of how Twitter can help their users to obtain this information. The user just needs to follow the Twitter bot, and the  it will be responsible for post updates about the traffic in pre - defined intervals. It's important to notice that in this type of bot the user doesn't interact with it. It only delivers information, and is very popular in  deliver weather, stocks information.

TransitoRe : Twitter Bot

This bot is another demonstration of a full operational bot that interacts with the user. Tweetcomendas was developed by me (marcelcaraciolo) and my friend (ricardocaspirro) and its design  by (lucianacns). It is a web service that runs through Twitter which the user can easily track his SEDEX shipments. SEDEX is a popular Express Courier Service, a division of Correios in Brazil. It is famous  for deliver all around Brazil  shipments and packages.
The difference of @tweetcomendas  to @transitoRE is that since the first uses direct messages, the user must follow it so it can send direct messages to him. The messages contain information from the web service at Correios, delivering the last status of the shipment at their system. The user just needs to send a reply to the system with the track code and it automatically starts to track his package by delivering real time information about it in accordance to the Correios Track System.  The bot now has a simply interaction with the user, demonstrating how this bots (robots) could deliver and talk with the user in order to deliver what he needs or based on his interests help him to discover new content.
Here are some screenshots of the web service:

Tweetcomendas bot

tweetcomendas web site

Those bots are all developed using free technologies with Python and Google App Engine. For you interested in get more details about those webservices and how we handled to develop it, post a comment at this blog! I will try to answer as soon as possible!

In conclusion, the possibilities for bots capitalising on this concept is endless. Bots that provide directions, restaurant or product recommendations, or weather information are just a few ideas. As Twitter grows the number of people a simple bot will reach continues to increase.

You can see a lot of other twitter bots here at this link

I expect you enjoyed this post!

See you next time,

Marcel Caraciolo