It has been a while since my last post. But, I am still alive!! In this post I'll be talking about one of my last projects based on natural language processing (NLP) that I've developed: A Visual Word Tree.
But what is a Word Tree ?A word tree is a visual analyzer tool for unstructured text, such as a article, speech or a book. It is also a new visualization technique that makes easy the exploration of repetitive context. The main idea behind it includes the concordance principle.
Concordances have been used for centuries at biblical scholars to see how different words occur in religious texts. It is a special type of indexation technique, which shows near to each word some words that appears before or after that one. For instance, consider the phrase "if love" in Romeo and Juliet which occurs three times:

As the figure shown above, you will notice that in the words following 'if love' there are many repeated phrases. For example, "be" follows "if love" in all three cases. And "be blind" follows in two cases. To create a word tree, the computer merges all the matching phrases, as in this diagram:

As you can notice, this diagram has a shape of a tree of each node represented by a word, which can be easily interpreted and visualized. It emphasizes in the interactive exploration of short texts (like the short texts of the Bible). This visualization is called WordTree and is based on a well-known data structure in computer science: suffix tree (introduced in the 70's).
IBM scientists have developed a interactive graphical tool, which recently at TEDx São Paulo, Fernanda Viegas from IBM have lectured a great keynote about data visualization techniques, presenting the application of the wordTree technique. The tool is also available for public and for free use at the website ManyEyes.
But why did i developed one ?
One of the problems of ManyEyes is its forbidden use for commercial applications and based on what I've researched there is also a maximum limit of words supported by the tool. I've found other ones like the ManyEyes, but unfortunatelly was not open-source or not available for public.
Therefore, I decided to build my own implementation of concordance based on the suffix tree. Different from the ManyEyes tool, my goal is to create automatically word trees from statuses from the web microblog Twitter. I was inspired by the work done by the Vettalabs who have developed a wordTree for Twitter in Java.
Twitter has a powerful mechanism called Re-Tweet (RT) which can be used by users to repeat any tweet that was already posted by someone, in order to reinforce or support that tweet and spread to all your followers (Making a RT of a tweet you're announcing that tweet for more people in order to see it). The more the number of RT's , the more divulged that tweet had.
Thus, I've developed a simple system that monitors the Twitter in real-time, seeking for tweets that has keywords specified by the users. Furthermore, for each n tweets found, one new tree is created, which shows in a easy way what it has been discussed about that topic, i.e., related to that keyword.
Let's see an example: I've collected some tweets about the recently launched movie in the brazilian theaters: Robin Hood. The figure below illustrates the new word tree created:
As you can notice, the quantity of tweets processed can be huge - thousand of items in one day. So I decided to prune the tree in order to present only relevant information. For that, we can use some natural language processing techniques such as to choose only nodes that have verbs in the second node and a subject in the first node of the tree. This could simplify the tree and focusing on the texts that have a subject + verb in the beginning (action of the keyword), etc.
The tree presented above is a reverse tree, which shows the words that precede a keyword.(trees with higher depth). The other one is the basic tree where the keyword is the subject:
Both the suffix tree and the graph was developed using Python programming language. The most interesting part is that you can easily visualize/extract information in real-time on Twitter with this visualization tool. It also summarizes repeated words by increasing its font/letter size so the user can directly understand in a intuitive way, specially in a environment with lots of text and information.
I'd like to mention Murilo Queiroga who has given me some tips to this work. Thanks Murilo!
See you next time,
Marcel Caraciolo

As the figure shown above, you will notice that in the words following 'if love' there are many repeated phrases. For example, "be" follows "if love" in all three cases. And "be blind" follows in two cases. To create a word tree, the computer merges all the matching phrases, as in this diagram:

As you can notice, this diagram has a shape of a tree of each node represented by a word, which can be easily interpreted and visualized. It emphasizes in the interactive exploration of short texts (like the short texts of the Bible). This visualization is called WordTree and is based on a well-known data structure in computer science: suffix tree (introduced in the 70's).
IBM scientists have developed a interactive graphical tool, which recently at TEDx São Paulo, Fernanda Viegas from IBM have lectured a great keynote about data visualization techniques, presenting the application of the wordTree technique. The tool is also available for public and for free use at the website ManyEyes.
But why did i developed one ?
One of the problems of ManyEyes is its forbidden use for commercial applications and based on what I've researched there is also a maximum limit of words supported by the tool. I've found other ones like the ManyEyes, but unfortunatelly was not open-source or not available for public.
Therefore, I decided to build my own implementation of concordance based on the suffix tree. Different from the ManyEyes tool, my goal is to create automatically word trees from statuses from the web microblog Twitter. I was inspired by the work done by the Vettalabs who have developed a wordTree for Twitter in Java.
Twitter has a powerful mechanism called Re-Tweet (RT) which can be used by users to repeat any tweet that was already posted by someone, in order to reinforce or support that tweet and spread to all your followers (Making a RT of a tweet you're announcing that tweet for more people in order to see it). The more the number of RT's , the more divulged that tweet had.
Thus, I've developed a simple system that monitors the Twitter in real-time, seeking for tweets that has keywords specified by the users. Furthermore, for each n tweets found, one new tree is created, which shows in a easy way what it has been discussed about that topic, i.e., related to that keyword.
Let's see an example: I've collected some tweets about the recently launched movie in the brazilian theaters: Robin Hood. The figure below illustrates the new word tree created:
As you can notice, the quantity of tweets processed can be huge - thousand of items in one day. So I decided to prune the tree in order to present only relevant information. For that, we can use some natural language processing techniques such as to choose only nodes that have verbs in the second node and a subject in the first node of the tree. This could simplify the tree and focusing on the texts that have a subject + verb in the beginning (action of the keyword), etc.
The tree presented above is a reverse tree, which shows the words that precede a keyword.(trees with higher depth). The other one is the basic tree where the keyword is the subject:
Both the suffix tree and the graph was developed using Python programming language. The most interesting part is that you can easily visualize/extract information in real-time on Twitter with this visualization tool. It also summarizes repeated words by increasing its font/letter size so the user can directly understand in a intuitive way, specially in a environment with lots of text and information.
I'd like to mention Murilo Queiroga who has given me some tips to this work. Thanks Murilo!
See you next time,
Marcel Caraciolo
WIZTECH Automation, Anna Nagar, Chennai, has earned reputation offering the best automation training in Chennai in the field of industrial automation. Flexible timings, hands-on-experience, 100% practical. The candidates are given enhanced job oriented practical training in all major brands of PLCs (AB, Keyence, ABB, GE-FANUC, OMRON, DELTA, SIEMENS, MITSUBISHI, SCHNEIDER, and MESSUNG)
ReplyDeletePLC training in chennai
Automation training in chennai
Best plc training in chennai
PLC SCADA training in chennai
Process automation training in chennai
Final year eee projects in chennai
VLSI training in chennai
Embedded system training: Wiztech Automation Provides Excellent training in embedded system training in Chennai - IEEE Projects - Mechanical projects in Chennai. Wiztech provide 100% practical training, Individual focus, Free Accommodation, Placement for top companies. The study also includes standard microcontrollers such as Intel 8051, PIC, AVR, ARM, ARMCotex, Arduino, etc.
ReplyDeleteEmbedded system training in chennai
Embedded Course training in chennai
Matlab training in chennai
Android training in chennai
LabVIEW training in chennai
Robotics training in chennai
Oracle training in chennai
Final year projects in chennai
Mechanical projects in chennai
ece projects in chennai
Java SE Java EE Java Online Course Oracle Learning Tutorials. Java EE Training Java is a great cross-platform programming language. Java EE & Java SE Java Training Institutes in Chennai on Linux Training Course Materials. java j2ee training institutes in chennai Java Standard Edition Java Enterprise Edition Certification Training Course ware Java Training in Chennai . Java Development Kit JDK J2EE Training in Chennai Java Runtime Environment JRE Java Course in Chennai on Linux Java Interview Questions . IT Technical Articles
ReplyDeleteGreat and Useful Article.
ReplyDeleteOnline Java Training | Online Java Training
Java Training Institutes Java Training Institutes JMS Training Institutes in Chennai JMS Training Institutes in Chennai | JSP Training Institutes in Chennai | MicroServices Training Institutes In Chennai Java MicroServices Training Institutes In Chennai
ReplyDeleteI have read your blog its very attractive and impressive. I like it your blog.
ReplyDeleteJava Training in Chennai Core Java Training in Chennai Core Java Training in Chennai
Java Online Training Java Online Training Core Java 8 Training in Chennai Core java 8 online training JavaEE Training in Chennai Java EE Training in Chennai
شركة تسليك مجاري المطبخ بالرياض
ReplyDeleteشركة تسليك مجاري بالرياض
شركة تسليك مجارى الحمام بالرياض
level تسليك المجاري بالرياض
افضل شركة تنظيف بالرياض
تنظيف شقق بالرياض
شركة تنظيف منازل بالرياض
شركة غسيل خزنات بالرياض
افضل شركة مكافحة حشرات بالرياض
رش مبيدات بالرياض
شركة تخزين عفش بالرياض
شركة تنظيف مجالس بالرياض
تنظيف فلل بالرياض
ابى شركة تنظيف بالرياض
PLC training in Cochin, Kerala
ReplyDeleteAutomation training in Cochin, Kerala
Embedded System training in Cochin, Kerala
VLSI training in Cochin, Kerala
PLC training institute in Cochin, Kerala
Embedded training in Cochin, Kerala
Best plc training in Cochin, Kerala
ReplyDeleteThanks for your informative article.
NLP training in Chennai
Taking NLP training is like learning the language of your mind.
ReplyDeleteNLP Certification in Chennai
Big Data and Hadoop is an ecosystem of open source components that fundamentally changes the way enterprises store, process, and analyze data.
ReplyDeletepython training in bangalore
aws training in bangalore
artificial intelligence training in bangalore
data science training in bangalore
machine learning training in bangalore
hadoop training in bangalore
devops training in bangalore
corporate training companies
ReplyDeletecorporate training companies in mumbai
corporate training companies in pune
corporate training companies in delhi
corporate training companies in chennai
corporate training companies in hyderabad
corporate training companies in bangalore
Gaining Python certifications will validate your skills and advance your career.
ReplyDeletepython certification
law college
ReplyDeletelaw college in Jaipur
Best law college in Jaipur
Law Course In Jaipur
Top College Of law In Jaipur
Vidyasthali Law College
Best Law College
Jaipur Law College
law college
ReplyDeletelaw college in Jaipur
Best law college in Jaipur
Law Course In Jaipur
Top College Of law In Jaipur
Vidyasthali Law College
Best Law College
Jaipur Law College
Hey Your site is awesome and full of information. I have read you posts they are so informative. Keep Posting wonderful content.
ReplyDeleteAni international provide the security solutions for all kind of secruity system and other equipment.
Home security system in jaipur
Wireless Home Security System in jaipur
Realtime attendance machine in jaipur
CCTV Camera dealer in jaipur
Hikvision DVR in jaipur at Rajasthan
security system solutions in jaipur
website design in jaipur
website development company in jaipur
seo company in jaipur
Thanks for sharing this unique information with us. Your post is really awesome. Your blog is really helpful for me..
ReplyDeleteorganic oil
organic oil in jaipur
organic cold pressed oils
ayurvedic oil store in jaipur