Review new book: Practical Data Analysis

Tuesday, November 26, 2013

Hi all,

I was invited by the Packtpub team to review the book Practical Data Analysis by Hector Cuesta.  I started reading it for the last two weeks and I really enjoyed the topics approach covered by the book.

The book goes through the data science hot topics by presenting several practical examples of data exploration, analysis and even some machine learning techniques. Python as the main platform for the sample codes was a perfect choice, at my opinion. Since Python has becoming more popular at scientific community. 

The book brings several examples in  several science study fields such as stock prices, sentiment analysis on social networks,  biology modelling scenarios, social graphs,  MapReduce, text classification,  data visualisation, etc.  Many novel libraries and tools are presented including Numpy, Scipy, PIL, Python, Pandas, NLTK, IPython, Wakari ( I really liked a dedicated chapter for this excellent tool for scientific python environment on-line),  etc.  It also covers NoSQL databases such as MongoDB and visualisation libraries like D3.js.

I believe the biggest value proposition of this book is that it brings together in one book several tools and how they can be applied on data science. Many tools mentioned lacks further examples or documentation, which this book can assist any data scientist on this task.

However, the reader must not expect learn machine learning and data science in this book.  The theory is  scarce, and I believe it was not the main goal of the author for this book. For anyone looking to learn data science, this is not the right book. It is focused on who desires an extra resource for sample codes and inspirations, it will be a great pick!

The source code is available on Github, but you can explore them better explained inside the book, including illustrations. To sum up, I have to congratulate Hector for his effort writing this book. For the Scientific community, including the Python group, they will really enjoy this book! I really missed more material about scientific stack software installations, since for the beginners it can be really painful.  But in overall, it was well written focused on practical problems! A guide for any scientists.

For me the best chapters were the Chapter 6: Simulation of Stock Prices, the visualisation using D3.js was great, and the last chapter, 14, about On-line Data Analysis with IPython and Wakari. It's the first time I see Wakari covered in a book! Everyone who works with scientific Python today must give a chance some day to experiment this on-line tool! It's awesome!

Congratulations to PacktPub and Hector for the book!


Marcel Caraciolo