MIP, my proposal for a high-performance analysis pipeline for whole exome sequencing

Saturday, August 23, 2014

Hi all,

It has been a while since my last post, but the reason is worth the long time absent. Since January I am co-leading the bioinformatics and I.T department of one of Genomika Diagnósticos

Genomika is one of most advanced clinical genetics laboratory in Brazil. Located in Recife, Pernambuco, in the Northeast of Brazil, it provides cutting edge molecular testing in cancer samples to better define treatment options and prognosis, making personalized cancer management a reality. It also has a vast menu of tests to evaluate inherited diseases, including cancer susceptibility syndromes and  rare disorders. Equipped with state-of-the-art next-generation sequencing instruments and a world-class team of specialists in the field of genetic testing, Genomika focus on test methods that improve patient care and have immediate impact on management.  There is available a pitch video about our lab and one of our exams (unfortunately the spoken language is portuguese).

Our video about sequencing exams spoken in portuguese

My daily work is to provide tools; infra-structure and systems to support our clients and teams in the lab. One of major teams is the molecular biology sector. It is responsible for the DNA sequencing exams, which includes targeted-panels, specific genes or exons or whole exome.  Each one of those genetic tests, before delivered to the patient and the doctor,  goes under several data pre-processing and analysis steps organised in a ordered set of sequential steps, which we call a pipeline.

There is a customised pipeline for clinical sequencing; where we bioinformaticians and specialists study the genetic basis of human phenotypes. In our lab pipeline we are interested on selecting and capturing the protein-coding portion of the genome (we call the exome).  This region responsible for only 3% of our human DNA can be used to elucidate the genetic causes of many human diseases, starting from single gene disorders and moving on more complex genetic disorders, including complex traits and cancer.

Clinical Sequencing Pipeline overview

For this task,  we use several tools that must handle with large volumes of data, specially because of the new next-generation DNA sequencing machines (yeah we have one at our lab from Illumina). Those machines are capable of producing in shorter times and lower costs  large amount of NGS data. 

Taking those challenges into account, we perform our sequencing, alignment, detection and data analysis of human samples in order to seek variants.  This study we call variant analysis. Variant analysis looks for variant information, that is, possible mutations that may be associated to genetic diseases . Let's consider as examples of mutation or variant as follows: a change of nucleotide (A for T) (single nucleotide variant or SNV) or even a small insertion or deletion (INDEL's) that can impact the functional activity of the protein.  Looking after variants and even further seek and identify those related to diseases or genetic disorders is a big challenge in terms of technology, tools and interpretation.

The reference in the genome at bottom; the variants above.
In this example there's a possible exchange of A to G (SNV) in a specific position of the genome.

In our lab we are developing a streamlined, highly automated pipeline for exome and targeted panel regions data analysis. In our pipeline we handle multiple datasets and state of the art tools that are integrated in a custom pipeline for generating, annotating and analyzing sequence variants.

We named our internal pipeline tool as MIP (Mutation Identification pipeline). Some minimal requirements we stablished for MIP in order to use it with maximum performance and productivity. 

1. It must be automatic;  with a limited team like ours (2 or 3 bioinformaticians) we need a efficient service that is capable to execute the complete analysis without typing commands at terminals calling software or converting files among several data formats.

MIP pipeline overview for clinical sequencing. All those steps requires tools and files in a specific format.
Our engine   would be capable of manage and execute all or some of those steps with
specific parameters defined by the specialist .

2. It must be user-oriented; it mean that MIP platform must provide an easy-to-use interface, that any researcher of lab could use the system and start out-of-box their sequencing analysis.  For biologists and geneticists it would allow them to focus their work on what matters: the downstream experiments.

3. Scalable-out architecture;  More and more hight throughput sequencing data is pulled out from NGS instruments, so MIP must be designed to be a building block for a scalable genomics infrastructure. It means that we must work with distributed and parallel approaches and the best practices from high-performance computing and big data to efficient take advantage of all resources available at our infra-structure while thinking on continuous optimization in order to minimize the network and shared disk I/O footprint.

My draft proposal to our exome sequencing pipeline

4. Rich-detailed reports and smart software and dataset updates;  In order to maintain our execution engine working healthy, it requires that our software stack always being updated. Since our engine is written on top of numerous open-source biological and big data packages, we need a self-contained management system that could not only check for any new versions but also with a few clicks start any update and perform a post-check for any possible corruptions at the pipeline.  In addition to the third-party genomics software used on MIP, we are also developing our tool for variant annotation. So it stands for an engine that could query and analyze several genomic dataset, generate real-time interactive reports where the researchers could filter out variants based on specific criteria and output in formats of QC reports, target and sequencing depth information, descriptions of the annotations and variants hyperlinked to public datasets in order to get further details about a variation.

Example of web interface where a researcher could select any single or combination of annotations to display. Links to the original datasources are readily available (Figure from WEP annotation system)

5. Finally, we think the most important requirement nowadays to MIP is the integration with our current LMS (Laboratory management system), in order to put the filtered variants  as input to our existing laboratory report analysis and publishing workflow. It means more productivity and automation with our existing infrastructure.

MIP could be also be acessible via RESful API, where the runs output
 would be interchanged with our external LMS solution.

As you may see, there's a huge effort on coding, design and infrastructure to meet those requirements. But we are thrilled to make this happen!  One of our current works in this project is the genv tool. Genv is what we call our genomika environment builder. The basic idea behind it is a tool written in python and fabric package, that provides instant access to biological software, programming libraries and data. The expected result  is a fully automated infrastructure that installs all software and data required to start MIP pipeline.  We are thinking of also arranging pre-built images with Docker.  Of course I will need a whole post to explain more about it!

To sum up,  I hope I could summarise one of the projects I've been working this first semester. At Genomika Diagnósticos we are facing big scientific challenges and the best part is that those tools are helping our lab to provide a next level of health information to the patients,  all from our DNA!

If you are interested on working with us, keep checking our github homepage with any open positions at our bioinformatics team.

Until next time!


  1. I Agree with your information....
    Please refer this site
    Dot Net Training in Chennai,


  2. Thanks for share your thoughts and experiences.

  3. Thanks for sharing informative post about Microsoft Visual Studio. This platform is used to create web application and services. Being widely used software framework, this domain offer huge career opportunity for trained professionals. We at, Dot Net Training in Chennai offer hands on training in this evergreen technology.

  4. Thanks for sharing this informative blog. If anyone wants to get Unix Training in Chennai, Please visit Fita Academy located at Chennai, Velachery.

  5. Your posts is really helpful for me.Thanks for your wonderful post.It is really very helpful for us and I have gathered some important information from this blog.If anyone wants to get Dot Net Training in Chennai reach FITA, rated as No.1 Dot Net Training Institutes in Chennai.

  6. Hi, Thanks for sharing this valuable blog.I was really impressed by reading this blog. I did HTML Training in Chennai at reputed HTML5 Training Institutes in Chennai. This is reslly useful for me to make a bright future in designing field.

  7. SEO Training in Chennai

    Thanks for sharing this information. SEO is one of the digital marketing techniques which is used to increase website traffic and organic search results. If anyone wants to get SEO Training Chennai visit FITA Academy located at Chennai. Rated as No.1 SEO Institutes in Chennai.

    SEO Training in Chennai | SEO Training Center in Chennai

  8. Thanks for sharing this valuable information to our vision. You have posted a trust worthy blog keep sharing. SEO Training in chennai | SEO Training chennai | SEO Course in chennai | SEO Course chennai

  9. Thanks for sharing this informative blog. I have read your blog and I gathered some valuable information from this blog. Keep posting. Recently I did Digital Marketing Training in Chennai at a leading digital marketing company. It's really useful for me to make a bright career.

  10. This technical post helps me to improve my skills set, thanks for this wonder article I expect your upcoming blog, so keep sharing..
    Informatica training in chennai|Salesforce training in Chennai


  11. The information you have given here is truly helpful to me. CCNA- It’s a certification program based on routing & switching for starting level network engineers that helps improve your investment in knowledge of networking & increase the value of employer’s network,
    ccna training institute in Chennai|ccna courses in Chennai|Salesforce training in Chennai

  12. I am really enjoying reading your well written articles. It looks like you spend a lot of effort and time on your blog. I have bookmarked it and I am looking forward to reading new articles. Keep up the good work. ccna training in bangalore

  13. Welcome to Wiztech Automation - Embedded System Training in Chennai. We have knowledgeable Team for Embedded Courses handling and we also are after Job Placements offer provide once your Successful Completion of Course. We are Providing on Microcontrollers such as 8051, PIC, AVR, ARM7, ARM9, ARM11 and RTOS. Free Accommodation, Individual Focus, Best Lab facilities, 100% Practical Training and Job opportunities.

    Embedded System Training in chennai
    Embedded System Training Institute in chennai
    Embedded Training in chennai
    Embedded Course in chennai
    Best Embedded System Training in chennai
    Best Embedded System Training Institute in chennai
    Best Embedded System Training Institutes in chennai
    Embedded Training Institute in chennai
    Embedded System Course in chennai
    Best Embedded System Training in chennai

  14. Wiztech Automation Solutions is the Best Training institute in Chennai,started in the year 2006 and it extended its circle through providing the best Education as per the Global Quality Standards. Hence our Training Center in Chennai was Recognized by IAO and ISO for its inspiring Education Quality Standards. Wiztech Automation Solution, the PLC SCADA Training Academy in Chennai offers both PLC, SCADA, DCS, VFD, Drives, Control Panels, HMI, Pneumatics, Embedded systems, VLSI, IT, Web Designing, AutoCad Training courses in chennai with latest various brands. Wiztech Automation Solutions offers Real Time Training Courses with 100% Placement support in chennai.

    PLC Training in chennai
    SCADA Training in chennai
    Embedded Systems Training in chennai
    VLSI Training in chennai
    Automation Training in chennai
    Industrial Automation Training in chennai
    Process Automation Training in chennai
    DCS Training in chennai
    Inplant Training in chennai
    PLC Course in chennai
    Best PLC Training in chennai
    PLC Training in chennai
    Robotics Training in chennai
    Embedded Training in chennai
    IT Training in chennai
    Web designing Training in chennai
    AutoCad Training in chennai

  15. Hello admin, thank you for your informative post on hadoop training in Chennai. It helped a lot in training my students during our hadoop training Chennai sessions. We at Fita, provide big data training in Chennai for students who are interested in choosing a career in big data.

  16. Embedded system training: Wiztech Automation Provides Excellent training in embedded system training in Chennai - IEEE Projects - Mechanical projects in Chennai. Wiztech provide 100% practical training, Individual focus, Free Accommodation, Placement for top companies. The study also includes standard microcontrollers such as Intel 8051, PIC, AVR, ARM, ARMCotex, Arduino, etc.

    Embedded system training in chennai
    Embedded system course in chennai
    VLSI trraining in chennai
    Final year projects in chennai

  17. WIZTECH Automation, Anna Nagar, Chennai, has earned reputation offering the best automation training in Chennai in the field of industrial automation. Flexible timings, hands-on-experience, 100% practical. The candidates are given enhanced job oriented practical training in all major brands of PLCs (AB, Keyence, ABB, GE-FANUC, OMRON, DELTA, SIEMENS, MITSUBISHI, SCHNEIDER, and MESSUNG)

    PLC training in chennai
    Automation training in chennai
    Best plc training in chennai
    PLC SCADA training in chennai
    Process automation training in chennai
    Final year eee projects in chennai
    VLSI training in chennai

  18. Wow, brilliant article on dot net training in Chennai that I was searching for. Helps us a lot in referring at my dot net training institutes in Chennai. Thanks a lot. Keep writing more on dot net training Chennai, would love to follow your posts and refer to others in dot net training institute in Chennai.

  19. superb information thank you. And please keep updating like this information with this site.

    digital marketing training Chennai

  20. I read your articles very excellent and the i agree our all points because all is very good information provided this through in the post.
    It is very helpful for me.
    iOS Training in Chennai

  21. Thank goodness someone is promoting quality content. I often struggle with myself.Thanks for sharing that valuable information.

    Oracle SQL Training in Chennai

  22. Thanks for sharing the valuable information here. So i think i got some useful information with this content. Thank you and please keep update like this informative details.

    Salesforce Training

  23. Wonderful article, very useful and well explanation. Your post is extremely incredible. I will refer this to my candidates.

    CCNA Training in Chennai

  24. Thanks for sharing informative article on hadoop. It helped me to understand the career prospects in
    web designing Training in Chennai

  25. A very good step-by-step guide especially for a beginner like me. It’s overwhelming with information, thank you for making it easy and very detailed.. I’ll pop some questions here, if I need help, hope that’s okay.

    Android Training in Chennai

  26. The training was very great and more information to get after end of course,then more offers are provided on the training times.

    SAP training in Chennai

  27. The training was very great and more information to get after end of course,then more offers are provided on the training times.


  28. Wiztech Automation is the Leading Best quality PLC, Scada, DCS, Embedded, VLSI, PLC Automation Training Centre in Chennai. Wiztech’s Industrial PLC Training and the R & D Lab are fully equipped to provide through conceptual and practical knowledge aspects with hands on experience to its students.

    PLC training in Chennai
    SCADA Training in Chennai
    DCS training in chennai
    Automation training in Chennai
    Industrial automation training in chennai
    PLC training institute in chennai
    PLC training Centre in chennai
    PLC, SCADA, DCS training in chennai

  29. Really an amazing post..! By reading your blog post i gained more information. Thanks a lot for posting unique information and made me more knowledgeable person. Keep on blogging!!
    Android Training in Chennai Adyar

  30. Great post!I am actually getting ready to across this information,i am very happy to this commands.Also great blog here with all of the valuable information you have.Well done,its a great knowledge.

    ccna training in chennai mylapore

  31. A nice article here, i think that people who have grown up with the idea of using computers are showing more responsibility towards writing posts that are thoughtful, do not have grammar mistakes and pertinent to the post..

    Corporate Training in Chennai

  32. This is a great post. I like this topic.This site has lots of advantage.I found many interesting things from this site. It helps me in many ways.Thanks for posting this again.

    SEO Company in Chennai
    SEO Services in Chennai

  33. It's like you read my mind! You seem to know a lot about this, like you wrote the book in it or something. I think that you can do with some pics to drive the message home a little bit, but instead of that, this is fantastic blog. A great read. I will definitely be back.

    SMO Services Chennai

  34. Great information shared in this blog. Helps in gaining concepts about new information and concepts.Awsome information provided.Very useful for the beginners.
    SEO Training in Chennai

  35. This blog is having the cloud computing based general information. Got a creative work and this is very different work.We have to develop our dataresource and it's creativity mind.This blog helps for this. Thank you for this blog. This is very interesting and useful.

    Web Designing Training in Chennai

  36. Thanks for sharing such a great information..Its really nice and informative.
    PHP Training in Chennai

  37. This information is impressive; I am inspired with your post writing style & how continuously you describe this topic. After reading your post, thanks for taking the time to discuss this, I feel happy about it and I love learning more about this topic. android Training in Chennai

  38. Wonderful bloggers like yourself who would positively reply encouraged me to be more open and engaging in commenting.So know it's helpful.
    SAT Coaching Chennai

  39. I just see the post i am so happy to the communication science post of information's.So I have really enjoyed and reading your blogs for these posts.Any way I’ll be replay for your great thinks and I hope you post again soon.

    Aws Training in Chennai

  40. This comment has been removed by the author.

  41. Great site for these post and i am seeing the most of contents have useful for my Carrier.Thanks to such a useful information.Any information are commands like to share him.

    Digital Marketing Company in India

  42. Nice information. Thanks for sharing the article in the blog.

  43. Pretty section of content. I simply stumbled upon your site and in accession capital to say that I get actually loved to account your blog posts.
    Housekeeping Services in Chennai

  44. I am not sure the place you are getting your information, however good topic.I needs to spend some time studying more or understanding more.Thank you for wonderful information I was in search of this info for my mission.

    Manpower Consultancy in Bangalore
    HR Consultancy in Bangalore
    Recruitment Consultancy in Bangalore
    HR Franchise in Bangalore

  45. Thanks for sharing as it is an excellent post would love to read your future post -for more knowledge Aido | Aido Robot

  46. Great articles and great layout. Your blog post deserves all of the positive feedback it’s been getting.
    ccna training london


  47. Nice it seems to be good post... It will get readers engagement on the article since readers engagement plays an vital role in every blog.. i am expecting more updated posts from your hands.
    Android App Development Company

  48. Nice one, very much informative. thanks for your information.
    IOS Training in Chennai

  49. Finding the time and actual effort to create a superb article like this is great thing. I’ll learn many new stuff right here! Good luck for the next post buddy..
    CCNA Training in Chennai

  50. great and nice blog thanks sharing..I just want to say that all the information you have given here is awesome...Thank you very much for this one.
    web design Company
    web development Company
    web design Company in chennai
    web development Company in chennai
    web design Company in India
    web development Company in India


  51. Really it was an awesome article...very interesting to read..You have provided an nice article....Thanks for sharing..
    Mobile App development Company
    Ios App development Company

  52. Wow, you have a nice proposal and it provides a solution to a problem that has affected a lot of people in the society. I have gone through the article and you have done an outstanding job. In case you experience any challenges while writing your paper you should not hesitate to seek professional Research Paper Writing Help. I wish you all the best in your research.

  53. This comment has been removed by the author.

  54. Really it was an awesome article. Very useful & Informative
    Freshers Jobs in Chennai

  55. This comment has been removed by the author.

  56. Call :9310096831!! Leading industrial automation companies in India providing PLC SCADA DCS training in Noida. Best PLC training center in Noida.


  57. This information is impressive; I am inspired with your post writing style & how continuously you describe this topic.

    Pawn Shop

    Pawn Loans

    Pawn Shops

    Pawn Loan

    Pawn Shop near me

  58. Thanks for sharing informative article. Download Windows 7 ultimate for free from getintopc. It helps you to explore full functionality of windows operating system.

  59. Really it was an awesome article...very interesting to read..You have provided an nice article....Thanks for sharing..
    Android Training in Chennai
    Ios Training in Chennai

  60. Bán thuốc diệt kiến Nhật Bản Super Arinosu Koroki siêu an toàn, diệt 1 con lây chết cả tổ LH 0983131528

  61. This comment has been removed by the author.

  62. Please guide me about Matlab Training Institute who provide best matlab training in Jalandhar. Thanks

  63. Best Digital Marketing company Anantapur

    helpful information, thanks for writing and share this information

  64. Very nice piece of information, Digital marketing is the now a days a very important aspect of a marketing plan for a marketing manager so neglecting digital marketing can be a very big mistake which can drive away the business to the competitor.
    Best digital marketing workshop in chennai

  65. Thanks. Nice blog!! Very useful information is providing by your blog. Great tutorial about
    Best digital marketing workshop in chennai

  66. The blog has also brought me in contact with chefs and restaurants. If you visit a place often, you are bound to be taken care of by them. Its called 'regulars', I also have that tag and that makes it very difficult to objectively write about service.

    Restaurants in OMR

  67. Nice Information Digital marketing. Thanks for sharing. Great post, thanks for sharing .if you want to get
    Best seo classes in chennai

  68. This idea is mind blowing. I think everyone should know such information like you have described on this post. Thank you for sharing this explanation.Your final conclusion was good. We are sowing seeds and need to be patiently wait till it blossoms.
    Fleet Management Software
    Human Resources Management Software
    Logistics Software
    Manufacturing ERP

  69. is an online platform that allows people to find offers and the greatest deals in and around OMR. It is the directory of the business in and around omr. It helps internet users to surf the required deals and offers.

    offers in Chennai

  70. Thanks for sharing the wonderful and helpful information
    Private Taj Mahal Tour Package & Agra tour from Delhi by car. Guiding tour from Delhi start from Delhi any time from 3.00 am till 09.00 am. Depart from Delhi to Agra. Around 3 hrs drive from expressway. Agra our tour guide will join you before tour and start your guiding tour of Agra. Visit Taj Mahal- built in 17th century by 5th Mughal king Shah Jahan in the memory of his beloved queen Mumtaz. Now Taj Mahal becomes the most popular destination of India. Around 2.5 millions people visit Taj Mahal annually. It is open 30 minutes before sunrise time and close after sunset. Agra Fort - this massive Fort was built by 3rd number emperor Akbar in 16th century. This Fort is built 2.5 km aria. Fort is also coming in world heritage site listing. Late afternoon or evenings depart from Agra to Delhi. Evening driver transfer you to Delhi hotel or airport.
    So please Visit Our Website... India Trip Designer
    Thanks and best regards
    Manoj Sharma

  71. Thanks for sharing this information I really enjoyed reading this article we are open cart developers company if you are looking for any web development pls visit us.

  72. Its really an Excellent post. I just stumbled upon your blog and wanted to say that I have really enjoyed reading your blog. Thanks for sharing....

    Restaurant in OMR
    Apartments in OMR
    Villas in OMR
    Resorts in OMR