Using HPC Techniques to Accelerate NGS Workflows

Next-generation sequencing (NGS) describes the modern nucleotide sequencing technologies that allow analysis of genetic material with unprecedented speed and efficiency. Its advent is shifting genome assembly from a problem of laboratory-based chemistry to one well suited to high performance computing (HPC). In simple terms, NGS involves breaking up long DNA or RNA molecules into millions of small, fragments (50 to 200 nucleotides), defined as a “reads” to be assembled into larger fragments called contigs.

The process of taking genetic material, processing it on a sequencer, passing it to an HPC system for assembly, and outputting digital information in a form useful for research is contained in a “workflow,” the end-to-end flow of genetic information. Across this workflow there are bottlenecks, where individual steps can dramatically slow down of the workflow.

The success of the new technologies — the introduction of sequencers that generate data snippets faster — has come at a price. Sequencers produce reads that are too small (< 150 bp) for commonly used assembler code sets based on overlap-layout-consensus algorithms.

Instead, de Bruijn graph-based assemblers have proven to be successful at assembling short reads. Taken one step further, leveraging distributed memory parallelism can be an important enhancement of the performance and resource utilization of NGS workflows. Cray is working closely with the Broad Institute to optimize one of the modules in Trinity, an open source application for de novo reconstruction of RNA-seq data.

This open source and freely available application combines three independent software modules — Inchworm, Chrysalis and Butterfly — applied sequentially to process large volumes of RNA-seq reads. It partitions the sequence data into many individual de Bruijn graphs, each representing the transcriptional complexity at a given gene or locus, then processes each graph independently to extract full-length splicing isoforms and tease apart transcripts derived from paralogous genes.

At the upcoming Intelligent Systems for Molecular Biology (ISMB) conference we’ll review some commonly used NGS workflows and highlight opportunities for improving their performance. Along with sharing our findings on the use of distributed memory parallelism, we’ll highlight our project to parallelize the Broad Institute’s Inchworm module in Trinity RNA-Seq program, enabling efficient scaling to thousands of processors. We’ll focus on the first part of the code (Inchworm) that was parallelized using MPI, keeping in mind the whole structure of this workflow.

Beyond Trinity and Inchworm, we at Cray believe efficient distributed memory parallel implementations improve most types of bioinformatics workflows, and the benefits are very worthwhile.

Cray at ISMB

Mark your calendars for a technical talk by Cray’s Pierre Carrier on using HPC techniques to accelerate HPC workflows (Session TT31, Tuesday, July 15 at 2:30 p.m.). Cray, along with representatives from the Broad Institute and Technische Universität Dresden, will also present a poster session on High-Performance De Novo RNA-Transcript Assembly Leveraging Distributed Memory and Massive Parallelization (Poster-B18, Monday, July 14 at 5:45 p.m.).

Carlos P. Sosa, Chemistry and Life Sciences Segment Manager

Image may be NSFW.
Clik here to view.

The post Using HPC Techniques to Accelerate NGS Workflows appeared first on Cray Blog.

Using HPC Techniques to Accelerate NGS Workflows

Trending Articles

Practice Sheet of Right form of verbs for HSC Students

Download: FK ft Shenky – Nakuyewa ”Prod by: Shenky”

How to win at Markstrat (Markstrat Tips and Tricks) – Vodites

Ominde Commission Report and Recommendations – Ominde Report of 1964

Bureau of Internal Revenue: Regional Offices (Directory)

GO 53 on Enhancement of Ex-gratia upto 5 Lakhs Toddy Tappers in Telangana

Cakewalk CA-2A Leveling Amplifier v2.0.1.97 WiN, v2.0.1.96 OSX Incl Keygen

Mp3 Download: Mdu - Kunjenjenjena

How the kill the job , when DTP request running for long hours.

Microsoft Intune から展開しているアプリのアップデートについて

18-year-old girl was beaten for half an hour by two Northampton men in 'an...

Car crash in Dunton Bassett leaves driver in critical condition

Macky 2, Two Others In Road Accident

Application log 00000000000000089514: Could not convert queue DLVST90CLNT

Detroit mafia: D’Anna Brothers agree to plea deal

Delivery block field greyed out using VA02

Muloraki Au

【個人撮影】スマホのプライベート映像♪「中に出さないで///」カラオケ屋での生ハメ撮りが流出ｗ【リベンジポルノ】＠PornHub

BREAKING NEWS: Diamond Platnumz Is Reported Dead After Ghastly Car Accident

FIAT 500 B0111 B0112