Rosienne Farrugia from the University of Malta explores the role of high throughput sequencing (HTS) in rare and complex diseases, including the move towards the clinical applications of genomics
High throughput sequencing (HTS) is poised to play an ever increasingly central role in the elucidation of the causes of both rare and complex diseases. Technology developments in recent years have revolutionised the approach to genetic studies making it possible to query the entire genomic sequence, detecting most variants within the genome. The same technology is also applicable to epigenomic and transcriptomic research, making possible the generation of multiple layers of high throughput data from a single sample. This data can be integrated together to give a more complete picture of gene function.
“A wide range of bioinformatics techniques and software tools are available today, enabling researchers to draw new insights from biological datasets.”
HTS has changed the way genetics and molecular biology research is carried out. In the very near future, HTS will also find widespread application in the clinical diagnostics of Mendelian disorders, greatly improving on the 20% diagnostic sensitivity of the current Sanger candidate gene approach.
With the emerging gene-specific and mutation-specific therapies, HTS will in time supplant even other non-molecular diagnostic tests. This technology enables scientists to sequence entire genomes, generate RNA profiles and investigate genome-wide epigenetic changes at an increasingly fast rate and low cost. Technological advances have made it possible to sequence multiple human genomes in less than a week.
However, substantially longer time periods and heightened bioinformatics skills are required to analyse, understand and generate meaningful results from the enormous data sets generated by genomes, transcriptomes and epigenomes. The initial bottleneck is the data processing, analysis and integration of HTS data generated from different applications. Translating these findings into clinical applications presents even more challenges.
This is due not only to the volume of the data being generated, but also the complexity of it. With high throughput sequencing, the DNA of an individual is sheared into many small fragments. All the fragments are then immobilised onto a solid support and ‘read’ simultaneously using fluorescently tagged nucleotides. Images are captured at every stage and the fluorescent signal converted into a DNA sequence. Each sequence is linked to its coordinates on the solid support, effectively giving independent sequence data for each fragment. All the data is then put together again to build the entire sequence of the 3 billion nucleotides that make up the human genome.
Next, the data is compared to reference sequences so that variations that could be the cause of disease can be pulled out and analysed further. Bioinformatics techniques play an important role here since it is not possible to manually carry out the data capture, alignment and variant identification. Furthermore, computational algorithms and pipelines allow different scenarios to be designed and quickly executed to analyse the data under different models of inheritance.
Therefore, life scientists, who are generating the data, are facing difficulties in the downstream handling of the data due to the lack of basic computational and statistical knowledge, becoming dependent on the support of bioinformaticians or statisticians. Not an easy task due to the scarcity of bioinformaticians and statisticians with the biological background required to be able to analyse and evaluate these large data sets with a profound biological insight.
The TrainMALTA project, funded through a H2020 Twinning grant under agreement number 692014, aims to tackle these limitations by providing diverse forms of training in bioinformatic analysis, enabling researchers at the University of Malta to gain new insights into the genetic causes of disease. This support and coordination action aims to achieve this through providing researchers with a solid understanding of the basis of data analysis, enabling life scientists to analyse and interpret HTS data within a biological and clinical context. This is best achieved through collaborations between life scientists, bioinformaticians and statisticians.
A wide range of bioinformatics techniques and software tools are available today, enabling researchers to draw new insights from biological datasets. The volume of data being generated and the rapid development of bioinformatics techniques means there is an ongoing need to provide high-quality training, an issue which lies at the core of the TrainMALTA project. It is also crucial to integrate this training with ongoing local research into the background of disease.
Thus, the priority of the TrainMALTA project is to equip researchers with the inter-disciplinary skill sets that they need to analyse HTS data using informatics, command line open-source tools and high-throughput analysis pipelines and a solid appreciation of the limitations of each technique and analysis pipeline being used. Analysis of these data-sets could help researchers learn more about the underlying causes of disease, marking another step towards the wider goal of personalised medicine.
Please note: this is a commercial profile
Rosienne Farrugia
Department of Applied Biomedical Science
Faculty of Health Sciences
University of Malta
Tel: +356 2340 1107/3281
www.um.edu.mt/project/trainmalta
Twitter: @uniofmalta