Partner: Ostrava University Hospital

field: HEALTHCARE

Another example of collaboration is a case study that was developed with the Institute of Clinical and Molecular Pathology and Medical Genetics. The Department of Medical Genetics has an exceptional position within Ostrava University Hospital due to its supra-regional importance resulting from the catchment area of the Moravian-Silesian region. This department offers genetic counselling and performs specialised examinations in cooperation with its cytogenetic and molecular genetic laboratories as well as with laboratories throughout the Czech Republic and abroad. It is a multidisciplinary field whose results are reflected in all fields of medical care, and their primary focus is prevention. They deal with the diagnosis of congenital developmental defects, developmental disorders in childhood, monogenic diseases, oncological, neurological and neurodegenerative diseases, etc. Many of their clients are pregnant women and couples planning a pregnancy, and as such they also deal with the diagnosis of fertility disorders, as well as with planning a pregnancy in couples carrying genetic diseases, family burdens, or those in a kinship relationship.

The main purpose of this cooperation was to test the execution of complex computational tasks from the area of next-generation sequencing (NGS) on the supercomputing infrastructure of IT4Innovations to acquire a baseline in terms of computational complexity, scalability, and data volume of NGS pipelines.

IT4Innovations’ systems Barbora and Karolina were used to test the complexity of NGS data processing. The state-of-the-art processing pipeline is written in the Nextflow workflow language, and thus enables the full use of the available computational resources. Nextflow pipelines consist of so-called processes, which perform a given task on the input data. Run time management is implemented here, so the pipelines monitor which processes or tasks themselves can run in parallel so that the limit of allocated hardware (HW) resources is not exceeded. Testing took place on two types of data: exome (large, 3 patients) and panel (small, MR-MIKRO4 panel, 17 patients).

The solution consists of a number of NGS data processing benchmarks that were performed on the supercomputing infrastructure to provide a baseline in terms of computational and data requirements. The benchmarks were performed for the exome and panel data types. Each pipeline run was separated into several parts: mapping, variant calling, and annotation. The benchmark statistics were acquired for each type of data and pipeline part in terms of execution time, used CPU hours, peak CPU usage, peak RAM, peak RAM+swap, user data volume, and cache volume. The benchmarks’ results will be used by the partner as a baseline for HW specification used for the procurement of a dedicated HW located in partner’s premises. This HW will significantly speed up the processing of genetic screening results of clinical patients.

This success story was supported by the EuroCC project. This project has received funding from the European High-Performance Computing Joint Undertaking (JU) under grant agreement No 951732. The JU receives support from the European Union’s Horizon 2020 research and innovation programme and Germany, Bulgaria, Austria, Croatia, Cyprus, the Czech Republic, Denmark, Estonia, Finland, Greece, Hungary, Ireland, Italy, Lithuania, Latvia, Poland, Portugal, Romania, Slovenia, Spain, Sweden, the United Kingdom, France, the Netherlands, Belgium, Luxembourg, Slovakia, Norway, Switzerland, Turkey, Republic of North Macedonia, Iceland, Montenegro. This project has received funding from the Ministry of Education, Youth and Sports of the Czech Republic (ID:MC2101).