Janssen Pharmaceuticals

Field: information technology, pharmaceutical industry

Machine learning applications are becoming increasingly used in various areas of human activity such as the energy industry, industrial automation, robotics, the automotive industry, and biomedicine. At an industrial level, these applications usually include a series of mutually interconnected data processing steps.

Within the framework of the European ExCAPE project funded by the Horizon 2020 programme, we cooperate with pharmaceutical companies in the field of drug discovery using supercomputers and machine learning methods. The opportunity to exactly predict the activity of chemical compounds enormously reduces the costs and time required for this process, and has a significant impact on innovation in therapeutic processes and increasing the possible uses of medication. For this purpose, a number of tools allowing us to face this challenge have been developed. The contribution of IT4Innovations is the open source software HyperLoom, which allows dependent computational tasks to be efficiently distributed in distributed environments.

HyperLoom users can easily define dependences between computational tasks and create a pipeline, which can then be executed on an HPC system. This software also allows users to execute pipelines containing a wide range of task types, from basic built-in tasks and user-defined tasks to tasks wrapping third-party applications including their combinations.

HyperLoom was designed to have minimal task planning and running overheads, and to efficiently deal with the varying execution times of different tasks. The software core is implemented in the C++ programming language
and is able to dynamically orchestrate tasks over available computational resources with respect to user-defined task requirements. The core of HyperLoom is composed of a server and several worker components. The server component is responsible for planning and running tasks on worker components, which run on compute nodes. Pipelines are then defined and sent to
the server via a Python interface.

Performance tests have proved that HyperLoom allows execution of pipelines containing hundreds of thousands
of interconnected tasks with unknown execution times on tens to hundreds of compute nodes. In addition, this
universal tool is also applicable in other industrial sectors.


Vladimir Chupakhin

Pharma companies have collected a significant amount of protein-ligand interactions forming the so-called chemogenomics matrix: interactions between compounds and proteins. However, this matrix is very sparse with less than 1% of this matrix being filled. Predictive modelling can help fill this matrix using a classification or regression model. Predictions in turn are used to speed-up
the drug design and development process, which can help cut costs as well as reduce animal use. While machine learning is
widely used in every step of the drug design and discovery process, it is still a hurdle to use it on big data, taking into account all of the modelling steps needed: hyperparameter search, model and
predictions storage, etc. Together with IT4Innovations, we try to overcome these challenges within the ExCAPE project.