Czech scientists’ first experience with pilot testing of the GPU part of the LUMI supercomputer

The world's third most powerful supercomputer and Europe's number one is fully operational. The second pilot phase of the GPU-based LUMI supercomputer has been completed, and LUMI is accepted as of February 2023, and officially ready to serve European scientists, including Czech scientists. LUMI contains a total of 10 240 graphics processors.

What was the first experience of Czech scientists from IT4Innovations with pilot testing of the GPU part of the LUMI supercomputer?

David Číž from IT4Innovations focuses on exploring the viability of training machine learning models on AMD graphics cards in an HPC environment using the machine learning TensorFlow framework. He created benchmarks that will serve as a baseline for LUMI users, and he also compared them with the same benchmarks run on the Karolina supercomputer. On the pilot testing of LUMI he adds: “From start to finish, I was impressed with how smooth the experience already was for a pilot project. The documentation is nicely organized, with clear instructions and helpful examples, which makes connecting to LUMI and using its vast resources quick and easy. Creating the required environments with the appropriate software was made simple with the pre-made EasyBuild recipes, and the support team was always quick to help and answer questions. The system runs smoothly, quickly, and is intuitive to use.”

Sergiu Arapan from IT4Innovations focuses on two-dimensional van der Waals materials, which are promising candidates for future thermoelectric materials and in compact spintronic applications. He uses state-of-the-art computational methods to study the structural and physical properties of these materials. On his first experience with the LUMI supercomputer, he says: “Using the GPU nodes can speed up our electronic structure calculations by an order of magnitude, and thus accelerate the computational design of new materials with desired properties. We have been successfully running our code on the Karolina supercomputer, but with a different GPU architecture than on LUMI. The LUMI-G pilot phase offered us the opportunity to deploy our code on the AMD GPUs. Though we still need to rework some parts of the code to fully use the capability provided by LUMI, we noticed that the Cray compiler and available mathematical libraries produce a more performant code. We would also like to give credit to the professionalism of the LUMI support staff and the easy and clear on-line documentation.”

We also asked Oldřich Plchot from the Faculty of Information Technology of Brno University of Technology:

You were one of the first Czech scientists to get the unique opportunity to use the LUMI supercomputer, testing its GPU part. Your project, entitled Training Speaker Embedding Extractors Using Multi-Speaker Audio with Unknown Speaker Boundaries, has been allocated 8,000 node hours. Could you give us an overview of your research and explain why you use supercomputers?

“The Training Speaker Embedding Extractors Using Multi-Speaker Audio with Unknown Speaker Boundaries project is specific in its requirements for the amount of audio data to be processed, which need not be strictly annotated. Our goal was to train an embedding extractor that is used in biometric applications for speaker verification. In simple terms, for each pair of speech utterances, we extract an embedding (a high-dimensional vector extracted from a neural network) and then compare it using, for example, their cosine distance to decide whether the embeddings are from the same speaker or from two different speakers. The embedding extractor is a deep convolutional neural network, and our proposed algorithm for its training can exploit weakly annotated data while optimising the objective function for speaker identification. By weakly annotated data, we mean training recordings that can contain an arbitrary number of speakers; during training, we only have information for each recording if any of the speakers are anywhere in the recording. This approach makes it possible to take advantage of the large amount of data freely available from the Internet, circumventing the current fundamental problem where we are often faced with small amounts of training data for increasingly large neural networks. Acquiring and augmenting such data is significantly cheaper than having the data annotated and segmented manually. Thus, the computing power was essentially required to train this large neural network on roughly 10 times the amount of data than is typical for manually annotated data. As the algorithm iteratively refined its estimate of where individual speakers occur in the weakly annotated data, a significantly larger number of iterations was also required.”

What was your first user experience with the LUMI supercomputer like?

“We only used nodes with graphics accelerators for our computations, and we experienced several challenges in our initial experience with LUMI and AMD accelerators in the pilot testing. For example, on Karolina we often use the feature to mount squashfs or tar files via fusermount, and this feature is missing on LUMI; we were promised an implementation of this functionality. We are glad for the opportunity to use the remaining computing time on LUMI until the end of March.”

Institutions can apply for computational resources within Open Access Grant Competitions. The grant competition is announced three times a year for employees of research, scientific, and educational organizations.

Created on: 27. 2. 2023