In addition to its large supercomputers, IT4Innovations also operates smaller complementary systems.
These systems represent emerging, non-traditional, and highly specialised hardware architectures that are not yet common in supercomputing data centres.
New programming models, libraries, and application development tools are also deployed in complementary systems.
These systems thus allow research teams to test
and compare experimental architectures with traditional architectures (e.g., x86 + Nvidia GPUs) and optimise
and accelerate computations in new research areas.
Technical specifications of complementary systems, which consist of several hardware platforms:
Compute partition 1 – Arm A64FX processors
The compute nodes of the first part of complementary systems are built on Arm A64FX processors with integrated fast HBM2 memory. It is a fragment of one of the world's most powerful supercomputers in recent years, Fugaku, installed at the RIKEN Center of Computational Science in Japan (currently the second most powerful supercomputer). The configuration consists of eight HPE Apollo 80 compute nodes interconnected by a 100Gb/s Infiniband network.
Configuration of each compute node:
- 1× Arm A64FX, 48 cores, 2 GHz, 32 GB HBM2 memory
- 400 GB SSD
- HDR Infiniband 100 Gb/s
Compute partition 2 – Intel processors, Intel PMEM, Intel FPGA (Altera)
The compute nodes in this part of complementary systems are based on Intel technologies. The servers are equipped with third-generation Intel Xeon processors and persistent (non-volatile) Intel Optane memory with a total capacity of 2TB and 8TB per server, and Intel Stratix 10 FPGA cards.
This part consists of two HPE ProLiant DL380 Gen 10 Plus nodes in the configuration:
- 2× Intel Xeon Gold 6338, 32 cores, 2 GHz
- 256 GB RAM
- 8 TB and 2 TB Intel Optane Persistent Memory (NVDIMM)
- 3,2 TB NVMe SSD
- 2× FPGA Bittware 520N-MX (Intel Stratix 10)
- HDR Infiniband 100 Gb/s
Compute partition 3 – AMD processors, AMD accelerators, AMD FPGA (Xilinx)
The third part of complementary systems is built on AMD technologies. The servers are equipped with third-generation AMD EPYC processors, four AMD Instinct MI100 GPU cards interconnected by a fast bus (AMD Infinity Fabric), and two Xilinx Alveo FPGA cards, both being different in performance. Xilinx is one of AMD's latest significant acquisitions. This part consists of two HPE Apollo 6500 Gen 10+ nodes in the following configuration:
- 2× AMD EPYC 7513, 32 jader, 2,6 GHz
- 256 GB RAM
- 3,2 TB NVMe SSD
- 4× AMD Instinct MI100 (AMD Infinity Fabric Link)
- FPGA Xilinx Alveo U250
- FPGA Xilinx Alveo U280
- HDR Infiniband 100 Gb/s
Compute partition 4 – Edge server
Complementary systems also include the HPE EL1000 edge server, designed to process AI jobs directly at the data source, often outside the data centre. The server has high computing power for AI inference thanks to the NVIDIA Tesla T4 GPU accelerator, several technologies for communication (10Gb Ethernet, Wifi, LTE), and low power consumption.
- 1× Intel Xeon D-1587, 16 jader, TDP 65W
- 1x NVIDIA Tesla T4, 16 GB, TDP 70W
- 128 GB RAM
- 1,92 TB SSD storage
- Interconnect:
- 2x 10 Gbps Ethernet,
- WiFi 802.11ac,
- LTE connectivity
- Power consumption of up to 500W
Compute partition 5 – FPGA Synthesis Server
FPGAs design tools usually run for several hours to one day to generate a final bitstream (logic design) of large FPGA chips. These tools are usually sequential, therefore part of the system is a dedicated server for this task.
This server is used by development tools needed for FPGA boards installed in both Compute partition 2 and 3.
- AMD EPYC 72F3, 8 cores @ 3.7 GHz
- 128 GB of DDR4-3200 memory with ECC, memory is fully populated to maximize memory subsystem performance
- 2x NVMe disks 3.2TB, configured RAID 1
Compute partition 6 – ARM + CUDA GPGU (Ampere) + DPU
This partition consists of Gigabyte G242-P36 Server, Ampere Altra Q80-30 (80c, 3.0GHz) with ARM processors and includes GPGPU CUDA programmable accelerators with Ampere architecture and DPU processors.
Configuration of each compute node:
- 512GB DIMM DDR4, 3200MHz, ECC, CL22
- 2x Micron 7400 PRO 1920GB NVMe M.2 Non-SED Enterprise SSD
- 2x NVIDIA A30 GPU Accelerator
- 2x NVIDIA BlueField-2 E-Series DPU 25GbE Dual-Port SFP56, PCIe Gen4 x16, 16GB DDR + 64, 200Gb Ethernet
- Mellanox ConnectX-5 EN network interface card, 10/25GbE dual-port SFP28, PCIe3.0 x8
- Mellanox ConnectX-6 VPI adapter card, 100Gb/s (HDR100, EDR IB and 100GbE), single-port QSFP56
Compute partition 7 – IBM Power
Consists of a single server with two IBM Power10 processors, fast memory and fast NVMe storage. The server is suitable for computations where performance is limited by the performance of the storage system and for porting applications to the Power platform.
IBM POWER S1022 server specification:
- 2x Power10 12-CORE TYPICAL 2.90 TO 4.0 GHZ (MAX) PO
- 512GB DDIMMS, 3200 MHZ, 8GBIT DDR4
- 2x ENTERPRISE 1.6 TB SSD PCIE4 NVME U.2 MOD
- 2x ENTERPRISE 6.4 TB SSD PCIE4 NVME U.2 MOD
- PCIE3 LP 2-PORT 25/10GB NIC&ROCE SR/CU A
IBM POWER S1022
Compute partition 8 – CPU with a very large L3 cache
It is based on the HPE Proliant DL 385 Gen10 server with a very large L3 cache. The platform allows to develop algorithms and libraries requiring a large L3 cache (linear algebra, relatively small matrices).
Specification:
- Server HPE Proliant DL 385 Gen10 Plus v2 CTO
- 2x AMD EPYC 7773X Milan-X, 64 cores, 2.2GHz, 768 MB L3 cache
- 16x HPE 16GB (1x+16GB) x4 DDR4-3200 Registered Smart Memory Kit
- 2x 3.84TB NVMe RI SFF BC U.3ST MV SSD
- BCM 57412 10GbE 2p SFP+ OCP3 Adptr
- HPE IB HDR100/EN 100Gb 1p QSFP56 Adptr1
- HPE Cray Programming Environment for x86 Systems 2 Seats
Compute partition 9 – VDI (Virtual Desktop Infrastructure)
This partition consists of two HPE Proliant DL 385 Gen10 servers and each server is equipped with two NVIDIA A40 48 GB GPUs. The platform provides users with a remote/virtual workstation running MS Windows OS with a graphical interface and a focus on 3D OpenGL and RayTracing applications.
Configuration of each node:
- Server HPE Proliant DL 385 Gen10 Plus v2 CTO
- 2x AMD EPYC 7413, 24 cores, 2.55GHz
- 16x HPE 32GB 2Rx4 PC4-3200AA-R Smart Kit
- 2x 3.84TB NVMe RI SFF BC U.3ST MV SSD
- BCM 57412 10GbE 2p SFP+ OCP3 Adptr
- 2x NVIDIA A40 48GB GPU Accelerator
NVIDIA A40
Software available on compute partition 9:
- Academic VMware Horizon 8 Enterprise Term Edition: 10 Concurrent User Pack for 4 year term license; includes SnS
- 8x NVIDIA RTX Virtual Workstation, per concurrent user, EDU, perpetual license
- 32x NVIDIA RTX Virtual Workstation, per concurrent user, EDU SUMS per year
- 7x Windows Server 2022 Standard - 16 Core License Pack
- 10x Windows Server 2022 - 1 User CAL
- 40x Windows 10/11 Enterprise E3 VDA (Microsoft) per year
- Hardware VMware Horizon management
Network Infrastructure
The interconnection of individual nodes of complementary systems is provided by the high-speed, low-latency Infiniband HDR interconnection network, built on
an Nvidia/Mellanox switch with forty ports and a speed of up to 200 Gb/s. The infrastructure also includes a 10Gb Ethernet network.
Software
An important part of complementary systems is software, which includes environments, compilers, numerical libraries, and algorithm development and debugging tools.
HPE Cray Programming Environment
The HPE Cray Programming Environment is a comprehensive tool for developing HPC applications in a heterogeneous environment. It supports all complementary systems architectures. It includes optimised libraries, support for the most widely used programming languages, and several tools for analysing, debugging, and optimising parallel algorithms.
Intel oneAPI
OneAPI is Intel's tool for developing applications deployed on heterogeneous platforms - CPU, GPU, and FPGA. It is planned to be used primarily for FPGA cards in complementary systems.
AMD ROCm
ROCm is an AMD software package that includes programming models, development tools, libraries, and integration tools for the most widely used AI frameworks that run on top of AMD GPU accelerators.