Using Supercomputers to Parallelize RTL Simulations
Guillem López-Paradís, Brian Li, Adrià Armejach, Stefan Wallentowitz, Miquel Moretó and Jonathan Balkind
Abstract
Since the popularization of multiprocessors in the last few decades, which currently offer up to hundreds of cores per chip, it has become the norm to parallelize any software to obtain the maximum performance. However, if we look into the software tools needed to develop hardware, e.g., RTL simulators, we only see little adoption of parallel techniques, typically constrained to a single node and a few threads. For example, some open-source and close-source RTL simulators make use of pthreads, obtaining good speedups but leaving room for improvement. In this work, we discuss our past and current experience using Metro-MPI to improve RTL simulations for modern chips in our research. Metro-MPI exploits the natural boundaries present in chip designs to partition RTL simulations and leverage High Performance Computing (HPC) techniques to extract parallelism. For chip designs that scale in size by exploiting latency-insensitive interfaces like networks-on-chip and AXI, Metro-MPI offers a new paradigm for RTL simulation scalability. We have tested the implementation of Metro-MPI in OpenPiton with different cores like CVA6, DVINO and Sargantana. We obtain significant speedups, energy reductions, and enable large design space exploration studies that would be prohibitively expensive otherwise.