Wednesday, 22 November 2017

Benchmarking HPC systems

At SC17, we celebrated the 50th edition of the Top500 list. With nearly 25,000 list positions published over 25 years, the Top500 is an incredibly rich database of consistently measured performance data with associated system configurations, sites, vendors, etc. Each SC and ISC, the Top500 feeds community gossip, serious debate, the HPC media, and ambitious imaginations of HPC marketing departments. Central to the Top500 list is the infamous HPL benchmark.

Benchmarks are used to answer questions such as (naively posed): “How fast is this supercomputer?”, “How fast is my code?”, “How does my code scale?”, “Which system/processor is faster?”.

In the context of HPC, benchmarking means the collection of quantifiable data on the speed, time, scalability, efficiency, or similar characteristics of a specific combination of hardware, software, configuration, and dataset. In practice, this means running well-understood test case(s) on various HPC platforms/configurations under specified conditions or rules (for consistency) and recording appropriate data (e.g., time to completion).

These test cases may be full application codes, or subsets of those codes with representative performance behaviour, or standard benchmarks. HPL falls into the latter category, although for some applications it could fall into the second category too. In fact, this is the heart of the debate over the continued relevance of the HPL benchmark for building the Top500 list: how many real-world applications does it provide a meaningful performance guide for? But, even moving away from HPL to “user codes”, selecting a set of benchmark codes is as much a political choice (e.g., reflecting stakeholders) as it is a technical choice.