Monday, 31 July 2017

HPC Getting More Choices - Technology Diversity

HPC has been easy for a while ...

When buying new workstations or personal computers, it is easy to adopt the simple mantra that a newer processor or higher clock frequency means your application will run faster. It is not totally true, but it works well enough. However, with High Performance Computing, HPC, it is more complicated.

HPC works by using parallel computing – the use of many computing elements together. The nature of these computing elements, how they are combined, the hardware and software ecosystems around them, and the challenges for the programmer and user vary significantly – between products and across time. Since HPC works by bringing together many technology elements, the interaction between those elements becomes as important as the elements themselves.

Whilst there has always been a variety of HPC technology solutions, there has been a strong degree of technical similarity of the majority of HPC systems in the last decade or so. This has meant that (i) code portability between platforms has been relatively easy to achieve and (ii) attention to on-node memory bandwidth (including cache optimization) and inter-node scaling aspects would get you a long way towards a single code base that performs well on many platforms.

Increase in HPC technology diversity

However, there is a marked trend of an increase in diversity of technology options over the last few years, with all signs that this is set to continue for the next few years. This includes breaking the near-ubiquity of Intel Xeon processors, the use of many-core processors for the compute elements, increasing complexity (and choice) of the data storage (memory) and movement (interconnect) hierarchies of HPC systems, new choices in software layers, new processor architectures, etc.

This means that unless your code is adjusted to effectively exploit the architecture of your HPC system, your code may not run faster at all on the newer system.

It also means HPC clusters proving themselves where custom supercomputers might have previously been the only option, and custom supercomputers delivering value where commodity clusters might have previously been the default.

Many-core processors, often referred to as accelerators, are processors that use a high degree of parallel processing within the chip – such as GPUs and Xeon Phi – and require different programming techniques to achieve the best performance. In the case of GPUs, a different language might be required (e.g., CUDA or OpenCL). It is likely that code written for one accelerator will be non-trivial to port and achieve good performance on another accelerator (maybe even when both accelerators are from the same family).

Even away from accelerators, there are strong signs that credible competitors to Intel’s Xeon CPU family are back in play. One or more of AMD’s EPYC x86 processors, ARM architecture candidates, or IBM’s OpenPower could take market share from Intel. This adds a further portability and performance tuning challenge to programmers.

Not just CPUs and GPUs

The data storage hierarchy that an executing HPC code must consider now spans registers, multiple levels of on-chip cache (some shared, some dedicated), off-chip cache or local high speed memory, on-node memory, memory on remote nodes (maybe split into topologically near and far nodes!), fast storage layers (e.g., cache buffers), and disk. Between each level of this hierarchy is a substantial sharp step in capacity, bandwidth and latency (and cost). A good compiler might manage the registers and lower cache levels, whilst some parallelisation strategies might minimize the performance effects of the off-node memories. Use of libraries or software frameworks might help manage the hierarchies.

However, the combination of processor diversity and data hierarchy means that software developers, and application buyers, must pay attention to the hardware details in a way that has not mattered this strongly for several years.

The rules of this new game are often reduced to simple sounding ideals: expose more parallelism; avoid data movement; extra compute is cheaper than data movement; etc. The reality of effective implementation turns out to be somewhat harder in practice.

What increased HPC technology diversity means

The diversity is good, because it brings competition, which helps reduce prices, and - perhaps more importantly - drives continued innovation among the suppliers. The diversity is hard, because it adds decision risk in determining the optimum technology path for buyers, and adds complexity for software developers seeking portability and performance.

Ultimately, technology diversity keeps HPC alive and fun - and it keeps our consulting business going, as we provide impartial expert advice on the technology choices and mitigating decision risk!

Are you ready to seize the opportunity performance or cost opportunities? How are you managing the decision risk associated with finding the optimum technology path for you? Are you comfortable with the potential lost performance of not exploring the technology diversity?

To discuss this further contact me at @hpcnotes or, or use the comments box below.

Tuesday, 11 July 2017

SC17 Tutorials - HPC cost models, investment cases and acquisitions

Following our successful HPC tutorials at SC16 and OGHPC17, I'm delighted to report that we've had three tutorials accepted for SC17 in Denver this November, all continuing our mission to provide HPC training opportunities for HPC people other than just programmers.

At SC17, we will be delivering these three tutorials:
  • [Sun 12th, am] "Essential HPC Finance: Total Cost of Ownership (TCO), Internal Funding, and Cost-Recovery Models"
  • [Sun 12th, pm] "Extracting Value from HPC: Business Cases, Planning, and Investment"
  • [Mon 13th, am] "HPC Acquisition and Commissioning"
In a last minute bit of co-ordination, Sharan Kalwani will be following these with his related tutorial "Data Center Design" on Mon 13th pm.

Are these tutorials any good?

The HPC procurement tutorial was successfully presented at SC13 (>100 attendees) and SC16 (~60 attendees). Feedback from the SC16 attendees was very positive: scored 4.6/5 overall and scored 2.9/3 for “recommend to a colleague.

The HPC finance tutorial was successfully presented at SC17 (~60 attendees) and at the Rice Oil & Gas HPC conference 2017 (~30 attendees). Feedback from the SC16 attendees was very positive: scored 4.3/5 overall and scored 2.7/3 for “recommend to a colleague.

The HPC business case tutorial is new for SC17.

What is the goal of the tutorials?

The tutorials provide an impartial, practical, non-sales focused guide to the business aspects of HPC facilities and services (including cloud), such as total cost of ownership, funding models, showing value and securing investing in HPC, and the process of purchasing and deploying a HPC system. All tutorials include exploration of the main issues, pros and cons of differing approaches, practical tips, hard-earned experience and potential pitfalls.

What is in the tutorials?

Essential HPC Finance Practice: Total Cost of Ownership (TCO), Internal Funding, and Cost-Recovery Models
  • Calculating and using TCO models
  • Pros and cons of different internal cost recovery and funding models
  • Updated from the SC16 base, with increased consideration of cloud vs in-house HPC
Extracting Value from HPC: Business Cases, Planning, and Investment
  • Applicable to either a first investment or an upgrade of existing capability
  • Most relevant to organizations with a clear purpose (e.g., industry) or those with a clear service mission (e.g., academic HPC facilities)
  • Identifying the value, building a business case, engaging stakeholders, securing funding, requirements capture, market survey, strategic choices, and more
HPC Acquisition and Commissioning
  • Procurement process including RFP
  • Specify what you want, yet enable the suppliers to provide innovative solutions beyond the specification both in technology and in the price
  • Bid evaluation, benchmarks, clarification processes
  • Demonstrate to stakeholders that the solution selected is best value for money
  • Contracting, project management, commissioning, acceptance testing

Who are the tutors?

Me (Andrew Jones, @hpcnotes), Owen Thomas (Red Oak Consulting), and Terry Hewitt. We have been involved in numerous major HPC procurements and other strategic HPC projects since 1990, as service managers, bidders to funding agencies, as customers and as impartial advisors. We are all from the UK but have worked around the world and the tutorials will be applicable to HPC projects and procurements anywhere. The tutorials are based on experiences across a diverse set of real world cases in various countries, in private and public sectors.

What if you need even more depth?

These SC17 tutorials will deliver a lot of content in each half day. However, if you need more depth, or a fuller range of topics, or are looking for a CV step towards becoming a future HPC manager, then our joint TACC-NAG summer training institute is the right thing for you: "Where will future HPC leaders come from?"

Hope to see you at one (or more!) of our tutorials at SC17 this November in Denver.