Friday, 29 September 2017

Finding a Competitive Advantage with High Performance Computing

High Performance Computing (HPC), or supercomputing, is a critical enabling capability for many industries, including energy, aerospace, automotive, manufacturing, and more. However, one of the most important aspects of HPC is that HPC is not only an enabler, it is often also a differentiator – a fundamental means of gaining a competitive advantage.

Differentiating with HPC


Differentiating (gaining a competitive advantage) through HPC can include:
  • faster - complete calculations in a shorter time;
  • more - complete more computations in a given amount of time;
  • better - undertake more complex computations;
  • cheaper - deliver computations at a lower cost;
  • confidence - increase the confidence in the results of the computations; and 
  • impact - effectively exploiting the results of the computations in the business.
These are all powerful business benefits, enabling quicker and better decision making, reducing the cost of business operations, better understanding risk, supporting safety, etc.

Strategic delivery choices are the broad decisions about how to do/use HPC within an organization. This might include:
  • choosing between cloud computing and traditional in-house HPC systems (or points on a spectrum between these two extremes);
  • selecting between a cost-driven hardware philosophy and a capability-driven hardware philosophy;
  • deciding on a balance of internal capability and externally acquired capability;
  • choices on the balance of investment across hardware, software, people and processes.
The answers to these strategic choices will depend on the environment (market landscape, other players, etc.), how and where you want to navigate that environment, and why. This is an area where our consulting customers benefit from our expertise and experience. If I were to extract a core piece of advice from those many consulting projects, it would be: "explicitly make a decision rather than drift into one, and document the reasons, risk accepted, and stakeholder buy-in".

Which HPC technology?


A key means of differentiating with HPC, and one of the most visible, is through the choice of hardware technologies used and at what scale. The HPC market is currently enjoying (or is it suffering?) a broader range of credible hardware technology options than the previous few years.

Whilst systems based on Intel Xeon (Skylake is the current iteration) are still the dominant choice, there are several viable alternatives. Other processor options include AMD's EPYC, IBM's (Open)Power, and ARM family (e.g., Cavium's ThunderX series).

The use of Graphical Processing Units (CPUs) for computations has become mature over recent years, and these are now a realistic candidate for use in production environments. The GPU-for-HPC space is dominated by NVidia in terms of presence, product and ecosystem maturity. AMD has a competing GPU product that has some possible performance advantages, with arguably a less rich ecosystem, although it is showing promise for maturing rapidly.

Intel’s manycore processor, the Xeon Phi (often referred to by the codename of the current generation, Knights Landing, or KNL), is an alternative that is attracting attention. The Xeon Phi promises higher performance than traditional Xeon processors, without the coding discontinuity of the GPU solutions.

The interaction of these hardware technology choices with the other aspects of the competitive game - i.e., software, people and processes - must also be considered.

Our team does a lot of benchmarking and other work to understand the relative performance, benefits, risks and futures of each technology. I'm not going to pass comment on the pros or cons of any of these options here, although I will note that there is no clear winner - it will depend on your business situation, application needs, other strategic choices, etc. - we can help you navigate and de-risk that decision space.

Software, People, Delivery


Perhaps the largest element of inertia in any substantial computational science effort is the application software. More than other components of the HPC ecosystem, application software (especially in-house software), persists for years, is expensive to evolve, and is subject to inertia from users, developers and funders. Yet, done right, application software can be one of the most capable elements in terms of creating a differentiation. Software must be part of the competitive landscape for HPC, along with how software interacts with the hardware and people choices.

The relative investments in people versus hardware, in which types of people and how they are engaged, can make substantial differences to the competitive landscape.

Finally, often forgotten when discussing HPC services, trends or competitive advantages, is the business aspects of HPC – how effectively HPC is delivered, measured, and used within the business. We are seeing growing demand for our training in this area (e.g., tutorials at SC), which is encouraging.

Finding the best HPC solution


Together, these elements provide a large multidimensional parameter space over which to seek meaningful differentiation in the use of HPC with respect to competitors.

HPC, especially in the context of competing, is best thought of not as IT – but as a powerful business/research tool that just happens to be built using IT components.

This philosophy underlines why HPC must take different approaches compared with normal IT across technology planning, acquisition, user service delivery, operational practices, cost-value calculations, etc. It also explains why HPC is never a commodity in a competitive sector – even if it employs mostly commodity components.

Whether the differentiation is in the HPC itself, or in the use of HPC-enabled results, or both, the optimum solution for one organization may be far from the optimum solution for another organization. Understanding the competitive possibilities, and creating the right edge over competitors through differentiated HPC is thus a critical part of any discussion of HPC planning, delivery and use.

To discuss this further contact me at @hpcnotes or www.linkedin.com/in/andrewjones, or use the comments box below.

Monday, 31 July 2017

HPC Getting More Choices - Technology Diversity

HPC has been easy for a while ...


When buying new workstations or personal computers, it is easy to adopt the simple mantra that a newer processor or higher clock frequency means your application will run faster. It is not totally true, but it works well enough. However, with High Performance Computing, HPC, it is more complicated.

HPC works by using parallel computing – the use of many computing elements together. The nature of these computing elements, how they are combined, the hardware and software ecosystems around them, and the challenges for the programmer and user vary significantly – between products and across time. Since HPC works by bringing together many technology elements, the interaction between those elements becomes as important as the elements themselves.

Whilst there has always been a variety of HPC technology solutions, there has been a strong degree of technical similarity of the majority of HPC systems in the last decade or so. This has meant that (i) code portability between platforms has been relatively easy to achieve and (ii) attention to on-node memory bandwidth (including cache optimization) and inter-node scaling aspects would get you a long way towards a single code base that performs well on many platforms.

Increase in HPC technology diversity


However, there is a marked trend of an increase in diversity of technology options over the last few years, with all signs that this is set to continue for the next few years. This includes breaking the near-ubiquity of Intel Xeon processors, the use of many-core processors for the compute elements, increasing complexity (and choice) of the data storage (memory) and movement (interconnect) hierarchies of HPC systems, new choices in software layers, new processor architectures, etc.

This means that unless your code is adjusted to effectively exploit the architecture of your HPC system, your code may not run faster at all on the newer system.

It also means HPC clusters proving themselves where custom supercomputers might have previously been the only option, and custom supercomputers delivering value where commodity clusters might have previously been the default.

Many-core processors, often referred to as accelerators, are processors that use a high degree of parallel processing within the chip – such as GPUs and Xeon Phi – and require different programming techniques to achieve the best performance. In the case of GPUs, a different language might be required (e.g., CUDA or OpenCL). It is likely that code written for one accelerator will be non-trivial to port and achieve good performance on another accelerator (maybe even when both accelerators are from the same family).

Even away from accelerators, there are strong signs that credible competitors to Intel’s Xeon CPU family are back in play. One or more of AMD’s EPYC x86 processors, ARM architecture candidates, or IBM’s OpenPower could take market share from Intel. This adds a further portability and performance tuning challenge to programmers.

Not just CPUs and GPUs


The data storage hierarchy that an executing HPC code must consider now spans registers, multiple levels of on-chip cache (some shared, some dedicated), off-chip cache or local high speed memory, on-node memory, memory on remote nodes (maybe split into topologically near and far nodes!), fast storage layers (e.g., cache buffers), and disk. Between each level of this hierarchy is a substantial sharp step in capacity, bandwidth and latency (and cost). A good compiler might manage the registers and lower cache levels, whilst some parallelisation strategies might minimize the performance effects of the off-node memories. Use of libraries or software frameworks might help manage the hierarchies.

However, the combination of processor diversity and data hierarchy means that software developers, and application buyers, must pay attention to the hardware details in a way that has not mattered this strongly for several years.

The rules of this new game are often reduced to simple sounding ideals: expose more parallelism; avoid data movement; extra compute is cheaper than data movement; etc. The reality of effective implementation turns out to be somewhat harder in practice.

What increased HPC technology diversity means


The diversity is good, because it brings competition, which helps reduce prices, and - perhaps more importantly - drives continued innovation among the suppliers. The diversity is hard, because it adds decision risk in determining the optimum technology path for buyers, and adds complexity for software developers seeking portability and performance.

Ultimately, technology diversity keeps HPC alive and fun - and it keeps our consulting business going, as we provide impartial expert advice on the technology choices and mitigating decision risk!

Are you ready to seize the opportunity performance or cost opportunities? How are you managing the decision risk associated with finding the optimum technology path for you? Are you comfortable with the potential lost performance of not exploring the technology diversity?

To discuss this further contact me at @hpcnotes or www.linkedin.com/in/andrewjones, or use the comments box below.

Tuesday, 11 July 2017

SC17 Tutorials - HPC cost models, investment cases and acquisitions

Following our successful HPC tutorials at SC16 and OGHPC17, I'm delighted to report that we've had three tutorials accepted for SC17 in Denver this November, all continuing our mission to provide HPC training opportunities for HPC people other than just programmers.

At SC17, we will be delivering these three tutorials:
  • [Sun 12th, am] "Essential HPC Finance: Total Cost of Ownership (TCO), Internal Funding, and Cost-Recovery Models"
  • [Sun 12th, pm] "Extracting Value from HPC: Business Cases, Planning, and Investment"
  • [Mon 13th, am] "HPC Acquisition and Commissioning"
In a last minute bit of co-ordination, Sharan Kalwani will be following these with his related tutorial "Data Center Design" on Mon 13th pm.

Are these tutorials any good?


The HPC procurement tutorial was successfully presented at SC13 (>100 attendees) and SC16 (~60 attendees). Feedback from the SC16 attendees was very positive: scored 4.6/5 overall and scored 2.9/3 for “recommend to a colleague.

The HPC finance tutorial was successfully presented at SC17 (~60 attendees) and at the Rice Oil & Gas HPC conference 2017 (~30 attendees). Feedback from the SC16 attendees was very positive: scored 4.3/5 overall and scored 2.7/3 for “recommend to a colleague.

The HPC business case tutorial is new for SC17.

What is the goal of the tutorials?


The tutorials provide an impartial, practical, non-sales focused guide to the business aspects of HPC facilities and services (including cloud), such as total cost of ownership, funding models, showing value and securing investing in HPC, and the process of purchasing and deploying a HPC system. All tutorials include exploration of the main issues, pros and cons of differing approaches, practical tips, hard-earned experience and potential pitfalls.

What is in the tutorials?


Essential HPC Finance Practice: Total Cost of Ownership (TCO), Internal Funding, and Cost-Recovery Models
  • Calculating and using TCO models
  • Pros and cons of different internal cost recovery and funding models
  • Updated from the SC16 base, with increased consideration of cloud vs in-house HPC
Extracting Value from HPC: Business Cases, Planning, and Investment
  • Applicable to either a first investment or an upgrade of existing capability
  • Most relevant to organizations with a clear purpose (e.g., industry) or those with a clear service mission (e.g., academic HPC facilities)
  • Identifying the value, building a business case, engaging stakeholders, securing funding, requirements capture, market survey, strategic choices, and more
HPC Acquisition and Commissioning
  • Procurement process including RFP
  • Specify what you want, yet enable the suppliers to provide innovative solutions beyond the specification both in technology and in the price
  • Bid evaluation, benchmarks, clarification processes
  • Demonstrate to stakeholders that the solution selected is best value for money
  • Contracting, project management, commissioning, acceptance testing

Who are the tutors?


Me (Andrew Jones, @hpcnotes), Owen Thomas (Red Oak Consulting), and Terry Hewitt. We have been involved in numerous major HPC procurements and other strategic HPC projects since 1990, as service managers, bidders to funding agencies, as customers and as impartial advisors. We are all from the UK but have worked around the world and the tutorials will be applicable to HPC projects and procurements anywhere. The tutorials are based on experiences across a diverse set of real world cases in various countries, in private and public sectors.

What if you need even more depth?


These SC17 tutorials will deliver a lot of content in each half day. However, if you need more depth, or a fuller range of topics, or are looking for a CV step towards becoming a future HPC manager, then our joint TACC-NAG summer training institute is the right thing for you: "Where will future HPC leaders come from?"



Hope to see you at one (or more!) of our tutorials at SC17 this November in Denver.
@hpcnotes


Wednesday, 28 June 2017

Is cloud inevitable for HPC?

In 2009, I wrote this article for HPC Wire: "2009-2019: A Look Back on a Decade of Supercomputing", pretending to look back on supercomputing between 2009 and 2019 from the perspective of beyond 2020.

The article opens with the idea that owning your own supercomputer was a thing of the past:
"As we turn the decade into the 2020s, we take a nostalgic look back at the last ten years of supercomputing. It’s amazing to think how much has changed in that time. Many of our older readers will recall how things were before the official Planetary Supercomputing Facilities at Shanghai, Oak Ridge and Saclay were established. Strange as it may seem now, each country — in fact, each university or company — had its own supercomputer!"
I got this bit wrong:
"And then the critical step — businesses and researchers finally understood that their competitive asset was the capabilities of their modelling software and user expertise — not the hardware itself. Successful businesses rushed to establish a lead over their competitors by investing in their modelling capability — especially robustness (getting trustable predictions/analysis), scalability (being able to process much larger datasets than before) and performance (driving down time to solutions)."
Hardware still matters - in some cases - as a means of gaining a competitive advantage in performance or cost [We help advise if that is true for our HPC consulting customers, and how to ensure the operational and strategic advantage is measured and optimized].

And, of course, my predicted rush to invest in software and people hasn't quite happened yet.

Towards the end, I predicted three major computing providers, from which most people got their HPC needs:
"We have now left the housing and daily care of the hardware to the specialists. The volume of public and private demand has set the scene for strong HPC provision into the future. We have the three official global providers to ensure consumer choice, with its competitive benefits, but few enough providers to underpin their business cases for the most capable possible HPC infrastructure."
Whilst my predictions were a little off in timing, some could be argued to have come true e.g., the rise to the top of Chinese supercomputing, the increasing likelihood of using someone else's supercomputer rather than buying your own (even if we still call it cloud), etc.

With the ongoing debate around cloud vs in-house HPC (where I am desperately trying to inject some impartial debate to balance the relentless and brash cloud marketing), re-visiting this article made an interesting trip down memory lane for me. I hope you might enjoy it too.

As I recently posted on LinkedIn:
"Cloud will never be the right solution for everyone/every use case. Cloud is rightly the default now for corporate IT, hosted applications, etc. But, this cloud-for-everything is unfortunately, wrongly, extrapolated to specialist computing (e.g.,  high performance computing, HPC), where cloud won't be the default for a long time.
For many HPC users, cloud is becoming a viable path to HPC, and very soon perhaps even the default option for many use cases. But, cloud is not yet, and probably never will be, the right solution for everyone. There will always be those who can legitimately justify a specialized capability (e.g., a dedicated HPC facility) rather than a commodity solution (i.e., cloud, even "HPC cloud"). The reasons for this might include better performance, specific operational constraints, lower TCO, etc. that only specialized facilities can deliver. 
The trick is to get an unbiased view for your specific situation, and you should be aware that most of the commentators on cloud are trying to sell cloud solutions or related services, so are not giving you impartial advice!"
[We provide that impartial advice on cloud, measuring performance, TCO, and related topics to our HPC consulting customers]


@hpcnotes

Wednesday, 21 June 2017

Deeply learning about HPC - ISC17 day 3 summary - Wednesday evening

For most of the HPC people gathered in Frankfurt for ISC17, Wednesday evening marks the end of the hard work, the start of the journey home for some, already home for others. A few hardy souls will hang on until Thursday for the workshops. So, as you relax with a drink in Frankfurt, trudge through airports on the way home, or catch up on the week's emails, here's my final daily summary of ISC17, as seen through the lens of twitter, private conversations, and the HPC media.

This follows my highlights blogs from Monday "Cutting through the ISC17 clutter"  (~20k views so far) and Tuesday "ISC17 information overload" (~4k views so far).

So what sticks out from the last day, and what sticks out from the week overall?

Deep Learning

Wednesday was touted by ISC as "deep learning day". If we follow the current convention (inaccurate but seemingly pervasive) of using deep learning, machine learning, AI (nobody actually spells out artificial intelligence), big data, data analytics, etc. as totally interchangeable terms (why let facts get in the way of good marketing?), then Wednesday was indeed deep learning day, judging by by tweet references to one or more of the above. However, I struggle to nail down exactly what I am supposed to have learnt about HPC and deep learning from today's content. Perhaps you had to be there in person (there is a reason why attending conferences is better than watching via twitter).

I think my main observations are:
  • DL/ML/AI/BigData/analytics/... is a real and growing part of the HPC world - both in terms of "traditional" HPC users looking at these topics, and new users from these backgrounds peering into the HPC community to seek performance advantages.
  • A huge proportion of the HPC community doesn't really know what DL/ML/... actually means in practice (which software, use case, workflow, skills, performance characteristics, ...).
  • It is hard to find the reality behind the marketing of DL/ML/... products, technologies, and "success stories" of the various vendors. But, hey, what's new? - I was driven to deal with this issue for GPUs and cloud in my recent webinar "Dissecting the myths of Cloud and GPUs for HPC".
  • Between all of the above, I still feel there is a huge opportunity being missed: for users in either community and for the technology/product providers. I don't have the answers though.

Snippets

Barcelona (BSC) has joined other HPC centers (e.g., Bristol Isambard, Cambridge Peta5, ...) in buying a bit of everything to explore the technology diversity for future HPC systems: "New MareNostrum Supercomputer Reflects Processor Choices Confronting HPC Users".

Exascale is now a world-wide game: China, European countries, USA, Japan are all close enough to start talking about how they might get to exascale, rather than merely visions of wanting to get there.

People are on the agenda: growing the future HPC talent, e.g., the ISC STEM Student Day Day & Gala, the Student Cluster Competition, gender diversity (Women-in-HPC activities), and more.

Wrapping up

There are some parts of ISC that have been repeated over the years due to demand. Thomas Sterling's annual "HPC Achievement & Impact" keynote that traditionally closes ISC (presenting as I write this) is an excellent session and goes a long way towards justifying the technical program registration fee.

2017 sees the welcome return of Addison Snell's "Analyst Crossfire". With a great selection of questions, fast pace, and well chosen panel members, this is always a good event. Of course, I am biased towards the ISC11 Analyst Crossfire being the best one!

I'll join Addison's fun with my "one up, one down" for ISC17. Up is CSCS, not merely for Piz Daint knocking the USA out of the top 3 of the Top500, but for a sustained program of supercomputing over many years, culminating in this leadership position. Down is Intel - brings a decent CPU to market in Skylake but gets backlash for pricing, has to face uncertainty over the CORAL Aurora project, and in spite of a typically high profile presence at the show, a re-emerging rival AMD takes a good share of the twitter & press limelight with EPYC.


Until next time

That's all from me for ISC17. I'll be back with more blogs over the next few weeks, based on my recent conference talks (e.g., "Six Trends in HPC for Engineers" and "Measuring the Business Impact of HPC").

You can catch up with me in person at the SEG Annual Meeting, EAGE HPC Workshop (I'm presenting), the TACC-NAG Training Institute for Managers, and SC17 (I can reveal we will be delivering tutorials again, including a new one - more details soon!).

In the meantime, interact with me on twitter @hpcnotes, where I provide pointers to key HPC content, plus my comments and opinions on HPC matters (with a bit of F1 and travel geekery thrown in for fun).

Safe travels,

Tuesday, 20 June 2017

ISC17 information overload - Tuesday afternoon summary

I hope you've been enjoying a productive ISC17 if you are in Frankfurt, or if not have been able to keep up with the ISC17 news flow from afar.

My ISC17 highlights blog post from yesterday ("Cutting through the clutter of ISC17: Monday lunchtime summary") seems to have collected over 11,000 page-views so far. Since this hpcnotes blog normally only manages several hundred to a few thousand page views per post, I'm assuming a bot somewhere is inflating the stats. However, there are probably enough real readers to make me write another one. So here goes - my highlights of ISC17 news flow as of Tuesday mid-afternoon.

Monday, 19 June 2017

How to keep up with the HPC news from ISC17

Overwhelmed by the HPC information pouring out of ISC17? Twitter, press releases, media stories, exhibitors, presentations, etc.? How to keep up?

Twitter

  • @ischpc - the official ISC stream
  • @HPC_Guru - the anonymous tweeting wonder that feeds the HPC community's appetite for news, and adds targeted comments
  • @hpcnotes (me) - a subset of @hpc_guru's stream, plus my own extra snippets and opinion
The above three will get you most of what you need (in my opinion!) but you can gain useful additonal information by more exploring who the above three interact with throughout ISC17.

If you are a glutton, then follow #ISC17.

I'll update the above list throughout ISC17 if other tweeters become key commentators, but you might also find this (mildly out of date) list of HPC twitter accounts handy.

HPC Notes

Of course, I would say the most essential method is reading my ISC17 summary blogs!

Media

If you prefer commentary and press releases from the main HPC media then here are the mian options:

  • Top500.org - your first port of call for the main announcements and editor Michael Feldman's analysis
  • The Next Platform - in depth analysis of the stories behind the press releases from Nicole Hemsoth and Timothy Prickett-Morgan
  • InsideHPC - a selection of announcements, plus audio/video news and interviews from the show floor, by Rich Brueckner
  • HPC Wire - the most comprehensive list of HPC press releases, with other articles by Tiffany Trader and Doug Black
Happy reading!




Cutting through the clutter of ISC17: Monday lunchtime summary

ISC, the HPC community's 2nd biggest annual gathering, in fully underway in Frankfurt now. ISC week is characterized by a vibrant twitter flood (#ISC17), topped up with a deluge of press releases (a small subset of which are actually news), plus a plethora of news and analysis pieces in the HPC media. And, of course, anyone physically present at ISC, has presentations, meetings, and exhibitors further demanding their attention.

I go to ISC almost every year. It is a valuable use of time for anyone in the HPC community or who uses, or has an interest in, HPC even if they don't see themselves as part of the HPC community. However, I have decided not to attend ISC this year, due to other commitments. However, I will keep an eye on the "news" throughout the week and post a handful of summary blogs (like this one), which might be a useful catch-up on "news" so far, whether you are attending ISC or watching from afar.

Tuesday, 6 December 2016

Secrets, lies, women and money: the definitive summary of SC16 - Part 2

I'm usually not shy of speaking my opinions (if you read Part 1 of my summary of SC16, then you’ll know that marketing departments through the land of HPC are busy taking my name off their Christmas card lists 😀), but this Part 2 blog is probably sticking my neck out even further than normal, with some potentially uncomfortable opinions.

SC is arguably the main event of the year for the HPC/supercomputing community. And so it becomes an annual cauldron, relentlessly bubbling to the surface those issues that are most topical for the HPC world. In 2016, two of those issues were women and money.

Saturday, 26 November 2016

Secrets, lies, women and money: the definitive summary of SC16 - Part 1

Just over a week ago 11,000 people were making their way home from the biggest supercomputing event of the year – SC16 in Salt Lake City. With so much going on at SC, even those who were there in person likely still missed a huge proportion of what happened. It’s simply too busy to keep up with all the news during the week, too many events/talks/meetings happening in parallel, and much of the interesting stuff only gets talked about behind closed doors or through informal networking.

There were even a couple of top-notch tutorials on HPC acquisition and TCO/funding models :-)

Amongst this productive chaos, I was flattered to be told several times at during SC that people find my blogs worth reading and commented they hadn’t seen any recently. I guess the subtext was “it’s about time I wrote some more”. So, I’ll make an effort to blog more often again. Starting with my thoughts on SC16 itself.

As ever, while I do soften the occasional punch in my writing (not usually in person though), there remains the possibility that some readers won’t like some of my opinions, and there’s always the risk of me straying into controversy in places.

I've got four topics to cover: secrets, lies, women and money.