High performance computing and real-time systems
From a foreword by Jack Dongarra to [M14]:
High-performance computing (HPC) most generally refers to the practice of aggregating computing power in a way that delivers much higher performance than one could get out of a typical desktop computer or workstation in order to solve large problems in science, engineering, or business.
The time between the presentation of a set of inputs to a system (stimulus) and the realization of the required behavio[u]r (response), including the availability of all associated outputs, is called the response time of the system. A real-time system is a system that must satisfy explicit (bounded) response-time constraints or risk severe consequences, including failure.
HPC is concerned with minimizing latency and maximizing throughput (both to be defined next). (Bounded, hopefully deterministic) latency is also a concern of real-time systems. When researching and calibrating our models we need to deal with vast amounts of data and iterate quickly, therefore we need a high throughput. When running in production, we risk losing money if we respond to changing market conditions with an excessive delay, therefore we need to minimize latency. Therefore we need high-performance and real-time computing.
A central processing unit (CPU), also called a central processor, main processor or just processor, is the electronic circuitry that executes instructions comprising a computer program. The CPU performs basic arithmetic, logic, controlling, and input/output (I/O) operations specified by the instructions in the program. This contrasts with external components such as main memory and I/O circuitry, and specialized processors such as graphics processing units (GPUs).
A graphics processing unit (GPU) is a specialized electronic circuit designed to rapidly manipulate and alter memory to accelerate the creation of images in a frame buffer intended for output to a display device. GPUs are used in embedded systems, mobile phones, personal computers, workstations, and game consoles.
It is becoming increasingly common to use a general purpose graphics processing unit (GPGPU) as a modified form of stream processor (or vector processor), running compute kernels. This concept turns the massive computational power of a modern graphics accelerator's shader pipeline into general-purpose computing power, as opposed to being hardwired solely to do graphical operations.
A field-programmable gate array (FPGA) is an integrated circuit designed to be configured by a customer or a designer after manufacturing — hence the term "field-programmable". The FPGA configuration is generally specified using a hardware description language (HDL), similar to that used for an application-specific integrated circuit (ASIC).
An application-specific integrated circuit (ASIC) is an integrated circuit (IC) chip customized for a particular use, rather than intended for general-purpose use. For example, a chip designed to run a digital voice recorder or a high-efficiency bitcoin miner is an ASIC.
Tensor Processing Unit (TPU) is an AI accelerator application-specific integrated circuit (ASIC) developed by Google specifically for neural network machine learning, particularly using Google's own TensorFlow software. Google began using TPUs internally in 2015, and in 2018 made them available for third party use, both as part of its cloud infrastructure and by offering a smaller version of the chip for sale.
Graphcore's IPU is a new kind of massively parallel processor designed to accelerate machine intelligence. IPUs implement MIMD (Multiple Instruction, Multiple Data) parallelism and have distributed, local memory as the only form of memory on the device.
A quantum processing unit (QPU), also referred to as a quantum chip, is a physical (fabricated) chip that contains a number of interconnected qubits. It is the foundational component of a full quantum computer, which includes the housing environment for the QPU, the control electronics, and many other components.
- [L04] Phillip A. Laplante. Real-Time Systems Design and Analysis. IEEE Press, Wiley Interscience, 2004.
- [M14] Dariusz Mrozek. High-Performance Computational Solutions in Protein Bioinformatics. Springer, 2014.