Serial Killers: The massively parallel processors behind the AI and crypto revolutions (beans and a nice chianti not required)
Before arriving at the death scene, let’s go back in time …
The story tends to focus on adding new sources of energy, such as waterwheels and steam engines, as a transformative aspect of the Industrial Revolution. Arguably, separating the production of goods into distinct tasks and then having specialized systems to perform those tasks on a large scale was the real revolution. In the textile industry, the first generalists operating as a cottage industry – those skilled individuals who could spin, weave, and sew – were comfortably overwhelmed when tasks were separated and performed by groups of specialists in new factories.
Generalists would undertake serial tasks, one after another: carding wool or cotton, then spun it into a single yarn, then weaving fabrics and then making clothes. Factories had many workers doing parallel tasks, with floors of spinning machines and looms working on several threads at a time, respectively.
It is perhaps not surprising that this analogy was adopted by computer pioneers – from the late 1960s, collections of discrete instructions that could be programmed to be executed by a computer began to appear. be called “threads”. A computer that could perform a set of tasks at a time was “single-threaded”, and those that could handle several in parallel were “multi-threaded.”
Home computers – a new cottage industry
The advent of personal computers in the late 1970s depended on reducing the cost of a useful computing device to the point where it could accommodate the discretionary expenses of a sufficiently large section of society. Starting with 8-bit computers like the Apple II or Commodore PET, and progressing through the 16-bit era and into the era of IBM PC-compatible dominance in the 90s and early 2000s ( 286, 386, 486 and Pentium processors), personal computer hardware was almost universally single-threaded. Smart scheduling meant that multitasking – or the ability for two or more applications to appear to be running at the same time – existed at the operating system level. Amiga OS was a particularly early example, and the feature came to PC with much fanfare in Windows 95. Even when operating system-level multitasking was used, under the hood, processors dutifully executed serial instructions on only one thread at any given time. Series, not parallel.
While there had been a few rare personal computers with two or more processors available before, true multi-threading became widely available with the advent of the Pentium IV processor in 2002. Before long, multi-core processors, each capable to manage up to two threads, were commonplace. Today, 4, 6, or 8-core processors with 4, 8, or 16 threads are basic offerings, and workstation-class processors can have 28 or more cores. The single-threaded cottage industry of the early computer age is giving way to multi-threaded factories inside the CPU.
Entering the third dimension
The single-threaded processors of the early 90s were still powerful enough to spark a 3D revolution. Raycasting technologies, pseudo-3D engines running entirely on the CPU, allowed gamers to film everything from Nazis to demons invading Mars… I promised in advance that there would be deaths.
True 3D engines, with texture mapping, lighting effects, transparency, greater color depths, and higher resolutions required more simultaneous calculations than processors of the day could support. A new generation of specialized coprocessors is born: 3D graphics cards.
Instead of a second general-purpose processor capable of performing a range of different types of calculations with high levels of precision, these new processors have been transformed to perform the specific types of linear algebra and matrix manipulations for games. 3D at a ‘good enough’ level of accuracy. It’s important to note that these graphics processing units, or GPUs, were made up of several simple computing cores individually on a single chip, allowing many lower precision calculations to be performed in parallel.
More than a pretty picture
In just a few years, GPUs have revolutionized PC gaming. In 1996, it was rare for a PC to be sold with a GPU. In 1999, a dedicated gamer wouldn’t consider a PC without one. Today, even the most business-oriented PC will run a processor with built-in 3D graphics acceleration, and gamers will spend thousands of dollars on AMD’s latest graphics cards. Even though they are often integrated into the CPU, GPUs are ubiquitous.
Even with today’s multicore and multithreaded processors, the number of concurrent threads a GPU can run will eclipse the number the processor can handle. Since GPU hardware is part of the standard PC configuration, plans inevitably exist to unlock this parallel computing power for other purposes. Grouped under the banner of “General Purpose Computing on Graphics Processing Units” (GPGPU), projects such as OpenCL give programmers access to the massively parallel architecture of today’s GPUs.
One particular use case that has created massive demand and led to GPU shortages are blockchain technologies – and cryptocurrency mining in particular. Since the cryptographic hash functions used in many cryptocurrencies rely on linear algebra (elliptical curve) calculations which are largely similar to those underlying 3D graphics, the mining software offloads the majority of the work. on the GPU.
Artificial Intelligence – super massive parallelization
Any machine learning system based on neural networks requires significant computing resources to function, and even greater resources to train. Even a relatively simple neural network will likely have hundreds or thousands of neurons per layer, and multiple layers. If every neuron in a layer is to be connected to every neuron in the previous layer and have weights and biases for all of those connections, the number of computations required rapidly skyrockets to an absurdly large number, as does the memory required for it. keep this information. . Just trying to run trained AI can bring a powerful machine to its knees – and the number of threads GPUs can run simultaneously is insignificant. If we then factor in the additional computations needed to train an AI and optimize those weights and biases using techniques such as backpropagation, the computational task is often an order of magnitude or more.
This reality is why specialized AI hardware is increasingly important. New AI-driven processor classes provide this super massive parallelization with in-processor memory, allowing models to be trained and executed much more efficiently with larger data sets. In our last article, we drew attention to examples, including GraphCore’s “Intelligence Processing Units” (IPUs). Using this example again (although other specialized AI hardware is available), compared to the few dozen threads a workstation processor can run, GraphCore’s latest generation Colossus MK2 IPU can handle nine. a thousand threads in parallel – and with multiple IPUs in each machine, there is simply no comparison to what can be achieved with general-purpose hardware.
While high-end GPUs can have a very large number of cores, specialized AI hardware wins out again, this time due to memory bandwidth. A graphics card may provide separate memory for the GPU, but the architecture is a combination of standard memory modules connected through the logic board to the GPU. This limits the speed at which information can be introduced and received from the large number of compute cores on the GPU. For 3D graphics or crypto mining this is usually not a big constraint, but for running or training AI models it often is. Having silicon memory stores linked to each core as part of the processor architecture avoids this bottleneck, increases performance, and allows for more efficient scaling if multiple specialized processors are linked. in one machine.
Even with all of these advantages in specialized AI hardware, avoiding wasted compute cycles by reducing the load via parsimony techniques (i.e. getting rid of redundant computations where values are zero) does a huge job. difference. As is so often the case, a combination of high performance hardware paired with well tuned software is the best approach.
Integrity of integration
With artificial intelligence well above the peak of the technology hype curve and actively deploying under an ever-widening range of circumstances, running and training the best machine learning models is becoming an important differentiator for many. companies. Competitive pressure for the best and most “smart” machines will only increase.
The enormous potential of these technological platforms can be totally eroded by poor deployments, poor integration and the age-old challenge of poor quality data (garbage in, garbage out still applies…). Just as when new enterprise resource planning (ERP) deployments were all the rage in the early 2000s, there were significant opportunities for systems integrators, so will AI. Most organizations are unlikely to have significant internal expertise in the design, deployment and integration of these new AI platforms – purchasing expertise is the way to go.
Many contractual challenges associated with systems integration agreements will be familiar: requirement design, project schedules and consequences of delays, milestone payment triggers, acceptance testing and deemed acceptance. The key to success will be clarity of the objectives and results to be achieved, as well as the plan to achieve them. Complicating matters is the extent to which AI systems can ‘work’ in terms of being able to deliver a result, but be suboptimal in terms of accuracy or performance if not performed correctly. structured, properly formed and tuned to avoid duplication of effort. These issues take on new importance in the context of investment spending on third-party hardware and related software, and the increased legal responsibilities that may be attached to operators of AI systems as regulatory requirements increase. We’ve seen the EU’s proposed AI regulation before and know that the compliance burden will be significant, with fines for non-compliance even going above the GDPR fine thresholds.
We will discuss the implications of this exciting time in the material at European Technology Summit in our panel ‘Hardware Renaissance’.