Reading view

There are new articles available, click to refresh the page.

Intel moves to spin out foundry business, inks AI chip deal with AWS

17 September 2024 at 01:56

Intel has announced a key customer win and changes to its foundry business as the beleaguered chipmaker looks to execute a turnaround. Intel is taking steps to transition its chip foundry division, Intel Foundry, to an independent subsidiary, Intel CEO Patrick Gelsinger said in a blog post. Intel Foundry’s leadership isn’t changing, and the subsidiary […]

Challengers Are Coming for Nvidia’s Crown

IEEE Spectrum Recent Content full text

Matthew S. Smith

16 September 2024 at 16:00

It’s hard to overstate Nvidia’s AI dominance. Founded in 1993, Nvidia first made its mark in the then-new field of graphics processing units (GPUs) for personal computers. But it’s the company’s AI chips, not PC graphics hardware, that vaulted Nvidia into the ranks of the world’s most valuable companies. It turns out that Nvidia’s GPUs are also excellent for AI. As a result, its stock is more than 15 times as valuable as it was at the start of 2020; revenues have ballooned from roughly US $12 billion in its 2019 fiscal year to $60 billion in 2024; and the AI powerhouse’s leading-edge chips are as scarce, and desired, as water in a desert.

Access to GPUs “has become so much of a worry for AI researchers, that the researchers think about this on a day-to-day basis. Because otherwise they can’t have fun, even if they have the best model,” says Jennifer Prendki, head of AI data at Google DeepMind. Prendki is less reliant on Nvidia than most, as Google has its own homespun AI infrastructure. But other tech giants, like Microsoft and Amazon, are among Nvidia’s biggest customers, and continue to buy its GPUs as quickly as they’re produced. Exactly who gets them and why is the subject of an antitrust investigation by the U.S. Department of Justice, according to press reports.

Nvidia’s AI dominance, like the explosion of machine learning itself, is a recent turn of events. But it’s rooted in the company’s decades-long effort to establish GPUs as general computing hardware that’s useful for many tasks besides rendering graphics. That effort spans not only the company’s GPU architecture, which evolved to include “tensor cores” adept at accelerating AI workloads, but also, critically, its software platform, called Cuda, to help developers take advantage of the hardware.

“They made sure every computer-science major coming out of university is trained up and knows how to program CUDA,” says Matt Kimball, principal data-center analyst at Moor Insights & Strategy. “They provide the tooling and the training, and they spend a lot of money on research.”

Released in 2006, CUDA helps developers use an Nvidia GPU’s many cores. That’s proved essential for accelerating highly parallelized compute tasks, including modern generative AI. Nvidia’s success in building the CUDA ecosystem makes its hardware the path of least resistance for AI development. Nvidia chips might be in short supply, but the only thing more difficult to find than AI hardware is experienced AI developers—and many are familiar with CUDA.

That gives Nvidia a deep, broad moat with which to defend its business, but that doesn’t mean it lacks competitors ready to storm the castle, and their tactics vary widely. While decades-old companies like Advanced Micro Devices (AMD) and Intel are looking to use their own GPUs to rival Nvidia, upstarts like Cerebras and SambaNova have developed radical chip architectures that drastically improve the efficiency of generative AI training and inference. These are the competitors most likely to challenge Nvidia.

Nvidia’s Armory

An illustration of a bar chart. While Nvidia has several types of GPUs deployed, the big guns found in data centers are the H100 and H200. As soon as the end of 2024, they will be joined by the B200, which nearly quadruples the H100’s performance on a per-GPU basis.Sources: Nvidia, MLPerf inferencing v4.1 results for Llama2-70B

AMD: The other GPU maker

Pro: AMD GPUs are convincing Nvidia alternatives

Con: Software ecosystem can’t rival Nvidia’s CUDA

AMD has battled Nvidia in the graphics-chip arena for nearly two decades. It’s been, at times, a lopsided fight. When it comes to graphics, AMD’s GPUs have rarely beaten Nvidia’s in sales or mindshare. Still, AMD’s hardware has its strengths. The company’s broad GPU portfolio extends from integrated graphics for laptops to AI-focused data-center GPUs with over 150 billion transistors. The company was also an early supporter and adopter of high-bandwidth memory (HBM), a form of memory that’s now essential to the world’s most advanced GPUs.

“If you look at the hardware…it stacks up favorably” to Nvidia, says Kimball, referring to AMD’s Instinct MI325X, a competitor of Nvidia’s H100. “AMD did a fantastic job laying that chip out.”

The MI325X, slated to launch by the end of the year, has over 150 billion transistors and 288 gigabytes of high-bandwidth memory, though real-world results remain to be seen. The MI325X’s predecessor, the MI300X, earned praise from Microsoft, which deploys AMD hardware, including the MI300X, to handle some ChatGPT 3.5 and 4 services. Meta and Dell have also deployed the MI300X, and Meta used the chips in parts of the development of its latest large language model, Llama 3.1.

There’s still a hurdle for AMD to leap: software. AMD offers an open-source platform, ROCm, to help developers program its GPUs, but it’s less popular than CUDA. AMD is aware of this weakness, and in July 2024, it agreed to buy Europe’s largest private AI lab, Silo AI, which has experience doing large-scale AI training using ROCm and AMD hardware. AMD has also plans to purchase ZT Systems, a company with expertise in data-center infrastructure, to help the company serve customers looking to deploy its hardware at scale. Building a rival to CUDA is no small feat, but AMD is certainly trying.

Intel: Software success

Pro: Gaudi 3 AI accelerator shows strong performance

Con: Next big AI chip doesn’t arrive until late 2025

Intel’s challenge is the opposite of AMD’s.

While Intel lacks an exact match for Nvidia’s CUDA and AMD’s ROCm, it launched an open-source unified programming platform, OneAPI, in 2018. Unlike CUDA and ROCm, OneAPI spans multiple categories of hardware, including CPUs, GPUs, and FPGAs. So it can help developers accelerate AI tasks (and many others) on any Intel hardware. “Intel’s got a heck of a software ecosystem it can turn on pretty easily,” says Kimball.

Hardware, on the other hand, is a weakness, at least when compared to Nvidia and AMD. Intel’s Gaudi AI accelerators, the fruit of Intel’s 2019 acquisition of AI hardware startup Habana Labs, have made headway, and the latest, Gaudi 3, offers performance that’s competitive with Nvidia’s H100.

However, it’s unclear precisely what Intel’s next hardware release will look like, which has caused some concern. “Gaudi 3 is very capable,” says Patrick Moorhead, founder of Moor Insights & Strategy. But as of July 2024 “there is no Gaudi 4,” he says.

Intel instead plans to pivot to an ambitious chip, code-named Falcon Shores, with a tile-based modular architecture that combines Intel x86 CPU cores and Xe GPU cores; the latter are part of Intel’s recent push into graphics hardware. Intel has yet to reveal details about Falcon Shores’ architecture and performance, though, and it’s not slated for release until late 2025.

Cerebras: Bigger is better

Pro: Wafer-scale chips offer strong performance and memory per chip

Con: Applications are niche due to size and cost

Make no mistake: AMD and Intel are by far the most credible challengers to Nvidia. They share a history of designing successful chips and building programming platforms to go alongside them. But among the smaller, less proven players, one stands out: Cerebras.

The company, which specializes in AI for supercomputers, made waves in 2019 with the Wafer Scale Engine, a gigantic, wafer-size piece of silicon packed with 1.2 trillion transistors. The most recent iteration, Wafer Scale Engine 3, ups the ante to 4 trillion transistors. For comparison, Nvidia’s largest and newest GPU, the B200, has “just” 208 billion transistors. The computer built around this wafer-scale monster, Cerebras’s CS-3, is at the heart of the Condor Galaxy 3, which will be an 8-exaflop AI supercomputer made up of 64 CS-3s. G42, an Abu Dhabi–based conglomerate that hopes to train tomorrow’s leading-edge large language models, will own the system.

“It’s a little more niche, not as general purpose,” says Stacy Rasgon, senior analyst at Bernstein Research. “Not everyone is going to buy [these computers]. But they’ve got customers, like the [United States] Department of Defense, and [the Condor Galaxy 3] supercomputer.”

Cerebras’s WSC-3 isn’t going to challenge Nvidia, AMD, or Intel hardware in most situations; it’s too large, too costly, and too specialized. But it could give Cerebras a unique edge in supercomputers, because no other company designs chips on the scale of the WSE.

SambaNova: A transformer for transformers

Pro: Configurable architecture helps developers squeeze efficiency from AI models

Con: Hardware still has to prove relevance to mass market

SambaNova, founded in 2017, is another chip-design company tackling AI training with an unconventional chip architecture. Its flagship, the SN40L, has what the company calls a “reconfigurable dataflow architecture” composed of tiles of memory and compute resources. The links between these tiles can be altered on the fly to facilitate the quick movement of data for large neural networks.

Prendki believes such customizable silicon could prove useful for training large language models, because AI developers can optimize the hardware for different models. No other company offers that capability, she says.

SambaNova is also scoring wins with SambaFlow, the software stack used alongside the SN40L. “At the infrastructure level, SambaNova is doing a good job with the platform,” says Moorhead. SambaFlow can analyze machine learning models and help developers reconfigure the SN40L to accelerate the model’s performance. SambaNova still has a lot to prove, but its customers include SoftBank and Analog Devices.

Groq: Form for function

Pro: Excellent AI inference performance

Con: Application currently limited to inference

Yet another company with a unique spin on AI hardware is Groq. Groq’s approach is focused on tightly pairing memory and compute resources to accelerate the speed with which a large language model can respond to prompts.

“Their architecture is very memory based. The memory is tightly coupled to the processor. You need more nodes, but the price per token and the performance is nuts,” says Moorhead. The “token” is the basic unit of data a model processes; in an LLM, it’s typically a word or portion of a word. Groq’s performance is even more impressive, he says, given that its chip, called the Language Processing Unit Inference Engine, is made using GlobalFoundries’ 14-nanometer technology, several generations behind the TSMC technology that makes the Nvidia H100.

In July, Groq posted a demonstration of its chip’s inference speed, which can exceed 1,250 tokens per second running Meta’s Llama 3 8-billion parameter LLM. That beats even SambaNova’s demo, which can exceed 1,000 tokens per second.

Qualcomm: Power is everything

Pro: Broad range of chips with AI capabilities

Con: Lacks large, leading-edge chips for AI training

Qualcomm, well known for the Snapdragon system-on-a-chip that powers popular Android phones like the Samsung Galaxy S24 Ultra and OnePlus 12, is a giant that can stand toe-to-toe with AMD, Intel, and Nvidia.

But unlike those peers, the company is focusing its AI strategy more on AI inference and energy efficiency for specific tasks. Anton Lokhmotov, a founding member of the AI benchmarking organization MLCommons and CEO of Krai, a company that specializes in AI optimization, says Qualcomm has significantly improved the inference of the Qualcomm Cloud AI 100 servers in an important benchmark test. The servers’ performance increased from 180 to 240 samples-per-watt in ResNet-50, an image-classification benchmark, using “essentially the same server hardware,” Lokhmotov notes.

Efficient AI inference is also a boon on devices that need to handle AI tasks locally without reaching out to the cloud, says Lokhmotov. Case in point: Microsoft’s Copilot Plus PCs. Microsoft and Qualcomm partnered with laptop makers, including Dell, HP, and Lenovo, and the first Copilot Plus laptops with Qualcomm chips hit store shelves in July. Qualcomm also has a strong presence in smartphones and tablets, where its Snapdragon chips power devices from Samsung, OnePlus, and Motorola, among others.

Qualcomm is an important player in AI for driver assist and self-driving platforms, too. In early 2024, Hyundai’s Mobius division announced a partnership to use the Snapdragon Ride platform, a rival to Nvidia’s Drive platform, for advanced driver-assist systems.

The Hyperscalers: Custom brains for brawn

Pros: Vertical integration focuses design

Cons: Hyperscalers may prioritize their own needs and uses first

Hyperscalers—cloud-computing giants that deploy hardware at vast scales—are synonymous with Big Tech. Amazon, Apple, Google, Meta, and Microsoft all want to deploy AI hardware as quickly as possible, both for their own use and for their cloud-computing customers. To accelerate that, they’re all designing chips in-house.

Google began investing in AI processors much earlier than its competitors: The search giant’s Tensor Processing Units, first announced in 2015, now power most of its AI infrastructure. The sixth generation of TPUs, Trillium, was announced in May and is part of Google’s AI Hypercomputer, a cloud-based service for companies looking to handle AI tasks.

Prendki says Google’s TPUs give the company an advantage in pursuing AI opportunities. “I’m lucky that I don’t have to think too hard about where I get my chips,” she says. Access to TPUs doesn’t entirely eliminate the supply crunch, though, as different Google divisions still need to share resources.

And Google is no longer alone. Amazon has two in-house chips, Trainium and Inferentia, for training and inference, respectively. Microsoft has Maia, Meta has MTIA, and Apple is supposedly developing silicon to handle AI tasks in its cloud infrastructure.

None of these compete directly with Nvidia, as hyperscalers don’t sell hardware to customers. But they do sell access to their hardware through cloud services, like Google’s AI Hypercomputer, Amazon’s AWS, and Microsoft’s Azure. In many cases, hyperscalers offer services running on their own in-house hardware as an option right alongside services running on hardware from Nvidia, AMD, and Intel; Microsoft is thought to be Nvidia’s largest customer.

An illustration of a knight holding a crown surrounded by arrows. David Plunkert

Chinese chips: An opaque future

Another category of competitor is born not of technical needs but of geopolitical realities. The United States has imposed restrictions on the export of AI hardware that prevents chipmakers from selling their latest, most-capable chips to Chinese companies. In response, Chinese companies are designing homegrown AI chips.

Huawei is a leader. The company’s Ascend 910B AI accelerator, designed as an alternative to Nvidia’s H100, is in production at Semiconductor Manufacturing International Corp., a Shanghai-based foundry partially owned by the Chinese government. However, yield issues at SMIC have reportedly constrained supply. Huawei is also selling an “AI-in-a-box” solution, meant for Chinese companies looking to build their own AI infrastructure on-premises.

To get around the U.S. export control rules, Chinese industry could turn to alternative technologies. For example, Chinese researchers have made headway in photonic chips that use light, instead of electric charge, to perform calculations. “The advantage of a beam of light is you can cross one [beam with] another,” says Prendki. “So it reduces constraints you’d normally have on a silicon chip, where you can’t cross paths. You can make the circuits more complex, for less money.” It’s still very early days for photonic chips, but Chinese investment in the area could accelerate its development.

Room for more

It’s clear that Nvidia has no shortage of competitors. It’s equally clear that none of them will challenge—never mind defeat—Nvidia in the next few years. Everyone interviewed for this article agreed that Nvidia’s dominance is currently unparalleled, but that doesn’t mean it will crowd out competitors forever.

“Listen, the market wants choice,” says Moorhead. “I can’t imagine AMD not having 10 or 20 percent market share, Intel the same, if we go to 2026. Typically, the market likes three, and there we have three reasonable competitors.” Kimball says the hyperscalers, meanwhile, could challenge Nvidia as they transition more AI services to in-house hardware.

And then there’s the wild cards. Cerebras, SambaNova, and Groq are the leaders in a very long list of startups looking to nibble away at Nvidia with novel solutions. They’re joined by dozens of others, including d-Matrix, Untether, Tenstorrent, and Etched, all pinning their hopes on new chip architectures optimized for generative AI. It’s likely many of these startups will falter, but perhaps the next Nvidia will emerge from the survivors.

AM Mediaworks

EdTech Digest

Victor Rivero

16 July 2024 at 13:30

AM Mediaworks is a PR firm specializing in edtech and the future of learning for over a decade partnering with leading corporations, global nonprofits and notable disruptors. Over 10 years, AM Mediaworks’ has driven integrated and strategic PR campaigns for hundreds of companies at every stage and size.

Founder Alyssa Miller’s communications programs drive brand engagement, thought leadership and visibility, and support funding, M&A, partnerships, events and global expansion.

The firm has a proven track record of securing headline news and op-eds that set industry trends and are transforming education at every stage including LearnPlatform’s Evidence as a Service, Instructure’s dominance as the #1 life long learning platform and Podium Education collaborating with Intel and charity:water to reach 1 million undergrads in 2023-2024 through a for-credit real-work program. For these reasons and more, AM Mediaworks is The EdTech Leadership Awards 2024 Winner for “Best PR Firm Working in Edtech” as part of The EdTech Awards from EdTech Digest. Learn more.

The post AM Mediaworks appeared first on EdTech Digest.

Intel’s Latest FinFET Is Key to Its Foundry Plans

IEEE Spectrum Recent Content full text

Samuel K. Moore

25 June 2024 at 19:34

Last week at VLSI Symposium, Intel detailed the manufacturing process that will form the foundation of its foundry service for high-performance data center customers. For the same power consumption, the Intel 3 process results in an 18 percent performance gain over the previous process, Intel 4. On the company’s roadmap, Intel 3 is the last to use the fin field-effect transistor (FinFET) structure, which the company pioneered in 2011. But it also includes Intel’s first use of a technology that is essential to its plans long after the FinFET is no longer cutting edge. What’s more, the technology is crucial to the company’s plans to become a foundry and make high-performance chips for other companies.

Called dipole work-function metal, it allows a chip designer to select transistors of several different threshold voltages. Threshold voltage is the level at which a device switches on or off. With the Intel 3 process, a single chip can include devices having any of four tightly-controlled threshold voltages. That’s important because different functions operate best with different threshold voltages. Cache memory, for example, typically demands devices with a high threshold voltage to prevent current leakage that wastes power. While other circuits might need the fastest switching devices, with the lowest threshold voltage.

Threshold voltage is set by the transistor’s gate stack, the layer of metal and insulation that controls the flow of current through the transistor. Historically, “the thickness of the metals determines the threshold voltage,” explains Walid Hafez, vice president of foundry technology development at Intel. “The thicker that work function metal is, the lower the threshold voltage is.” But this dependence on transistor geometry comes with some drawbacks as devices and circuits scale down.

Small deviations in the manufacturing process can alter the volume of the metal in the gate, leading to a somewhat broad range of threshold voltages. And that’s where the Intel 3 process exemplifies the change from Intel making chips only for itself to running as a foundry.

“The way an external foundry operates is very different” from an integrated device manufacturer like Intel was until recently, says Hafez. Foundry customers “need different things… One of those things they need is very tight variation of threshold voltage.”

Intel is different; even without the tight threshold voltage tolerances, it can sell all its parts by steering the best performing ones toward its datacenter business and the lower-performing ones in other market segments.

“A lot of external customers don’t do that,” he says. If a chip doesn’t meet their constraints, they may have to chuck it. “So for Intel 3 to be successful in the foundry space, it has to have those very tight variations.”

Dipoles ever after

Dipole work function materials guarantee the needed control over threshold voltage without worrying about how much room you have in the gate. It’s a proprietary mix of metals and other materials that, despite being only angstroms thick, has a powerful effect on a transistor’s silicon channel.

black and white image of two lines sticking up with lines going around them Intel’s use of dipole work-function materials means the gate surrounding each fin in a FinFET is thinner.Intel

Like the old, thick metal gate, the new mix of materials electrostatically alters the silicon’s band structure to shift the threshold voltage. But it does so by inducing a dipole—a separation of charge—in the thin insulation between it and the silicon.

Because foundry customers were demanding tight control of Intel, it’s likely that competitors TSMC and Samsung already use dipoles in their latest FinFET processes. What exactly such structures are made of is a trade secret, but lanthanum is a component in earlier research, and it was the key ingredient in other research presented by the Belgium-based microelectronics research center, Imec. That research was concerned with how best to build the material around stacks of horizontal silicon ribbons instead of one or two vertical fins.

In these devices, called nanosheets or gate all-around transistors, there are mere nanometers between each ribbon of silicon, so dipoles are a necessity. Samsung has already introduced a nanosheet process, and Intel’s, called 20A, is scheduled for later this year. Introducing dipole work function at Intel 3 helps get 20A and its successor 18A into a more mature state, says Hafez.

Flavors of Intel 3

Dipole work-function was not the only technology behind the 18 percent boost Intel 3 delivers over its predecessor. Among them are more perfectly formed fins, more sharply defined contacts to the transistor, and lower resistance and capacitance in the interconnects. (Hafez details all that here.)

Intel is using the process to build its Xeon 6 CPUs. And the company plans to offer customers three variations on the technology, including one, 3-PT, with 9-micrometer through-silicon-vias for use in 3D stacking. “We expect Intel 3-PT to be the backbone of our foundry processes for some time to come,” says Hafez.

Nvidia Conquers Latest AI Tests

IEEE Spectrum Recent Content full text

Samuel K. Moore

12 June 2024 at 17:00

For years, Nvidia has dominated many machine learning benchmarks, and now there are two more notches in its belt.

MLPerf, the AI benchmarking suite sometimes called “the Olympics of machine learning,” has released a new set of training tests to help make more and better apples-to-apples comparisons between competing computer systems. One of MLPerf’s new tests concerns fine-tuning of large language models, a process that takes an existing trained model and trains it a bit more with specialized knowledge to make it fit for a particular purpose. The other is for graph neural networks, a type of machine learning behind some literature databases, fraud detection in financial systems, and social networks.

Even with the additions and the participation of computers using Google’s and Intel’s AI accelerators, systems powered by Nvidia’s Hopper architecture dominated the results once again. One system that included 11,616 Nvidia H100 GPUs—the largest collection yet—topped each of the nine benchmarks, setting records in five of them (including the two new benchmarks).

“If you just throw hardware at the problem, it’s not a given that you’re going to improve.” —Dave Salvator, Nvidia

The 11,616-H100 system is “the biggest we’ve ever done,” says Dave Salvator, director of accelerated computing products at Nvidia. It smashed through the GPT-3 training trial in less than 3.5 minutes. A 512-GPU system, for comparison, took about 51 minutes. (Note that the GPT-3 task is not a full training, which could take weeks and cost millions of dollars. Instead, the computers train on a representative portion of the data, at an agreed-upon point well before completion.)

Compared to Nvidia’s largest entrant on GPT-3 last year, a 3,584 H100 computer, the 3.5-minute result represents a 3.2-fold improvement. You might expect that just from the difference in the size of these systems, but in AI computing that isn’t always the case, explains Salvator. “If you just throw hardware at the problem, it’s not a given that you’re going to improve,” he says.

“We are getting essentially linear scaling,” says Salvator. By that he means that twice as many GPUs lead to a halved training time. “[That] represents a great achievement from our engineering teams,” he adds.

Competitors are also getting closer to linear scaling. This round Intel deployed a system using 1,024 GPUs that performed the GPT-3 task in 67 minutes versus a computer one-fourth the size that took 224 minutes six months ago. Google’s largest GPT-3 entry used 12-times the number of TPU v5p accelerators as its smallest entry and performed its task nine times as fast.

Linear scaling is going to be particularly important for upcoming “AI factories” housing 100,000 GPUs or more, Salvator says. He says to expect one such data center to come online this year, and another, using Nvidia’s next architecture, Blackwell, to startup in 2025.

Nvidia’s streak continues

Nvidia continued to boost training times despite using the same architecture, Hopper, as it did in last year’s training results. That’s all down to software improvements, says Salvator. “Typically, we’ll get a 2-2.5x [boost] from software after a new architecture is released,” he says.

For GPT-3 training, Nvidia logged a 27 percent improvement from the June 2023 MLPerf benchmarks. Salvator says there were several software changes behind the boost. For example, Nvidia engineers tuned up Hopper’s use of less accurate, 8-bit floating point operations by trimming unnecessary conversions between 8-bit and 16-bit numbers and better targeting of which layers of a neural network could use the lower precision number format. They also found a more intelligent way to adjust the power budget of each chip’s compute engines, and sped communication among GPUs in a way that Salvator likened to “buttering your toast while it’s still in the toaster.”

Additionally, the company implemented a scheme called flash attention. Invented in the Stanford University laboratory of Samba Nova founder Chris Re, flash attention is an algorithm that speeds transformer networks by minimizing writes to memory. When it first showed up in MLPerf benchmarks, flash attention shaved as much as 10 percent from training times. (Intel, too, used a version of flash attention but not for GPT-3. It instead used the algorithm for one of the new benchmarks, fine-tuning.)

Using other software and network tricks, Nvidia delivered an 80 percent speedup in the text-to-image test, Stable Diffusion, versus its submission in November 2023.

New benchmarks

MLPerf adds new benchmarks and upgrades old ones to stay relevant to what’s happening in the AI industry. This year saw the addition of fine-tuning and graph neural networks.

Fine tuning takes an already trained LLM and specializes it for use in a particular field. Nvidia, for example took a trained 43-billion-parameter model and trained it on the GPU-maker’s design files and documentation to create ChipNeMo, an AI intended to boost the productivity of its chip designers. At the time, the company’s chief technology officer Bill Dally said that training an LLM was like giving it a liberal arts education, and fine tuning was like sending it to graduate school.

The MLPerf benchmark takes a pretrained Llama-2-70B model and asks the system to fine tune it using a dataset of government documents with the goal of generating more accurate document summaries.

There are several ways to do fine-tuning. MLPerf chose one called low-rank adaptation (LoRA). The method winds up training only a small portion of the LLM’s parameters leading to a 3-fold lower burden on hardware and reduced use of memory and storage versus other methods, according to the organization.

The other new benchmark involved a graph neural network (GNN). These are for problems that can be represented by a very large set of interconnected nodes, such as a social network or a recommender system. Compared to other AI tasks, GNNs require a lot of communication between nodes in a computer.

The benchmark trained a GNN on a database that shows relationships about academic authors, papers, and institutes—a graph with 547 million nodes and 5.8 billion edges. The neural network was then trained to predict the right label for each node in the graph.

Future fights

Training rounds in 2025 may see head-to-head contests comparing new accelerators from AMD, Intel, and Nvidia. AMD’s MI300 series was launched about six months ago, and a memory-boosted upgrade the MI325x is planned for the end of 2024, with the next generation MI350 slated for 2025. Intel says its Gaudi 3, generally available to computer makers later this year, will appear in MLPerf’s upcoming inferencing benchmarks. Intel executives have said the new chip has the capacity to beat H100 at training LLMs. But the victory may be short-lived, as Nvidia has unveiled a new architecture, Blackwell, which is planned for late this year.