Normal view

There are new articles available, click to refresh the page.
Before yesterdayMain stream

Challengers Are Coming for Nvidia’s Crown



It’s hard to overstate Nvidia’s AI dominance. Founded in 1993, Nvidia first made its mark in the then-new field of graphics processing units (GPUs) for personal computers. But it’s the company’s AI chips, not PC graphics hardware, that vaulted Nvidia into the ranks of the world’s most valuable companies. It turns out that Nvidia’s GPUs are also excellent for AI. As a result, its stock is more than 15 times as valuable as it was at the start of 2020; revenues have ballooned from roughly US $12 billion in its 2019 fiscal year to $60 billion in 2024; and the AI powerhouse’s leading-edge chips are as scarce, and desired, as water in a desert.

Access to GPUs “has become so much of a worry for AI researchers, that the researchers think about this on a day-to-day basis. Because otherwise they can’t have fun, even if they have the best model,” says Jennifer Prendki, head of AI data at Google DeepMind. Prendki is less reliant on Nvidia than most, as Google has its own homespun AI infrastructure. But other tech giants, like Microsoft and Amazon, are among Nvidia’s biggest customers, and continue to buy its GPUs as quickly as they’re produced. Exactly who gets them and why is the subject of an antitrust investigation by the U.S. Department of Justice, according to press reports.

Nvidia’s AI dominance, like the explosion of machine learning itself, is a recent turn of events. But it’s rooted in the company’s decades-long effort to establish GPUs as general computing hardware that’s useful for many tasks besides rendering graphics. That effort spans not only the company’s GPU architecture, which evolved to include “tensor cores” adept at accelerating AI workloads, but also, critically, its software platform, called Cuda, to help developers take advantage of the hardware.

“They made sure every computer-science major coming out of university is trained up and knows how to program CUDA,” says Matt Kimball, principal data-center analyst at Moor Insights & Strategy. “They provide the tooling and the training, and they spend a lot of money on research.”

Released in 2006, CUDA helps developers use an Nvidia GPU’s many cores. That’s proved essential for accelerating highly parallelized compute tasks, including modern generative AI. Nvidia’s success in building the CUDA ecosystem makes its hardware the path of least resistance for AI development. Nvidia chips might be in short supply, but the only thing more difficult to find than AI hardware is experienced AI developers—and many are familiar with CUDA.

That gives Nvidia a deep, broad moat with which to defend its business, but that doesn’t mean it lacks competitors ready to storm the castle, and their tactics vary widely. While decades-old companies like Advanced Micro Devices (AMD) and Intel are looking to use their own GPUs to rival Nvidia, upstarts like Cerebras and SambaNova have developed radical chip architectures that drastically improve the efficiency of generative AI training and inference. These are the competitors most likely to challenge Nvidia.

Nvidia’s Armory

An illustration of a bar chart. While Nvidia has several types of GPUs deployed, the big guns found in data centers are the H100 and H200. As soon as the end of 2024, they will be joined by the B200, which nearly quadruples the H100’s performance on a per-GPU basis.Sources: Nvidia, MLPerf inferencing v4.1 results for Llama2-70B

AMD: The other GPU maker

Pro: AMD GPUs are convincing Nvidia alternatives

Con: Software ecosystem can’t rival Nvidia’s CUDA

AMD has battled Nvidia in the graphics-chip arena for nearly two decades. It’s been, at times, a lopsided fight. When it comes to graphics, AMD’s GPUs have rarely beaten Nvidia’s in sales or mindshare. Still, AMD’s hardware has its strengths. The company’s broad GPU portfolio extends from integrated graphics for laptops to AI-focused data-center GPUs with over 150 billion transistors. The company was also an early supporter and adopter of high-bandwidth memory (HBM), a form of memory that’s now essential to the world’s most advanced GPUs.

“If you look at the hardware…it stacks up favorably” to Nvidia, says Kimball, referring to AMD’s Instinct MI325X, a competitor of Nvidia’s H100. “AMD did a fantastic job laying that chip out.”

The MI325X, slated to launch by the end of the year, has over 150 billion transistors and 288 gigabytes of high-bandwidth memory, though real-world results remain to be seen. The MI325X’s predecessor, the MI300X, earned praise from Microsoft, which deploys AMD hardware, including the MI300X, to handle some ChatGPT 3.5 and 4 services. Meta and Dell have also deployed the MI300X, and Meta used the chips in parts of the development of its latest large language model, Llama 3.1.

There’s still a hurdle for AMD to leap: software. AMD offers an open-source platform, ROCm, to help developers program its GPUs, but it’s less popular than CUDA. AMD is aware of this weakness, and in July 2024, it agreed to buy Europe’s largest private AI lab, Silo AI, which has experience doing large-scale AI training using ROCm and AMD hardware. AMD has also plans to purchase ZT Systems, a company with expertise in data-center infrastructure, to help the company serve customers looking to deploy its hardware at scale. Building a rival to CUDA is no small feat, but AMD is certainly trying.

Intel: Software success

Pro: Gaudi 3 AI accelerator shows strong performance

Con: Next big AI chip doesn’t arrive until late 2025

Intel’s challenge is the opposite of AMD’s.

While Intel lacks an exact match for Nvidia’s CUDA and AMD’s ROCm, it launched an open-source unified programming platform, OneAPI, in 2018. Unlike CUDA and ROCm, OneAPI spans multiple categories of hardware, including CPUs, GPUs, and FPGAs. So it can help developers accelerate AI tasks (and many others) on any Intel hardware. “Intel’s got a heck of a software ecosystem it can turn on pretty easily,” says Kimball.

Hardware, on the other hand, is a weakness, at least when compared to Nvidia and AMD. Intel’s Gaudi AI accelerators, the fruit of Intel’s 2019 acquisition of AI hardware startup Habana Labs, have made headway, and the latest, Gaudi 3, offers performance that’s competitive with Nvidia’s H100.

However, it’s unclear precisely what Intel’s next hardware release will look like, which has caused some concern. “Gaudi 3 is very capable,” says Patrick Moorhead, founder of Moor Insights & Strategy. But as of July 2024 “there is no Gaudi 4,” he says.

Intel instead plans to pivot to an ambitious chip, code-named Falcon Shores, with a tile-based modular architecture that combines Intel x86 CPU cores and Xe GPU cores; the latter are part of Intel’s recent push into graphics hardware. Intel has yet to reveal details about Falcon Shores’ architecture and performance, though, and it’s not slated for release until late 2025.

Cerebras: Bigger is better

Pro: Wafer-scale chips offer strong performance and memory per chip

Con: Applications are niche due to size and cost

Make no mistake: AMD and Intel are by far the most credible challengers to Nvidia. They share a history of designing successful chips and building programming platforms to go alongside them. But among the smaller, less proven players, one stands out: Cerebras.

The company, which specializes in AI for supercomputers, made waves in 2019 with the Wafer Scale Engine, a gigantic, wafer-size piece of silicon packed with 1.2 trillion transistors. The most recent iteration, Wafer Scale Engine 3, ups the ante to 4 trillion transistors. For comparison, Nvidia’s largest and newest GPU, the B200, has “just” 208 billion transistors. The computer built around this wafer-scale monster, Cerebras’s CS-3, is at the heart of the Condor Galaxy 3, which will be an 8-exaflop AI supercomputer made up of 64 CS-3s. G42, an Abu Dhabi–based conglomerate that hopes to train tomorrow’s leading-edge large language models, will own the system.

“It’s a little more niche, not as general purpose,” says Stacy Rasgon, senior analyst at Bernstein Research. “Not everyone is going to buy [these computers]. But they’ve got customers, like the [United States] Department of Defense, and [the Condor Galaxy 3] supercomputer.”

Cerebras’s WSC-3 isn’t going to challenge Nvidia, AMD, or Intel hardware in most situations; it’s too large, too costly, and too specialized. But it could give Cerebras a unique edge in supercomputers, because no other company designs chips on the scale of the WSE.

SambaNova: A transformer for transformers

Pro: Configurable architecture helps developers squeeze efficiency from AI models

Con: Hardware still has to prove relevance to mass market

SambaNova, founded in 2017, is another chip-design company tackling AI training with an unconventional chip architecture. Its flagship, the SN40L, has what the company calls a “reconfigurable dataflow architecture” composed of tiles of memory and compute resources. The links between these tiles can be altered on the fly to facilitate the quick movement of data for large neural networks.

Prendki believes such customizable silicon could prove useful for training large language models, because AI developers can optimize the hardware for different models. No other company offers that capability, she says.

SambaNova is also scoring wins with SambaFlow, the software stack used alongside the SN40L. “At the infrastructure level, SambaNova is doing a good job with the platform,” says Moorhead. SambaFlow can analyze machine learning models and help developers reconfigure the SN40L to accelerate the model’s performance. SambaNova still has a lot to prove, but its customers include SoftBank and Analog Devices.

Groq: Form for function

Pro: Excellent AI inference performance

Con: Application currently limited to inference

Yet another company with a unique spin on AI hardware is Groq. Groq’s approach is focused on tightly pairing memory and compute resources to accelerate the speed with which a large language model can respond to prompts.

“Their architecture is very memory based. The memory is tightly coupled to the processor. You need more nodes, but the price per token and the performance is nuts,” says Moorhead. The “token” is the basic unit of data a model processes; in an LLM, it’s typically a word or portion of a word. Groq’s performance is even more impressive, he says, given that its chip, called the Language Processing Unit Inference Engine, is made using GlobalFoundries’ 14-nanometer technology, several generations behind the TSMC technology that makes the Nvidia H100.

In July, Groq posted a demonstration of its chip’s inference speed, which can exceed 1,250 tokens per second running Meta’s Llama 3 8-billion parameter LLM. That beats even SambaNova’s demo, which can exceed 1,000 tokens per second.

Qualcomm: Power is everything

Pro: Broad range of chips with AI capabilities

Con: Lacks large, leading-edge chips for AI training

Qualcomm, well known for the Snapdragon system-on-a-chip that powers popular Android phones like the Samsung Galaxy S24 Ultra and OnePlus 12, is a giant that can stand toe-to-toe with AMD, Intel, and Nvidia.

But unlike those peers, the company is focusing its AI strategy more on AI inference and energy efficiency for specific tasks. Anton Lokhmotov, a founding member of the AI benchmarking organization MLCommons and CEO of Krai, a company that specializes in AI optimization, says Qualcomm has significantly improved the inference of the Qualcomm Cloud AI 100 servers in an important benchmark test. The servers’ performance increased from 180 to 240 samples-per-watt in ResNet-50, an image-classification benchmark, using “essentially the same server hardware,” Lokhmotov notes.

Efficient AI inference is also a boon on devices that need to handle AI tasks locally without reaching out to the cloud, says Lokhmotov. Case in point: Microsoft’s Copilot Plus PCs. Microsoft and Qualcomm partnered with laptop makers, including Dell, HP, and Lenovo, and the first Copilot Plus laptops with Qualcomm chips hit store shelves in July. Qualcomm also has a strong presence in smartphones and tablets, where its Snapdragon chips power devices from Samsung, OnePlus, and Motorola, among others.

Qualcomm is an important player in AI for driver assist and self-driving platforms, too. In early 2024, Hyundai’s Mobius division announced a partnership to use the Snapdragon Ride platform, a rival to Nvidia’s Drive platform, for advanced driver-assist systems.

The Hyperscalers: Custom brains for brawn

Pros: Vertical integration focuses design

Cons: Hyperscalers may prioritize their own needs and uses first

Hyperscalers—cloud-computing giants that deploy hardware at vast scales—are synonymous with Big Tech. Amazon, Apple, Google, Meta, and Microsoft all want to deploy AI hardware as quickly as possible, both for their own use and for their cloud-computing customers. To accelerate that, they’re all designing chips in-house.

Google began investing in AI processors much earlier than its competitors: The search giant’s Tensor Processing Units, first announced in 2015, now power most of its AI infrastructure. The sixth generation of TPUs, Trillium, was announced in May and is part of Google’s AI Hypercomputer, a cloud-based service for companies looking to handle AI tasks.

Prendki says Google’s TPUs give the company an advantage in pursuing AI opportunities. “I’m lucky that I don’t have to think too hard about where I get my chips,” she says. Access to TPUs doesn’t entirely eliminate the supply crunch, though, as different Google divisions still need to share resources.

And Google is no longer alone. Amazon has two in-house chips, Trainium and Inferentia, for training and inference, respectively. Microsoft has Maia, Meta has MTIA, and Apple is supposedly developing silicon to handle AI tasks in its cloud infrastructure.

None of these compete directly with Nvidia, as hyperscalers don’t sell hardware to customers. But they do sell access to their hardware through cloud services, like Google’s AI Hypercomputer, Amazon’s AWS, and Microsoft’s Azure. In many cases, hyperscalers offer services running on their own in-house hardware as an option right alongside services running on hardware from Nvidia, AMD, and Intel; Microsoft is thought to be Nvidia’s largest customer.

An illustration of a knight holding a crown surrounded by arrows.  David Plunkert

Chinese chips: An opaque future

Another category of competitor is born not of technical needs but of geopolitical realities. The United States has imposed restrictions on the export of AI hardware that prevents chipmakers from selling their latest, most-capable chips to Chinese companies. In response, Chinese companies are designing homegrown AI chips.

Huawei is a leader. The company’s Ascend 910B AI accelerator, designed as an alternative to Nvidia’s H100, is in production at Semiconductor Manufacturing International Corp., a Shanghai-based foundry partially owned by the Chinese government. However, yield issues at SMIC have reportedly constrained supply. Huawei is also selling an “AI-in-a-box” solution, meant for Chinese companies looking to build their own AI infrastructure on-premises.

To get around the U.S. export control rules, Chinese industry could turn to alternative technologies. For example, Chinese researchers have made headway in photonic chips that use light, instead of electric charge, to perform calculations. “The advantage of a beam of light is you can cross one [beam with] another,” says Prendki. “So it reduces constraints you’d normally have on a silicon chip, where you can’t cross paths. You can make the circuits more complex, for less money.” It’s still very early days for photonic chips, but Chinese investment in the area could accelerate its development.

Room for more

It’s clear that Nvidia has no shortage of competitors. It’s equally clear that none of them will challenge—never mind defeat—Nvidia in the next few years. Everyone interviewed for this article agreed that Nvidia’s dominance is currently unparalleled, but that doesn’t mean it will crowd out competitors forever.

“Listen, the market wants choice,” says Moorhead. “I can’t imagine AMD not having 10 or 20 percent market share, Intel the same, if we go to 2026. Typically, the market likes three, and there we have three reasonable competitors.” Kimball says the hyperscalers, meanwhile, could challenge Nvidia as they transition more AI services to in-house hardware.

And then there’s the wild cards. Cerebras, SambaNova, and Groq are the leaders in a very long list of startups looking to nibble away at Nvidia with novel solutions. They’re joined by dozens of others, including d-Matrix, Untether, Tenstorrent, and Etched, all pinning their hopes on new chip architectures optimized for generative AI. It’s likely many of these startups will falter, but perhaps the next Nvidia will emerge from the survivors.

Antioxidants Slow Vision Loss in Late-Stage Dry AMD

16 July 2024 at 23:37
This shows an eye.Researchers found that daily antioxidant supplements slow the progression of late-stage dry age-related macular degeneration (AMD). The supplements help preserve central vision by slowing the expansion of geographic atrophy regions in the retina. This finding supports the use of AREDS2 supplements for people with late dry AMD.

How Large Language Models Are Changing My Job



Generative artificial intelligence, and large language models in particular, are starting to change how countless technical and creative professionals do their jobs. Programmers, for example, are getting code segments by prompting large language models. And graphic arts software packages such as Adobe Illustrator already have tools built in that let designers conjure illustrations, images, or patterns by describing them.

But such conveniences barely hint at the massive, sweeping changes to employment predicted by some analysts. And already, in ways large and small, striking and subtle, the tech world’s notables are grappling with changes, both real and envisioned, wrought by the onset of generative AI. To get a better idea of how some of them view the future of generative AI, IEEE Spectrum asked three luminaries—an academic leader, a regulator, and a semiconductor industry executive—about how generative AI has begun affecting their work. The three, Andrea Goldsmith, Juraj Čorba, and Samuel Naffziger, agreed to speak with Spectrum at the 2024 IEEE VIC Summit & Honors Ceremony Gala, held in May in Boston.

Click to read more thoughts from:

  1. Andrea Goldsmith, dean of engineering at Princeton University.
  2. Juraj Čorba, senior expert on digital regulation and governance, Slovak Ministry of Investments, Regional Development
  3. Samuel Naffziger, senior vice president and a corporate fellow at Advanced Micro Devices

Andrea Goldsmith

Andrea Goldsmith is dean of engineering at Princeton University.

There must be tremendous pressure now to throw a lot of resources into large language models. How do you deal with that pressure? How do you navigate this transition to this new phase of AI?

A woman with brown shoulder length hair smiles for a portrait in a teal jacket in an outside scene Andrea J. Goldsmith

Andrea Goldsmith: Universities generally are going to be very challenged, especially universities that don’t have the resources of a place like Princeton or MIT or Stanford or the other Ivy League schools. In order to do research on large language models, you need brilliant people, which all universities have. But you also need compute power and you need data. And the compute power is expensive, and the data generally sits in these large companies, not within universities.

So I think universities need to be more creative. We at Princeton have invested a lot of money in the computational resources for our researchers to be able to do—well, not large language models, because you can’t afford it. To do a large language model… look at OpenAI or Google or Meta. They’re spending hundreds of millions of dollars on compute power, if not more. Universities can’t do that.

But we can be more nimble and creative. What can we do with language models, maybe not large language models but with smaller language models, to advance the state of the art in different domains? Maybe it’s vertical domains of using, for example, large language models for better prognosis of disease, or for prediction of cellular channel changes, or in materials science to decide what’s the best path to pursue a particular new material that you want to innovate on. So universities need to figure out how to take the resources that we have to innovate using AI technology.

We also need to think about new models. And the government can also play a role here. The [U.S.] government has this new initiative, NAIRR, or National Artificial Intelligence Research Resource, where they’re going to put up compute power and data and experts for educators to use—researchers and educators.

That could be a game-changer because it’s not just each university investing their own resources or faculty having to write grants, which are never going to pay for the compute power they need. It’s the government pulling together resources and making them available to academic researchers. So it’s an exciting time, where we need to think differently about research—meaning universities need to think differently. Companies need to think differently about how to bring in academic researchers, how to open up their compute resources and their data for us to innovate on.

As a dean, you are in a unique position to see which technical areas are really hot, attracting a lot of funding and attention. But how much ability do you have to steer a department and its researchers into specific areas? Of course, I’m thinking about large language models and generative AI. Is deciding on a new area of emphasis or a new initiative a collaborative process?

Goldsmith: Absolutely. I think any academic leader who thinks that their role is to steer their faculty in a particular direction does not have the right perspective on leadership. I describe academic leadership as really about the success of the faculty and students that you’re leading. And when I did my strategic planning for Princeton Engineering in the fall of 2020, everything was shut down. It was the middle of COVID, but I’m an optimist. So I said, “Okay, this isn’t how I expected to start as dean of engineering at Princeton.” But the opportunity to lead engineering in a great liberal arts university that has aspirations to increase the impact of engineering hasn’t changed. So I met with every single faculty member in the School of Engineering, all 150 of them, one-on-one over Zoom.

And the question I asked was, “What do you aspire to? What should we collectively aspire to?” And I took those 150 responses, and I asked all the leaders and the departments and the centers and the institutes, because there already were some initiatives in robotics and bioengineering and in smart cities. And I said, “I want all of you to come up with your own strategic plans. What do you aspire to in these areas? And then let’s get together and create a strategic plan for the School of Engineering.” So that’s what we did. And everything that we’ve accomplished in the last four years that I’ve been dean came out of those discussions, and what it was the faculty and the faculty leaders in the school aspired to.

So we launched a bioengineering institute last summer. We just launched Princeton Robotics. We’ve launched some things that weren’t in the strategic plan that bubbled up. We launched a center on blockchain technology and its societal implications. We have a quantum initiative. We have an AI initiative using this powerful tool of AI for engineering innovation, not just around large language models, but it’s a tool—how do we use it to advance innovation and engineering? All of these things came from the faculty because, to be a successful academic leader, you have to realize that everything comes from the faculty and the students. You have to harness their enthusiasm, their aspirations, their vision to create a collective vision.

Juraj Čorba

Juraj Čorba is senior expert on digital regulation and governance, Slovak Ministry of Investments, Regional Development, and Information, and Chair of the Working Party on Governance of AI at the Organization for Economic Cooperation and Development.

What are the most important organizations and governing bodies when it comes to policy and governance on artificial intelligence in Europe?

Portrait of a clean-shaven man with brown hair wearing a blue button down shirt. Juraj Čorba

Juraj Čorba: Well, there are many. And it also creates a bit of a confusion around the globe—who are the actors in Europe? So it’s always good to clarify. First of all we have the European Union, which is a supranational organization composed of many member states, including my own Slovakia. And it was the European Union that proposed adoption of a horizontal legislation for AI in 2021. It was the initiative of the European Commission, the E.U. institution, which has a legislative initiative in the E.U. And the E.U. AI Act is now finally being adopted. It was already adopted by the European Parliament.

So this started, you said 2021. That’s before ChatGPT and the whole large language model phenomenon really took hold.

Čorba: That was the case. Well, the expert community already knew that something was being cooked in the labs. But, yes, the whole agenda of large models, including large language models, came up only later on, after 2021. So the European Union tried to reflect that. Basically, the initial proposal to regulate AI was based on a blueprint of so-called product safety, which somehow presupposes a certain intended purpose. In other words, the checks and assessments of products are based more or less on the logic of the mass production of the 20th century, on an industrial scale, right? Like when you have products that you can somehow define easily and all of them have a clearly intended purpose. Whereas with these large models, a new paradigm was arguably opened, where they have a general purpose.

So the whole proposal was then rewritten in negotiations between the Council of Ministers, which is one of the legislative bodies, and the European Parliament. And so what we have today is a combination of this old product-safety approach and some novel aspects of regulation specifically designed for what we call general-purpose artificial intelligence systems or models. So that’s the E.U.

By product safety, you mean, if AI-based software is controlling a machine, you need to have physical safety.

Čorba: Exactly. That’s one of the aspects. So that touches upon the tangible products such as vehicles, toys, medical devices, robotic arms, et cetera. So yes. But from the very beginning, the proposal contained a regulation of what the European Commission called stand-alone systems—in other words, software systems that do not necessarily command physical objects. So it was already there from the very beginning, but all of it was based on the assumption that all software has its easily identifiable intended purpose—which is not the case for general-purpose AI.

Also, large language models and generative AI in general brings in this whole other dimension, of propaganda, false information, deepfakes, and so on, which is different from traditional notions of safety in real-time software.

Čorba: Well, this is exactly the aspect that is handled by another European organization, different from the E.U., and that is the Council of Europe. It’s an international organization established after the Second World War for the protection of human rights, for protection of the rule of law, and protection of democracy. So that’s where the Europeans, but also many other states and countries, started to negotiate a first international treaty on AI. For example, the United States have participated in the negotiations, and also Canada, Japan, Australia, and many other countries. And then these particular aspects, which are related to the protection of integrity of elections, rule-of-law principles, protection of fundamental rights or human rights under international law—all these aspects have been dealt with in the context of these negotiations on the first international treaty, which is to be now adopted by the Committee of Ministers of the Council of Europe on the 16th and 17th of May. So, pretty soon. And then the first international treaty on AI will be submitted for ratifications.

So prompted largely by the activity in large language models, AI regulation and governance now is a hot topic in the United States, in Europe, and in Asia. But of the three regions, I get the sense that Europe is proceeding most aggressively on this topic of regulating and governing artificial intelligence. Do you agree that Europe is taking a more proactive stance in general than the United States and Asia?

Čorba: I’m not so sure. If you look at the Chinese approach and the way they regulate what we call generative AI, it would appear to me that they also take it very seriously. They take a different approach from the regulatory point of view. But it seems to me that, for instance, China is taking a very focused and careful approach. For the United States, I wouldn’t say that the United States is not taking a careful approach because last year you saw many of the executive orders, or even this year, some of the executive orders issued by President Biden. Of course, this was not a legislative measure, this was a presidential order. But it seems to me that the United States is also trying to address the issue very actively. The United States has also initiated the first resolution of the General Assembly at the U.N. on AI, which was passed just recently. So I wouldn’t say that the E.U. is more aggressive in comparison with Asia or North America, but maybe I would say that the E.U. is the most comprehensive. It looks horizontally across different agendas and it uses binding legislation as a tool, which is not always the case around the world. Many countries simply feel that it’s too early to legislate in a binding way, so they opt for soft measures or guidance, collaboration with private companies, et cetera. Those are the differences that I see.

Do you think you perceive a difference in focus among the three regions? Are there certain aspects that are being more aggressively pursued in the United States than in Europe or vice versa?

Čorba: Certainly the E.U. is very focused on the protection of human rights, the full catalog of human rights, but also, of course, on safety and human health. These are the core goals or values to be protected under the E.U. legislation. As for the United States and for China, I would say that the primary focus in those countries—but this is only my personal impression—is on national and economic security.

Samuel Naffziger

Samuel Naffziger is senior vice president and a corporate fellow at Advanced Micro Devices, where he is responsible for technology strategy and product architectures. Naffziger was instrumental in AMD’s embrace and development of chiplets, which are semiconductor dies that are packaged together into high-performance modules.

To what extent is large language model training starting to influence what you and your colleagues do at AMD?

Portrait of a brown haired man in a dark blue shirt. Samuel Naffziger

Samuel Naffziger: Well, there are a couple levels of that. LLMs are impacting the way a lot of us live and work. And we certainly are deploying that very broadly internally for productivity enhancements, for using LLMs to provide starting points for code—simple verbal requests, such as “Give me a Python script to parse this dataset.” And you get a really nice starting point for that code. Saves a ton of time. Writing verification test benches, helping with the physical design layout optimizations. So there’s a lot of productivity aspects.

The other aspect to LLMs is, of course, we are actively involved in designing GPUs [graphics processing units] for LLM training and for LLM inference. And so that’s driving a tremendous amount of workload analysis on the requirements, hardware requirements, and hardware-software codesign, to explore.

So that brings us to your current flagship, the Instinct MI300X, which is actually billed as an AI accelerator. How did the particular demands influence that design? I don’t know when that design started, but the ChatGPT era started about two years ago or so. To what extent did you read the writing on the wall?

Naffziger: So we were just into the MI300—in 2019, we were starting the development. A long time ago. And at that time, our revenue stream from the Zen [an AMD architecture used in a family of processors] renaissance had really just started coming in. So the company was starting to get healthier, but we didn’t have a lot of extra revenue to spend on R&D at the time. So we had to be very prudent with our resources. And we had strategic engagements with the [U.S.] Department of Energy for supercomputer deployments. That was the genesis for our MI line—we were developing it for the supercomputing market. Now, there was a recognition that munching through FP64 COBOL code, or Fortran, isn’t the future, right? [laughs] This machine-learning [ML] thing is really getting some legs.

So we put some of the lower-precision math formats in, like Brain Floating Point 16 at the time, that were going to be important for inference. And the DOE knew that machine learning was going to be an important dimension of supercomputers, not just legacy code. So that’s the way, but we were focused on HPC [high-performance computing]. We had the foresight to understand that ML had real potential. Although certainly no one predicted, I think, the explosion we’ve seen today.

So that’s how it came about. And, just another piece of it: We leveraged our modular chiplet expertise to architect the 300 to support a number of variants from the same silicon components. So the variant targeted to the supercomputer market had CPUs integrated in as chiplets, directly on the silicon module. And then it had six of the GPU chiplets we call XCDs around them. So we had three CPU chiplets and six GPU chiplets. And that provided an amazingly efficient, highly integrated, CPU-plus-GPU design we call MI300A. It’s very compelling for the El Capitan supercomputer that’s being brought up as we speak.

But we also recognize that for the maximum computation for these AI workloads, the CPUs weren’t that beneficial. We wanted more GPUs. For these workloads, it’s all about the math and matrix multiplies. So we were able to just swap out those three CPU chiplets for a couple more XCD GPUs. And so we got eight XCDs in the module, and that’s what we call the MI300X. So we kind of got lucky having the right product at the right time, but there was also a lot of skill involved in that we saw the writing on the wall for where these workloads were going and we provisioned the design to support it.

Earlier you mentioned 3D chiplets. What do you feel is the next natural step in that evolution?

Naffziger: AI has created this bottomless thirst for more compute [power]. And so we are always going to be wanting to cram as many transistors as possible into a module. And the reason that’s beneficial is, these systems deliver AI performance at scale with thousands, tens of thousands, or more, compute devices. They all have to be tightly connected together, with very high bandwidths, and all of that bandwidth requires power, requires very expensive infrastructure. So if a certain level of performance is required—a certain number of petaflops, or exaflops—the strongest lever on the cost and the power consumption is the number of GPUs required to achieve a zettaflop, for instance. And if the GPU is a lot more capable, then all of that system infrastructure collapses down—if you only need half as many GPUs, everything else goes down by half. So there’s a strong economic motivation to achieve very high levels of integration and performance at the device level. And the only way to do that is with chiplets and with 3D stacking. So we’ve already embarked down that path. A lot of tough engineering problems to solve to get there, but that’s going to continue.

And so what’s going to happen? Well, obviously we can add layers, right? We can pack more in. The thermal challenges that come along with that are going to be fun engineering problems that our industry is good at solving.

❌
❌