Normal view

There are new articles available, click to refresh the page.
Before yesterdayMain stream

Amazon's Secret Weapon in Chip Design Is Amazon



Big-name makers of processors, especially those geared toward cloud-based AI, such as AMD and Nvidia, have been showing signs of wanting to own more of the business of computing, purchasing makers of software, interconnects, and servers. The hope is that control of the “full stack” will give them an edge in designing what their customers want.

Amazon Web Services (AWS) got there ahead of most of the competition, when they purchased chip designer Annapurna Labs in 2015 and proceeded to design CPUs, AI accelerators, servers, and data centers as a vertically-integrated operation. Ali Saidi, the technical lead for the Graviton series of CPUs, and Rami Sinno, director of engineering at Annapurna Labs, explained the advantage of vertically-integrated design and Amazon-scale and showed IEEE Spectrum around the company’s hardware testing labs in Austin, Tex., on 27 August.

Saidi and Sinno on:

What brought you to Amazon Web Services, Rami?

an older man in an eggplant colored polo shirt posing for a portrait Rami SinnoAWS

Rami Sinno: Amazon is my first vertically integrated company. And that was on purpose. I was working at Arm, and I was looking for the next adventure, looking at where the industry is heading and what I want my legacy to be. I looked at two things:

One is vertically integrated companies, because this is where most of the innovation is—the interesting stuff is happening when you control the full hardware and software stack and deliver directly to customers.

And the second thing is, I realized that machine learning, AI in general, is going to be very, very big. I didn’t know exactly which direction it was going to take, but I knew that there is something that is going to be generational, and I wanted to be part of that. I already had that experience prior when I was part of the group that was building the chips that go into the Blackberries; that was a fundamental shift in the industry. That feeling was incredible, to be part of something so big, so fundamental. And I thought, “Okay, I have another chance to be part of something fundamental.”

Does working at a vertically-integrated company require a different kind of chip design engineer?

Sinno: Absolutely. When I hire people, the interview process is going after people that have that mindset. Let me give you a specific example: Say I need a signal integrity engineer. (Signal integrity makes sure a signal going from point A to point B, wherever it is in the system, makes it there correctly.) Typically, you hire signal integrity engineers that have a lot of experience in analysis for signal integrity, that understand layout impacts, can do measurements in the lab. Well, this is not sufficient for our group, because we want our signal integrity engineers also to be coders. We want them to be able to take a workload or a test that will run at the system level and be able to modify it or build a new one from scratch in order to look at the signal integrity impact at the system level under workload. This is where being trained to be flexible, to think outside of the little box has paid off huge dividends in the way that we do development and the way we serve our customers.

“By the time that we get the silicon back, the software’s done” —Ali Saidi, Annapurna Labs

At the end of the day, our responsibility is to deliver complete servers in the data center directly for our customers. And if you think from that perspective, you’ll be able to optimize and innovate across the full stack. A design engineer or a test engineer should be able to look at the full picture because that’s his or her job, deliver the complete server to the data center and look where best to do optimization. It might not be at the transistor level or at the substrate level or at the board level. It could be something completely different. It could be purely software. And having that knowledge, having that visibility, will allow the engineers to be significantly more productive and delivery to the customer significantly faster. We’re not going to bang our head against the wall to optimize the transistor where three lines of code downstream will solve these problems, right?

Do you feel like people are trained in that way these days?

Sinno: We’ve had very good luck with recent college grads. Recent college grads, especially the past couple of years, have been absolutely phenomenal. I’m very, very pleased with the way that the education system is graduating the engineers and the computer scientists that are interested in the type of jobs that we have for them.

The other place that we have been super successful in finding the right people is at startups. They know what it takes, because at a startup, by definition, you have to do so many different things. People who’ve done startups before completely understand the culture and the mindset that we have at Amazon.

[back to top]

What brought you to AWS, Ali?

a man with a beard wearing a polka dotted button-up shirt posing for a portrait Ali SaidiAWS

Ali Saidi: I’ve been here about seven and a half years. When I joined AWS, I joined a secret project at the time. I was told: “We’re going to build some Arm servers. Tell no one.”

We started with Graviton 1. Graviton 1 was really the vehicle for us to prove that we could offer the same experience in AWS with a different architecture.

The cloud gave us an ability for a customer to try it in a very low-cost, low barrier of entry way and say, “Does it work for my workload?” So Graviton 1 was really just the vehicle demonstrate that we could do this, and to start signaling to the world that we want software around ARM servers to grow and that they’re going to be more relevant.

Graviton 2—announced in 2019—was kind of our first… what we think is a market-leading device that’s targeting general-purpose workloads, web servers, and those types of things.

It’s done very well. We have people running databases, web servers, key-value stores, lots of applications... When customers adopt Graviton, they bring one workload, and they see the benefits of bringing that one workload. And then the next question they ask is, “Well, I want to bring some more workloads. What should I bring?” There were some where it wasn’t powerful enough effectively, particularly around things like media encoding, taking videos and encoding them or re-encoding them or encoding them to multiple streams. It’s a very math-heavy operation and required more [single-instruction multiple data] bandwidth. We need cores that could do more math.

We also wanted to enable the [high-performance computing] market. So we have an instance type called HPC 7G where we’ve got customers like Formula One. They do computational fluid dynamics of how this car is going to disturb the air and how that affects following cars. It’s really just expanding the portfolio of applications. We did the same thing when we went to Graviton 4, which has 96 cores versus Graviton 3’s 64.

[back to top]

How do you know what to improve from one generation to the next?

Saidi: Far and wide, most customers find great success when they adopt Graviton. Occasionally, they see performance that isn’t the same level as their other migrations. They might say “I moved these three apps, and I got 20 percent higher performance; that’s great. But I moved this app over here, and I didn’t get any performance improvement. Why?” It’s really great to see the 20 percent. But for me, in the kind of weird way I am, the 0 percent is actually more interesting, because it gives us something to go and explore with them.

Most of our customers are very open to those kinds of engagements. So we can understand what their application is and build some kind of proxy for it. Or if it’s an internal workload, then we could just use the original software. And then we can use that to kind of close the loop and work on what the next generation of Graviton will have and how we’re going to enable better performance there.

What’s different about designing chips at AWS?

Saidi: In chip design, there are many different competing optimization points. You have all of these conflicting requirements, you have cost, you have scheduling, you’ve got power consumption, you’ve got size, what DRAM technologies are available and when you’re going to intersect them… It ends up being this fun, multifaceted optimization problem to figure out what’s the best thing that you can build in a timeframe. And you need to get it right.

One thing that we’ve done very well is taken our initial silicon to production.

How?

Saidi: This might sound weird, but I’ve seen other places where the software and the hardware people effectively don’t talk. The hardware and software people in Annapurna and AWS work together from day one. The software people are writing the software that will ultimately be the production software and firmware while the hardware is being developed in cooperation with the hardware engineers. By working together, we’re closing that iteration loop. When you are carrying the piece of hardware over to the software engineer’s desk your iteration loop is years and years. Here, we are iterating constantly. We’re running virtual machines in our emulators before we have the silicon ready. We are taking an emulation of [a complete system] and running most of the software we’re going to run.

So by the time that we get to the silicon back [from the foundry], the software’s done. And we’ve seen most of the software work at this point. So we have very high confidence that it’s going to work.

The other piece of it, I think, is just being absolutely laser-focused on what we are going to deliver. You get a lot of ideas, but your design resources are approximately fixed. No matter how many ideas I put in the bucket, I’m not going to be able to hire that many more people, and my budget’s probably fixed. So every idea I throw in the bucket is going to use some resources. And if that feature isn’t really important to the success of the project, I’m risking the rest of the project. And I think that’s a mistake that people frequently make.

Are those decisions easier in a vertically integrated situation?

Saidi: Certainly. We know we’re going to build a motherboard and a server and put it in a rack, and we know what that looks like… So we know the features we need. We’re not trying to build a superset product that could allow us to go into multiple markets. We’re laser-focused into one.

What else is unique about the AWS chip design environment?

Saidi: One thing that’s very interesting for AWS is that we’re the cloud and we’re also developing these chips in the cloud. We were the first company to really push on running [electronic design automation (EDA)] in the cloud. We changed the model from “I’ve got 80 servers and this is what I use for EDA” to “Today, I have 80 servers. If I want, tomorrow I can have 300. The next day, I can have 1,000.”

We can compress some of the time by varying the resources that we use. At the beginning of the project, we don’t need as many resources. We can turn a lot of stuff off and not pay for it effectively. As we get to the end of the project, now we need many more resources. And instead of saying, “Well, I can’t iterate this fast, because I’ve got this one machine, and it’s busy.” I can change that and instead say, “Well, I don’t want one machine; I’ll have 10 machines today.”

Instead of my iteration cycle being two days for a big design like this, instead of being even one day, with these 10 machines I can bring it down to three or four hours. That’s huge.

How important is Amazon.com as a customer?

Saidi: They have a wealth of workloads, and we obviously are the same company, so we have access to some of those workloads in ways that with third parties, we don’t. But we also have very close relationships with other external customers.

So last Prime Day, we said that 2,600 Amazon.com services were running on Graviton processors. This Prime Day, that number more than doubled to 5,800 services running on Graviton. And the retail side of Amazon used over 250,000 Graviton CPUs in support of the retail website and the services around that for Prime Day.

[back to top]

The AI accelerator team is colocated with the labs that test everything from chips through racks of servers. Why?

Sinno: So Annapurna Labs has multiple labs in multiple locations as well. This location here is in Austin… is one of the smaller labs. But what’s so interesting about the lab here in Austin is that you have all of the hardware and many software development engineers for machine learning servers and for Trainium and Inferentia [AWS’s AI chips] effectively co-located on this floor. For hardware developers, engineers, having the labs co-located on the same floor has been very, very effective. It speeds execution and iteration for delivery to the customers. This lab is set up to be self-sufficient with anything that we need to do, at the chip level, at the server level, at the board level. Because again, as I convey to our teams, our job is not the chip; our job is not the board; our job is the full server to the customer.

How does vertical integration help you design and test chips for data-center-scale deployment?

Sinno: It’s relatively easy to create a bar-raising server. Something that’s very high-performance, very low-power. If we create 10 of them, 100 of them, maybe 1,000 of them, it’s easy. You can cherry pick this, you can fix this, you can fix that. But the scale that the AWS is at is significantly higher. We need to train models that require 100,000 of these chips. 100,000! And for training, it’s not run in five minutes. It’s run in hours or days or weeks even. Those 100,000 chips have to be up for the duration. Everything that we do here is to get to that point.

We start from a “what are all the things that can go wrong?” mindset. And we implement all the things that we know. But when you were talking about cloud scale, there are always things that you have not thought of that come up. These are the 0.001-percent type issues.

In this case, we do the debug first in the fleet. And in certain cases, we have to do debugs in the lab to find the root cause. And if we can fix it immediately, we fix it immediately. Being vertically integrated, in many cases we can do a software fix for it. We use our agility to rush a fix while at the same time making sure that the next generation has it already figured out from the get go.

[back to top]

How India Is Starting a Chip Industry From Scratch



In March, India announced a major investment to establish a semiconductor-manufacturing industry. With US $15 billion in investments from companies, state governments, and the central government, India now has plans for several chip-packaging plants and the country’s first modern chip fab as part of a larger effort to grow its electronics industry.

But turning India into a chipmaking powerhouse will also require a substantial investment in R&D. And so the Indian government turned to IEEE Fellow and retired Georgia Tech professor Rao Tummala, a pioneer of some of the chip-packaging technologies that have become critical to modern computers. Tummala spoke with IEEE Spectrum during the IEEE Electronic Component Technology Conference in Denver, Colo., in May.

Rao Tummala


Rao Tummala is a pioneer of semiconductor packaging and a longtime research leader at Georgia Tech.

What are you helping the government of India to develop?

Rao Tummala: I’m helping to develop the R&D side of India’s semiconductor efforts. We picked 12 strategic research areas. If you explore research in those areas, you can make almost any electronic system. For each of those 12 areas, there’ll be one primary center of excellence. And that’ll be typically at an IIT (Indian Institute of Technology) campus. Then there’ll be satellite centers attached to those throughout India. So when we’re done with it, in about five years, I expect to see probably almost all the institutions involved.

Why did you decide to spend your retirement doing this?

Tummala: It’s my giving back. India gave me the best education possible at the right time.

I’ve been going to India and wanting to help for 20 years. But I wasn’t successful until the current government decided they’re going to make manufacturing and semiconductors important for the country. They asked themselves: What would be the need for semiconductors, in 10 years, 20 years, 30 years? And they quickly concluded that if you have 1.4 billion people, each consuming, say, $5,000 worth of electronics each year, it requires billions and billions of dollars’ worth of semiconductors.

“It’s my giving back. India gave me the best education possible at the right time.” —Rao Tummala, advisor to the government of India

What advantages does India have in the global semiconductor space?

Tummala: India has the best educational system in the world for the masses. It produces the very best students in science and engineering at the undergrad level and lots of them. India is already a success in design and software. All the major U.S. tech companies have facilities in India. And they go to India for two reasons. It has a lot of people with a lot of knowledge in the design and software areas, and those people are cheaper [to employ].

What are India’s weaknesses, and is the government response adequate to overcoming them?

Tummala: India is clearly behind in semiconductor manufacturing. It’s behind in knowledge and behind in infrastructure. Government doesn’t solve these problems. All that the government does is set the policies and give the money. This has given companies incentives to come to India, and therefore the semiconductor industry is beginning to flourish.

Will India ever have leading-edge chip fabs?

Tummala: Absolutely. Not only will it have leading-edge fabs, but in about 20 years, it will have the most comprehensive system-level approach of any country, including the United States. In about 10 years, the size of the electronics industry in India will probably have grown about 10 times.

This article appears in the August 2024 print issue as “5 Questions for Rao Tummala.”

Your Gateway to a Vibrant Career in the Expanding Semiconductor Industry



This sponsored article is brought to you by Purdue University.

The CHIPS America Act was a response to a worsening shortfall in engineers equipped to meet the growing demand for advanced electronic devices. That need persists. In its 2023 policy report, Chipping Away: Assessing and Addressing the Labor Market Gap Facing the U.S. Semiconductor Industry, the Semiconductor Industry Association forecast a demand for 69,000 microelectronic and semiconductor engineers between 2023 and 2030—including 28,900 new positions created by industry expansion and 40,100 openings to replace engineers who retire or leave the field.

This number does not include another 34,500 computer scientists (13,200 new jobs, 21,300 replacements), nor does it count jobs in other industries that require advanced or custom-designed semiconductors for controls, automation, communication, product design, and the emerging systems-of-systems technology ecosystem.

Purdue University is taking charge, leading semiconductor technology and workforce development in the U.S. As early as Spring 2022, Purdue University became the first top engineering school to offer an online Master’s Degree in Microelectronics and Semiconductors.

U.S. News & World Report has ranked the university’s graduate engineering program among America’s 10 best every year since 2012 (and among the top 4 since 2022)

“The degree was developed as part of Purdue’s overall semiconductor degrees program,” says Purdue Prof. Vijay Raghunathan, one of the architects of the semiconductor program. “It was what I would describe as the nation’s most ambitious semiconductor workforce development effort.”

A person dressed in a dark suit with a white shirt and red tie poses for a professional portrait against a dark background. Prof. Vijay Raghunathan, one of the architects of the online Master’s Degree in Microelectronics and Semiconductors at Purdue.Purdue University

Purdue built and announced its bold high-technology online program while the U.S. Congress was still debating the $53 billion “Creating Helpful Incentives to Produce Semiconductors for America Act” (CHIPS America Act), which would be passed in July 2022 and signed into law in August.

Today, the online Master’s in Microelectronics and Semiconductors is well underway. Students learn leading-edge equipment and software and prepare to meet the challenges they will face in a rejuvenated, and critical, U.S. semiconductor industry.

Is the drive for semiconductor education succeeding?

“I think we have conclusively established that the answer is a resounding ‘Yes,’” says Raghunathan. Like understanding big data, or being able to program, “the ability to understand how semiconductors and semiconductor-based systems work, even at a rudimentary level, is something that everybody should know. Virtually any product you design or make is going to have chips inside it. You need to understand how they work, what the significance is, and what the risks are.”

Earning a Master’s in Microelectronics and Semiconductors

Students pursuing the Master’s Degree in Microelectronics and Semiconductors will take courses in circuit design, devices and engineering, systems design, and supply chain management offered by several schools in the university, such as Purdue’s Mitch Daniels School of Business, the Purdue Polytechnic Institute, the Elmore Family School of Electrical and Computer Engineering, and the School of Materials Engineering, among others.

Professionals can also take one-credit-hour courses, which are intended to help students build “breadth at the edges,” a notion that grew out of feedback from employers: Tomorrow’s engineering leaders will need broad knowledge to connect with other specialties in the increasingly interdisciplinary world of artificial intelligence, robotics, and the Internet of Things.

“This was something that we embarked on as an experiment 5 or 6 years ago,” says Raghunathan of the one-credit courses. “I think, in hindsight, that it’s turned out spectacularly.”

A researcher wearing a white lab coat, hairnet, and gloves works with scientific equipment, with a computer monitor displaying a detailed scientific pattern. A researcher adjusts imaging equipment in a lab in Birck Nanotechnology Center, home to Purdue’s advanced research and development on semiconductors and other technology at the atomic scale.Rebecca Robiños/Purdue University

The Semiconductor Engineering Education Leader

Purdue, which opened its first classes in 1874, is today an acknowledged leader in engineering education. U.S. News & World Report has ranked the university’s graduate engineering program among America’s 10 best every year since 2012 (and among the top 4 since 2022). And Purdue’s online graduate engineering program has ranked in the country’s top three since the publication started evaluating online grad programs in 2020. (Purdue has offered distance Master’s degrees since the 1980s. Back then, of course, course lectures were videotaped and mailed to students. With the growth of the web, “distance” became “online,” and the program has swelled.)

Thus, Microelectronics and Semiconductors Master’s Degree candidates can study online or on-campus. Both tracks take the same courses from the same instructors and earn the same degree. There are no footnotes, asterisks, or parentheses on the diploma to denote online or in-person study.

“If you look at our program, it will become clear why Purdue is increasingly considered America’s leading semiconductors university” —Prof. Vijay Raghunathan, Purdue University

Students take classes at their own pace, using an integrated suite of proven online-learning applications for attending lectures, submitting homework, taking tests, and communicating with faculty and one another. Texts may be purchased or downloaded from the school library. And there is frequent use of modeling and analytical tools like Matlab. In addition, Purdue is also the home of national the national design-computing resources nanoHUB.org (with hundreds of modeling, simulation, teaching, and software-development tools) and its offspring, chipshub.org (specializing in tools for chip design and fabrication).

From R&D to Workforce and Economic Development

“If you look at our program, it will become clear why Purdue is increasingly considered America’s leading semiconductors university, because this is such a strategic priority for the entire university, from our President all the way down,” Prof. Raghunathan sums up. “We have a task force that reports directly to the President, a task force focused only on semiconductors and microelectronics. On all aspects—R&D, the innovation pipeline, workforce development, economic development to bring companies to the state. We’re all in as far as chips are concerned.”

Scaling Compute to Satiate AI



Fifty years ago, DRAM inventor and IEEE Medal of Honor recipient Robert Dennard created what essentially became the semiconductor industry’s path to perpetually increasing transistor density and chip performance. That path became known as Dennard scaling, and it helped codify Gordon Moore’s postulate about device dimensions shrinking by half every 18 to 24 months. For decades it compelled engineers to push the physical limits of semiconductor devices.

But in the mid-2000s, when Dennard scaling began running out of juice, chipmakers had to turn to exotic solutions like extreme ultraviolet (EUV) lithography systems to try to keep Moore’s Law on pace. On a visit to GlobalFoundries in Malta, N.Y., in 2017 to see the company install its first EUV system, senior editor Samuel K. Moore asked one expert what the fab would need to achieve even smaller device dimensions. “We’d probably have to build a particle accelerator under the parking lot,” the man joked. The idea seemed so fantastic that it stuck with Moore.

So when Tokyo-based tech journalist John Boyd recently pitched a story about an effort to harness a linear accelerator as an EUV light source, Moore was excited. Boyd’s visit to the High Energy Accelerator Research Organization, known as KEK, in Tsukuba, Japan, became the basis for “Is the Future of Moore’s Law in a Particle Accelerator?” As he reports, KEK’s system generates light by “boosting electrons to relativistic speeds and then deviating their motion in a particular way.”

So far, KEK researchers have managed to blast a 17-megaelectron-volt electron beam in bursts of 20-micrometer infrared light, a ways away from the current industry standard of 13.5 nanometers. But the KEK team is optimistic about their technology’s prospects.

While the industry’s ability to affordably make smaller devices has certainly slowed, Moore believes that scaling has a few tricks up its sleeve yet. In addition to brighter light sources like the one KEK is working on, future complementary field-effect transistors (CFETs) will build two transistors in the space of one.

“I believe Wong and Liu want young, technically minded people to understand the importance of keeping semiconductor advances going and to make them want to be part of that effort,” Moore says.

In the shorter term, Moore says stacking chips is the most effective way to keep increasing the amount of logic and memory you can throw at a problem.

“There are always going to be functions in a CPU or GPU that don’t scale as well as core processor logic. Increasingly, it doesn’t make sense to try to keep building all these parts using the core logic’s bleeding-edge chip processes,” Moore says. “It makes more sense to build each part with its best, most economical process, and put them back together as a stack, or at least in the same package.”

To meet the demands of the booming AI sector, makers of GPUs will need to stack up. When former Taiwan Semiconductor Manufacturing Co. chairman Mark Liu and TSMC chief scientist H.-S. Philip Wong wanted to get their message out about the future of CMOS, they approached Moore. The result is “The Path to a 1-Trillion-Transistor GPU.” In addition to Wong’s corporate role, he’s also an academic. One of the worries he’s repeatedly expressed to Moore is that AI and software generally are pulling talent away from semiconductor engineering.

“I believe Wong and Liu want young, technically minded people to understand the importance of keeping semiconductor advances going and to make them want to be part of that effort,” Moore says. “They want to show that semiconductor engineering has a career-long future despite much talk of the death of Moore’s Law.”

❌
❌