Normal view

There are new articles available, click to refresh the page.
Before yesterdayMain stream

Why OpenAI’s new model is such a big deal

17 September 2024 at 10:59

This story is from The Algorithm, our weekly newsletter on AI. To get it in your inbox first, sign up here.

Last weekend, I got married at a summer camp, and during the day our guests competed in a series of games inspired by the show Survivor that my now-wife and I orchestrated. When we were planning the games in August, we wanted one station to be a memory challenge, where our friends and family would have to memorize part of a poem and then relay it to their teammates so they could re-create it with a set of wooden tiles. 

I thought OpenAI’s GPT-4o, its leading model at the time, would be perfectly suited to help. I asked it to create a short wedding-themed poem, with the constraint that each letter could only appear a certain number of times so we could make sure teams would be able to reproduce it with the provided set of tiles. GPT-4o failed miserably. The model repeatedly insisted that its poem worked within the constraints, even though it didn’t. It would correctly count the letters only after the fact, while continuing to deliver poems that didn’t fit the prompt. Without the time to meticulously craft the verses by hand, we ditched the poem idea and instead challenged guests to memorize a series of shapes made from colored tiles. (That ended up being a total hit with our friends and family, who also competed in dodgeball, egg tosses, and capture the flag.)    

However, last week OpenAI released a new model called o1 (previously referred to under the code name “Strawberry” and, before that, Q*) that blows GPT-4o out of the water for this type of purpose

Unlike previous models that are well suited for language tasks like writing and editing, OpenAI o1 is focused on multistep “reasoning,” the type of process required for advanced mathematics, coding, or other STEM-based questions. It uses a “chain of thought” technique, according to OpenAI. “It learns to recognize and correct its mistakes. It learns to break down tricky steps into simpler ones. It learns to try a different approach when the current one isn’t working,” the company wrote in a blog post on its website.

OpenAI’s tests point to resounding success. The model ranks in the 89th percentile on questions from the competitive coding organization Codeforces and would be among the top 500 high school students in the USA Math Olympiad, which covers geometry, number theory, and other math topics. The model is also trained to answer PhD-level questions in subjects ranging from astrophysics to organic chemistry. 

In math olympiad questions, the new model is 83.3% accurate, versus 13.4% for GPT-4o. In the PhD-level questions, it averaged 78% accuracy, compared with 69.7% from human experts and 56.1% from GPT-4o. (In light of these accomplishments, it’s unsurprising the new model was pretty good at writing a poem for our nuptial games, though still not perfect; it used more Ts and Ss than instructed to.)

So why does this matter? The bulk of LLM progress until now has been language-driven, resulting in chatbots or voice assistants that can interpret, analyze, and generate words. But in addition to getting lots of facts wrong, such LLMs have failed to demonstrate the types of skills required to solve important problems in fields like drug discovery, materials science, coding, or physics. OpenAI’s o1 is one of the first signs that LLMs might soon become genuinely helpful companions to human researchers in these fields. 

It’s a big deal because it brings “chain-of-thought” reasoning in an AI model to a mass audience, says Matt Welsh, an AI researcher and founder of the LLM startup Fixie. 

“The reasoning abilities are directly in the model, rather than one having to use separate tools to achieve similar results. My expectation is that it will raise the bar for what people expect AI models to be able to do,” Welsh says.

That said, it’s best to take OpenAI’s comparisons to “human-level skills” with a grain of salt, says Yves-Alexandre de Montjoye, an associate professor in math and computer science at Imperial College London. It’s very hard to meaningfully compare how LLMs and people go about tasks such as solving math problems from scratch.

Also, AI researchers say that measuring how well a model like o1 can “reason” is harder than it sounds. If it answers a given question correctly, is that because it successfully reasoned its way to the logical answer? Or was it aided by a sufficient starting point of knowledge built into the model? The model “still falls short when it comes to open-ended reasoning,” Google AI researcher François Chollet wrote on X.

Finally, there’s the price. This reasoning-heavy model doesn’t come cheap. Though access to some versions of the model is included in premium OpenAI subscriptions, developers using o1 through the API will pay three times as much as they pay for GPT-4o—$15 per 1 million input tokens in o1, versus $5 for GPT-4o. The new model also won’t be most users’ first pick for more language-heavy tasks, where GPT-4o continues to be the better option, according to OpenAI’s user surveys. 

What will it unlock? We won’t know until researchers and labs have the access, time, and budget to tinker with the new mode and find its limits. But it’s surely a sign that the race for models that can outreason humans has begun. 

Now read the rest of The Algorithm


Deeper learning

Chatbots can persuade people to stop believing in conspiracy theories

Researchers believe they’ve uncovered a new tool for combating false conspiracy theories: AI chatbots. Researchers from MIT Sloan and Cornell University found that chatting about a conspiracy theory with a large language model (LLM) reduced people’s belief in it by about 20%—even among participants who claimed that their beliefs were important to their identity. 

Why this matters: The findings could represent an important step forward in how we engage with and educate people who espouse such baseless theories, says Yunhao (Jerry) Zhang, a postdoc fellow affiliated with the Psychology of Technology Institute who studies AI’s impacts on society. “They show that with the help of large language models, we can—I wouldn’t say solve it, but we can at least mitigate this problem,” he says. “It points out a way to make society better.” Read more from Rhiannon Williams here.

Bits and bytes

Google’s new tool lets large language models fact-check their responses

Called DataGemma, it uses two methods to help LLMs check their responses against reliable data and cite their sources more transparently to users. (MIT Technology Review)

Meet the radio-obsessed civilian shaping Ukraine’s drone defense 

Since Russia’s invasion, Serhii “Flash” Beskrestnov has become an influential, if sometimes controversial, force—sharing expert advice and intel on the ever-evolving technology that’s taken over the skies. His work may determine the future of Ukraine, and wars far beyond it. (MIT Technology Review)

Tech companies have joined a White House commitment to prevent AI-generated sexual abuse imagery

The pledges, signed by firms like OpenAI, Anthropic, and Microsoft, aim to “curb the creation of image-based sexual abuse.” The companies promise to set limits on what models will generate and to remove nude images from training data sets where possible.  (Fortune)

OpenAI is now valued at $150 billion

The valuation arose out of talks it’s currently engaged in to raise $6.5 billion. Given that OpenAI is becoming increasingly costly to operate, and could lose as much as $5 billion this year, it’s tricky to see how it all adds up. (The Information)

Google’s new tool lets large language models fact-check their responses

12 September 2024 at 15:00

As long as chatbots have been around, they have made things up. Such “hallucinations” are an inherent part of how AI models work. However, they’re a big problem for companies betting big on AI, like Google, because they make the responses it generates unreliable. 

Google is releasing a tool today to address the issue. Called DataGemma, it uses two methods to help large language models fact-check their responses against reliable data and cite their sources more transparently to users. 

The first of the two methods is called Retrieval-Interleaved Generation (RIG), which acts as a sort of fact-checker. If a user prompts the model with a question—like “Has the use of renewable energy sources increased in the world?”—the model will come up with a “first draft” answer. Then RIG identifies what portions of the draft answer could be checked against Google’s Data Commons, a massive repository of data and statistics from reliable sources like the United Nations or the Centers for Disease Control and Prevention. Next, it runs those checks and replaces any incorrect original guesses with correct facts. It also cites its sources to the user.

The second method, which is commonly used in other large language models, is called Retrieval-Augmented Generation (RAG). Consider a prompt like “What progress has Pakistan made against global health goals?” In response, the model examines which data in the Data Commons could help it answer the question, such as information about access to safe drinking water, hepatitis B immunizations, and life expectancies. With those figures in hand, the model then builds its answer on top of the data and cites its sources.

“Our goal here was to use Data Commons to enhance the reasoning of LLMs by grounding them in real-world statistical data that you could source back to where you got it from,” says Prem Ramaswami, head of Data Commons at Google. Doing so, he says, will “create more trustable, reliable AI.”

It is only available to researchers for now, but Ramaswami says access could widen further after more testing. If it works as hoped, it could be a real boon for Google’s plan to embed AI deeper into its search engine.  

However, it comes with a host of caveats. First, the usefulness of the methods is limited by whether the relevant data is in the Data Commons, which is more of a data repository than an encyclopedia. It can tell you the GDP of Iran, but it’s unable to confirm the date of the First Battle of Fallujah or when Taylor Swift released her most recent single. In fact, Google’s researchers found that with about 75% of the test questions, the RIG method was unable to obtain any usable data from the Data Commons. And even if helpful data is indeed housed in the Data Commons, the model doesn’t always formulate the right questions to find it. 

Second, there is the question of accuracy. When testing the RAG method, researchers found that the model gave incorrect answers 6% to 20% of the time. Meanwhile, the RIG method pulled the correct stat from Data Commons only about 58% of the time (though that’s a big improvement over the 5% to 17% accuracy rate of Google’s large language models when they’re not pinging Data Commons). 

Ramaswami says DataGemma’s accuracy will improve as it gets trained on more and more data. The initial version has been trained on only about 700 questions, and fine-tuning the model required his team to manually check each individual fact it generated. To further improve the model, the team plans to increase that data set from hundreds of questions to millions.

A skeptic’s guide to humanoid-robot videos

27 August 2024 at 11:00

This story is from The Algorithm, our weekly newsletter on AI. To get it in your inbox first, sign up here.

We are living in “humanoid summer” right now, if you didn’t know. Or at least it feels that way to Ken Goldberg, a roboticist extraordinaire who leads research in the field at the University of California, Berkeley, and has founded several robotics companies. Money is pouring into humanoid startups, including Figure AI, which raised $675 million earlier this year. Agility Robotics has moved past the pilot phase, launching what it’s calling the first fleet of humanoid robots at a Spanx factory in Georgia.

But what really makes it feel like humanoid summer is the videos. Seemingly every month brings a new moody, futuristic video featuring a humanoid staring intensely (or unnervingly) into the camera, jumping around, or sorting things into piles. Sometimes they even speak

Such videos have heightened currency in robotics right now. As Goldberg says, you can’t just fire up a humanoid robot at home and play around with it the way you can with the latest release of ChatGPT. So for anyone hoping to ride the AI wave or demonstrate their progress—like a startup or an academic seeking lab funding—a good humanoid video is the best marketing tool available. “The imagery, visuals, and videos—they’ve played a big role,” he says.  

But what do they show, exactly? I’ve watched dozens of them this year, and I confess I frequently oscillate between being impressed, scared, and bored. I wanted a more sophisticated eye to help me figure out the right questions to ask. Goldberg was happy to help. 

Watch out for movie magic

First, some basics. The most important thing to know is whether a robot is being tele-operated by a human off screen rather than executing the tasks autonomously. Unfortunately, you can’t tell unless the company discloses it in the video, which they don’t always do.

The second issue is selection bias. How many takes were necessary to get that perfect shot? If a humanoid shows off an impressive ability to sort objects, but it took 200 tries to do the task successfully, that matters. 

Lastly, is the video sped up? Oftentimes that can be totally reasonable if it’s skipping over things that don’t demonstrate much about the robot (“I don’t want to watch the paint dry,” Goldberg says). But if the video is sped up to intentionally hide something or make the robot seem more effective than it is, that’s worth flagging. All of these editing decisions should, ideally, be disclosed by the robotics company or lab. 

Look at the hands

A trope I’ve noticed in humanoid videos is that they show off the robot’s hands by having the fingers curl gently into a fist. A robotic hand with that many usable joints is indeed more complex than the grippers shown on industrial robots, Goldberg says, but those humanoid hands may not be capable of what the videos sometimes suggest. 

For example, humanoids are often shown holding a box while walking. The shot may suggest they’re using their hands the way humans would—placing their fingers underneath the box and lifting up. But often, Goldberg says, the robots are actually just squeezing the box horizontally, with the force coming from the shoulder. It still works, but not the way I’d imagined. Most videos don’t show the hands doing much at all—unsurprising, since hand dexterity requires enormously complicated engineering. 

Evaluate the environment

The latest humanoid videos prove that robots are getting really good at walking and even running. “A robot that could outrun a human is probably right around the corner,” Goldberg says. 

That said, it’s important to look out for what the environment is like for the robot in the video. Is there clutter or dust on the floor? Are there people getting in its way? Are there stairs, pieces of equipment, or slippery surfaces in its path? Probably not. The robots generally show off their (admittedly impressive) feats in pristine environments, not quite like the warehouses, factories, and other places where they will purportedly work alongside humans. 

Watch out for empty boxes

Humanoids are sometimes not as strong as the videos of their physical feats can suggest; I was surprised to hear that many would struggle to hold even a hammer at arm’s length. They can carry more when they hold the weight close to the core, but their carrying capacity varies dramatically as their arms are outstretched. Keep this in mind when you watch a robot move boxes from one belt to the other, since those boxes might be empty. 

There are countless other questions to ask amid the humanoid hype, not the least of which is how much these things might end up costing. But I hope this at least gives you some perspective as the robots become more prevalent in our world.


Now read the rest of The Algorithm

Deeper Learning

We finally have a definition for open-source AI

Open-source AI is everywhere right now. The problem is, no one agrees on what it actually is. Now we may finally have an answer. The Open Source Initiative (OSI), the self-appointed arbiters of what it means to be open source, has released a new definition, which it hopes will help lawmakers develop regulations to protect consumers from AI risks. Among other details, the definition says that an open-source AI system can be used for any purpose without permission, and researchers should be able to inspect its components and study how the system works. The definition requires transparency about what training data was used, but it does not require model makers to release such data in full. 

Why this matters: The previous lack of an open-source standard presented a problem. Although we know that the decisions of OpenAI and Anthropic to keep their models, data sets, and algorithms secret makes their AI closed source, some experts argue that Meta and Google’s freely accessible models, which are open to anyone to inspect and adapt, aren’t truly open source either, because licenses restrict what users can do with the models and because the training data sets aren’t made public. An agreed-upon definition could help. Read more from Rhiannon Williams and me here.

Bits and Bytes

How to fine-tune AI for prosperity

Artificial intelligence could put us on the path to a booming economic future, but getting there will take some serious course corrections. (MIT Technology Review

A new system lets robots sense human touch without artificial skin

Even the most capable robots aren’t great at sensing human touch; you typically need a computer science degree or at least a tablet to interact with them effectively. That may change, thanks to robots that can now sense and interpret touch without being covered in high-tech artificial skin. (MIT Technology Review)

(Op-ed) AI could be a game changer for people with disabilities

It feels unappreciated (and underreported) that AI-based software can truly be an assistive technology, enabling people to do things they otherwise would be excluded from. (MIT Technology Review)

Our basic assumption—that photos capture reality—is about to go up in smoke

Creating realistic and believable fake photos is now trivially easy. We are not prepared for the implications. (The Verge)

We finally have a definition for open-source AI

Open-source AI is everywhere right now. The problem is, no one agrees on what it actually is. Now we may finally have an answer. The Open Source Initiative (OSI), the self-appointed arbiters of what it means to be open source, has released a new definition, which it hopes will help lawmakers develop regulations to protect consumers from AI risks. 

Though OSI has published much about what constitutes open-source technology in other fields, this marks its first attempt to define the term for AI models. It asked a 70-person group of researchers, lawyers, policymakers, and activists, as well as representatives from big tech companies like Meta, Google, and Amazon, to come up with the working definition. 

According to the group, an open-source AI system can be used for any purpose without securing permission, and researchers should be able to inspect its components and study how the system works.

It should also be possible to modify the system for any purpose—including to change its output—and to share it with others to use, with or without modifications, for any purpose. In addition, the standard attempts to define a level of transparency for a given model’s training data, source code, and weights. 

The previous lack of an open-source standard presented a problem. Although we know that the decisions of OpenAI and Anthropic to keep their models, data sets, and algorithms secret makes their AI closed source, some experts argue that Meta and Google’s freely accessible models, which are open to anyone to inspect and adapt, aren’t truly open source either, because of licenses that restrict what users can do with the models and because the training data sets aren’t made public. Meta, Google, and OpenAI have been contacted for their response to the new definition but did not reply before publication.

“Companies have been known to misuse the term when marketing their models,” says Avijit Ghosh, an applied policy researcher at Hugging Face, a platform for building and sharing AI models. Describing models as open source may cause them to be perceived as more trustworthy, even if researchers aren’t able to independently investigate whether they really are open source.

Ayah Bdeir, a senior advisor to Mozilla and a participant in OSI’s process, says certain parts of the open-source definition were relatively easy to agree upon, including the need to reveal model weights (the parameters that help determine how an AI model generates an output). Other parts of the deliberations were more contentious, particularly the question of how public training data should be.

The lack of transparency about where training data comes from has led to innumerable lawsuits against big AI companies, from makers of large language models like OpenAI to music generators like Suno, which do not disclose much about their training sets beyond saying they contain “publicly accessible information.” In response, some advocates say that open-source models should disclose all their training sets, a standard that Bdeir says would be difficult to enforce because of issues like copyright and data ownership. 

Ultimately, the new definition requires that open-source models provide information about the training data to the extent that “a skilled person can recreate a substantially equivalent system using the same or similar data.” It’s not a blanket requirement to share all training data sets, but it also goes further than what many proprietary models or even ostensibly open-source models do today. It’s a compromise.

“Insisting on an ideologically pristine kind of gold standard that actually will not effectively be met by anybody ends up backfiring,” Bdeir says. She adds that OSI is planning some sort of enforcement mechanism, which will flag models that are described as open source but do not meet its definition. It also plans to release a list of AI models that do meet the new definition. Though none are confirmed, the handful of models that Bdeir told MIT Technology Review are expected to land on the list are relatively small names, including Pythia by Eleuther, OLMo by Ai2, and models by the open-source collective LLM360.

A new system lets robots sense human touch without artificial skin

21 August 2024 at 20:00

Even the most capable robots aren’t great at sensing human touch; you typically need a computer science degree or at least a tablet to interact with them effectively. That may change, thanks to robots that can now sense and interpret touch without being covered in high-tech artificial skin. It’s a significant step toward robots that can interact more intuitively with humans. 

To understand the new approach, led by the German Aerospace Center and published today in Science Robotics, consider the two distinct ways our own bodies sense touch. If you hold your left palm facing up and press lightly on your left pinky finger, you may first recognize that touch through the skin of your fingertip. That makes sense–you have thousands of receptors on your hands and fingers alone. Roboticists often try to replicate that blanket of sensors for robots through artificial skins, but these can be expensive and ineffective at withstanding impacts or harsh environments.

But if you press harder, you may notice a second way of sensing the touch: through your knuckles and other joints. That sensation–a feeling of torque, to use the robotics jargon–is exactly what the researchers have re-created in their new system.

Their robotic arm contains six sensors, each of which can register even incredibly small amounts of pressure against any section of the device. After precisely measuring the amount and angle of that force, a series of algorithms can then map where a person is touching the robot and analyze what exactly they’re trying to communicate. For example, a person could draw letters or numbers anywhere on the robotic arm’s surface with a finger, and the robot could interpret directions from those movements. Any part of the robot could also be used as a virtual button.

It means that every square inch of the robot essentially becomes a touch screen, except without the cost, fragility, and wiring of one, says Maged Iskandar, researcher at the German Aerospace Center and lead author of the study. 

“Human-robot interaction, where a human can closely interact with and command a robot, is still not optimal, because the human needs an input device,” Iskandar says. “If you can use the robot itself as a device, the interactions will be more fluid.”

A system like this could provide a cheaper and simpler way of providing not only a sense of touch, but also a new way to communicate with robots. That could be particularly significant for larger robots, like humanoids, which continue to receive billions in venture capital investment. 

Calogero Maria Oddo, a roboticist who leads the Neuro-Robotic Touch Laboratory at the BioRobotics Institute but was not involved in the work, says the development is significant, thanks to the way the research combines sensors, elegant use of mathematics to map out touch, and new AI methods to put it all together. Oddo says commercial adoption could be fairly quick, since the investment required is more in software than hardware, which is far more expensive.

There are caveats, though. For one, the new model cannot handle more than two points of contact at once. In a fairly controlled setting like a factory floor that might not be an issue, but in environments where human-robot interactions are less predictable, it could present limitations. And the sorts of sensors needed to communicate touch to a robot, though commercially available, can also cost tens of thousands of dollars.

Overall, though, Oddo envisions a future where skin-based sensors and joint-based ones are merged to give robots a more comprehensive sense of touch.

“We humans and other animals have integrated both solutions,” he says. “I expect robots working in the real world will use both, too, to interact safely and smoothly with the world and learn.”

Why you’re about to see a lot more drones in the sky

20 August 2024 at 11:00

This story is from The Algorithm, our weekly newsletter on AI. To get it in your inbox first, sign up here.

If you follow drone news closely—and you’re forgiven if you don’t—you may have noticed over the last few months that the Federal Aviation Administration (FAA) has been quite busy. For decades, the agency had been a thorn in the side of drone evangelists, who wanted more freedom to fly drones in shared airspaces or dense neighborhoods. The FAA’s rules have made it cumbersome for futuristic ideas like drones delivering packages to work at scale.

Lately, that’s been changing. The agency recently granted Amazon’s Prime Air program approval to fly drones beyond the visual line of sight from its pilots in parts of Texas. The FAA has also granted similar waivers to hundreds of police departments around the country, which are now able to fly drones miles away, much to the ire of privacy advocates. 

However, while the FAA doling out more waivers is notable, there’s a much bigger change coming in less than a month. It promises to be the most significant drone decision in decades, and one that will decide just how many drones we all can expect to see and hear buzzing above us in the US on a daily basis. 

By September 16—if the FAA adheres to its deadline—the agency must issue a Notice of Proposed Rulemaking about whether drones can be flown beyond a visual line of sight. In other words, rather than issuing one-off waivers to police departments and delivery companies, it will propose a rule that applies to everyone using the airspace and aims to minimize the safety risk of drones flying into one another or falling and injuring people or property below. 

The FAA was first directed to come up with a rule back in 2018, but it hasn’t delivered. The September 16 deadline was put in place by the most recent FAA Reauthorization Act, signed into law in May. The agency will have 16 months after releasing the proposed rule to issue a final one.

Who will craft such an important rule, you ask? There are 87 organizations on the committee. Half are either commercial operators like Amazon and FedEx, drone manufacturers like Skydio, or other tech interests like Airbus or T-Mobile. There are also a handful of privacy groups like the American Civil Liberties Union, as well as academic researchers. 

It’s unclear where exactly the agency’s proposed rule will fall, but experts in the drone space told me that the FAA has grown much more accommodating of drones, and they expect this ruling to be reflective of that shift. 

If the rule makes it easier for pilots to fly beyond their line of sight, nearly every type of drone pilot will benefit from fewer restrictions. Groups like search and rescue pilots could more easily use drones to find missing persons in the wilderness without an FAA waiver, which is hard to obtain quickly in an emergency situation. 

But if more drones take to the skies with their pilots nowhere in sight, it will have massive implications. “The [proposed rule] will likely allow a broad swatch of operators to conduct wide-ranging drone flights beyond their visual line of sight,” says Jay Stanley, a senior policy analyst at the American Civil Liberties Union’s Speech, Privacy, and Technology Project. “That could open up the skies to a mass of delivery drones (from Amazon and UPS to local ‘burrito-copters’ and other deliveries), local government survey or code-enforcement flights, and a whole new swath of police surveillance operations.”

Read more about what’s coming next for drones from me here.


Now read the rest of The Algorithm

Deeper Learning

The US wants to use facial recognition to identify migrant children as they age

The US Department of Homeland Security (DHS) is looking into ways it might use facial recognition technology to track the identities of migrant children, “down to the infant,” as they age, according to John Boyd, assistant director of the department’s Office of Biometric Identity Management (OBIM), where a key part of his role is to research and develop future biometric identity services for the government. The previously unreported project is intended to improve how facial recognition algorithms track children over time.

Why this matters: Facial recognition technology (FRT) has traditionally not been applied to children, largely because training data sets of real children’s faces are few and far between, and consist of either low-quality images drawn from the internet or small sample sizes with little diversity. Such limitations reflect the significant sensitivities regarding privacy and consent when it comes to minors. A DHS program specifically trained on images of children, immigrants’ rights organizations and privacy advocates told MIT Technology Review, raises serious concern about whether children will be able to opt out of biometric data collection. Read more from Eileen Guo here

Bits and Bytes

A new public database lists all the ways AI could go wrong

The AI Risk Repository documents over 700 potential risks advanced AI systems could pose. It’s the most comprehensive source yet of information about previously identified issues that could arise from the creation and deployment of these models. (MIT Technology Review

Escaping Spotify’s algorithm

According to a 2022 report published by Distribution Strategy Group, at least 30% of songs streamed on Spotify are recommended by AI. By delivering what people seem to want, has Spotify killed the joy of music discovery? (MIT Technology Review)

How ‘Deepfake Elon Musk’ became the internet’s biggest scammer

An AI-powered version of Mr. Musk has appeared in thousands of inauthentic ads, contributing to billions in fraud. (The New York Times

Google’s conversational assistant Gemini Live has launched

Google’s Gemini Live, which was teased back in May, is the company’s closest answer to OpenAI’s GPT-4o. The model can hold conversations in real time and you can interrupt it mid-sentence. Google finally rolled it out earlier this week. (Google)

What’s next for drones

16 August 2024 at 11:00

MIT Technology Review’s What’s Next series looks across industries, trends, and technologies to give you a first look at the future. You can read the rest of them here.

Drones have been a mainstay technology among militaries, hobbyists, and first responders alike for more than a decade, and in that time the range available has skyrocketed. No longer limited to small quadcopters with insufficient battery life, drones are aiding search and rescue efforts, reshaping wars in Ukraine and Gaza, and delivering time-sensitive packages of medical supplies. And billions of dollars are being plowed into building the next generation of fully autonomous systems. 

These developments raise a number of questions: Are drones safe enough to be flown in dense neighborhoods and cities? Is it a violation of people’s privacy for police to fly drones overhead at an event or protest? Who decides what level of drone autonomy is acceptable in a war zone?

Those questions are no longer hypothetical. Advancements in drone technology and sensors, falling prices, and easing regulations are making drones cheaper, faster, and more capable than ever. Here’s a look at four of the biggest changes coming to drone technology in the near future.

Police drone fleets

Today more than 1,500 US police departments have drone programs, according to tracking conducted by the Atlas of Surveillance. Trained police pilots use drones for search and rescue operations, monitoring events and crowds, and other purposes. The Scottsdale Police Department in Arizona, for example, successfully used a drone to locate a lost elderly man with dementia, says Rich Slavin, Scottsdale’s assistant chief of police. He says the department has had useful but limited experiences with drones to date, but its pilots have often been hamstrung by the “line of sight” rule from the Federal Aviation Administration (FAA). The rule stipulates that pilots must be able to see their drones at all times, which severely limits the drone’s range.

Soon, that will change. On a rooftop somewhere in the city, Scottsdale police will in the coming months install a new police drone capable of autonomous takeoff, flight, and landing. Slavin says the department is seeking a waiver from the FAA to be able to fly its drone past the line of sight. (Hundreds of police agencies have received a waiver from the FAA since the first was granted in 2019.) The drone, which can fly up to 57 miles per hour, will go on missions as far as three miles from its docking station, and the department says it will be used for things like tracking suspects or providing a visual feed of an officer at a traffic stop who is waiting for backup. 

“The FAA has been much more progressive in how we’re moving into this space,” Slavin says. That could mean that around the country, the sight (and sound) of a police drone soaring overhead will become much more common. 

The Scottsdale department says the drone, which it is purchasing from Aerodome, will kick off its drone-as-first-responder program and will play a role in the department’s new “real-time crime center.” These sorts of centers are becoming increasingly common in US policing, and allow cities to connect cameras, license plate readers, drones, and other monitoring methods to track situations on the fly. The rise of the centers, and their associated reliance on drones, has drawn criticism from privacy advocates who say they conduct a great deal of surveillance with little transparency about how footage from drones and other sources will be used or shared. 

In 2019, the police department in Chula Vista, California, was the first to receive a waiver from the FAA to fly beyond line of sight. The program sparked criticism from members of the community who alleged the department was not transparent about the footage it collected or how it would be used. 

Jay Stanley, a senior policy analyst at the American Civil Liberties Union’s Speech, Privacy, and Technology Project, says the waivers exacerbate existing privacy issues related to drones. If the FAA continues to grant them, police departments will be able to cover far more of a city with drones than ever, all while the legal landscape is murky about whether this would constitute an invasion of privacy. 

“If there’s an accumulation of different uses of this technology, we’re going to end up in a world where from the moment you step out of your front door, you’re going to feel as though you’re under the constant eye of law enforcement from the sky,” he says. “It may have some real benefits, but it is also in dire need of strong checks and balances.”

Scottsdale police say the drone could be used in a variety of scenarios, such as responding to a burglary in progress or tracking a driver with suspected connection to a kidnapping. But the real benefit, Slavin says, will come from pairing it with other existing technologies, like automatic license plate readers and hundreds of cameras placed around the city. “It can get to places very, very quickly,” he says. “It gives us real-time intelligence and helps us respond faster and smarter.”

While police departments might indeed benefit from drones in those situations, Stanley says the ACLU has found that many deploy them for far more ordinary cases, like reports of a kid throwing a ball against a garage or of “suspicious persons” in an area.

“It raises the question about whether these programs will just end up being another way in which vulnerable communities are over-policed and nickeled and dimed by law enforcement agencies coming down on people for all kinds of minor transgressions,” he says.

Drone deliveries, again

Perhaps no drone technology is more overhyped than home deliveries. For years, tech companies have teased futuristic renderings of a drone dropping off a package on your doorstep just hours after you ordered it. But they’ve never managed to expand them much beyond small-scale pilot projects, at least in the US, again largely due to the FAA’s line of sight rules. 

But this year, regulatory changes are coming. Like police departments, Amazon’s Prime Air program was previously limited to flying its drones within the pilot’s line of sight. That’s because drone pilots don’t have radar, air traffic controllers, or any of the other systems commercial flight relies on to monitor airways and keep them safe. To compensate, Amazon spent years developing an onboard system that would allow its drones to detect nearby objects and avoid collisions. The company says it showed the FAA in demonstrations that its drones could fly safely in the same airspace as helicopters, planes, and hot air balloons. 

In May, Amazon announced the FAA had granted the company a waiver and permission to expand operations in Texas, more than a decade after the Prime Air project started. And in July, the FAA cleared one more roadblock by allowing two companies—Zipline as well as Google’s Wing Aviation—to fly in the same airspace simultaneously without the need for visual observers. 

While all this means your chances of receiving a package via drone have ticked up ever so slightly, the more compelling use case might be medical deliveries. Shakiba Enayati, an assistant professor of supply chains at the University of Missouri–St. Louis, has spent years researching how drones could conduct last-mile deliveries of vaccines, antivenom, organs, and blood in remote places. She says her studies have found drones to be game changers for getting medical supplies to underserved populations, and if the FAA extends these regulatory changes, it could have a real impact. 

That’s especially true in the steps leading up to an organ transplant, she says. Before an organ can be transmitted to a recipient, a number of blood tests must be sent back-and-forth to make sure the recipient can accept it, which takes a time if the blood is being transferred by car or even helicopter. “In these cases, the clock is ticking,” Enayati says. If drones were allowed to be used in this step at scale, it would be a significant improvement.

“If the technology is supporting the needs of organ delivery, it’s going to make a big change in such an important arena,” she says.

That development could come sooner than using drones for delivery of the actual organs, which have to be transported under very tightly controlled conditions to preserve them.

Domesticating the drone supply chain

Signed into law last December, the American Security Drone Act bars federal agencies from buying drones from countries thought to pose a threat to US national security, such as Russia and China. That’s significant. China is the undisputed leader when it comes to manufacturing drones and drone parts, with over 90% of law enforcement drones in the US made by Shenzhen-based DJI, and many drones used by both sides of the war in Ukraine are made by Chinese companies. 

The American Security Drone Act is part of an effort to curb that reliance on China. (Meanwhile, China is stepping up export restrictions on drones with military uses.) As part of the act, the US Department of Defense’s Defense Innovation Unit has created the Blue UAS Cleared List, a list of drones and parts the agency has investigated and approved for purchase. The list applies to federal agencies as well as programs that receive federal funding, which often means state police departments or other non-federal agencies. 

Since the US is set to spend such significant sums on drones—with $1 billion earmarked for the Department of Defense’s Replicator initiative alone—getting on the Blue List is a big deal. It means those federal agencies can make large purchases with little red tape. 

Allan Evans, CEO of US-based drone part maker Unusual Machine, says the list has sparked a significant rush of drone companies attempting to conform to the US standards. His company manufactures a first-person view flight controller that he hopes will become the first of its kind to be approved for the Blue List.

The American Security Drone Act is unlikely to affect private purchases in the US of drones used by videographers, drone racers, or hobbyists, which will overwhelmingly still be made by China-based companies like DJI. That means any US-based drone companies, at least in the short term, will only survive by catering to the US defense market.  

“Basically any US company that isn’t willing to have ancillary involvement in defense work will lose,” Evans says. 

The coming months will show the law’s true impact: Because the US fiscal year ends in September, Evans says he expects to see a host of agencies spending their use-it-or-lose-it funding on US-made drones and drone components in the next month. “That will indicate whether the marketplace is real or not, and how much money is actually being put toward it,” he says.

Autonomous weapons in Ukraine

The drone war in Ukraine has largely been one of attrition. Drones have been used extensively for surveying damage, finding and tracking targets, or dropping weapons since the war began, but on average these quadcopter drones last just three flights before being shot down or rendered unnavigable by GPS jamming. As a result, both Ukraine and Russia prioritized accumulating high volumes of drones with the expectation that they wouldn’t last long in battle. 

Now they’re having to rethink that approach, according to Andriy Dovbenko, founder of the UK-Ukraine Tech Exchange, a nonprofit that helps startups involved in Ukraine’s war effort and eventual reconstruction raise capital. While working with drone makers in Ukraine, he says, he has seen the demand for technology shift from big shipments of simple commercial drones to a pressing need for drones that can navigate autonomously in an environment where GPS has been jammed. With 70% of the front lines suffering from jamming, according to Dovbenko, both Russian and Ukrainian drone investment is now focused on autonomous systems. 

That’s no small feat. Drone pilots usually rely on video feeds from the drone as well as GPS technology, neither of which is available in a jammed environment. Instead, autonomous drones operate with various types of sensors like LiDAR to navigate, though this can be tricky in fog or other inclement weather. Autonomous drones are a new and rapidly changing technology, still being tested by US-based companies like Shield AI. The evolving war in Ukraine is raising the stakes and the pressure to deploy affordable and reliable autonomous drones.  

The transition toward autonomous weapons also raises serious yet largely unanswered questions about how much humans should be taken out of the loop in decision-making. As the war rages on and the need for more capable weaponry rises, Ukraine will likely be the testing ground for if and how the moral line is drawn. But Dovbenko says stopping to find that line during an ongoing war is impossible. 

“There is a moral question about how much autonomy you can give to the killing machine,” Dovbenko says. “This question is not being asked right now in Ukraine because it’s more of a matter of survival.”

Google’s new weather prediction system combines AI with traditional physics

Researchers from Google have built a new weather prediction model that combines machine learning with more conventional techniques, potentially yielding accurate forecasts at a fraction of the current cost. 

The model, called NeuralGCM and described in a paper in Nature today, bridges a divide that’s grown among weather prediction experts in the last several years. 

While new machine-learning techniques that predict weather by learning from years of past data are extremely fast and efficient, they can struggle with long-term predictions. General circulation models, on the other hand, which have dominated weather prediction for the last 50 years, use complex equations to model changes in the atmosphere and give accurate projections, but they are exceedingly slow and expensive to run. Experts are divided on which tool will be most reliable going forward. But the new model from Google instead attempts to combine the two. 

“It’s not sort of physics versus AI. It’s really physics and AI together,” says Stephan Hoyer, an AI researcher at Google Research and a coauthor of the paper. 

The system still uses a conventional model to work out some of the large atmospheric changes required to make a prediction. It then incorporates AI, which tends to do well where those larger models fall flat—typically for predictions on scales smaller than about 25 kilometers, like those dealing with cloud formations or regional microclimates (San Francisco’s fog, for example). “That’s where we inject AI very selectively to correct the errors that accumulate on small scales,” Hoyer says.

The result, the researchers say, is a model that can produce quality predictions faster with less computational power. They say NeuralGCM is as accurate as one-to-15-day forecasts from the European Centre for Medium-Range Weather Forecasts (ECMWF), which is a partner organization in the research. 

But the real promise of technology like this is not in better weather predictions for your local area, says Aaron Hill, an assistant professor at the School of Meteorology at the University of Oklahoma, who was not involved in this research. Instead, it’s in larger-scale climate events that are prohibitively expensive to model with conventional techniques. The possibilities could range from predicting tropical cyclones with more notice to modeling more complex climate changes that are years away. 

“It’s so computationally intensive to simulate the globe over and over again or for long periods of time,” Hill says. That means the best climate models are hamstrung by the high costs of computing power, which presents a real bottleneck to research. 

AI-based models are indeed more compact. Once trained, typically on 40 years of historical weather data from ECMWF, a machine-learning model like Google’s GraphCast can run on less than 5,500 lines of code, compared with the nearly 377,000 lines required for the model from the National Oceanic and Atmospheric Administration, according to the paper. 

NeuralGCM, according to Hill, seems to make a strong case that AI can be brought in for particular elements of weather modeling to make things faster, while still keeping the strengths of conventional systems.

“We don’t have to throw away all the knowledge that we’ve gained over the last 100 years about how the atmosphere works,” he says. “We can actually integrate that with the power of AI and machine learning as well.”

Hoyer says using the model to predict short-term weather has been useful for validating its predictions, but that the goal is indeed to be able to use it for longer-term modeling, particularly for extreme weather risk. 

NeuralGCM will be open source. While Hoyer says he looks forward to having climate scientists use it in their research, the model may also be of interest to more than just academics. Commodities traders and agricultural planners pay top dollar for high-resolution predictions, and the models used by insurance companies for products like flood or extreme weather insurance are struggling to account for the impact of climate change. 

While many of the AI skeptics in weather forecasting have been won over by recent developments, according to Hill, the fast pace is hard for the research community to keep up with. “It’s gangbusters,” he says—it seems as if a new model is released by Google, Nvidia, or Huawei every two months. That makes it difficult for researchers to actually sort out which of the new tools will be most useful and apply for research grants accordingly. 

“The appetite is there [for AI],” Hill says. “But I think a lot of us still are waiting to see what happens.”

Correction: This story was updated to clarify that Stephan Hoyer is a researcher at Google Research, not Google DeepMind.

Robot-packed meals are coming to the frozen-food aisle

Advances in artificial intelligence are coming to your freezer, in the form of robot-assembled prepared meals. 

Chef Robotics, a San Francisco–based startup, has launched a system of AI-powered robotic arms that can be quickly programmed with a recipe to dole out accurate portions of everything from tikka masala to pesto tortellini. After experiments with leading brands, including Amy’s Kitchen, the company says its robots have proved their worth and are being rolled out at scale to more production facilities. They are also being offered to new customers in the US and Canada. 

You might think the meals that end up in the grocery store’s frozen aisle, at Starbucks, or on airplanes are robot-packed already, but that’s rarely the case. Workers are often much more flexible than robots and can handle production lines that frequently rotate recipes. Not only that, but certain ingredients, like rice or shredded cheese, are hard to portion out with robotic arms. That means the vast majority of meals from recognizable brands are still typically hand-packed. 

However, advancements from AI have changed the calculus, making robots more useful on production lines, says David Griego, senior director of engineering at Amy’s.

“Before Silicon Valley got involved, the industry was much more about ‘Okay, we’re gonna program—a robot is gonna do this and do this only,’” he says. For a brand with so many different meals, that wasn’t very helpful. But the robots Griego is now able to add to the production line can learn how scooping a portion of peas is different from scooping cauliflower, and they can improve their accuracy for next time. “It’s astounding just how they can adapt to all the different types of ingredients that we use,” he says. Meal-packing robots suddenly make much more financial sense. 

Rather than selling the machines outright, Chef uses a service model, where customers pay a yearly fee that covers maintenance and training. Amy’s currently uses eight systems (each with two robotic arms) spread across two of its plants. One of these systems can now do the work of two to four workers depending on which ingredients are being packed, Griego says. The robots also reduce waste, since they can pack more consistent portions than their human counterparts. One-arm systems typically cost less than $135,000 per year, according to Chef CEO Rajat Bhageria.

With these advantages in mind, Griego imagines the robots handling more and more of the meal assembly process. “I have a vision,” he says, “where the only thing people would do is run the systems.” They’d make sure the hoppers of ingredients and packaging materials were full, for example, and the robots would do the rest. 

Robot chefs have been getting more skilled in recent years thanks to AI, and some companies have promised that burger-flipping and nugget-frying robots can provide cost savings to restaurants. But much of this technology has seen little adoption in the restaurant industry so far, says Bhageria. That’s because fast-casual restaurants often only need one cook running the grill, and if a robot cannot fully replace that person because it still needs supervision, it makes little sense to use it. Packaged meal companies, however, have a larger source of labor costs that they want to bring down: plating and assembly.

“That’s going to be the highest bang for our buck for our customers,” Bhageria says. 

CHEF

The notion that more flexible robots could mean broader adoption in new industries is no surprise, says Lerrel Pinto, who leads the General-Purpose Robotics and AI Lab at New York University and is not involved with Chef or Amy’s Kitchen. 

“A lot of robots deployed in the real world are used in a very repetitive way, where they’re supposed to do the same thing over and over again,” he says. Deep learning has caused a paradigm shift over the past few years, sparking the idea that more generally capable robots might be not only possible but necessary for more widespread adoption. If Chef’s robots can perform without frequent stops for repair or training, they could deliver material savings to food companies and shift how they use human labor, Pinto says: “In the next few years, we will probably see a lot more companies trying to actually deploy these types of learning-based robots in the real world.”

One new challenge the robots have created for Amy’s, Griego says, is maintaining the look of a hand-packed meal when it was assembled by a robot. The company’s cheese enchilada dish in particular was causing trouble: it’s finished with a hand-distributed sprinkling of cheddar on top, but Amy’s panel of examiners said the cheese on the robot-packed dish looked too machine-spread, sending Griego back to the drawing board.

“The first few tests went pretty well,” he says. After a couple of changes, the robots are ready to take over. Amy’s plans to bring them to more of its facilities and train them on a growing list of ingredients, meaning your frozen meals are increasingly likely to be packed by a robot.

Update: This story has been amended to include updating pricing information from Chef.

AI is poised to automate today’s most mundane manual warehouse task

Before almost any item reaches your door, it traverses the global supply chain on a pallet. More than 2 billion pallets are in circulation in the United States alone, and $400 billion worth of goods are exported on them annually. However, loading boxes onto these pallets is a task stuck in the past: Heavy loads and repetitive movements leave workers at high risk of injury, and in the rare instances when robots are used, they take months to program using handheld computers that have changed little since the 1980s.

Jacobi Robotics, a startup spun out of the labs of the University of California, Berkeley, says it can vastly speed up that process with AI command-and-control software. The researchers approached palletizing—one of the most common warehouse tasks—as primarily an issue of motion planning: How do you safely get a robotic arm to pick up boxes of different shapes and stack them efficiently on a pallet without getting stuck? And all that computation also has to be fast, because factory lines are producing more varieties of products than ever before—which means boxes of more shapes and sizes.

After much trial and error, Jacobi’s founders, including roboticist Ken Goldberg, say they’ve cracked it. Their software, built upon research from a paper they published in Science Robotics in 2020, is designed to work with the four leading makers of robotic palletizing arms. It uses deep learning to generate a “first draft” of how an arm might move an item onto the pallet. Then it uses more traditional robotics methods, like optimization, to check whether the movement can be done safely and without glitches. 

Jacobi aims to replace the legacy methods customers are currently using to train their bots. In the conventional approach, robots are programmed using tools called “teaching pendants,” and customers usually have to manually guide the robot to demonstrate how to pick up each individual box and place it on the pallet. The entire coding process can take months. Jacobi says its AI-driven solution promises to cut that time down to a day and can compute motions in less than a millisecond. The company says it plans to launch its product later this month.

Billions of dollars are being poured into AI-powered robotics, but most of the excitement is geared toward next-generation robots that promise to be capable of many different tasks—like the humanoid robot that has helped Figure raise $675 million from investors, including Microsoft and OpenAI, and reach a $2.6 billion evaluation in February. Against this backdrop, using AI to train a better box-stacking robot might feel pretty basic. 

Indeed, Jacobi’s seed funding round is trivial in comparison: $5 million led by Moxxie Ventures. But amid hype around promised robotics breakthroughs that could take years to materialize, palletizing might be the warehouse problem AI is best poised to solve in the short term. 

“We have a very pragmatic approach,” says Max Cao, Jacobi’s co-founder and CEO. “These tasks are within reach, and we can get a lot of adoption within a short time frame, versus some of the moonshots out there.”

Jacobi’s software product includes a virtual studio where customers can build replicas of their setups, capturing factors like which robot models they have, what types of boxes will come off the conveyor belt, and which direction the labels should face. A warehouse moving sporting goods, say, might use the program to figure out the best way to stack a mixed pallet of tennis balls, rackets, and apparel. Then Jacobi’s algorithms will automatically plan the many movements the robotic arm should take to stack the pallet, and the instructions will be transmitted to the robot.

JACOBI ROBOTICS

The approach merges the benefits of fast computing provided by AI with the accuracy of more traditional robotics techniques, says Dmitry Berenson, a professor of robotics at the University of Michigan, who is not involved with the company.

“They’re doing something very reasonable here,” he says. A lot of modern robotics research is betting big on AI, hoping that deep learning can augment or replace more manual training by having the robot learn from past examples of a given motion or task. But by making sure the predictions generated by deep learning are checked against the results of more traditional methods, Jacobi is developing planning algorithms that will likely be less prone to error, Berenson says.

The planning speed that could result “is pushing this into a new category,” he adds. “You won’t even notice the time it takes to compute a motion. That’s really important in the industrial setting, where every pause means delays.”

❌
❌