Anthropic has announced a partnership with Palantir and Amazon Web Services to bring its Claude AI models to unspecified US intelligence and defense agencies. Claude, a family of AI language models similar to those that power ChatGPT, will work within Palantir's platform using AWS hosting to process and analyze data. But some critics have called out the deal as contradictory to Anthropic's widely-publicized "AI safety" aims.
The partnership makes Claude available within Palantir's Impact Level 6 environment (IL6), a defense-accredited system that handles data critical to national security up to the "secret" classification level. This move follows a broader trend of AI companies seeking defense contracts, with Meta offering its Llama models to defense partners and OpenAI pursuing closer ties with the Defense Department.
On Wednesday, OpenAI CEO Sam Altman merely tweeted "chat.com," announcing that the company had acquired the short domain name, which now points to the company's ChatGPT AI assistant when visited in a web browser. As of Thursday morning, "chatgpt.com" still hosts the chatbot, with the new domain serving as a redirect.
The new domain name comes with an interesting backstory that reveals a multimillion-dollar transaction. HubSpot founder and CTO Dharmesh Shah purchased chat.com for $15.5 million in early 2023, The Verge reports. Shah sold the domain to OpenAI for an undisclosed amount, though he confirmed on X that he "doesn't like profiting off of people he considers friends" and that he received payment in company shares by revealing he is "now an investor in OpenAI."
As The Verge's Kylie Robison points out, Shah originally bought the domain to promote conversational interfaces. "The reason I bought chat.com is simple: I think Chat-based UX (#ChatUX) is the next big thing in software. Communicating with computers/software through a natural language interface is much more intuitive. This is made possible by Generative A.I.," Shah wrote in a LinkedIn post during his brief ownership.
Early Wednesday morning, Donald Trump became the presumptive winner of the 2024 US presidential election, setting the stage for dramatic changes to federal AI policy when he takes office early next year. Among them, Trump has stated he plans to dismantle President Biden's AI Executive Order from October 2023 immediately upon taking office.
Biden's order established wide-ranging oversight of AI development. Among its core provisions, the order established the US AI Safety Institute (AISI) and lays out requirements for companies to submit reports about AI training methodologies and security measures, including vulnerability testing data. The order also directed the Commerce Department's National Institute of Standards and Technology (NIST) to develop guidance to help companies identify and fix flaws in their AI models.
Trump supporters in the US government have criticized the measures, as TechCrunch points out. In March, Representative Nancy Mace (R-S.C.) warned that reporting requirements could discourage innovation and prevent developments like ChatGPT. And Senator Ted Cruz (R-Texas) characterized NIST's AI safety standards as an attempt to control speech through "woke" safety requirements.
On Monday, Anthropic launched the latest version of its smallest AI model, Claude 3.5 Haiku, in a way that marks a departure from typical AI model pricing trends—the new model costs four times more to run than its predecessor. The reason for the price increase is causing some pushback in the AI community: more smarts, according to Anthropic.
"During final testing, Haiku surpassed Claude 3 Opus, our previous flagship model, on many benchmarks—at a fraction of the cost," Anthropic wrote in a post on X. "As a result, we've increased pricing for Claude 3.5 Haiku to reflect its increase in intelligence."
"It's your budget model that's competing against other budget models, why would you make it less competitive," wrote one X user. "People wanting a 'too cheap to meter' solution will now look elsewhere."
On Tuesday, Google's CEO revealed that AI systems now generate more than a quarter of new code for its products, with human programmers overseeing the computer-generated contributions. The statement, made during Google's Q3 2024 earnings call, shows how AI tools are already having a sizable impact on software development.
"We're also using AI internally to improve our coding processes, which is boosting productivity and efficiency," Pichai said during the call. "Today, more than a quarter of all new code at Google is generated by AI, then reviewed and accepted by engineers. This helps our engineers do more and move faster."
Google developers aren't the only programmers using AI to assist with coding tasks. It's difficult to get hard numbers, but according to Stack Overflow's 2024 Developer Survey, over 76 percent of all respondents "are using or are planning to use AI tools in their development process this year," with 62 percent actively using them. A 2023 GitHub survey found that 92 percent of US-based software developers are "already using AI coding tools both in and outside of work."
Today, Apple released iOS 18.1, iPadOS 18.1, macOS Sequoia 15.1, tvOS 18.1, visionOS 2.1, and watchOS 11.1. The iPhone, iPad, and Mac updates are focused on bringing the first AI features the company has marketed as "Apple Intelligence" to users.
Once they update, users with supported devices in supported regions can enter a waitlist to begin using the first wave of Apple Intelligence features, including writing tools, notification summaries, and the "reduce interruptions" focus mode.
In terms of features baked into specific apps, Photos has natural language search, the ability to generate memories (those short gallery sequences set to video) from a text prompt, and a tool to remove certain objects from the background in photos. Mail and Messages get summaries and smart reply (auto-generating contextual responses).
“What are the differences between trail shoes and running shoes?”
“What are the best dinosaur toys for a five year old?”
These are some of the open-ended questions customers might ask a helpful sales associate in a brick-and-mortar store. But how can customers get answers to similar questions while shopping online?
Amazon’s answer is Rufus, a shopping assistant powered by generative AI. Rufus helps Amazon customers make more informed shopping decisions by answering a wide range of questions within the Amazon app. Users can get product details, compare options, and receive product recommendations.
I lead the team of scientists and engineers that built the large language model (LLM) that powers Rufus. To build a helpful conversational shopping assistant, we used innovative techniques across multiple aspects of generative AI. We built a custom LLM specialized for shopping; employed retrieval-augmented generation with a variety of novel evidence sources; leveraged reinforcement learning to improve responses; made advances in high-performance computing to improve inference efficiency and reduce latency; and implemented a new streaming architecture to get shoppers their answers faster.
How Rufus Gets Answers
Most LLMs are first trained on a broad dataset that informs the model’s overall knowledge and capabilities, and then are customized for a particular domain. That wouldn’t work for Rufus, since our aim was to train it on shopping data from the very beginning—the entire Amazon catalog, for starters, as well as customer reviews and information from community Q&A posts. So our scientists built a custom LLM that was trained on these data sources along with public information on the web.
But to be prepared to answer the vast span of questions that could possibly be asked, Rufus must be empowered to go beyond its initial training data and bring in fresh information. For example, to answer the question, “Is this pan dishwasher-safe?” the LLM first parses the question, then it figures out which retrieval sources will help it generate the answer.
Our LLM uses retrieval-augmented generation (RAG) to pull in information from sources known to be reliable, such as the product catalog, customer reviews, and community Q&A posts; it can also call relevant Amazon Stores APIs. Our RAG system is enormously complex, both because of the variety of data sources used and the differing relevance of each one, depending on the question.
Every LLM, and every use of generative AI, is a work in progress. For Rufus to get better over time, it needs to learn which responses are helpful and which can be improved. Customers are the best source of that information. Amazon encourages customers to give Rufus feedback, letting the model know if they liked or disliked the answer, and those responses are used in a reinforcement learning process. Over time, Rufus learns from customer feedback and improves its responses.
Special Chips and Handling Techniques for Rufus
Rufus needs to be able to engage with millions of customers simultaneously without any noticeable delay. This is particularly challenging since generative AI applications are very compute-intensive, especially at Amazon’s scale.
To minimize delay in generating responses while also maximizing the number of responses that our system could handle, we turned to Amazon’s specialized AI chips, Trainium and Inferentia, which are integrated with core Amazon Web Services (AWS). We collaborated with AWS on optimizations that improve model inference efficiency, which were then made available to all AWS customers.
But standard methods of processing user requests in batches will cause latency and throughput problems because it’s difficult to predict how many tokens (in this case, units of text) an LLM will generate as it composes each response. Our scientists worked with AWS to enable Rufus to use continuous batching, a novel LLM technique that enables the model to start serving new requests as soon as the first request in the batch finishes, rather than waiting for all requests in a batch to finish. This technique improves the computational efficiency of AI chips and allows shoppers to get their answers quickly.
We want Rufus to provide the most relevant and helpful answer to any given question. Sometimes that means a long-form text answer, but sometimes it’s short-form text, or a clickable link to navigate the store. And we had to make sure the presented information follows a logical flow. If we don’t group and format things correctly, we could end up with a confusing response that’s not very helpful to the customer.
That’s why Rufus uses an advanced streaming architecture for delivering responses. Customers don’t need to wait for a long answer to be fully generated—instead, they get the first part of the answer while the rest is being generated. Rufus populates the streaming response with the right data (a process called hydration) by making queries to internal systems. In addition to generating the content for the response, it also generates formatting instructions that specify how various answer elements should be displayed.
Even though Amazon has been using AI for more than 25 years to improve the customer experience, generative AI represents something new and transformative. We’re proud of Rufus, and the new capabilities it provides to our customers.
When an international team of researchers set out to create an “AI scientist” to handle the whole scientific process, they didn’t know how far they’d get. Would the system they created really be capable of generating interesting hypotheses, running experiments, evaluating the results, and writing up papers?
What they ended up with, says researcher Cong Lu, was an AI tool that they judged equivalent to an early Ph.D. student. It had “some surprisingly creative ideas,” he says, but those good ideas were vastly outnumbered by bad ones. It struggled to write up its results coherently, and sometimes misunderstood its results: “It’s not that far from a Ph.D. student taking a wild guess at why something worked,” Lu says. And, perhaps like an early Ph.D. student who doesn’t yet understand ethics, it sometimes made things up in its papers, despite the researchers’ best efforts to keep it honest.
Lu, a postdoctoral research fellow at the University of British Columbia, collaborated on the project with several other academics, as well as with researchers from the buzzy Tokyo-based startup Sakana AI. The team recently posted a preprint about the work on the ArXiv server. And while the preprint includes a discussion of limitations and ethical considerations, it also contains some rather grandiose language, billing the AI scientist as “the beginning of a new era in scientific discovery,” and “the first comprehensive framework for fully automatic scientific discovery, enabling frontier large language models (LLMs) to perform research independently and communicate their findings.”
The AI scientist seems to capture the zeitgeist. It’s riding the wave of enthusiasm for AI for science, but some critics think that wave will toss nothing of value onto the beach.
The “AI for Science” Craze
This research is part of a broader trend of AI for science. Google DeepMind arguably started the craze back in 2020 when it unveiled AlphaFold, an AI system that amazed biologists by predicting the 3D structures of proteins with unprecedented accuracy. Since generative AI came on the scene, many more bigcorporateplayers have gotten involved. Tarek Besold, a SonyAI senior research scientist who leads the company’s AI for scientific discovery program, says that AI for science is “a goal behind which the AI community can rally in an effort to advance the underlying technology but—even more importantly—also to help humanity in addressing some of the most pressing issues of our times.”
Yet the movement has its critics. Shortly after a 2023 Google DeepMind paper came out claiming the discovery of 2.2 million new crystal structures (“equivalent to nearly 800 years’ worth of knowledge”), two materials scientists analyzed a random sampling of the proposed structures and said that they found “scant evidence for compounds that fulfill the trifecta of novelty, credibility, and utility.” In other words, AI can generate a lot of results quickly, but those results may not actually be useful.
How the AI Scientist Works
In the case of the AI scientist, Lu and his collaborators tested their system only on computer science, asking it to investigate topics relating to large language models, which power chatbots like ChatGPT and also the AI scientist itself, and the diffusion models that power image generators like DALL-E.
The AI scientist’s first step is hypothesis generation. Given the code for the model it’s investigating, it freely generates ideas for experiments it could run to improve the model’s performance, and scores each idea on interestingness, novelty, and feasibility. It can iterate at this step, generating variations on the ideas with the highest scores. Then it runs a check in Semantic Scholar to see if its proposals are too similar to existing work. It next uses a coding assistant called Aider to run its code and take notes on the results in the format of an experiment journal. It can use those results to generate ideas for follow-up experiments.
The AI scientist is an end-to-end scientific discovery tool powered by large language models. University of British Columbia
The next step is for the AI scientist to write up its results in a paper using a template based on conference guidelines. But, says Lu, the system has difficulty writing a coherent nine-page paper that explains its results—”the writing stage may be just as hard to get right as the experiment stage,” he says. So the researchers broke the process down into many steps: The AI scientist wrote one section at a time, and checked each section against the others to weed out both duplicated and contradictory information. It also goes through Semantic Scholar again to find citations and build a bibliography.
But then there’s the problem of hallucinations—the technical term for an AI making stuff up. Lu says that although they instructed the AI scientist to only use numbers from its experimental journal, “sometimes it still will disobey.” Lu says the model disobeyed less than 10 percent of the time, but “we think 10 percent is probably unacceptable.” He says they’re investigating a solution, such as instructing the system to link each number in its paper to the place it appeared in the experimental log. But the system also made less obvious errors of reasoning and comprehension, which seem harder to fix.
And in a twist that you may not have seen coming, the AI scientist even contains a peer review module to evaluate the papers it has produced. “We always knew that we wanted some kind of automated [evaluation] just so we wouldn’t have to pour over all the manuscripts for hours,” Lu says. And while he notes that “there was always the concern that we’re grading our own homework,” he says they modeled their evaluator after the reviewer guidelines for the leading AI conference NeurIPS and found it to be harsher overall than human evaluators. Theoretically, the peer review function could be used to guide the next round of experiments.
Critiques of the AI Scientist
While the researchers confined their AI scientist to machine learning experiments, Lu says the team has had a few interesting conversations with scientists in other fields. In theory, he says, the AI scientist could help in any field where experiments can be run in simulation. “Some biologists have said there’s a lot of things that they can do in silico,” he says, also mentioning quantum computing and materials science as possible fields of endeavor.
Some critics of the AI for science movement might take issue with that broad optimism. Earlier this year, Jennifer Listgarten, a professor of computational biology at UC Berkeley, published a paper in Nature Biotechnology arguing that AI is not about to produce breakthroughs in multiple scientific domains. Unlike the AI fields of natural language processing and computer vision, she wrote, most scientific fields don’t have the vast quantities of publicly available data required to train models.
Two other researchers who study the practice of science, anthropologist Lisa Messeri of Yale University and psychologist M.J. Crockett of Princeton University, published a 2024 paper in Nature that sought to puncture the hype surrounding AI for science. When asked for a comment about this AI scientist, the two reiterated their concerns over treating “AI products as autonomous researchers.” They argue that doing so risks narrowing the scope of research to questions that are suited for AI, and losing out on the diversity of perspectives that fuels real innovation. “While the productivity promised by ‘the AI Scientist’ may sound appealing to some,” they tell IEEE Spectrum, “producing papers and producing knowledge are not the same, and forgetting this distinction risks that we produce more while understanding less.”
But others see the AI scientist as a step in the right direction. SonyAI’s Besold says he believes it’s a great example of how today’s AI can support scientific research when applied to the right domain and tasks. “This may become one of a handful of early prototypes that can help people conceptualize what is possible when AI is applied to the world of scientific discovery,” he says.
What’s Next for the AI Scientist
Lu says that the team plans to keep developing the AI scientist, and he says there’s plenty of low-hanging fruit as they seek to improve its performance. As for whether such AI tools will end up playing an important role in the scientific process, “I think time will tell what these models are good for,” Lu says. It might be, he says, that such tools are useful for the early scoping stages of a research project, when an investigator is trying to get a sense of the many possible research directions—although critics add that we’ll have to wait for future studies to see if these tools are really comprehensive and unbiased enough to be helpful.
Or, Lu says, if the models can be improved to the point that they match the performance of“a solid third-year Ph.D. student,” they could be a force multiplier for anyone trying to pursue an idea (at least, as long as the idea is in an AI-suitable domain). “At that point, anyone can be a professor and carry out a research agenda,” says Lu. “That’s the exciting prospect that I’m looking forward to.”