Apple, Microsoft Shrink AI Models to Improve Them

By: Shubham Agarwal

20 June 2024 at 18:00

Tech companies have been caught up in a race to build the biggest large language models (LLMs). In April, for example, Meta announced the 400-billion-parameter Llama 3, which contains twice the number of parameters—or variables that determine how the model responds to queries—than OpenAI’s original ChatGPT model from 2022. Although not confirmed, GPT-4 is estimated to have about 1.8 trillion parameters.

In the last few months, however, some of the largest tech companies, including Apple and Microsoft, have introduced small language models (SLMs). These models are a fraction of the size of their LLM counterparts and yet, on many benchmarks, can match or even outperform them in text generation.

On 10 June, at Apple’s Worldwide Developers Conference, the company announced its “Apple Intelligence” models, which have around 3 billion parameters. And in late April, Microsoft released its Phi-3 family of SLMs, featuring models housing between 3.8 billion and 14 billion parameters.

OpenAI’s CEO Sam Altman believes we’re at the end of the era of giant models.

In a series of tests, the smallest of Microsoft’s models, Phi-3-mini, rivalled OpenAI’s GPT-3.5 (175 billion parameters), which powers the free version of ChatGPT, and outperformed Google’s Gemma (7 billion parameters). The tests evaluated how well a model understands language by prompting it with questions about mathematics, philosophy, law, and more. What’s more interesting, Microsoft’s Phi-3-small, with 7 billion parameters, fared remarkably better than GPT-3.5 in many of these benchmarks.

Aaron Mueller, who researches language models at Northeastern University in Boston, isn’t surprised SLMs can go toe-to-toe with LLMs in select functions. He says that’s because scaling the number of parameters isn’t the only way to improve a model’s performance: Training it on higher-quality data can yield similar results too.

Microsoft’s Phi models were trained on fine-tuned “textbook-quality” data, says Mueller, which have a more consistent style that’s easier to learn from than the highly diverse text from across the Internet that LLMs typically rely on. Similarly, Apple trained its SLMs exclusively on richer and more complex datasets.

The rise of SLMs comes at a time when the performance gap between LLMs is quickly narrowing and tech companies look to deviate from standard scaling laws and explore other avenues for performance upgrades. At an event in April, OpenAI’s CEO Sam Altman said he believes we’re at the end of the era of giant models. “We’ll make them better in other ways.”

Because SLMs don’t consume nearly as much energy as LLMs, they can also run locally on devices like smartphones and laptops (instead of in the cloud) to preserve data privacy and personalize them to each person. In March, Google rolled out Gemini Nano to the company’s Pixel line of smartphones. The SLM can summarize audio recordings and produce smart replies to conversations without an Internet connection. Apple is expected to follow suit later this year.

More importantly, SLMs can democratize access to language models, says Mueller. So far, AI development has been concentrated into the hands of a couple of large companies that can afford to deploy high-end infrastructure, while other, smaller operations and labs have been forced to license them for hefty fees.

Since SLMs can be easily trained on more affordable hardware, says Mueller, they’re more accessible to those with modest resources and yet still capable enough for specific applications.

In addition, while researchers agree there’s still a lot of work ahead to overcome hallucinations, carefully curated SLMs bring them a step closer toward building responsible AI that is also interpretable, which would potentially allow researchers to debug specific LLM issues and fix them at the source.

For researchers like Alex Warstadt, a computer science researcher at ETH Zurich, SLMs could also offer new, fascinating insights into a longstanding scientific question: How children acquire their first language. Warstadt, alongside a group of researchers including Northeastern’s Mueller, organizes BabyLM, a challenge in which participants optimize language-model training on small data.

Not only could SLMs potentially unlock new secrets of human cognition, but they also help improve generative AI. By the time children turn 13, they’re exposed to about 100 million words and are better than chatbots at language, with access to only 0.01 percent of the data. While no one knows what makes humans so much more efficient, says Warstadt, “reverse engineering efficient humanlike learning at small scales could lead to huge improvements when scaled up to LLM scales.”

Normal view

How to stop LinkedIn from training AI on your data

Apple, Microsoft Shrink AI Models to Improve Them