Apple’s latest research about running large language models on smartphones offers the clearest signal yet that the iPhone maker plans to deliver more AI capabilities on its hardware instead of in the cloud which will offer speed and privacy to users.

Tim Bradshaw for Financial Times:

‎

The paper, entitled “LLM in a Flash”, offers a “solution to a current computational bottleneck”, its researchers write.

Its approach “paves the way for effective inference of LLMs on devices with limited memory”, they said. Inference refers to how large language models, the large data repositories that power apps like ChatGPT, respond to users’ queries. Chatbots and LLMs normally run in vast data centres with much greater computing power than an iPhone.

The paper was published on December 12 but caught wider attention after Hugging Face, a popular site for AI researchers to showcase their work, highlighted it late on Wednesday. It is the second Apple paper on generative AI this month and follows earlier moves to enable image-generating models such as Stable Diffusion to run on its custom chips…

Ensuring that queries are answered on an individual’s own device without sending data to the cloud is also likely to bring privacy benefits, a key differentiator for Apple in recent years.

“Our experiment is designed to optimise inference efficiency on personal devices,” its researchers said.