The clearest illustration of Apple's on-device AI momentum comes not from Apple's announcements but from the constraints developers are hitting in production. One iOS developer filing against apple/coreai-models describes a deployed app where portfolio data cannot leave the device — making on-device inference the product, not an optimization . The native Core AI support request for Gemma 4 E4B is explicit: the current llama.cpp workaround via Metal works, but battery and thermal costs on long agent sessions are a real shipping problem. Apple's privacy architecture has already created a category of apps for which the question is not whether to run locally but how to do it without melting the phone.
That category will expand. WWDC 2026 added image input to Apple Foundation Models , which opens on-device captioning and alt-text generation to apps that had no path to those capabilities before. The Rollercoaster.dev mobile issue proposing on-device alt-text via Foundation Models treats it as a solved architectural question — the only constraint named is an internal app policy, not a capability gap. Apple's Foundation Models layer is already being treated by practitioners as sufficient for a class of sensitive inference tasks, and the llama.cpp escape hatch that preceded Core AI maturity is now becoming a temporary workaround rather than a permanent architecture.