The Coming Local "Good Enough" Models

I don't think it is very hard to predict the exponential rise of INT in large LLMs in the near future, but what about on prem local models and their rise? Every day I am seeing reports of people able to run decent models on their own M4 or dual card windows rigs. Models up to 27B on airplanes, when does this become the dedicated fallback for API access to the big three? I am old enough to remember search in its infancy, and I have seen a lot of people comparing Search ( think the time of google/yahoo/msn ) fought it out for Google to become the king of the hill. This feels different. No one wants to revisit a dominance by one player like Search. It's almost like the foreseeable future shows two tiers, the paid and the local with both tiers accelerating quickly. Search was also a horsepower/hardware problem. Whoever had the most crawlers, who could crawl the most frequently, who had the quickest SERP load times generally did better, but AI is still a different animal. People seem to be more patient for answers this time around. It isn't retrieval, it is the thought. My gut tells me "good enough" local models for 75% of problems are an easy take in the next 3-5 years. Not just as backups, but depending on the use case, sometimes the first answer for procedure based utility ( ie less thinking involved ).

At some point the big three are going to fill their moats with crocodiles. It has started in the form of monopoly on hardware, but I don't know what that looks like for student local models that just need to run. We will still need large scale models ( teacher / jedi ) to distill and get the end user to the point of having a smaller model to work with locally ( student / padawan ), but the reliance on an external AI can at least be backfilled when down or the client is just token poor and I think there is enough of a movement to be paying attention to it.

Good reads:

📄Running local models on an M4 with 24GB memory
📄 Local AI Needs to be the Norm
📄 Claude Code: connect to a local model when your quota runs out
📄 I ran Gemma 4 as a local model in Codex CLI