Finally, in this post, we put our money where our mouth is. We dragged a 2016 enterprise relic out of the closet — NAY, out of the grave, a single Intel Xeon running on agonizingly slow DDR3 RAM with absolutely no GPU to speak of — and forced it to run a cutting-edge, 26-billion-parameter Mixture-of-Experts architecture at reading speed. We did without throwing exotic hardware at the problem. Instead we treated the deployment pipeline as a serious thing, and mapped the architecture directly to physical hardware, tuning memory allocation, and unlocking the absolute limits of CPU cache optimization.

Very cool post. I'm may go necro up some old servers and mess with brining them back up to speed now.
I wonder if, by pooling resources, something like this could start within the old IT-guard communities, spread, and begin to tip the scales between homespun infrastructure and membership-based services.
How does that then affect the open model ecosystem? From my understanding we still need the data center footprint at that point, but the last mile ( you typing something in ) could* be converted in maybe 50% of use cases?
