A popular way to run local AI models, Ollama, seems to be one of the shadiest (Zetaphor, Sleeping Robots). I’ve switched to LM Studio recently and use it with Gemma4, highly recommended.
- Ollama built its reputation as the easiest way to run local LLMs, but that ease rested almost entirely on llama.cpp, an engine it failed to credit for over a year and whose MIT license attribution requirement it ignored for 400+ days.
- After years of downplaying that dependency, Ollama forked away from llama.cpp to build its own backend, which promptly reintroduced bugs the upstream project had already fixed, while benchmarks show throughput penalties of 30 to 70 percent compared to running llama.cpp directly.
- The project has since added a closed-source GUI app, a cloud-hosted model tier with murky third-party data handling, and a proprietary model storage format that makes it deliberately awkward to migrate to other tools.
- The classic VC wrapper playbook: borrow credibility from open source, minimize attribution to look self-sufficient to investors, then build lock-in before pivoting to the monetizable cloud product.