Engineer ran Gemma 4 31B and Qwen 36B on an M5 Max MacBook during a 10-hour offline flight, built a DuckDB billing tool, and documented real power and context limits.
Key Takeaways
M5 Max at 70-80W sustained burns roughly 1% battery per minute; an iPhone cable instead of the MacBook cable caused 34W of avoidable throttling against a 70W BA seat cap.
Gemma 4 31B and Qwen 36B via LM Studio matched frontier models for tight-scope work: refactors, CLI scaffolding, and docs across roughly 4M tokens processed.
Context throughput and latency degrade past 100k tokens; agentic loops required manual interrupts, with fault unclear between the opencode orchestration layer and the model itself.
DuckDB-backed billing analytics tool for loveholidays cloud spend surfaced cross-service cost correlations that standard dashboards missed, built entirely offline.
Local inference enforces discipline around prompt size, tool-call overhead, and context compaction – habits the author argues transfer directly to cheaper cloud usage.
Hacker News Comment Review
A commenter running qwen3 and gemma4 across pi, claude code, and codex harnesses on a 64GB M3 Max reports identical loop failures and questions whether the local LLM productivity narrative holds up at all.
The model versioning in the post is loose: “Qwen 4.6 36B” is likely Qwen3.6-35B-A3B, which matters for anyone trying to reproduce benchmarks or compare results.
Physical economy seating was called out as the binding constraint for most people before power or connectivity – a 14” laptop in a window seat is the real bottleneck.