Running Local LLMs Offline on a Ten-Hour Flight

Apr 27, 2026 · ai cloud · Source ↗

TLDR

Engineer ran Gemma 4 31B and Qwen 36B on an M5 Max MacBook with 128GB RAM across a 10-hour transatlantic flight, building a billing analytics tool and processing ~4M tokens.

M5 Max draws ~81W under sustained LLM load; a 60W iPhone cable vs. 94W MacBook cable created a 34W gap the author only caught via a custom powermonitor CLI.
Battery drains ~1% per minute at full GPU load (77W GPU alone); the 60W in-flight adapter cannot keep up without the correct cable.
Gemma 4 31B and Qwen 36B via LM Studio matched frontier model output on tight-scope tasks: refactors, CLI scaffolding, and exploratory analytics tooling.
Context degradation past 100K tokens and infinite-loop failures from opencode or the model were the main reliability limits; mitigation was one problem per session and plans written to markdown.
Local inference forces prompt discipline – smaller context, fewer tool calls – that the author says transfers directly back to cloud usage habits.

Commenters dispute whether Qwen 4.6 36B is a real model name, with multiple threads suggesting the author likely means Qwen3.6-35B-A3B; the naming ambiguity undermines reproducibility.
Skepticism runs strong about agentic local workflows: at least one commenter on an M3 Max 64GB reports consistent loops across pi, claude code, and codex harnesses with both MLX and Ollama, calling local-LLM hype overstated for anything beyond simple tasks.
A practical hardware risk flagged: some in-flight sockets cut power entirely if draw exceeds the advertised limit rather than capping at it, meaning the return-flight 94W cable test could trigger a cutoff rather than a boost.

@deanc: On M3 Max 64GB, every harness tested ends in loops; questions whether useful local agentic work is real or mostly hype.
@cube00: Warns that some plane sockets cut out completely above their rated limit rather than throttling – the 94W cable draw may kill power, not gain headroom.