LAN cluster architecture write-up.
Detailed write-up coming in a later update.
LAN cluster architecture write-up.
Detailed write-up coming in a later update.
A single scheduler tracks node health and routes inference requests — the theory is to keep one source of truth for "who is free" so workers stay stateless and replaceable.
Each LAN machine runs a model shard or full model and reports capacity; horizontal scale comes from adding nodes, not bigger hardware.
Low-latency bidirectional streams carry tokens as they generate, so the dashboard shows partial output without polling.
The static frontend (hosted here) visualizes cluster state; live data requires the LAN coordinator, so the page degrades to an offline view off-network.