LARQL — The Model IS the Database

vindex anatomy

🔮

gate_vectors.bin

W_gate rows as a KNN index. The "what does this model know" lookup — 0.008ms per layer query.

~3.3 GB (Gemma 4B)

📖

embeddings.bin

W_embed matrix as a token lookup. Maps tokens ↔ vectors. Powers DESCRIBE and INFER token output.

~2.5 GB (Gemma 4B)

🔗

down_meta.bin

Per-feature output metadata. Down projection becomes edge labels — this is what makes DESCRIBE work.

binary, compact

📋

index.json

Config, layer band mappings, provenance. The manifest that tells LARQL how to interpret everything else.

~few KB

patch size vs full model

1 fact patch~10 KB

1,000 fact patch~10 MB

LoRA adapter (typical)~200 MB

Full model (Gemma 4B)~8 GB

extraction levels

LEVEL	SIZE	ENABLES
BROWSE	~3 GB	DESCRIBE, WALK, SELECT
INFERENCE	~6 GB	+ INFER
ALL	~10 GB	+ COMPILE, TRACE

residual stream trace

Answer trajectory — layer by layer

TRACE "The capital of France is" FOR "Paris";

LAYER	RANK	PROB	ATTN	FFN	WHO

how it works — components

Query parser (larql-lql)

A SQL-like grammar (USE, SELECT, TRACE) tokenized and parsed into an executable AST. The theory: treating a transformer as a queryable store means the surface language must map cleanly onto tensor operations — so the parser stays declarative and side-effect free.

Vindex (model-as-database index)

Weights are indexed as rows/columns you can address directly. Instead of training a separate DB, the model's parameter matrices are the table — embeddings and projections become addressable vectors, so a query resolves to a slice of the network.

Residual-stream trace (logit lens)

The trace table projects each layer's hidden state to vocabulary logits, revealing where a prediction crystallizes. Theory: the residual stream is a running sum of layer contributions, so reading it layer-by-layer exposes the "phase transition" where rank collapses to 1.

REPL + demo/live modes

The browser REPL runs pre-baked responses (Option A) or proxies to a local larql-server (Option B). This site is static, so live inference needs your own endpoint — the same isolation model used for other server-backed demos here.

The model IS the database.

Query parser (larql-lql)

Vindex (model-as-database index)

Residual-stream trace (logit lens)

REPL + demo/live modes

The model
IS the
database.