DEMO MODE
LARQL — Lazarus Query Language

The model
IS the
database.

Decompile transformer weights into a queryable graph.
No GPU. No fine-tuning. No retraining.

Rust + Python
Apache-2.0
475 ★ on GitHub
No GPU required
safetensors / GGUF / MLX
Try the REPL ↓ View on GitHub
~10KB per fact patch
800× smaller than full model
0.008ms gate KNN per layer
9 model families
20+ LQL statement types
LQL Interactive Shell
DEMO
larql>
↵ RUN
USE vindex
DESCRIBE France
WALK
INFER
INSERT edge
SHOW RELATIONS
STATS
HELP
CLEAR
LQL operations
DESCRIBE
WALK
INFER
INSERT
TRACE
PATCH
DIFF
COMPILE
crate architecture
larql-modelsweight loading, quant/dequant
larql-vindexextract, query, mutate, patch
larql-inferenceforward pass, BLAS, Metal GPU
larql-lqlparser, executor, REPL
larql-serverHTTP/gRPC — used in Option B
supported models
Gemma 2/3
Llama 2/3
Mistral 7B
Mixtral MoE
Qwen 2/2.5
Phi 2/3
DeepSeek V2/V3
GPT-2
connect local instance
Not connected — running in demo mode
Run locally:
cargo run -p larql-server -- model.vindex --port 8080
Or via Cloudflare Tunnel for remote access.
Once connected, all REPL queries route to your local instance.
vindex anatomy
🔮
gate_vectors.bin
W_gate rows as a KNN index. The "what does this model know" lookup — 0.008ms per layer query.
~3.3 GB (Gemma 4B)
📖
embeddings.bin
W_embed matrix as a token lookup. Maps tokens ↔ vectors. Powers DESCRIBE and INFER token output.
~2.5 GB (Gemma 4B)
🔗
down_meta.bin
Per-feature output metadata. Down projection becomes edge labels — this is what makes DESCRIBE work.
binary, compact
📋
index.json
Config, layer band mappings, provenance. The manifest that tells LARQL how to interpret everything else.
~few KB
patch size vs full model
1 fact patch~10 KB
1,000 fact patch~10 MB
LoRA adapter (typical)~200 MB
Full model (Gemma 4B)~8 GB
extraction levels
LEVEL SIZE ENABLES
BROWSE ~3 GB DESCRIBE, WALK, SELECT
INFERENCE ~6 GB + INFER
ALL ~10 GB + COMPILE, TRACE
residual stream trace
Answer trajectory — layer by layer
TRACE "The capital of France is" FOR "Paris";
LAYER RANK PROB ATTN FFN WHO
how it works — components

Query parser (larql-lql)

A SQL-like grammar (USE, SELECT, TRACE) tokenized and parsed into an executable AST. The theory: treating a transformer as a queryable store means the surface language must map cleanly onto tensor operations — so the parser stays declarative and side-effect free.

Vindex (model-as-database index)

Weights are indexed as rows/columns you can address directly. Instead of training a separate DB, the model's parameter matrices are the table — embeddings and projections become addressable vectors, so a query resolves to a slice of the network.

Residual-stream trace (logit lens)

The trace table projects each layer's hidden state to vocabulary logits, revealing where a prediction crystallizes. Theory: the residual stream is a running sum of layer contributions, so reading it layer-by-layer exposes the "phase transition" where rank collapses to 1.

REPL + demo/live modes

The browser REPL runs pre-baked responses (Option A) or proxies to a local larql-server (Option B). This site is static, so live inference needs your own endpoint — the same isolation model used for other server-backed demos here.