.bh envelopeMinimal usable prototype of the Hierarchical Bits thesis in a third domain: ML model weights. The same envelope as
bhmemandbhtrace, a different domain.
A checkpoint is a hierarchy: model → layers → tensors, and for MoE
layer → experts → tensors. The heavy residual is the raw weight bytes; the
structure (which tensor, which shape, which layer, which expert) is tiny.
bhckpt writes one envelope where the structure is the index — read
instantly — and the weight blocks are read only on demand:
from bhckpt import Tensor, CheckpointStore, CheckpointReader
store = CheckpointStore({"dim": 256, "moe_layers": [2, 3], "n_experts": 6})
store.add(Tensor("model.embed_tokens.weight", [3000, 256], "fp16", data, codec="fp16"))
store.save("model.bckpt")
ck = CheckpointReader("model.bckpt")
ck.summary() # architecture + tensor list (names/shapes/sizes) — no weights
ck.tensor("lm_head.weight") # one tensor — its block
ck.layer(0) # every tensor of one layer — those blocks
ck.expert(2, 0) # ONE MoE expert of layer 2 (a sub-branch) — those blocks
ck.full() # all weights — the baseline
Each reading returns (result, stats) with the bytes actually read (real
seeks) — gain measured, not claimed.
Realistic demo (a small transformer, 2 dense + 2 MoE layers, 57 tensors, 16.2 MB):
| reading | % of file read | vs flat (reads all) |
|---|---|---|
summary() — architecture, no weights |
0.06% | 1,779× less |
expert(2, 0) — one MoE expert |
4.91% | 20× less |
layer(0) — one dense layer |
8.15% | 12× less |
tensor('embed_tokens') — one tensor |
9.54% | 10× less |
full() |
100% | 1× (baseline) |
MoE routing: loading one expert of layer 2 reads 0.80 MB vs 4.72 MB for the whole layer — ~6× less to activate a single expert. On a real multi-GB MoE, that is the difference between loading one expert and the whole mixture.
Inspect the architecture of a multi-GB checkpoint instantly (read the index, not the weights); load one expert without the rest.
.bckpt formatMAGIC(4)
header_len(4) + header_json {arch{...}, n_tensors, total_bytes}
table_len(4) + table_json [{name, shape, dtype, codec, nbytes, off, size}, ...]
weight_block_0 ... block_n raw tensor bytes (the residual)
Position encodes the hierarchy via the dotted tensor names. The header + table are the structure index; weight blocks are read by seek only for the tensors a query asks for.
Selective per-tensor read already exists — safetensors does
header + offsets + mmap. That is the anchor (credibility, not novelty). The
new piece is the union:
layer(i) and especially expert(i, e)
(load one MoE expert without the mixture) are the natural units for partial
loading and MoE serving.X:/miniconda3/python.exe X:/bitH/bhckpt/demo.py # measured demo → RESULTS_BHCKPT_DEMO.md
X:/miniconda3/python.exe -m pytest X:/bitH/bhckpt/tests/ -q # correctness as a gate (8/8)
A minimal, usable prototype — not a product. It does the full loop (build a
checkpoint → save → read by structure → measure) with tested correctness. The
shape is deliberately identical to bhmem and bhtrace: one envelope, now a
third domain.