hierarchical-bits

BH Applicability Scorecard — research-grounded sweep of 20 domains

Method: one web-grounded research agent per domain gathered real data volumes, the tools used today, and whether interpretations conflict; each scored 0-3 on five criteria. The composite below is computed from those scores with declared weights (transparent, reproducible in scorecard.py). Scores are research-informed estimates, not market data.

map

Read it in two axes: the composite says how BH-shaped a domain is (substrate + many layers + selective read). The verdict says whether there is novel work to do — because being BH-shaped is not enough if the store-once + selective-read pattern is already mature SOTA.

#	domain	composite	verdict	scale	why
1	Knowledge graphs (multi-ontology)	100	ANCHOR	16B triples	named graphs + RDF-star already SOTA
2	Medical imaging (multi-reader)	95	ANCHOR	WSI 1-100 GB	DICOM SEG references source — mature
3	Earth obs / satellite	95	ANCHOR	10+ PB	COG + STAC already do it
4	Legal eDiscovery	95	ANCHOR	TB / matter	Relativity store-once + coding fields
5	Video archives / MAM	95	ANCHOR	PB masters	MAM timecoded tracks — mature
6	Dataset versioning / data lakes	95	ANCHOR	PB	lakeFS/DVC zero-copy IS the win
7	Autonomous driving datasets	95	DELEGATE	Waymo 2 TB	sensor signal -> codecs; sidecars mature
8	Data labeling / annotation ★	93	ANCHOR	COCO 20 GB	CVAT/Label Studio store-once — mature
9	Genomics (variant calling)	93	ANCHOR	ref 3 Gbp	CRAM + tabix selective — mature
10	Geospatial map tiles	90	ANCHOR	PB	COG/PMTiles/vector tiles — mature
11	CAD / BIM	90	BUILD	tens of GB	federation duplicates; rival overlays UNSERVED
12	Model checkpoints / MoE ★	85	ANCHOR	base GB-TB	S-LoRA / Punica already SOTA
13	Agent memory ★	75	ANCHOR	KB-MB text	substrate LIGHT; Mem0/Zep mature
14	Scientific sim / HPC ensembles	75	DELEGATE	CMIP6 30 PB	per-member dense -> Zarr/zfp
15	Time-series / IoT	68	DELEGATE	1-50 TB/day	Gorilla codec owns the signal
16	RLHF / preference data	62	NO	MB-GB text	substrate = label order; payload IS the point
17	Distributed tracing ★	58	ANCHOR	TB/day	many distinct traces; Tempo mature
18	3D scenes / glTF (LOD)	45	DELEGATE	MB-TB	geometry -> Draco/Nanite; LOD = dup copies
19	Vector DBs / embeddings *	30	DELEGATE	TB	control: vectors ARE payload; PQ/HNSW own it
20	Image / audio codecs *	25	DELEGATE	exabytes	control: signal IS payload; codecs irreducible

★ = prototype already built in this repo (bhmem / bhtrace / bhckpt / bhanno).

The honest finding

Most domains score HIGH (90-100) — and most are ANCHOR. The BH shape is everywhere, but store-the-substrate-once + selective-read is already mature production SOTA almost everywhere it matters: DICOM (medical), COG+STAC (satellite/GIS), lakeFS/DVC (data lakes), CRAM+tabix (genomics), MAM (video), Relativity (legal), named graphs (KGs), S-LoRA (checkpoints). BH earns credibility by analogy there, not a novel build.
Among the 20 domains surveyed, CAD/BIM (90) was the only one classified BUILD. The research found no mainstream tool that stores ONE canonical building substrate once with many additive AND rival discipline/version overlays co-registered as first-class layers with selective branch/region reads — today federation duplicates and treats clashes as ad-hoc. That union is the gap. (A claim about this sample, not about every possible domain.)
The recurring under-served slice is the rival layer, not the shared one. Across annotation, MAM, eDiscovery, genomics and KGs, existing tools store the substrate once but treat conflicting interpretations as noise to adjudicate into one ground truth — not as first-class, queryable, co-registered rival layers. That is precisely what bhanno models. So the sweep suggests BH’s principal still-under-explored contribution is treating rival interpretations as first-class entities — narrower and sharper than ‘a universal format’. Stated as the differential observed so far, not a final reduction of what BH can be.
Our four built prototypes all land ANCHOR. They proved the generalization (one envelope, many domains) honestly — but the sweep shows their economic win is largely already-solved. That is the method working: it refuses to let us overclaim.
Controls behaved. Image/audio codecs (25) and vector DBs (30) sit at the bottom as DELEGATE — the score discriminates dense-signal from structure-dominant.

Recommendation

Two moves, in order. First, formalize the principle (see BH_PRINCIPLE.md): the sweep shifts BH from a file format to a representation model — multiple concurrent, possibly contradictory interpretations sharing one immutable substrate and remaining queryable without forced adjudication. That definition is what separates BH from the ANCHOR systems; it is the real contribution this sweep surfaced. Then, if a measured test is wanted, the natural candidate in this sample is CAD/BIM — the only domain here classified BUILD — a .bh where the building object-graph is the substrate, with rival discipline/version overlays + clash annotations co-registered, selective branch reads, and the deferred-adjudication face.

This site is open source. Improve this page.