Skip to content

znation/arbvis

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

237 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

arbvis

Visualize arbitrary binary files in a way that makes structure visible at a glance. arbvis lays bytes out along a Hilbert curve and colors them by value range. Null regions, ASCII text, compressed payloads, and section boundaries all produce recognizable visual signatures. The default 2D mode renders a zoomable image (one pixel per byte); the 3D mode lifts the same idea into a volume you can fly through, using opacity to reveal the cube's interior.

For ML model weights, use modelweightvis, built on top of arbvis. arbvis renders .safetensors / .gguf / .bin checkpoints as raw bytes; modelweightvis adds tensor-format parsing, an architectural layout that stacks transformer blocks at each tensor's natural element shape, MoE expert-vs-expert diffs, finetune auto-detection, and dtype-aware coloring. Architecturally, modelweightvis is a thin crate that registers tensor-aware plugins and hooks against arbvis's registry — see Relationship to modelweightvis below.

Quick start

arbvis /tmp/foo.bin --out ./out
# then serve ./out over HTTP and open index.html in a browser

The output is a Leaflet.js tile pyramid you can zoom across; at maximum zoom, one pixel is one byte. Add --3d for the volume viewer:

arbvis /tmp/foo.bin --3d --out ./out3d

To publish either as a live, shareable visualization, swap --out for --space:

arbvis hf://datasets/owner/dataset --space me/dataset-vis

What you see

Byte-Hilbert layout

1 px = 1 byte along a Hilbert curve over the concatenated input bytes. The curve preserves locality: nearby bytes in the file end up nearby in the image, so contiguous regions (a string table, a compressed payload, an embedded image) appear as coherent blobs rather than scattered noise.

Byte colors

Raw bytes are colored by range (based on Stairwell's approach):

Value Color
0x00 Black
0x010x1F Green (control characters)
0x200x7E Blue (printable ASCII)
0x7F0xFE Red (high bytes)
0xFF White

Diff colors

In --diff mode, each pixel encodes the byte-wise difference between the two inputs. Identical bytes render as black; the larger the delta, the brighter the pixel.

3D mode (--3d)

--3d lays the bytes along a 3D Hilbert curve inside a cube — the natural generalization of the 2D layout — and emits a self-contained Three.js viewer bundle (index.html, volume.bin, points.bin, meta.json). It deploys as a Hugging Face Space exactly like 2D (--space).

arbvis model.safetensors --3d --out ./out3d      # local bundle (serve over HTTP)
arbvis hf://datasets/owner/dataset --3d --space me/vis-3d   # deploy a Space

Where 2D color is fully opaque, 3D uses opacity to encode density so you can see through the cube to its internal structure instead of just an opaque shell. The viewer has two modes (toggle in the panel):

  • Volume (default) — a GPU ray-march of a bounded voxel grid. Color encodes the mean byte value (the same byte-color scheme as 2D); opacity comes from an adjustable, log-style transfer function. Render and download cost depend on the grid resolution, not the input size, so a multi-GB file renders as smoothly as a small one.
    • Opacity sourceActivity (default: mean byte "brightness", so null/padding regions fade to transparent and real data stands out) or Density (how many bytes fall in each voxel).
    • Opacity / Contrast / Threshold / Quality sliders tune the transfer function and ray-march step count.
  • Points — the exact byte positions as a point cloud with additive blending (per-point color + size/opacity). The cloud is a streamed level-of-detail octree (the 3D analog of the 2D tile pyramid): the viewer fetches only the nodes in view at the zoom-appropriate resolution and refines as you zoom in, converging toward one point per byte — so you can drill into individual points without downloading the whole cloud up front. A small wholesale buffer is kept as a file:// fallback.

Controls: drag to rotate · right-drag to pan · scroll to zoom (the camera auto-frames the occupied region on load). Zooming in streams finer detail.

Grid resolution (--grid N) — the voxel cube side, a power of two in 2–512 (default 256). Higher is more detailed but a larger download (≈ N³ · 4 bytes; 128³ ≈ 8 MB, 256³ ≈ 64 MB, 512³ ≈ 512 MB).

Point budget (--point-budget N) — max points stored in the streamed LOD octree (default 8_000_000). Files within it render every byte exactly; larger files are subsampled to fit. The viewer streams only on-screen nodes, so a bigger budget mostly costs disk and build time, not client bandwidth.

Like the 2D viewer, the 3D bundle loads Three.js from a CDN and fetches its data over HTTP — open index.html through a web server, not a file:// URL.

Not yet implemented

The 3D mode is scoped for incremental delivery. Shipped so far: octree level-of-detail streaming for the point cloud (above) — exact drill-down up to the --point-budget, streamed on demand — for both the raw byte cloud and structured layouts (a VoxelRenderer can emit per-element points via point_weight + render_points, so e.g. modelweightvis's arch view refines toward individual weights). Still deferred:

  • Bricked sparse-voxel streaming for the volume mode (GigaVoxels-style page table + brick pool with empty-space skipping), so the ray-marched volume drills past a single bounded grid too.
  • WebGPU compute-shader rendering (software point rasterization) for far higher on-screen point counts.
  • 3D file-boundary overlays and an interactive transfer-function editor with a density histogram.

Supported input formats

  • Plain binary — anything not specifically detected is rendered byte-for-byte.
  • JSON / JSONL — structure-aware in diff mode (see below).

Anything else — .safetensors, .gguf, PyTorch .bin — is rendered as plain bytes here. For tensor-format awareness use modelweightvis.

Comparing two files: --diff

arbvis --diff a.bin b.bin --out ./out
arbvis --diff hf://owner/repo/a.json hf://owner/repo/b.json --out hf://datasets/me/vis/diff

Plain-byte diff aligns the two inputs at offset 0 and computes per-byte deltas. Whole directories work too — each file pairs up by name across the two roots.

JSON / JSONL structure-aware diff

When both --diff inputs have a .json or .jsonl extension, arbvis aligns them by structure (object keys, array elements, value boundaries) before computing byte deltas, so a single-key insertion near the top of a file doesn't smear every following byte across the canvas.

Output destinations

arbvis writes a self-contained web-viewer bundle. There are two ways to get one:

  • --out DIR — write the bundle to a local directory (or an hf:// URL).
  • --space NAMESPACE/REPO — render the bundle and deploy a live Hugging Face Space.

--out and --space work the same in both 2D and --3d.

Local bundle (--out DIR)

arbvis file1.bin file2.bin --out ./out
# serve it locally, e.g.:  python3 -m http.server -d ./out

In 2D this generates a Leaflet pyramid (out/tiles/{z}/{x}/{y}.{ext}, out/index.html, out/labels.json):

  • Full resolution at every zoom level (1 px = 1 byte at max zoom).
  • Vector file boundaries — sharp at every scale, not baked into pixels.
  • No size limit — works on files of any size; lower zoom levels are averaged.
  • HTML labels positioned at each region's area-weighted centroid.

In --3d it generates the volume bundle (index.html, volume.bin, points.bin, meta.json) — see 3D mode. Either bundle loads its rendering library from a CDN and fetches its data over HTTP, so open index.html through a web server, not a file:// URL.

arbvis screenshot

Multiple unrelated files (images, parquet, mp3, an SSH key) concatenated and rendered together — each file's content signature is immediately distinguishable.

HF Hub output

--out accepts an hf:// URL and uploads the bundle directly to the Hub:

arbvis dir/ --out hf://datasets/me/vis/dir

Note: --out hf://… uploads the bundle files to the target repo, but the Hub won't render index.html on its own. Use --space for a working URL.

Deploy a viewable Space (--space)

arbvis hf://datasets/owner/dataset --space me/dataset-vis
arbvis hf://datasets/owner/dataset --3d --space me/dataset-vis-3d

Renders the bundle and deploys a Docker Space that serves the viewer. The bundle data lives in an auto-created sibling bucket repo (me/dataset-vis_bucket); the Space itself is stateless and just proxies it.

Tile format (--tile-format, 2D only)

avif (default) — ~30–50% smaller over the wire and supported in every modern browser. Leaf tiles are encoded near-lossless (each pixel is one source byte); pyramid tiles are lossy at quality 85.

png — universal fallback for byte-for-byte regression checks or audiences without AVIF support.

Working with the Hub

hf:// URLs work as both input and output. Forms accepted:

hf://owner/repo[@rev][/path]                     # model (default), optional revision
hf://models/owner/repo[@rev][/path]              # explicit model
hf://datasets/owner/repo[@rev][/path]
hf://spaces/owner/repo[@rev][/path]
hf://buckets/owner/bucket[/path]                 # no revision concept

Whole-repo URLs (no /path) expand to every file in the repo. Single-file URLs fetch just that file.

Streaming (--stream)

By default, hf:// inputs are downloaded to the local HF cache (via the hf CLI) before rendering, and tile output is staged on local disk before upload. --stream flips both: input bytes are range-fetched per tile, and tiles are pushed to the Hub as they are produced. The disk-backed default is faster and more recoverable; use --stream only when input or output data won't fit on local disk.

Xet xorb visualization (--show-xet-xorbs)

arbvis hf://datasets/owner/dataset --show-xet-xorbs --out ./out

For xet-backed Hub files, colors each region by the xorb (content-addressed chunk) it was reconstructed from: hue encodes xorb ID, intensity encodes the underlying byte. Useful for seeing how a file is partitioned across the CAS.

modelweightvis layers a dtype-aware element coloring on top of the same xorb hue for .safetensors / .gguf inputs; arbvis covers the generic byte path.

Other useful flags

  • --title TEXT — title shown in the viewer info panel (defaults to "arbvis" or "arbvis diff").
  • -l, --file-list FILE — read input paths from FILE, one per line; - reads from stdin.
  • --regen-html DIR — rebuild index.html for an existing bundle directory without re-rendering (2D or, with --3d, the volume bundle). Useful after editing the viewer template.
  • --space OWNER/REPO --out LOCAL_DIR (with no input files) — re-deploy an already-rendered bundle to a Space without re-rendering. Add --3d to re-deploy a volume bundle.
arbvis --regen-html ./out
arbvis --space me/vis --out ./out

Relationship to modelweightvis

arbvis is the byte-only foundation: Hilbert layout, byte coloring, JSON-aware diff, Hub I/O, tile pyramid, Space deploy, xet xorb path, streaming. It has no knowledge of tensors, model formats, or transformer architecture — .safetensors and .gguf get the same byte-Hilbert treatment as any other binary.

modelweightvis is a separate crate that extends arbvis through its generic plugin surface (no fork, no patch) — it's one specialization of arbvis, and a new one (for any structured binary format) plugs in the same way: FormatPlugin impls parse .safetensors / .gguf / pickle headers and stuff ModelInfo into each source's extension map; LayoutPlugin impls add the architectural transformer layout and the MoE summary / CKA panel layouts; SourceProvider impls turn an invocation (--moe, a repo-level or directory --diff) into render sources; a layout-keyed LeafLoader/LeafRenderer pair draws the arch layout; DiffSourceBuilder adds tensor-aware file-pair diffing; PrepareSourcesExtension fetches sidecar config. The modelweightvis binary builds an arbvis::Registry::with_defaults(), calls modelweightvis::register_all(&mut registry, &args), and hands off to arbvis::run. Same renderer, same Hub I/O, same tile pyramid — just with the tensor-aware plugins registered.

Which to use:

  • arbvis — for non-model binaries (any file format), JSON/JSONL diffs, plain-byte diffs, the xet xorb path on arbitrary content. Smaller dependency footprint (no candle-core / regex / zip / half).
  • modelweightvis — for .safetensors / .gguf / .bin model checkpoints, architectural transformer layout, --moe-summary / --moe-cka / --probe, --diff-metric, --finetune / --no-finetune, --layout. Inherits arbvis's full CLI surface (--out, --3d, --space, --stream, --show-xet-xorbs, --regen-html, etc.) — no need to use both binaries.

Building

Requires Rust (stable) and the official Hugging Face hf CLI on $PATH (install via pip install -U huggingface_hub, brew install huggingface-cli, or curl -LsSf https://hf.co/cli/install.sh | bash). arbvis shells out to hf for every Hub download / upload / sync.

cargo build --release
./target/release/arbvis <file> --out ./output

Or install into your PATH:

cargo install --path .

For modelweightvis, see the standalone modelweightvis repo — it depends on arbvis via a pinned git revision and inherits arbvis's full CLI surface.

Credits

Color scheme inspired by Stairwell's binary visualization post. Built on clap (CLI), image + png + rav1e (tile encoding), fast_hilbert (2D curve mapping; the 3D curve is a hand-rolled Skilling transform), the official Hugging Face hf CLI (Hub I/O) + xet-core-structures (per-tile xet decode), Leaflet.js (the 2D viewer), and Three.js (the 3D viewer).

About

Visualization of arbitrary data.

Resources

Stars

Watchers

Forks

Packages

 
 
 

Contributors

Languages