zog
zog is a high-performance command-line utility for querying massive JSONL datasets. Designed as a “Blind Scanner,” it skips full JSON tree construction to extract data at the physical speed limit of the underlying hardware.
Engineering Highlights
- SIMD Fast Path: Leverages AVX2 and NEON instructions to scan delimiters and keys, skipping dozens of CPU branches per cycle. Zig’s first-class SIMD support automatically targets platform-specific vector units.
- Zero-Allocation Model: Operates entirely on input buffers with zero heap allocations during the hot path, maximizing cache locality.
- Async I/O: Implemented double-buffered asynchronous file I/O to saturate NVMe SSD bandwidth.
- Pre-Compiled Query Plans: Compiles user queries into an optimized execution plan at startup to eliminate runtime interpretation overhead.
Performance Benchmarks
In head-to-head comparisons on a 1 GB JSONL dataset (6.7 million lines), zog demonstrates significant performance leads over industry-standard tools:
- Key Matching: 4.03 GB/s, outperforming
ripgrep(2.30 GB/s) andjq(0.07 GB/s). - Complex Logic: 2.24 GB/s, roughly 4.1x faster than
ripgrep. - Aggregations: Runs at ~2 GB/s, providing a 40x speedup over
jqfor large-scale data analysis.