Agentra LabsAgentra Labs DocsPublic Documentation

AgenticVision

Benchmarks

Measured on Apple Silicon (M2 Pro, 16 GB) with release builds. All times are wall-clock averages over 100 iterations unless noted.

Measured on Apple Silicon (M2 Pro, 16 GB) with release builds. All times are wall-clock averages over 100 iterations unless noted.

Test environment

ComponentValue
HardwareApple M2 Pro, 16 GB RAM
OSmacOS 14
Rust1.76 (release build, LTO)
Embedding modelCLIP ViT-B/32 (512-dim)
Artifact format.avis (64-byte header + JSON payload)
Image libraryimage crate with Lanczos3 resampling
ONNX runtimeort crate (single intra-thread)

Core operations

OperationTypical TimeNotes
Image capture (file to embed to store)~47 ms1024x768 PNG
Image capture (screenshot)~62 msIncludes screen grab
Image capture (4K image)~95 ms3840x2160
Similarity search (top-5, 100 captures)~1-2 msCosine distance on embeddings
Similarity search (top-5, 1000 captures)~8 msLinear scan
Visual diff<1 msPixel comparison, same dimensions
Visual compare~2 msEmbedding cosine + optional diff
Quality score computation<1 msResolution + metadata + embedding norm
OCR extraction~120 msDepends on text density
MCP tool round-trip~7 msstdio transport overhead

Performance tiers

Operations fall into three latency tiers:

TierLatencyOperations
Sub-millisecond<1 msQuality score, visual diff (same dimensions), single embedding cosine
Low millisecond1-10 msSimilarity search (up to 1K captures), visual compare, MCP tool overhead
Capture-bound40-120 msImage capture with embedding, OCR extraction, screenshot grab

The bottleneck for capture operations is CLIP inference (224x224 resize + ONNX forward pass). Without a model loaded (fallback mode), capture drops to ~5 ms since only thumbnail generation runs.

Artifact size

CapturesApproximate .avis size
100~2 MB
1,000~18 MB
10,000~170 MB

Sizes vary with image thumbnails and metadata. Thumbnails are JPEG-encoded at 85% quality with a maximum dimension of 512 pixels, which is the primary size driver. Embedding vectors (512 x 4 bytes = 2 KB each) are a smaller contributor.

Scaling analysis

CapturesSimilarity search (top-5)File open timeMemory (RSS)
100~1 ms~5 ms~12 MB
1,000~8 ms~40 ms~45 MB
10,000~80 ms~350 ms~380 MB

Similarity search scales linearly with capture count because it performs a full scan over embeddings. File open time scales linearly with payload size (JSON deserialization). For stores above 10,000 captures, consider using workspaces to partition captures across multiple .avis files.

Comparison with alternatives

AspectAgenticVision (.avis)SQLite + blobsFilesystem + JSON sidecar
Embedding storageNative (per-observation)Manual schemaSeparate files
Similarity searchBuilt-in cosine scanManual queryExternal library
Visual diffBuilt-in pixel comparisonNot availableExternal tool
PortabilitySingle fileSingle fileDirectory tree
Concurrent accessFile locking + PID sessionsWAL modeNo safety
MCP integrationNative stdio transportManual adapterManual adapter

Reproducing benchmarks

To reproduce these numbers on your own hardware:

# Clone and build
git clone https://github.com/agentralabs/agentic-vision
cd agentic-vision

# Run the benchmark suite
cargo bench --package agentic-vision

# Run the stress tests for MCP tool latency
cargo test --package agentic-vision-mcp --test phase0_stress -- --nocapture
cargo test --package agentic-vision-mcp --test phase1_v2_stress -- --nocapture

# Quick validation
cargo test --workspace
cargo build --release
agentic-vision-mcp info

Benchmark results depend on available hardware, background load, and whether a CLIP ONNX model is installed. Numbers above reflect a quiet system with the model loaded.

Notes

These numbers are directional and depend on hardware, image size, and embedding model. Real-world performance may differ based on:

  • Disk I/O speed for artifact writes
  • Image resolution and format (PNG vs JPEG)
  • Number of existing captures (affects search time)
  • OCR text density
  • Whether the CLIP model is loaded or running in fallback mode
  • Background system load during screenshot capture