AgenticVision
API Reference
AgenticVision exposes its capabilities through MCP tools, resources, and prompts. Run agentic-vision-mcp info to verify tool discovery.
AgenticVision exposes its capabilities through MCP tools, resources, and prompts. Run agentic-vision-mcp info to verify tool discovery.
MCP Tools
vision_capture
Capture an image and store it in visual memory.
| Parameter | Type | Required | Default | Description |
|---|---|---|---|---|
| source | object | yes | — | Image source: type is file, base64, screenshot, or clipboard. Include path for file, data+mime for base64, optional region for screenshot. |
| description | string | no | — | Human-readable label for the capture. |
| labels | string[] | no | [] | Tags for filtering and organization. |
| extract_ocr | boolean | no | false | Run OCR on the captured image. |
Returns: { capture_id, timestamp, dimensions, quality_score }
vision_query
Search captures by filters.
| Parameter | Type | Required | Default | Description |
|---|---|---|---|---|
| description_contains | string | no | — | Substring match on capture descriptions. |
| labels | string[] | no | — | Filter by label tags. |
| min_quality | number | no | — | Minimum quality score (0.0-1.0). |
| sort_by | string | no | "recent" | Sort order: recent or quality. |
| max_results | number | no | 20 | Maximum captures to return. |
| before | number | no | — | Unix timestamp upper bound. |
| after | number | no | — | Unix timestamp lower bound. |
| session_ids | number[] | no | — | Filter by session. |
Returns: array of capture metadata objects.
vision_similar
Find visually similar captures using CLIP embedding distance.
| Parameter | Type | Required | Default | Description |
|---|---|---|---|---|
| capture_id | number | no | — | Find captures similar to this one. |
| embedding | number[] | no | — | Or provide a raw embedding vector. |
| top_k | number | no | 10 | Number of results. |
| min_similarity | number | no | 0.5 | Minimum cosine similarity threshold. |
| event_types | string[] | no | — | Filter by event type. |
Returns: array of { capture_id, similarity_score, metadata }.
vision_compare
Compare two captures for visual similarity.
| Parameter | Type | Required | Default | Description |
|---|---|---|---|---|
| id_a | number | yes | — | First capture ID. |
| id_b | number | yes | — | Second capture ID. |
| detailed | boolean | no | false | Include detailed diff data. |
Returns: { similarity_score, dimensions_match, summary }.
vision_diff
Pixel-level diff between two captures.
| Parameter | Type | Required | Default | Description |
|---|---|---|---|---|
| id_a | number | yes | — | First capture ID. |
| id_b | number | yes | — | Second capture ID. |
Returns: { changed_pixel_count, total_pixels, change_percentage, bounding_boxes }.
vision_ocr
Extract text from a capture using OCR.
| Parameter | Type | Required | Default | Description |
|---|---|---|---|---|
| capture_id | number | yes | — | Capture to extract text from. |
| language | string | no | "eng" | OCR language code. |
Returns: { text, confidence, regions }.
vision_track
Configure tracking for a UI region. Captures must be triggered externally.
| Parameter | Type | Required | Default | Description |
|---|---|---|---|---|
| region | object | yes | — | { x, y, w, h } in pixels. |
| interval_ms | number | no | 1000 | Minimum interval between captures. |
| max_captures | number | no | 100 | Stop after this many captures. |
| on_change_threshold | number | no | 0.95 | Similarity threshold; below this counts as a change. |
Returns: { track_id, region, status }.
vision_link
Link a visual capture to an AgenticMemory node.
| Parameter | Type | Required | Default | Description |
|---|---|---|---|---|
| capture_id | number | yes | — | Capture to link. |
| memory_node_id | number | yes | — | Target memory node ID. |
| relationship | string | no | "observed_during" | One of: observed_during, evidence_for, screenshot_of. |
Returns: { link_id, capture_id, memory_node_id, relationship }.
vision_health
Evaluate visual memory reliability.
| Parameter | Type | Required | Default | Description |
|---|---|---|---|---|
| low_quality_threshold | number | no | 0.45 | Below this is flagged as low quality. |
| stale_after_hours | number | no | 168 | Hours before a capture is considered stale. |
| max_examples | number | no | 20 | Max examples per category. |
Returns: { total_captures, low_quality, stale, unlinked, unlabeled }.
session_start
Start a new vision session.
| Parameter | Type | Required | Default | Description |
|---|---|---|---|---|
| session_id | number | no | auto | Explicit session ID. |
Returns: { session_id }.
session_end
End the current vision session and persist.
Returns: { session_id, capture_count }.
Quality score
Every capture receives a quality score (0.0-1.0) computed from:
- Resolution: higher resolution scores higher.
- Embedding confidence: CLIP embedding norm.
- Metadata completeness: presence of description and labels.
- OCR yield: text extraction success rate (if OCR was requested).
CLI subcommands
agentic-vision-mcp serve # Start MCP server
agentic-vision-mcp validate # Validate artifact path
agentic-vision-mcp info # List available tools and capabilities
agentic-vision-mcp completions # Shell completions
agentic-vision-mcp repl # Interactive REPL