Perceptron — Vision SDK

Docs: https://docs.perceptron.inc/

Image and video analysis via the Perceptron Python SDK. Pass file paths or URLs directly — the SDK handles base64 conversion automatically.

Setup

CODEBLOCK0

Quick Reference

Task	Function	Example
Describe / Q&A	INLINECODE0	INLINECODE1
Grounded Q&A

Python SDK

CODEBLOCK1

DSL Composition (Advanced)

Build custom multimodal workflows:

CODEBLOCK2

Structured Outputs

Constrain responses to Pydantic models, JSON schemas, or regex:

CODEBLOCK3

Pixel Coordinate Conversion

All spatial outputs use normalized coordinates (0–1000). Convert to pixels:

CODEBLOCK4

CLI Script

Located at: INLINECODE15

Requires PERCEPTRON_API_KEY environment variable. The provider is always perceptron.

CODEBLOCK5

Models

Model	Best for	Speed	Temp
INLINECODE18 (default)	General use, detection, OCR	Fast	0.0
INLINECODE19

Quick/simple tasks | Fastest | 0.0 |

Override with model="..." in any SDK call or --model ... in CLI.

Grounding (expects parameter)

Value	Returns	Use case
INLINECODE22 (default)	Plain text	Q&A, descriptions, OCR
INLINECODE23

Video Analysis

Extract frames with ffmpeg, then analyze:

CODEBLOCK6

For continuous monitoring, extract multiple frames and batch process.

Reference Files

For deeper SDK usage, consult these when needed:

- references/capabilities.md — Focus mode, reasoning, streaming, ICL, structured outputs, annotation types
references/prompting.md — Optimal prompts per task, vision hints (<hint>BOX</hint>), temperature guide
references/api.md — SDK configuration, models, image formats, streaming, best practices

Perceptron — Vision SDK

文档：https://docs.perceptron.inc/

通过 Perceptron Python SDK 进行图像和视频分析。直接传入文件路径或URL——SDK会自动处理base64转换。

设置

bash
pip install perceptron
export PERCEPTRONAPIKEY=ak_...

快速参考

任务	函数	示例
描述 / 问答	question()	question(photo.jpg, 这张图片里有什么？)
定位问答

Python SDK

python
from perceptron import configure, detect, caption, ocr, ocr_markdown, question

配置（或设置PERCEPTRONAPIKEY环境变量）

configure(provider=perceptron, apikey=ak...)

视觉问答——最常用的操作

result = question(photo.jpg, 这张图片里发生了什么？) print(result.text)

定位问答——获取带答案的边界框

result = question(photo.jpg, 损坏在哪里？, expects=box) for box in result.points or []: print(f{box.mention}: ({box.topleft.x},{box.topleft.y}) → ({box.bottomright.x},{box.bottomright.y}))

目标检测

result = detect(warehouse.jpg, classes=[forklift, person]) for box in result.points or []: print(f{box.mention}: ({box.topleft.x},{box.topleft.y}) → ({box.bottomright.x},{box.bottomright.y}))

OCR

result = ocr(receipt.jpg, prompt=提取总金额) print(result.text)

result = ocr_markdown(document.png) # 结构化Markdown输出
print(result.text)

图像描述

result = caption(scene.png, style=detailed) print(result.text)

DSL组合（高级）

构建自定义多模态工作流：

python
from perceptron import perceive, image, text, system

@perceive(expects=box, model=isaac-0.2-2b-preview)
def findhazards(imgpath):
return [system(BOX), image(img_path), text(找出所有安全隐患)]

result = find_hazards(factory.jpg)

结构化输出

将响应约束为Pydantic模型、JSON模式或正则表达式：

python
from perceptron import perceive, image, text, pydantic_format
from pydantic import BaseModel

class Scene(BaseModel):
objects: list[str]
count: int

@perceive(responseformat=pydanticformat(Scene))
def count_objects(path):
return image(path) + text(列出所有目标并计数。返回JSON格式。)

result = count_objects(photo.jpg)
scene = Scene.modelvalidatejson(result.text)

像素坐标转换

所有空间输出使用归一化坐标（0–1000）。转换为像素：

python
pixelboxes = result.pointsto_pixels(width=1920, height=1080)

或独立使用：

from perceptron import scalepointsto_pixels pixelpts = scalepointstopixels(result.points, width=1920, height=1080)

CLI脚本

位置：/scripts/perceptron_cli.py

需要PERCEPTRONAPIKEY环境变量。提供商始终为perceptron。

bash
P=/scripts/perceptron_cli.py

视觉问答

python3 $P question photo.jpg 你看到了什么？ python3 $P question photo.jpg 车在哪里？ --expects box

目标检测

python3 $P detect photo.jpg --classes person,car python3 $P detect photo.jpg --classes forklift --format json --pixels python3 $P detect ./frames/ --classes defect # 批量处理目录

OCR

python3 $P ocr document.png python3 $P ocr receipt.jpg --output markdown

图像描述

python3 $P caption scene.png --style detailed

自定义感知

python3 $P perceive frame.png --prompt 描述这个场景 --expects box

批量处理

python3 $P batch --images img1.jpg img2.jpg --prompt 描述 --output results.json

解析原始模型输出

python3 $P parse --mode points

列出模型

python3 $P models

模型

模型	最佳用途	速度	温度
isaac-0.2-2b-preview（默认）	通用、检测、OCR	快	0.0
isaac-0.2-1b

快速/简单任务 | 最快 | 0.0 |

在任何SDK调用中使用model=...或在CLI中使用--model ...覆盖。

定位（expects参数）

值	返回	使用场景
text（默认）	纯文本	问答、描述、OCR
box

视频分析

使用ffmpeg提取帧，然后分析：

bash

在5秒处提取单帧

ffmpeg -ss 5 -i video.mp4 -frames:v 1 -q:v 2 /tmp/frame.jpg

然后分析

python3 $P question /tmp/frame.jpg 发生了什么？

如需持续监控，提取多帧并进行批量处理。

参考文件

如需深入了解SDK用法，请查阅以下文件：

- references/capabilities.md — 聚焦模式、推理、流式传输、上下文学习、结构化输出、标注类型
references/prompting.md — 各任务的最佳提示、视觉提示（BOX）、温度指南
references/api.md — SDK配置、模型、图像格式、流式传输、最佳实践

perceptron感知器

perceptron

Perceptron — Vision SDK

Setup

Quick Reference

Python SDK

DSL Composition (Advanced)

Structured Outputs

Pixel Coordinate Conversion

CLI Script

Models

Grounding (expects parameter)

Video Analysis

Reference Files

Perceptron — Vision SDK

设置

快速参考

Python SDK

配置（或设置PERCEPTRONAPIKEY环境变量）

视觉问答——最常用的操作

定位问答——获取带答案的边界框

目标检测

OCR

图像描述

DSL组合（高级）

结构化输出

像素坐标转换

或独立使用：

CLI脚本

视觉问答

目标检测

OCR

图像描述

自定义感知

批量处理

解析原始模型输出

列出模型

模型

定位（expects参数）

视频分析

在5秒处提取单帧

然后分析

参考文件

标签

通过对话安装

方式一：安装 SkillHub 和技能

方式二：设置 SkillHub 为优先技能安装源

通过命令行安装

下载

相关推荐

self-improvement

self-improvement

self-improvement

self-improvement