Agent Learner
An AI toolkit for configuring, benchmarking, comparing, and optimizing agent prompts and evaluation results. Agent Learner provides persistent, file-based logging for each command category with timestamped entries, summary statistics, multi-format export, and full-text search across all records.
Commands
| Command | Description |
|---|
| INLINECODE0 | Configure agent settings — log configuration entries or view recent ones |
| INLINECODE1 |
Benchmark agent performance — log benchmark results or view history |
|
compare | Compare agent outputs — log comparison data or view recent comparisons |
|
prompt | Prompt management — log prompt variations or view recent prompts |
|
evaluate | Evaluate agent outputs — log evaluation results or view history |
|
fine-tune | Fine-tune parameters — log fine-tuning sessions or view recent ones |
|
analyze | Analyze agent behavior — log analysis entries or view recent analyses |
|
cost | Cost tracking — log cost data or view recent cost entries |
|
usage | Usage monitoring — log usage metrics or view recent usage data |
|
optimize | Optimize configurations — log optimization runs or view history |
|
test | Test agent behavior — log test results or view recent tests |
|
report | Report generation — log report entries or view recent reports |
|
stats | Show summary statistics across all log categories (entry counts, data size, first entry date) |
|
export <fmt> | Export all data in json, csv, or txt format to the data directory |
|
search <term> | Full-text search across all log files (case-insensitive) |
|
recent | Show the 20 most recent entries from the activity history log |
|
status | Health check — show version, data directory, total entries, disk usage, and last activity |
|
help | Show the full help message with all available commands |
|
version | Print the current version string |
Each data command (configure, benchmark, compare, etc.) works in two modes:
- - Without arguments: displays the 20 most recent entries from that category
- With arguments: saves the input as a new timestamped entry and reports the total count
Data Storage
All data is stored in plain text files under the data directory:
- - Category logs:
$DATA_DIR/<command>.log — one file per command (e.g., configure.log, benchmark.log, prompt.log), each entry is INLINECODE23 - History log:
$DATA_DIR/history.log — audit trail of every command executed with timestamps - Export files:
$DATA_DIR/export.<fmt> — generated by the export command in json, csv, or txt format
Default data directory: INLINECODE27
Requirements
- - Bash (with
set -euo pipefail support) - Standard Unix utilities:
grep, cat, date, echo, wc, du, head, tail, INLINECODE37 - No external dependencies or API keys required
When to Use
- 1. Benchmarking agent performance — When you need to track and compare benchmark results across different agent configurations, models, or prompt strategies
- Prompt engineering iteration — When you're testing multiple prompt variations and want to log each version with results for later comparison
- Cost and usage tracking — When you need to monitor API costs and usage metrics over time to optimize spending
- Fine-tuning experiments — When running fine-tuning sessions and you want to log parameters, results, and observations for reproducibility
- Cross-category analysis — When you need to search across all logged data (benchmarks, prompts, evaluations, costs) to find patterns or specific entries
Examples
CODEBLOCK0
Output
All commands return output to stdout. Export files are written to the data directory:
CODEBLOCK1
Every command execution is logged to $DATA_DIR/history.log for auditing purposes.
Powered by BytesAgain | bytesagain.com | hello@bytesagain.com
智能体学习器
一个用于配置、基准测试、比较和优化智能体提示词及评估结果的AI工具包。智能体学习器为每个命令类别提供基于文件的持久化日志记录,包含时间戳条目、汇总统计、多格式导出以及跨所有记录的全文本搜索功能。
命令
| 命令 | 描述 |
|---|
| configure | 配置智能体设置 — 记录配置条目或查看最近的配置 |
| benchmark |
智能体性能基准测试 — 记录基准测试结果或查看历史记录 |
| compare | 比较智能体输出 — 记录比较数据或查看最近的比较结果 |
| prompt | 提示词管理 — 记录提示词变体或查看最近的提示词 |
| evaluate | 评估智能体输出 — 记录评估结果或查看历史记录 |
| fine-tune | 微调参数 — 记录微调会话或查看最近的微调记录 |
| analyze | 分析智能体行为 — 记录分析条目或查看最近的分析结果 |
| cost | 成本追踪 — 记录成本数据或查看最近的成本条目 |
| usage | 使用监控 — 记录使用指标或查看最近的使用数据 |
| optimize | 优化配置 — 记录优化运行或查看历史记录 |
| test | 测试智能体行为 — 记录测试结果或查看最近的测试 |
| report | 报告生成 — 记录报告条目或查看最近的报告 |
| stats | 显示所有日志类别的汇总统计(条目数量、数据大小、首条条目日期) |
| export
| 以json、csv或txt格式将所有数据导出到数据目录 |
| search | 跨所有日志文件进行全文本搜索(不区分大小写) |
| recent | 显示活动历史日志中最近的20条条目 |
| status | 健康检查 — 显示版本、数据目录、总条目数、磁盘使用情况和最近活动 |
| help | 显示包含所有可用命令的完整帮助信息 |
| version | 打印当前版本号 |
每个数据命令(configure、benchmark、compare等)有两种工作模式:
- - 无参数:显示该类别最近的20条条目
- 带参数:将输入保存为新的带时间戳条目,并报告总条目数
数据存储
所有数据以纯文本文件形式存储在数据目录下:
- - 类别日志:$DATADIR/.log — 每个命令一个文件(例如configure.log、benchmark.log、prompt.log),每条条目格式为timestamp|value
- 历史日志:$DATADIR/history.log — 每个执行命令的审计追踪记录,包含时间戳
- 导出文件:$DATA_DIR/export. — 由export命令以json、csv或txt格式生成
默认数据目录:~/.local/share/agent-learner/
系统要求
- - Bash(支持set -euo pipefail)
- 标准Unix工具:grep、cat、date、echo、wc、du、head、tail、basename
- 无需外部依赖或API密钥
使用场景
- 1. 智能体性能基准测试 — 当您需要追踪和比较不同智能体配置、模型或提示词策略下的基准测试结果时
- 提示词工程迭代 — 当您测试多个提示词变体并希望记录每个版本及其结果以便后续比较时
- 成本和使用追踪 — 当您需要监控API成本和使用指标以优化支出时
- 微调实验 — 当运行微调会话并希望记录参数、结果和观察结果以确保可复现性时
- 跨类别分析 — 当您需要搜索所有记录数据(基准测试、提示词、评估、成本)以发现模式或特定条目时
示例
bash
初始化并检查状态
agent-learner status
记录基准测试结果
agent-learner benchmark GPT-4o在MMLU上:88.7%准确率,平均延迟1.2秒
记录提示词变体
agent-learner prompt 系统:你是一个有用的编程助手。始终逐步解释你的推理过程。
比较两个配置
agent-learner compare GPT-4o vs Claude-3.5:GPT-4o快12%,Claude在代码任务上准确率高5%
追踪成本
agent-learner cost 三月份批次:输入12,450个token,输出3,200个token,总计$0.47
查看所有最近的基准测试
agent-learner benchmark
跨所有日志搜索特定术语
agent-learner search 准确率
将所有数据导出为JSON
agent-learner export json
查看汇总统计
agent-learner stats
显示最近活动
agent-learner recent
输出
所有命令将输出返回到stdout。导出文件写入数据目录:
bash
agent-learner export json # → ~/.local/share/agent-learner/export.json
agent-learner export csv # → ~/.local/share/agent-learner/export.csv
agent-learner export txt # → ~/.local/share/agent-learner/export.txt
每个命令的执行都会被记录到$DATA_DIR/history.log中,用于审计目的。
由BytesAgain提供 | bytesagain.com | hello@bytesagain.com