Grafana tools for data visualization, monitoring, alerting, security, SRE investigation, and data collection pipeline management via Alloy. Use grafana_query, grafana_query_logs, grafana_query_traces, grafana_create_dashboard, grafana_update_dashboard, grafana_create_alert, grafana_share_dashboard, grafana_annotate, grafana_explore_datasources, grafana_list_metrics, grafana_search, grafana_get_dashboard, grafana_check_alerts, grafana_push_metrics, grafana_explain_metric, grafana_security_check,
You have full native Grafana access — query data, create dashboards, set alerts, receive alert notifications, annotate events, explore datasources, push custom data, and deliver visualizations inline. Works with ANY data in Grafana, not just agent metrics.
grafana_explore_datasources first when you need a datasource UID — never guess UIDsgrafana_search before creating a dashboard — avoid duplicatesgrafana_get_dashboard before grafana_share_dashboard — you need exact panel IDsgrafana_get_dashboard before grafana_update_dashboard — you need panel IDs and current structuregrafana_query for direct answers over creating dashboards — "what's my cost?" needs a number, not a URLgrafana_query over grafana_create_dashboard + grafana_share_dashboard for simple data questions — a number is faster than a chartgrafana_query_logs for log searches — LogQL for logs, PromQL for metrics, TraceQL for traces. Never use grafana_query for Loki datasourcesgrafana_query_traces for trace searches — TraceQL for traces, PromQL for metrics, LogQL for logs. Never use grafana_query or grafana_query_logs for Tempo datasourcesopenclaw_lens_* metricsgrafana_check_alerts — use the suggestedInvestigation field to go directly to querying (it provides the tool, query, and datasource)grafana_check_alerts with action setup once before alert notifications can reach the agent — this creates the webhook contact pointgrafana_explain_metric for "what is this metric?" questions over manual grafana_query — it returns current value, trend, stats, and metadata in one callqueryNames from push response for PromQL queries — don't guess metric names (counters get _total suffix)openclaw_ext_ prefix for custom metrics — grafana_push_metrics auto-prepends it if missinggrafana_query_logs with metric-over-logs queries (count_over_time, rate, topk) before switching to raw log entriesgrafana_check_alerts with action silence to prevent repeat notifications while investigatinglist_rules for complete alert health — grafana_check_alerts with action list_rules returns all rules with live eval state (normal/firing/pending/nodata/error), health, and lastEvaluation — no need to cross-reference with list actiondashboardUid + panelId to re-run panel queries — don't manually extract PromQL/LogQL from get_dashboard output. Both grafana_query and grafana_query_logs accept these params to auto-resolve the panel's query expression and datasource. The tool handles template variable replacement and datasource routing automaticallygrafana_update_dashboard with operation delete and grafana_check_alerts with action delete_rule are permanent and cannot be undonealloy_pipeline action recipes first when unsure which pipeline recipe fits the user's request — because recipes provide validation, credential handling, and sample queries that raw config does notalloy_pipeline action status after creating a pipeline — because data takes 15-20s to flow through the pipeline, and components may fail silently after reloadconfig only when the user explicitly provides Alloy syntaxconfig — when the user provides a connection string, DSN, password, or API key, ALWAYS use the matching recipe (which routes credentials through sys.env(), keeping secrets off disk). If you must use raw config, wrap sensitive values in sys.env("MY_VAR_NAME") and tell the user to set that env var where Alloy runsenvVarsRequired from every pipeline create response — credential recipes may return pending_credentials status when env vars aren't set yet. Tell the user the exact var names and that they must set them where Alloy runs, then verify with action INLINECODE55grafana_list_metrics or grafana_query_logs to discover data, grafana_create_dashboard to visualize, grafana_create_alert to monitoralloy_pipeline action diagnose as first step when user reports pipeline issues — because it checks Alloy connectivity, all pipeline health, config file drift, and limits in one callalloy_pipeline with action delete removes the config and data stops flowingjsonExpressions, labelFields, structuredMetadata, tenantValue, matchRoutes, etc. directly to any log recipe (docker-logs, file-logs, syslog, etc.)samplingPolicies for multi-policy tail sampling — don't create raw config when application-traces can handle it. sampleRate is for simple probabilistic, samplingPolicies is for intelligent multi-policy (keep errors, keep slow, sample rest)tenantValue/tenantSource/matchRoutes work on ALL log recipes. Don't create separate "routing" pipelinesreferences/alloy-components.md before composing raw config — it has copy-pasteable snippets for all common Alloy componentsgrafana_query_traces (search error/slow) → grafana_query_traces (get → follow correlationHint) → grafana_query_logs → grafana_query → INLINECODE86grafana_search → grafana_get_dashboard → INLINECODE89grafana_search (check duplicates) → INLINECODE91grafana_get_dashboard → INLINECODE93grafana_update_dashboard with operation delete (confirm with user first)grafana_check_alerts (setup) → INLINECODE97grafana_check_alerts with action INLINECODE99grafana_check_alerts with action list_rules → delete_rule with INLINECODE103grafana_push_metrics (with optional timestamp for historical data, auto-registers, returns queryNames) → grafana_query with INLINECODE108grafana_search (check existing) → grafana_create_dashboard with llm-command-center → follow suggestedNext chain through remaining templatesgrafana_create_dashboard with genai-observability templategrafana_create_dashboard with session-explorer template → paste session IDgrafana_create_dashboard with llm-command-center template (Loki + Tempo)grafana_create_dashboard with cost-intelligence templategrafana_create_dashboard with tool-performance templategrafana_create_dashboard with sre-operations templategrafana_explore_datasources → grafana_check_alerts (list + listrules) → grafana_search → grafana_get_dashboard (audit=true for each) → summarizegrafana_get_dashboard (audit=true) → review auditSummary + per-panel INLINECODE133grafana_check_alerts (setup) → grafana_create_dashboard (security-overview) → grafana_create_alert (webhook error burst, cost spike, tool loops, injection signals)grafana_security_check → grafana_query_logs (correlate) → grafana_annotate (mark investigation) → grafana_check_alerts (silence)grafana_investigate (multi-signal triage) → follow suggestedHypotheses.testWith for deep-divesgrafana_explain_metric (returns anomaly z-score + seasonality vs 1d/7d ago for 24h period)grafana_check_alerts with action analyze — fatigue reportgrafana_investigate → 5-Phase methodology → postmortem template (see sre-investigation.md §9)grafana_annotate (list, tags: ["deploy"]) → grafana_explain_metric (compareWith: "previous")alloy_pipeline action recipes (filter by category) → select recipe → create → status → query → dashboard → alertalloy_pipeline with recipe scrape-endpoint + params INLINECODE159alloy_pipeline with recipe [db]-exporter + params INLINECODE162alloy_pipeline (log recipe + processing params: jsonExpressions, labelFields, structuredMetadata)alloy_pipeline with recipe INLINECODE168alloy_pipeline with recipe file-logs + params INLINECODE171alloy_pipeline with recipe INLINECODE173alloy_pipeline with recipe kafka-logs + params INLINECODE176alloy_pipeline with recipe INLINECODE178alloy_pipeline with recipe blackbox-exporter + params INLINECODE181alloy_pipeline with recipe kubernetes-pods + kubernetes-services + kubernetes-logs (3 pipelines)alloy_pipeline with recipe INLINECODE187alloy_pipeline with recipe INLINECODE189alloy_pipeline with recipe INLINECODE191alloy_pipeline with recipe INLINECODE193alloy_pipeline with recipe secret-filter-logs + params INLINECODE196alloy_pipeline with recipe elasticsearch-exporter / INLINECODE199alloy_pipeline with recipe INLINECODE201alloy_pipeline with recipe INLINECODE203alloy_pipeline with recipe application-traces + samplingPolicies array (keep errors, keep slow, filter health checks, sample rest)tenantValue or matchRoutes processing paramalloy_pipeline with recipe continuous-profiling + INLINECODE211alloy_pipeline with recipe INLINECODE213alloy_pipeline with recipe INLINECODE215references/alloy-components.md → alloy_pipeline with raw config + optional INLINECODE219alloy_pipeline with action INLINECODE221alloy_pipeline with action INLINECODE223alloy_pipeline with action status + namealloy_pipeline with action diagnose → follow remediationalloy_pipeline with action delete + name (confirm with user first)When several Grafana environments are configured (dev, staging, prod), every tool accepts an optional instance parameter. grafana_explore_datasources returns availableInstances — use the name values from that list.
Why this matters: Users often need to query production metrics, create dashboards in dev, or compare environments side by side. Each tool call targets one instance.
Smart defaults: Omitting instance always targets the configured default — safe and invisible for single-environment setups. Only specify instance when the user explicitly names a non-default environment.
Cross-environment workflows: Each call is independent. Query prod, create dashboard in dev — just set instance differently on each call. No context switching needed.
| Tool | What It Does |
|---|---|
| INLINECODE237 | Discover configured datasources (UIDs, types, query routing) — tells you which tool + query language to use for each datasource |
| INLINECODE238 |
compact: true with metadata: true for minimal fields in multi-tool chains |
| grafana_query | Run PromQL instant/range queries — get numbers directly |
| grafana_query_logs | Run LogQL queries against Loki — search and filter logs |
| grafana_query_traces | Run TraceQL queries against Tempo — search traces or get full trace by ID |
| grafana_create_dashboard | Create dashboards from templates or custom JSON |
| grafana_update_dashboard | Add/remove/update panels, change dashboard metadata, or delete dashboard |
| grafana_get_dashboard | Get dashboard summary (panels, queries). Use compact: true for overview scans, audit: true to health-check all panels in one call |
| grafana_search | Search existing dashboards by title, tags, or starred status |
| grafana_share_dashboard | Render panel as image and deliver inline via messaging |
| grafana_create_alert | Create Grafana-native alert rules on any metric |
| grafana_annotate | Create or list annotations (events) on dashboards |
| grafana_check_alerts | Check, acknowledge, list/delete rules, silence/unsilence, or set up Grafana alert webhook notifications. Use compact: true with list_rules for minimal fields |
| grafana_push_metrics | Push custom data (calendar, git, fitness, finance) via OTLP |
| grafana_explain_metric | Get metric context: current value, trend, stats, metadata, drill-down queries — agent interprets |
| grafana_security_check | Run 6 parallel security checks and return threat-level assessment (green/yellow/red) — "Am I being attacked?" |
| grafana_investigate | Multi-signal investigation triage — gathers metrics, logs, traces, and context in parallel, generates hypotheses with specific tool+params for follow-up |
| alloy_pipeline | Create and manage Alloy data collection pipelines — 29 recipes for metrics, logs, traces, profiles from any infrastructure (databases, K8s, Docker, apps, profiling, frontend RUM) |
grafana_explore_datasourcesgrafana_query, grafana_query_logs, grafana_query_traces, grafana_list_metrics, grafana_create_alert, and grafana_explain_metric.
Params: instance (optional — target Grafana instance, omit for default).
Example: {}
Example (multi-instance): { "instance": "prod" }
Returns: List of datasources with uid, name, type, isDefault, plus routing hints: queryTool (which agent tool to use, e.g. "grafana_query", "grafana_query_logs", or "grafana_query_traces"), queryLanguage (e.g. "PromQL", "LogQL", "TraceQL"), and supported (boolean — whether an agent tool can query this datasource). Use queryTool to pick the right tool for each datasource. When multiple Grafana instances are configured, also returns instance (which instance was queried) and availableInstances (list of { name, url, isDefault } for all configured instances).
grafana_list_metricscategory to each openclaw_* metric. Use purpose when user asks about a specific concern (e.g., "performance metrics", "cost metrics").
Params: datasourceUid (required), prefix (filter by prefix), search (targeted discovery — server-side regex, only matching metrics returned), purpose ("performance" | "cost" | "reliability" | "capacity" — pre-filter by intent, composable with prefix and search), label (list label values instead), metadata (boolean — enriched results with type/help/category), compact (boolean — with metadata, returns only name/type/category, ~60% smaller).
Example names: { "datasourceUid": "prom1", "prefix": "openclaw_lens_" }
Example search: { "datasourceUid": "prom1", "search": "steps" }
Example purpose: { "datasourceUid": "prom1", "purpose": "performance", "metadata": true }
Example combined: { "datasourceUid": "prom1", "prefix": "openclaw_ext_", "search": "fitness" }
Example metadata: { "datasourceUid": "prom1", "metadata": true, "prefix": "openclaw_" }
Example compact: { "datasourceUid": "prom1", "metadata": true, "compact": true }
Returns names: { metrics: ["metric1", "metric2", ...] }. Truncated at 200.
Returns metadata: { metadataSource, categorySummary: { cost: 3, usage: 4, session: 5, ... }, metrics: [{ name, type, help, category?, source? }, ...] }. Use this before composing custom dashboards — type tells you counter vs gauge vs histogram, category groups openclaw_* metrics by function. Search also matches help text. Categories: cost, usage, session, queue, messaging, webhook, tools, agent, custom. categorySummary gives counts per category for quick overview (omitted when no openclaw_* metrics). Purpose maps: performance → session + tools, cost → cost + usage, reliability → webhook + messaging + agent, capacity → queue + session. metadataSource: "prometheus" when Prometheus metadata endpoint has data, "synthetic" when OTLP-only (metadata synthesized from known metric registry — histogram sub-metrics deduplicated, type/help from Grafana Lens definitions). On OTLP stacks, includes hint explaining why metadata is synthetic. source: "synthetic" on individual entries from the registry; source: "custom" on entries from the custom metrics store.
Returns compact: { metadataSource, categorySummary: {...}, metrics: [{ name, type, category? }, ...] }. Same as metadata but drops help, source, labelNames — use in multi-tool chains where you need metric names and types but not full descriptions.
Example label: { "datasourceUid": "prom1", "label": "job" }
Returns label: { label, count, totalCount, values: ["value1", "value2", ...] }. Truncated at 200.
grafana_querydatasourceUid, expr (PromQL), queryType (instant/range), start (range only, required), end (range only, default "now"), step (range only, optional — auto-calculated from time range if omitted, targeting ~300 datapoints), dashboardUid (optional — resolve query from panel), panelId (optional — use with dashboardUid).
Example instant: { "datasourceUid": "prom1", "expr": "sum(increase(openclaw_lens_cost_by_model_total[1d])) or vector(0)" }
Example range (auto-step): { "datasourceUid": "prom1", "expr": "rate(openclaw_tokens_total[5m])", "queryType": "range", "start": "now-30d" }
Example range (explicit step): { "datasourceUid": "prom1", "expr": "rate(openclaw_tokens_total[5m])", "queryType": "range", "start": "now-1h", "end": "now", "step": "60" }
Example panel re-run: { "dashboardUid": "openclaw-command-center", "panelId": 10, "queryType": "range", "start": "now-7d" }
Tip: start/end accept Unix seconds or relative expressions like "now-1h", "now-7d". For range queries, just set start — end defaults to "now" and step is auto-calculated. Override step only when you need specific resolution.
Tip (panel re-run): Set dashboardUid + panelId to re-run a panel's query without manually extracting PromQL. The tool auto-resolves expr and datasourceUid from the panel definition. Template variables are replaced with wildcards. You can still override expr or datasourceUid explicitly if needed. Get panel IDs from grafana_get_dashboard.
Returns instant: { metrics: [{ metric: {...}, value: "1.23", timestamp: "...", healthContext?: { status, thresholds, description, direction } }], datasourceUid, resultCount, warnings?, hint? } — healthContext is included for well-known openclaw_lens_* gauge metrics, providing SRE-grade health assessment: status ("healthy"/"warning"/"critical"), thresholds (warning/critical values), description (what the metric means), direction ("higherisworse"/"lowerisworse"). Omitted for unknown metrics. Capped at 50 results; when exceeded includes truncated: true, totalResults, and truncationHint advising to narrow the query.
Returns range: { series: [{ metric: {...}, values: [{ time, value }...] }], datasourceUid, resultCount, warnings?, hint? } — truncated to 20 points per series and 50 series max. When series are truncated includes truncated: true, totalSeries, and truncationHint. When step is auto-calculated, includes step: { value: "288s", display: "5m", auto: true }.
Returns (panel re-run): Includes resolvedFrom: "panel", panelTitle, panelType, templateVarsReplaced alongside normal query results. If the panel uses a Loki datasource, returns an error directing you to use grafana_query_logs instead.
Returns (warnings): When Prometheus flags a non-fatal issue (e.g., rate() on a gauge), warnings: [{ cause, suggestion, example? }] is included. Example: rate() on a gauge → cause says "rate() applied to 'metric' which appears to be a gauge", suggestion says "use delta() or deriv() instead", example shows the corrected query.
Returns (hint): When the query returns zero results, hint: { cause, suggestion } explains why (metric may not exist, label filters may not match) and suggests using grafana_list_metrics to verify.
Returns (error with guidance): On query failure, includes guidance: { cause, suggestion, example? } alongside the raw error. Pattern-matched for common PromQL mistakes: unclosed parenthesis, missing range selector, timeout, auth failure, rate on gauge, etc. Omitted when the error is unrecognized.
Tip (chaining): Both instant and range responses include datasourceUid — pass it directly to grafana_create_alert or other tools without re-calling grafana_explore_datasources. This enables zero-friction query→alert chains.
grafana_query_logsdatasourceUid, expr (LogQL), queryType (instant/range, default range), start/end (default now-1h/now), step (metric queries only), limit (default 100), direction (backward/forward), lineLimit (max chars per log line, default 500, max 2000), extractFields (boolean, default false — extract structured OTel attributes into a clean fields object), dashboardUid (optional — resolve query from panel), panelId (optional — use with dashboardUid).
Example log search: { "datasourceUid": "loki1", "expr": "{job=\"api\"} |= \"error\"" }
Example with filters: { "datasourceUid": "loki1", "expr": "{job=\"api\"} |~ \"timeout|refused\"", "limit": 50, "direction": "forward" }
Example full stack traces: { "datasourceUid": "loki1", "expr": "{job=\"api\"} |= \"Exception\"", "lineLimit": 2000 }
Example session debugging: { "datasourceUid": "loki1", "expr": "{service_name=\"openclaw\"} | json | component=\"lifecycle\"", "extractFields": true }
Example metric query: { "datasourceUid": "loki1", "expr": "rate({job=\"api\"}[5m])", "queryType": "range", "start": "now-6h", "end": "now", "step": "60" }
Example panel re-run: { "dashboardUid": "openclaw-command-center", "panelId": 18, "start": "now-24h", "extractFields": true }
Returns streams: { entries: [{ labels: {...}, timestamp: "...", line: "..." }], datasourceUid, totalEntries, truncated } — capped at 100 entries, lines at 500 chars (set lineLimit: 2000 for full stack traces).
Returns streams (extractFields): { entries: [{ labels: {...cleaned...}, timestamp: "...", line: "...", fields: { component, event_name, session_id, trace_id, model, duration_s, ... } }], datasourceUid } — infrastructure noise labels removed, openclaw_ prefix stripped from field keys, numeric values auto-converted. Also parses JSON log bodies if present.
Returns streams (traceCorrelation): When extractFields: true and entries contain trace_id, includes traceCorrelation: { traceIds: [...], tool: "grafana_query_traces", tip } — up to 5 unique trace IDs ready for grafana_query_traces with queryType: "get".
Returns metric: Same shape as grafana_query range/instant results (matrix capped at 50 series, vector capped at 50 results — includes datasourceUid, truncated, totalSeries/totalResults, and truncationHint when exceeded).
Returns (panel re-run): Includes resolvedFrom: "panel", panelTitle, panelType, templateVarsReplaced alongside normal results. If the panel uses a Prometheus datasource, returns an error directing you to use grafana_query instead.
Returns (error with guidance): On query failure, includes guidance: { cause, suggestion, example? } alongside the raw error. Pattern-matched for common LogQL mistakes: bare text without stream selector, empty {}, unclosed braces, missing label matchers, auth failure, timeout. Omitted when the error is unrecognized.
Tip: LogQL: {label="value"} selects streams, |= substring filter, |~ regex, != exclude. Metric wrappers: rate(), count_over_time(), bytes_rate(). Use extractFields: true when investigating OTel/lifecycle logs — it surfaces trace_id, session_id, event_name, model, and other attributes as first-class fields instead of buried in raw labels.
Tip (panel re-run): Same as grafana_query — set dashboardUid + panelId to auto-resolve LogQL and datasource. The tool routes Prometheus panels to grafana_query with a helpful error.
grafana_query_tracesdatasourceUid, query (TraceQL expression or trace ID), queryType (search/get, default search), start/end (default now-1h/now), limit (default 20, max 50), minDuration/maxDuration (e.g., "1s", "10s"), dashboardUid (optional — resolve query from panel), panelId (optional — use with dashboardUid).
Example search: { "datasourceUid": "tempo1", "query": "{ resource.service.name = \"openclaw\" }" }
Example search slow: { "datasourceUid": "tempo1", "query": "{ resource.service.name = \"openclaw\" }", "minDuration": "5s" }
Example search with time: { "datasourceUid": "tempo1", "query": "{ span.gen_ai.system = \"anthropic\" }", "start": "now-24h", "limit": 50 }
Example get: { "datasourceUid": "tempo1", "query": "abc123def456789...", "queryType": "get" }
Example panel re-run: { "dashboardUid": "openclaw-session-explorer", "panelId": 12, "start": "now-24h" }
Returns search: { traces: [{ traceId, rootServiceName, rootTraceName, startTime, durationMs, spanCount? }], datasourceUid, totalTraces, truncated?, correlationHint? } — capped at 50 traces. When exceeded includes truncated: true and truncationHint. When traces are found, includes correlationHint: { logQuery, tool, tip } with a ready-to-use LogQL expression for grafana_query_logs.
Returns get: { traceId, spans: [{ traceId, spanId, parentSpanId?, operationName, serviceName, startTime, durationMs, status, kind?, attributes: {...} }], datasourceUid, totalSpans, truncated? } — flattened OTLP spans with resolved attributes (string/number/boolean). Capped at 200 spans. Sorted by start time (earliest first).
Returns (panel re-run): Includes resolvedFrom: "panel", panelTitle, panelType, templateVarsReplaced alongside normal results. If the panel uses a Prometheus or Loki datasource, returns an error directing you to use the correct tool.
Returns (error with guidance): On query failure, includes guidance: { cause, suggestion, example? } alongside the raw error. Pattern-matched for common TraceQL mistakes: syntax errors, invalid attributes, auth failure, timeout, not-found, invalid trace ID. Omitted when the error is unrecognized.
Returns (no results): When search returns zero traces, includes hint: { cause, suggestion } suggesting to broaden the query or check the datasource.
Tip: TraceQL: { } matches all traces, resource.service.name for service filter, span.http.status_code for HTTP spans, name for operation name, duration for span duration, status for error/ok filtering. Use minDuration/maxDuration to find performance outliers. Trace-to-Log: search and get results include correlationHint.logQuery — pass it directly to grafana_query_logs to find correlated logs. Log-to-Trace: grafana_query_logs results (with extractFields: true) include traceCorrelation.traceIds — pass any ID to grafana_query_traces with queryType: "get".
Tip (panel re-run): Same as grafana_query — set dashboardUid + panelId to auto-resolve TraceQL and datasource. The tool routes Prometheus/Loki panels to the correct tool with a helpful error.
grafana_create_dashboardtemplate or dashboard (custom JSON) — one required. Optional: title (overrides template default), folderUid (target folder), overwrite (default true).
Returns: { uid, url, status, message, suggestedNext?: [{ template, reason }], validation?: DashboardValidation }. For template-based dashboards, suggestedNext lists complementary templates to deploy next. For custom JSON dashboards, validation dry-runs each panel's PromQL and reports per-panel health — check validation.panelsError for broken queries.
Choose the right template (3-tier SRE drill-down hierarchy):
Tier 1 → System: Start here for overall health.
Tier 2 → Session: Click a session from Tier 1 to investigate.
Tier 3 → Deep Dive: Cost, tool, or SRE details.
| Template | Tier | Domain | Variables | Use When |
|---|---|---|---|---|
| INLINECODE532 | Tier 1 | System overview | INLINECODE533 , $loki, $tempo, $provider, $model, INLINECODE538 | Golden signals, session table with click-to-drill-down, cost, cache, live feeds |
| INLINECODE539 |
$prometheus, $loki, $tempo, $session (textbox) | Per-session trace hierarchy, LLM calls, tool calls, conversation flow |cost-intelligence | Tier 3a | Cost analysis | $prometheus, $loki, $provider, $model | Spending trends, model attribution, cache savings, per-session cost table |tool-performance | Tier 3b | Tool analytics | $prometheus, $loki, $tempo, $tool | Tool leaderboard, latency ranking, error rates, tool traces |sre-operations | Tier 3c | SRE operations | $prometheus, $loki | Queue health, webhooks, stuck sessions, tool loops |genai-observability | — | OTel genai standard | $prometheus, $loki, $tempo, $model, $provider | Industry-standard AI monitoring: token analytics, LLM performance, traces, logs, cache efficiency. Works with any genai data. |node-exporter | — | System/DevOps | $datasource, $instance | Server CPU, memory, disk, network |http-service | — | Web/DevOps | $datasource, $job | HTTP request rate, errors, latency (RED signals) |metric-explorer | — | Any domain | $datasource, $metric | Deep-dive into any single metric from a dropdown |multi-kpi | — | Any domain | $datasource, $metric1..$metric4 | 4-metric KPI overview (business, fitness, finance, IoT) |weekly-review | — | Any domain | $datasource, $metric1, $metric2 | Weekly overview of 2 external metrics with trends + all openclawext* table |
All AI templates have Loki log-to-trace correlation via Tempo + stable UIDs for cross-dashboard navigation.
Example AI health: { "template": "llm-command-center", "title": "My AI Dashboard" }
Example session debug: { "template": "session-explorer", "title": "Session Debug" }
Example cost analysis: { "template": "cost-intelligence", "title": "My AI Costs" }
Example tool analytics: { "template": "tool-performance", "title": "Tool Health" }
Example SRE ops: { "template": "sre-operations", "title": "SRE Health" }
Example GenAI observability: { "template": "genai-observability", "title": "GenAI Observability" }
Example system: { "template": "node-exporter", "title": "Server Health" }
Example generic: { "template": "metric-explorer", "title": "Explore My Data" }
Example multi-KPI: { "template": "multi-kpi", "title": "Business KPIs" }
Example weekly review: { "template": "weekly-review", "title": "My Weekly Review" }
Example custom with validation: INLINECODE590
Custom dashboard validation (returned only for dashboard param, not templates):
validation: { panelsTotal: 3, panelsValid: 1, panelsNoData: 1, panelsError: 1, panelsSkipped: 0, details: [{ panelId: 1, title: "Cost by Model", status: "ok", queries: [{ refId: "A", expr: "...", valid: true, sampleValue: 0.42 }] }, { panelId: 2, title: "Latency", status: "nodata" }, { panelId: 3, title: "Bad Query", status: "error", error: "parse error at char 5" }] }
Panel statuses: ok (query returned data), nodata (valid query, no results — metric may not exist yet), error (PromQL syntax error or datasource issue), skipped (no datasource UID found). Dashboard is always created regardless — validation is informational.
grafana_update_dashboarduid (required), operation (required: add_panel, remove_panel, update_panel, update_metadata, delete).
add_panel params: panel (object with title, type, targets). Auto-layouts below existing panels.
removepanel / updatepanel params: panelId (preferred) or panelTitle (case-insensitive substring fallback). updates (object) for update_panel.
update_metadata params: title, description, tags, time (e.g., { "from": "now-7d", "to": "now" }), refresh (e.g., "1m").
delete params: None besides uid — permanently removes the dashboard. Always confirm with user first.
Example add: { "uid": "abc123", "operation": "add_panel", "panel": { "title": "Error Rate", "type": "timeseries", "targets": [{ "refId": "A", "expr": "rate(errors_total[5m])", "datasource": { "uid": "prom1" } }] } }
Example add (no datasource): { "uid": "abc123", "operation": "add_panel", "panel": { "title": "Latency", "type": "timeseries", "targets": [{ "refId": "A", "expr": "histogram_quantile(0.99, rate(http_duration_bucket[5m]))" }] } } — validation skipped if no datasource UID found, panel still saved.
Example remove: { "uid": "abc123", "operation": "remove_panel", "panelId": 3 }
Example update panel: { "uid": "abc123", "operation": "update_panel", "panelId": 1, "updates": { "title": "New Title", "targets": [{ "refId": "A", "expr": "new_query" }] } }
Example update metadata: { "uid": "abc123", "operation": "update_metadata", "title": "My Dashboard v2", "time": { "from": "now-7d", "to": "now" }, "refresh": "5m" }
Example delete: { "uid": "abc123", "operation": "delete" }
Returns update: { status: "updated", uid, url, version, operation, panelCount, affectedPanel?: { id, title }, changedFields?: [...], queryValidation?: { validated, results, datasourceUid?, skippedReason? } }.
Returns queryValidation: For add_panel and update_panel (when targets change), PromQL queries are dry-run against Grafana. Each result: { refId, expr, valid: boolean, error?: string, sampleValue?: number }. Panel is always saved — validation is informational. If valid: false, check the error field for PromQL syntax issues. If skippedReason is set, no datasource UID was found — include datasource: { uid: "..." } on targets to enable validation.
Returns delete: { status: "deleted", uid, title, message }.
Tip: targets in update_panel replaces entirely — include all targets, not just changed ones. Include datasource.uid on targets for query validation feedback.
grafana_get_dashboarduid (required). Optional: compact (boolean, default false) — return panel titles and types only, no queries or metadata (~70% smaller). audit (boolean, default false) — dry-run each panel's query and add health status.
Example (full): { "uid": "abc123" }
Example (compact overview): { "uid": "abc123", "compact": true }
Example (audit): { "uid": "abc123", "audit": true }
Returns (full): { uid, title, description?, url, tags, time?, refresh?, panelCount, panels: [{ id, title, type, queries: [{ refId, expr }] }], folderUid, created?, updated? }.
Returns (compact): { uid, title, url, tags, panelCount, panels: [{ id, title, type }] }.
Returns (audit): Same as full, plus each panel gets health: { status: "ok"|"nodata"|"error"|"skipped", error?, sampleValue? } and the response includes auditSummary: { ok, nodata, error, skipped }. Resolves template variable datasources ($prometheus, $loki) and replaces expression template vars with wildcards.
Tip: Use audit: true when the user asks "which panels are broken?" or "audit my dashboard" — it replaces N separate grafana_query calls with one tool call. Use compact: true for lightweight overview scans. Omit both when you need query details (before update or share).
grafana_searchquery (required). Optional: tags (array — filter by tags), starred (boolean — only starred), sort ("alpha-asc"/"alpha-desc"), limit (number, default 100), enrich (boolean — add updatedAt + panelCount per result, default false).
Example: { "query": "cost" }
Example with tags: { "query": "", "tags": ["production"] }
Example starred: { "query": "", "starred": true, "limit": 10 }
Example enriched: { "query": "", "enrich": true }
Returns: { count, enriched, dashboards: [{ uid, title, url, tags, folderTitle?, folderUid?, updatedAt?, panelCount? }] }. folderTitle/folderUid always included when dashboard is in a folder. updatedAt (ISO 8601) and panelCount only present when enrich: true — enables staleness detection and reporting without per-dashboard get_dashboard calls.
Tip: Use enrich: true for reporting workflows ("which dashboards are stale?", "give me a summary of all dashboards"). Skip enrichment for simple lookups. After finding a dashboard, use grafana_get_dashboard to inspect panels, grafana_share_dashboard to render a chart, or grafana_update_dashboard to modify it.
grafana_share_dashboarddashboardUid, panelId (required). Optional: from (default "now-6h"), to (default "now"), width (default 1000), height (default 500), theme ("light"/"dark", default "dark").
Example: { "dashboardUid": "abc123", "panelId": 2, "from": "now-6h", "to": "now" }
Returns: Image rendered inline (tier 1), or snapshot URL (tier 2), or deep link (tier 3). Always delivers something. Includes deliveryTier ("image" | "snapshot" | "link"), rendererAvailable (boolean — false when Image Renderer plugin is missing), renderFailureReason (why image rendering failed), and remediation (how to fix it). Tier 3 also includes snapshotFailureReason.
Tip: Use grafana_get_dashboard first to find panel IDs. If rendererAvailable is false, tell the user to install the grafana-image-renderer plugin.
grafana_create_alerttitle, datasourceUid, expr (PromQL), threshold (all required). Optional: evaluation ("instant"/"rate"/"increase", default "instant"), evaluationWindow (default "5m", used with rate/increase), condition (gt/lt/gte/lte, default gt), for (duration, default 5m), folderUid, labels (e.g., { "severity": "warning" }), annotations (e.g., { "summary": "Cost too high" }), noDataState (NoData/Alerting/OK, default NoData).
IMPORTANT: For counter metrics (*_total), always use evaluation: "rate" (per-second rate) or evaluation: "increase" (total change over window). Raw counter values always increase and will immediately breach any threshold. Use "instant" (default) only for gauges.
Example gauge alert: { "title": "High Cost Alert", "datasourceUid": "prom1", "expr": "openclaw_lens_daily_cost_usd", "threshold": 5, "condition": "gt" }
Example rate alert: { "title": "High Error Rate", "datasourceUid": "prom1", "expr": "openclaw_lens_webhook_error_total", "threshold": 0.1, "evaluation": "rate" }
Example increase alert: { "title": "Token Burst", "datasourceUid": "prom1", "expr": "openclaw_lens_tokens_total", "threshold": 10000, "evaluation": "increase", "evaluationWindow": "1h" }
Returns: { uid, title, status: "created", datasourceUid, url, evaluation?: { mode, window, evaluatedExpr }, metricValidation: { valid, error?, sampleValue? }, message }. The datasourceUid echoes back which datasource the rule targets (verify correctness). metricValidation dry-runs the expression before creation — valid: true + sampleValue confirms data exists; valid: false + error warns of typos/missing metrics. Alert is always created regardless (metric may not have data yet). When evaluation is "rate" or "increase", validation runs the wrapped expression.
Note: Auto-creates a "Grafana Lens Alerts" folder if no folderUid is specified.
grafana_annotateaction ("create" default, or "list").
Create params: text (required), tags, dashboardUid, panelId, time (epoch ms or relative like "now-2h", default now), timeEnd (epoch ms or relative).
List params: from, to (epoch ms or relative like "now-7d", "now-24h", "now"), tags, limit (default 20).
Time formats: All time params accept epoch ms (e.g., 1700000000000) OR Grafana-style relative strings ("now", "now-1h", "now-7d", "now-30m"). Prefer relative strings — they're simpler and avoid arithmetic errors.
Example create: { "text": "Deployed v2.1.0", "tags": ["deploy", "production"] }
Example create past: { "text": "Incident started", "time": "now-2h", "timeEnd": "now-30m", "tags": ["incident"] }
Example list recent: { "action": "list", "from": "now-7d", "to": "now", "tags": ["deploy"] }
Example list: { "action": "list", "tags": ["deploy"], "limit": 10 }
Returns create: { status: "created", id, message, time, comparisonHint: { beforeWindow: { from, to }, afterWindow: { from, to }, suggestion } }. The comparisonHint provides ready-to-use ISO 8601 time ranges (30-min windows) for before/after comparison via grafana_query — no manual time math needed. For region annotations (with timeEnd), afterWindow starts at timeEnd.
Returns list: { annotations: [{ id, text, tags, time, timeEnd?, dashboardUID?, panelId? }] }.
grafana_check_alertsaction ("list" default, "acknowledge", "list_rules", "delete_rule", "silence", "unsilence", "setup").
List params: None — returns all pending (unacknowledged) alerts. Instances capped at 5 per alert.
Acknowledge params: alertId (required) — marks an alert as investigated.
List rules params: compact (boolean, default false — returns only uid/title/state/condition). Full mode returns all configured alert rules from Grafana with UID, title, condition (PromQL), folder, labels, annotations, AND live evaluation state (normal/firing/pending/nodata/error), health, and lastEvaluation. One call gives the complete alert health picture.
Delete rule params: ruleUid (required) — permanently deletes an alert rule. Get UIDs from list_rules.
Silence params: matchers (required — array of { name, value, isRegex? } from alert's commonLabels), duration (default "2h"), comment (optional).
Unsilence params: silenceId (required) — removes a silence so alerts resume notifying.
Setup params: webhookUrl (optional, auto-detected) — creates webhook contact point and notification policy route in Grafana.
Example list: {}
Example acknowledge: { "action": "acknowledge", "alertId": "alert-1" }
Example list rules: { "action": "list_rules" }
Example list rules compact: { "action": "list_rules", "compact": true }
Example delete rule: { "action": "delete_rule", "ruleUid": "abc123-def456" }
Example silence: { "action": "silence", "matchers": [{ "name": "alertname", "value": "HighCost" }], "duration": "2h", "comment": "Investigating cost spike" }
Example unsilence: { "action": "unsilence", "silenceId": "silence-uuid-123" }
Example setup: { "action": "setup" }
Returns list: { status: "success", alertCount, alerts: [{ id, status, title, message, receivedAt, commonLabels, totalInstances, truncated?, suggestedInvestigation?: { datasourceUid, condition, tool, queryLanguage, hint }, instances: [{ status, labels, annotations, startsAt, values }] }] }. suggestedInvestigation is auto-enriched by matching the alert to its rule — provides the PromQL/LogQL expression, datasource, and tool to use for immediate investigation (eliminates the need for separate list_rules + explore_datasources calls).
Returns acknowledge: { status: "acknowledged", alertId }.
Returns list_rules: { status: "success", ruleCount, rules: [{ uid, title, folder, ruleGroup, state, health, lastEvaluation, for, labels, annotations, condition, updated }] }. state is the live evaluation state: "normal" (not firing), "firing", "pending" (within for duration), "nodata", or "error". Falls back to "unknown" if the eval state API is unavailable. health is "ok", "nodata", "error", or "unknown". condition is the extracted PromQL expression from the rule's data queries.
Returns list_rules (compact): { status: "success", ruleCount, rules: [{ uid, title, state, condition }] }. Minimal fields for multi-tool chains — use when you need a quick overview of all rules without details.
Returns delete_rule: { status: "deleted", ruleUid, message }.
Returns silence: { status: "silenced", silenceId, duration, matchers, message }.
Returns unsilence: { status: "unsilenced", silenceId, message }.
Returns setup: { status: "created", contactPointUid, webhookUrl } or { status: "already_exists", contactPointUid }.
Note: Setup is idempotent — safe to call multiple times. Only alerts with managed_by=openclaw label route to the webhook (auto-added by grafana_create_alert). Use list_rules → delete_rule for full alert lifecycle management (create via grafana_create_alert, list/delete via grafana_check_alerts).
grafana_push_metricsaction ("push" default, "register", "list", "delete").
Push params: metrics (required array) — each: { name, value, labels?, type?, help?, timestamp? }. Names auto-get openclaw_ext_ prefix. timestamp is optional ISO 8601 for historical data (gauge only).
Register params: name (required), type ("gauge"/"counter", default "gauge"), help, labelNames (array), ttlDays.
List params: None — returns all custom metric definitions.
Delete params: name (required) — removes a custom metric.
Example push: { "metrics": [{ "name": "steps_today", "value": 8000 }, { "name": "meetings", "value": 3, "labels": { "type": "standup" } }] }
Example backfill: { "metrics": [{ "name": "steps", "value": 8000, "timestamp": "2025-01-15" }, { "name": "steps", "value": 10500, "timestamp": "2025-01-16" }] }
Example mixed: { "metrics": [{ "name": "steps", "value": 9000, "timestamp": "2025-01-17" }, { "name": "heart_rate", "value": 72 }] }
Example register: { "action": "register", "name": "weight_kg", "type": "gauge", "help": "Body weight", "labelNames": ["person"], "ttlDays": 90 }
Example list: { "action": "list" }
Example delete: { "action": "delete", "name": "old_metric" }
Returns push: { status: "ok", accepted: 2, queryNames: { "openclaw_ext_steps": "openclaw_ext_steps", "openclaw_ext_events": "openclaw_ext_events_total" }, suggestedWorkflow: [{ tool, action, example }], message: "..." }. suggestedWorkflow contains concrete next-step examples using the actual pushed metric names — verify (grafanaquery), visualize (grafanacreatedashboard with metric-explorer template), and alert (grafanacreate_alert, single-metric only). Partial success supported. Timestamped and real-time points in the same batch are both accepted.
Returns register: { status: "registered", metric: { name, type, help, labelNames, ttlMs }, queryName: "openclaw_ext_events_total", suggestedWorkflow: [{ tool, action, example }] }. suggestedWorkflow shows how to push data and query the registered metric (with rate() wrapping for counters).
Returns list: { count, metrics: [{ name, type, queryName, help, labelNames, createdAt, updatedAt }] }.
Returns delete: { status: "deleted", name }.
Note: Push auto-registers unknown metrics. Response includes queryNames with exact PromQL names and suggestedWorkflow with concrete next steps. Follow suggestedWorkflow to complete the push→visualize pipeline. Timestamped pushes are gauge-only — counters with timestamps are rejected. See external-data.md for naming conventions and backfill patterns.
grafana_explain_metricdatasourceUid (required), expr (PromQL or plain metric name, required), period (24h/7d/30d, default 24h), compareWith ("previous" — compare current period with the same-length window immediately before it).
Example: { "datasourceUid": "prom1", "expr": "openclaw_lens_daily_cost_usd" }
Example counter: { "datasourceUid": "prom1", "expr": "openclaw_lens_tokens_total" }
Example 7d: { "datasourceUid": "prom1", "expr": "openclaw_lens_daily_cost_usd", "period": "7d" }
Example comparison: { "datasourceUid": "prom1", "expr": "openclaw_lens_daily_cost_usd", "period": "7d", "compareWith": "previous" }
Example PromQL: { "datasourceUid": "prom1", "expr": "rate(http_requests_total[5m])", "period": "24h" }
Returns: { metricType?, trendQuery?, current: { value, timestamp }, healthContext?: { status, thresholds, description, direction }, trend: { changePercent, direction, first, last }, stats: { min, max, avg, samples }, comparison?: { previousPeriod: { from, to, avg, min, max, samples }, change: { absolute, percentage, direction } }, metadata: { type, help, unit }, suggestedQueries?: [{ query, description }], suggestedBreakdowns?: string[] }. Sections omitted when data unavailable. changePercent is null when first value is zero. healthContext is included for well-known openclaw_lens_* gauge metrics — same as grafana_query.
Counter-aware: Auto-detects counter metrics (via metadata type or _total suffix) and wraps the trend query in rate(expr[5m]). The current value stays raw (cumulative total), but trend and stats show rate of change. metricType field tells you the detected type (counter/gauge/histogram). trendQuery shows the actual PromQL used for trend (only present when different from expr).
Drill-down: For multi-dimensional metrics (metrics with labels like model, token_type, provider), the response includes suggestedQueries — ready-to-use PromQL queries for grafana_query that break down the metric by each label. Counter metrics get rate() wrapping automatically. Use these to investigate cost attribution, identify top contributors, or decompose aggregates.
Breakdowns: suggestedBreakdowns provides label names for decomposition — always available for known OpenClaw metrics (cost, session, queue, webhook families) even when the metric has no data yet. For unknown metrics, falls back to labels discovered from the instant query. Use these labels with grafana_query to build sum by (label) (...) queries for root-cause analysis.
Period comparison: Use compareWith: "previous" for period-over-period analysis (e.g., this week vs. last week). Returns a comparison object with the previous period's stats and the change (absolute, percentage, direction). Works with counters too (compares rates). Eliminates the need for manual multi-query workflows.
Tip: For simple trend context, call with just period. For "did things improve?" questions, add compareWith: "previous". Metadata only available for plain metric names (not complex PromQL). No need to manually wrap counters in rate() — the tool does it automatically.
grafana_security_checklookback (time window, default "1h". Use "24h" for daily review, "7d" for weekly).
Example: {}
Example weekly review: { "lookback": "7d" }
Returns: { overallThreatLevel: "green"|"yellow"|"red", summary, checks: [{ name, status, value, detail }], securityEventLogs, limitations: [...], suggestedActions: [...], dashboardTemplate: "security-overview" }.
Checks (6 parallel, via Promise.allSettled — partial failures return partial results):
webhook_error_ratio — webhook error rate vs received rate. Warning 20%, critical 50%.dashboardTemplate to create a persistent security dashboard. Use suggestedActions for investigation steps.
grafana_investigatefocus (required — alert UID, metric name, or symptom text), timeWindow (optional, default "1h", options: "1h", "6h", "24h"), service (optional, default "openclaw").
Example: { "focus": "high error rate" }
Example with alert: { "focus": "alert-abc", "timeWindow": "6h" }
Example with metric: { "focus": "openclaw_lens_daily_cost_usd", "timeWindow": "24h" }
Returns: { timeWindow, focus, metricSignals: { focus: { current, trend, anomalyScore, anomalySeverity }, red: { rate, errorRate, p95Latency } }, logSignals: { totalVolume, errorCount, bySeverity, topPatterns, sampleErrors }, traceSignals: { errorTraces, slowTraces }, contextSignals: { recentAnnotations, alertsActive }, suggestedHypotheses: [{ hypothesis, evidence, confidence, testWith: { tool, params } }], limitations }.
Auto-discovery: No datasource UID needed — auto-discovers Prometheus, Loki, and Tempo datasources. Loki and Tempo are optional (graceful degradation with limitations noted).
Follow-up: Use suggestedHypotheses[].testWith for deep-dives with specific tools. Use grafana_annotate to mark findings. Use grafana_check_alerts to acknowledge investigated alerts.
alloy_pipelinecreate (deploy pipeline from recipe), list (show managed pipelines), update (change params), delete (remove pipeline), recipes (browse catalog), status (check health), diagnose (Alloy connectivity + all pipelines).
Params (create with recipe): recipe (required), params (recipe-specific), name (optional — auto-generated from recipe name with numeric suffix for uniqueness, e.g., syslog, syslog-2, syslog-3; always read the name field from the response to get the actual assigned name). Log recipes also accept optional processing params: jsonExpressions (object — JSON path extractions), labelFields (object — promote fields to labels), structuredMetadata (object — Loki structured metadata), staticLabels (object), timestampSource (string), outputSource (string), regexExpression (string).
Params (create with raw config): config (raw Alloy River syntax), name (required), signal (optional — "metrics", "logs", "traces", or "profiles", auto-detected from component IDs if omitted), sampleQueries (optional — object with metrics/logs/traces keys containing ready-to-use queries). Response includes exportTargets showing where data is being sent.
Params (update): name (required), params (fields to change — merges with existing). Raw-config pipelines support update via full config replacement — pass config param with the complete new config. Recipe-based pipelines use params for partial updates (merges with existing). Response includes sampleQueries and suggestedWorkflow for chaining into query/dashboard tools.
Params (status): name (required).
Params (recipes): category (optional filter).
Example create (scrape): { "recipe": "scrape-endpoint", "params": { "url": "http://myapp:8080/metrics" } }
Example create (database): { "recipe": "postgres-exporter", "params": { "connectionString": "postgres://user:pass@db:5432/mydb" }, "name": "analytics-db" }
Example create (logs): { "recipe": "docker-logs", "params": { "containerNames": ["myapp", "nginx"] } }
Example create (files): { "recipe": "file-logs", "params": { "paths": ["/var/log/app/*.log"] } }
Example create (K8s): { "recipe": "kubernetes-pods", "params": { "namespaces": ["production"] } }
Example create (OTLP): { "recipe": "otlp-receiver" }
Example create (traces with sampling): { "recipe": "application-traces", "params": { "environment": "staging", "sampleRate": 0.5 } }
Example create (syslog UDP): { "recipe": "syslog", "params": { "protocol": "udp", "listenAddress": "0.0.0.0:5514" } }
Example create (raw config): { "config": "prometheus.scrape ...", "name": "custom-pipeline", "signal": "metrics" }
Example recipes: { "action": "recipes", "category": "logs" }
Example status: { "action": "status", "name": "analytics-db" }
Example diagnose: { "action": "diagnose" }
Returns (create): { status, name, recipe, signal, configFile, reloaded, envVarsRequired, warnings, sampleQueries: { metrics|logs|traces: {...} }, suggestedWorkflow }. status is "created" (active) or "pending_credentials" (env vars not set yet). sampleQueries are ready-to-use PromQL/LogQL/TraceQL — pass directly to query tools. envVarsRequired lists env vars for credential-bearing recipes. warnings contains advisory notes about unrecognized params. Raw-config pipelines may also include credentialWarnings if plaintext credentials were detected.
Returns (recipes): `{ categories, recipes: [{ name, category, signal, summary, requiredParams, optionalParams, hasCreden您拥有完整的原生Grafana访问权限——查询数据、创建仪表盘、设置告警、接收告警通知、标注事件、探索数据源、推送自定义数据,以及内联交付可视化图表。适用于Grafana中的任何数据,而不仅仅是代理指标。
该技能支持在以下平台通过对话安装:
帮我安装 SkillHub 和 grafana-lens-1776197967 技能
设置 SkillHub 为我的优先技能安装源,然后帮我安装 grafana-lens-1776197967 技能
skillhub install grafana-lens-1776197967
文件大小: 490.53 KB | 发布时间: 2026-4-15 11:37
多链集团旗下-闲社网
闲社在线客服
关注闲社网微信
闲社网APP
Archiver·手机版·闲社网·闲社论坛·羊毛社区· 多链控股集团有限公司 · 苏ICP备2025199260号-1
Powered by Discuz! X5.0 © 2024-2025 闲社网·线报更新论坛·羊毛分享社区·http://xianshe.com
