VMware Aria Operations
VMware Aria Operations (vRealize Operations) AI-assisted monitoring — 27 MCP tools for resources, alerts, alert definitions, capacity planning, anomaly detection, report automation, and platform health.
Domain-focused monitoring skill for Aria Operations 8.x / vRealize Operations 8.x.
Companion skills: vmware-nsx (networking), vmware-aiops (VM lifecycle), vmware-monitor (read-only vSphere), vmware-avi (AVI/ALB/AKO).
| vmware-pilot (workflow orchestration) | vmware-policy (audit/policy)
What This Skill Does
| Category | Tools | Count |
|---|
| Resources | list, get details, metrics, health badge, top consumers | 5 |
| Alerts |
list, get details, acknowledge, cancel, list definitions | 5 |
|
Alert Definitions | list symptoms, create definition, enable/disable, delete | 4 |
|
Capacity | cluster overview, remaining capacity, time remaining, rightsizing | 4 |
|
Reports | list templates, generate, list, get status+download URL, delete | 5 |
|
Anomaly | list anomalies, risk badge | 2 |
|
Health | Aria platform health, collector group status | 2 |
Total: 27 tools (23 read-only + 4 write)
Quick Install
CODEBLOCK0
When to Use This Skill
Performance monitoring (daily proactive checks):
- - Check VM contention: CPU Ready %, Memory Balloon, Swap usage
- Fetch time-series metrics for any resource (CPU, memory, disk, network)
- Find top consumers by CPU/memory/disk/network
- Detect ML-based anomalies and risk scores
Alert management:
- - List, investigate, acknowledge, or cancel active alerts
- List or filter alert definitions (templates)
- Create new alert definitions from symptom definitions (post-RCA)
- Enable or disable alert definitions; delete obsolete ones
Capacity planning:
- - Cluster capacity remaining (CPU, memory, disk headroom)
- Time-until-full prediction per cluster
- Right-sizing: find over-provisioned or under-utilized VMs
- Capacity overview with Aria's built-in recommendations
Report automation:
- - Generate scheduled or on-demand reports (capacity, performance, SLA)
- Poll report status until COMPLETED; get PDF/CSV download URL
- Delete generated reports after download
Use companion skills for:
- - VM lifecycle: create, clone, snapshot, power → INLINECODE0
- NSX networking: segments, gateways, NAT, routing → INLINECODE1
- vSphere inventory, real-time alarms, events → INLINECODE2
- Storage: iSCSI, vSAN, datastores → INLINECODE3
- Load balancing, AVI/ALB, AKO, Ingress → INLINECODE4
Related Skills — Skill Routing
| User Intent | Recommended Skill |
|---|
| Aria Operations monitoring, alerts, capacity | vmware-aria ← this skill |
| VM lifecycle, deployment, guest ops |
vmware-aiops |
| NSX networking: segments, gateways, NAT, routing |
vmware-nsx |
| Read-only vSphere inventory, events, alarms |
vmware-monitor |
| Storage: iSCSI, vSAN, datastores |
vmware-storage |
| Multi-step workflows with approval |
vmware-pilot |
| Load balancer, AVI, ALB, AKO, Ingress |
vmware-avi (
uv tool install vmware-avi) |
| Audit log query |
vmware-policy (
vmware-audit CLI) |
Common Workflows
Daily VM Health Check (Proactive Ops)
Catch contention before users complain. Key metrics: CPU Ready, Memory Balloon, Disk Latency.
- 1. Find top CPU consumers → INLINECODE7
- Check CPU Ready on hot VMs → INLINECODE8
- >5% = warning, >10% = problem, >20% = critical
- 3. Check memory pressure → INLINECODE9
- Balloon >0 = ESXi reclaiming memory; Swap >0 = severe — act immediately
- 4. List active CRITICAL/IMMEDIATE alerts → INLINECODE10
- Check ML anomalies → INLINECODE11
Investigate High CPU Alert
- 1. List active CRITICAL alerts → INLINECODE12
- Get alert details + symptoms → INLINECODE13
- Find top CPU consumers → INLINECODE14
- Fetch 24h CPU metrics for the hot VM → INLINECODE15
- Check risk badge → INLINECODE16
- Acknowledge the alert → INLINECODE17
Capacity Planning
- 1. List clusters → INLINECODE18
- Get remaining capacity → INLINECODE19
- Predict time until full → INLINECODE20
- Get capacity overview with recommendations → INLINECODE21
- Find rightsizing candidates → INLINECODE22
Post-Incident: Create Detection Alert (RCA Follow-up)
After resolving an incident, create an early-warning alert to prevent recurrence:
- 1. Find matching symptom definition → INLINECODE23
- Create alert definition referencing symptoms → INLINECODE24
- Verify it appears in definitions → INLINECODE25
- Enable it → INLINECODE26
Generate Capacity Report
- 1. Find report template → INLINECODE27
- Trigger report generation → INLINECODE28
- Poll until completed →
vmware-aria report get <report-id> (repeat until status == COMPLETED) - Download via the returned
download_url (PDF) or INLINECODE32 - Clean up → INLINECODE33
Multi-Target Operations
All commands accept --target <name> to operate against a specific Aria Ops instance:
CODEBLOCK1
Usage Mode
| Scenario | Recommended | Why |
|---|
| Local/small models (Ollama, Qwen) | CLI | ~2K tokens vs ~8K for MCP |
| Cloud models (Claude, GPT-4o) |
Either | MCP gives structured JSON I/O |
| Automated pipelines |
MCP | Type-safe parameters, structured output |
MCP Tools (27 — 21 read, 6 write)
All MCP tools accept an optional target parameter to select which Aria Operations instance to connect to.
| Category | Tool | Type | Description |
|---|
| Resource | INLINECODE36 | Read | List VMs, hosts, clusters by resource kind |
|
get_resource | Read | Get resource details with health, risk, efficiency badges |
| |
get_resource_metrics | Read | Fetch time-series metric stats for any resource |
| |
get_resource_health | Read | Get health badge score (0–100) |
| |
get_top_consumers | Read | Rank resources by CPU, memory, disk, or network usage |
| Alerts |
list_alerts | Read | List active alerts with criticality and resource info |
| |
get_alert | Read | Get alert details: symptoms and recommendations |
| |
acknowledge_alert |
Write | Mark an alert as acknowledged (does not close it) |
| |
cancel_alert |
Write | Cancel (dismiss) an active alert |
| |
list_alert_definitions | Read | List alert templates configured in Aria Ops |
| Alert Defs |
list_symptom_definitions | Read | List symptom definitions — use IDs when creating alert defs |
| |
create_alert_definition |
Write | Create new alert definition from symptom definition IDs |
| |
set_alert_definition_state |
Write | Enable or disable an alert definition |
| |
delete_alert_definition |
Write | Delete an alert definition permanently |
| Capacity |
get_capacity_overview | Read | Cluster capacity recommendations from Aria |
| |
get_remaining_capacity | Read | Remaining CPU, memory, disk before hitting limits |
| |
get_time_remaining | Read | Days until cluster capacity is exhausted |
| |
list_rightsizing_recommendations | Read | VMs to resize: over/under-provisioned |
| Reports |
list_report_definitions | Read | List available report definition templates |
| |
generate_report |
Write | Trigger report generation (async; returns report_id) |
| |
list_reports | Read | List generated reports, optionally by definition |
| |
get_report | Read | Poll report status + get PDF/CSV download URLs |
| |
delete_report |
Write | Delete a generated report |
| Anomaly |
list_anomalies | Read | Machine-learning anomalies across monitored resources |
| |
get_resource_riskbadge | Read | Risk score (0–100): likelihood of future problems |
| Health |
get_aria_health | Read | Aria platform internal services health |
| |
list_collector_groups | Read | Collector agents status and connectivity |
Read/write split: 21 read-only, 6 write. All write operations are audit-logged to ~/.vmware/audit.db (via vmware-policy).
CLI Quick Reference
CODEBLOCK2
Key Metric Names (for resource metrics command)
| Metric | API Key | What It Means |
|---|
| CPU Ready % | INLINECODE65 | vCPU waiting for physical core; >5% = warning |
| CPU Used |
cpu.used.summation | Actual CPU execution time |
| CPU Demand |
cpu.demand.average | Total MHz requested by VM |
| Memory Active |
mem.active.average | Actively used by guest OS (sizing) |
| Memory Consumed |
mem.consumed.average | Footprint on host (capacity) |
| Memory Balloon |
mem.balloon.average |
>0 = ESXi reclaiming memory |
| Memory Swap |
mem.swapped.average |
>0 = severe pressure |
| Disk Read Latency |
disk.read.average | Read I/O latency ms |
| Disk Write Latency |
disk.write.average | Write I/O latency ms |
| Net Received |
net.received.average | Inbound network KB/s |
| Net Transmitted |
net.transmitted.average | Outbound network KB/s |
Full CLI reference with all options and output formats: see INLINECODE76
Troubleshooting
"Token not found" error after setup
The token acquisition request failed. Verify:
- 1. Aria Ops is reachable: INLINECODE77
- The
auth_source in config matches your environment (LOCAL, LDAP, AD) - The password env var follows the naming convention: INLINECODE79
Resources appear missing from list_resources
The collector agent may be offline. Check list_collector_groups for any collectors in a non-RUNNING state. Restart the affected collector from the Aria Ops UI under Administration > Collector Groups.
Metrics return empty data
The resource may not have metric collection configured, or the requested metric key is incorrect. Verify metric keys against the resource's available metrics in the Aria Ops UI (Metrics tab on the resource detail page).
"Password not found" error
Variable names follow the pattern VMWARE_ARIA_<TARGET_NAME_UPPER>_PASSWORD where hyphens become underscores. Example: target prod needs VMWARE_ARIA_PROD_PASSWORD. Check your ~/.vmware-aria/.env file.
Safety
- - Read-heavy: 21 of 27 tools are read-only
- Audit logging: Write operations logged to
~/.vmware/audit.db (SQLite WAL, via vmware-policy) with timestamp, user, target, operation, and result - Token expiry handling: OpsToken refreshed automatically 60 seconds before expiry (30-minute validity window)
- Prompt injection defense: API text values sanitized via
_sanitize() — strips control characters, truncates to 500 chars - Credential safety: Passwords loaded only from environment variables (
.env file), never from INLINECODE88 - Input validation: resourceid and alertid validated before API calls; criticality values validated against known enum
Setup
CODEBLOCK3
All tools are automatically audited via vmware-policy. Audit logs: INLINECODE89
Full setup guide with multi-target config, MCP server setup, and Docker: see INLINECODE90
Architecture
CODEBLOCK4
The MCP server uses stdio transport (local only, no network listener). Connections to Aria Ops use HTTPS on port 443 with OpsToken authentication (30-minute token validity, auto-refreshed).
Audit & Safety
All operations are automatically audited via vmware-policy (@vmware_tool decorator):
- - Every tool call logged to
~/.vmware/audit.db (SQLite, framework-agnostic) - Policy rules enforced via
~/.vmware/rules.yaml (deny rules, maintenance windows, risk levels) - Risk classification: each tool tagged as low/medium/high/critical
- View recent operations: INLINECODE94
- View denied operations: INLINECODE95
vmware-policy is automatically installed as a dependency — no manual setup needed.
License
MIT — github.com/zw008/VMware-Aria