cluster

# Cluster — Data Clustering Analysis Tool Cluster is a command-line data clustering analysis tool that supports k-means and hierarchical clustering algorithms. It reads numerical data from CSV/JSONL sources, performs clustering, evaluates cluster quality, and exports results. Data is stored in `~/.cluster/data.jsonl` as JSONL records. Each record represents a clustering run with its parameters, assignments, centroids, and evaluation metrics. ## Prerequisites - Python 3.8+ with standard library (no external packages required for basic operations) - `bash` shell ## Commands ### `run` Run a clustering algorithm on input data. **Environment Variables:** - `INPUT` (required) — Path to input CSV/JSONL file with numerical data - `K` — Number of clusters (default: 3) - `ALGORITHM` — Algorithm to use: `kmeans` or `hierarchical` (default: kmeans) - `MAX_ITER` — Maximum iterations for k-means (default: 100) - `SEED` — Random seed for reproducibility **Example:** ```bash INPUT=/path/to/data.csv K=5 ALGORITHM=kmeans bash scripts/script.sh run ``` ### `assign` Assign new data points to existing clusters from a previous run. **Environment Variables:** - `RUN_ID` (required) — ID of the clustering run to use - `INPUT` (required) — Path to new data points (CSV/JSONL) **Example:** ```bash RUN_ID=abc123 INPUT=/path/to/new_data.csv bash scripts/script.sh assign ``` ### `centroids` Display or export centroid coordinates for a clustering run. **Environment Variables:** - `RUN_ID` (required) — ID of the clustering run - `FORMAT` — Output format: `table`, `json`, `csv` (default: table) ### `evaluate` Evaluate clustering quality with silhouette score, inertia, and Davies-Bouldin index. **Environment Variables:** - `RUN_ID` (required) — ID of the clustering run to evaluate ### `visualize` Generate a text-based or ASCII visualization of cluster assignments. **Environment Variables:** - `RUN_ID` (required) — ID of the clustering run - `DIMS` — Dimensions to plot, comma-separated (default: first two) ### `export` Export clustering results to a file. **Environment Variables:** - `RUN_ID` (required) — ID of the run to export - `OUTPUT` — Output file path (default: stdout) - `FORMAT` — Export format: `json`, `csv`, `jsonl` (default: json) ### `import` Import a previously exported clustering run. **Environment Variables:** - `INPUT` (required) — Path to the file to import ### `config` View or update configuration settings. **Environment Variables:** - `KEY` — Configuration key to set - `VALUE` — Configuration value ### `list` List all stored clustering runs with summary info. **Environment Variables:** - `LIMIT` — Maximum runs to display (default: 20) - `SORT` — Sort field: `date`, `k`, `score` (default: date) ### `stats` Show aggregate statistics across all clustering runs. ### `help` Display usage information and available commands. ### `version` Display the current version of the cluster tool. ## Data Storage All clustering runs are stored in `~/.cluster/data.jsonl`. Each line is a JSON object with fields: - `id` — Unique run identifier - `timestamp` — ISO 8601 creation time - `algorithm` — Algorithm used - `k` — Number of clusters - `centroids` — List of centroid coordinates - `assignments` — Mapping of data point indices to cluster IDs - `metrics` — Evaluation metrics (silhouette, inertia, etc.) - `input_file` — Source data file path - `num_points` — Number of data points clustered ## Configuration Config is stored in `~/.cluster/config.json`. Available keys: - `default_k` — Default number of clusters (default: 3) - `default_algorithm` — Default algorithm (default: kmeans) - `max_iterations` — Default max iterations (default: 100) - `random_seed` — Default random seed (default: 42) --- Powered by BytesAgain | bytesagain.com | hello@bytesagain.com

cluster

标签

通过对话安装

方式一：安装 SkillHub 和技能

方式二：设置 SkillHub 为优先技能安装源

通过命令行安装

下载 Zip 包

cluster

cluster

标签

通过对话安装

方式一：安装 SkillHub 和技能

方式二：设置 SkillHub 为优先技能安装源

通过命令行安装

下载 Zip 包

相关推荐

self-improvement

self-improvement

self-improvement

self-improvement