Clone GitLab Skill
Batch clone all projects from GitLab group(s), maintain folder hierarchy, checkout key branches, and generate an Excel index.
When to Use
- - User wants to batch clone all projects from one or more GitLab top-level groups
- Need to maintain original group/subgroup folder hierarchy locally
- Need to checkout multiple branches (default + latest active + release/prod)
- Need an Excel index file (
01.Index.xlsx) with per-group sheets
Requirements
Before starting, collect from user:
| Parameter | Required | Default | Notes |
|---|
| GitLab URL | ✅ | — | e.g. INLINECODE1 |
| Personal Access Token |
✅ | — | Needs
read_api +
read_repository scopes |
| Target Group(s) | ✅ | — | Group/sub-group/project paths, comma-separated. Supports: top-level group (
myGroup), sub-group path (
myGroup/mySubGroup), or direct project path (
myGroup/mySubGroup/my-project) |
| Local Storage Path | ❌ |
~/Desktop/Code | Where repos are stored |
| Auth Method | ❌ | HTTPS+Token | Or SSH if key is configured |
| Mode | ❌ | clone |
clone (default),
update (pull only), or
sync (clone+pull+cleanup stale Excel rows) |
| Workers | ❌ | 4 | Parallel clone workers (
GITLAB_WORKERS) |
| Total Timeout | ❌ | 0 (none) | Global timeout in seconds (
GITLAB_TOTAL_TIMEOUT). 0=no limit |
Workflow
Step 1: Collect Input
Ask user for the 4 parameters above. Pass them as environment variables to the script:
CODEBLOCK0
Step 2: Run the Script
CODEBLOCK1
The script handles everything:
- 1. Resolves each input path — auto-detects if it's a top-level group, sub-group, or direct project
- For top-level groups: fetches all subgroups recursively (skips if access denied)
- For sub-group paths (e.g.
myGroup/mySubGroup): directly resolves and syncs that sub-group and its descendants only - For direct project paths: syncs only that specific project
- Clones new projects / pulls existing ones (with 5-min timeout per project)
- Uses multiprocessing for parallel clone/pull (process-group kill on timeout to prevent orphan git processes)
- Checks out branches: default + latest active + release/prod
- Incrementally updates
01.Index.xlsx every 50 projects (so partial results survive crashes) - On SIGTERM/SIGINT, emergency-flushes pending results to Excel before exit
- Prints per-group project counts during discovery and real-time progress with elapsed time
- In
sync mode, the final Excel write removes rows for deleted/archived projects and handles cross-group migrations
Step 3: Report Results
After the script finishes, report:
- - Total projects cloned/updated
- Any failed/timed-out projects (the script prints a summary table)
Modes
Update Only
If projects already exist and user just wants to update:
CODEBLOCK2
This skips clone, only does git fetch --all && git pull on existing repos, re-checkouts branches, and refreshes the Excel.
Sync (Full Sync with Cleanup)
For a complete sync that also cleans up stale data in the Excel:
CODEBLOCK3
This behaves like clone mode (new repos are cloned, existing ones are pulled), but additionally:
- - Removes Excel rows for projects that no longer exist on GitLab (deleted/archived)
- Handles projects that moved between groups (updates path, removes old row)
- Only cleans up sheets belonging to the groups specified in
GITLAB_GROUPS (won't touch other groups' data)
Excel Specification
File: 01.Index.xlsx inside <local-path>/ (e.g. ~/Desktop/Code/01.Index.xlsx)
Sheets: One sheet per top-level group (sheet name = group name, e.g. "myGroup")
Columns:
| Col | Field | Content |
|---|
| A | 主Group名称 | Top-level group (e.g. myGroup) |
| B |
子Group路径 | Full group path without project name |
| C | Project路径 | Full path (e.g. myGroup/mySubGroup/myProject) |
| D | Project名称 | Project name |
| E | Project描述 | GitLab description |
| F | 已checkout分支 | All local branches, one per line |
| G | 分支最新提交时间 | Corresponding commit times, one per line |
| H | SSH Git链接 | ssh
urlto_repo |
| I | 下载时间 | Clone/update timestamp |
| J | Project ID | GitLab project ID (hidden column, used for matching) |
Sort: A (asc) → B (asc) → D (asc)
Formatting: Frozen header row, thin borders on all cells, F/G columns left-aligned with wrap, other columns center-aligned, UTF-8 encoding.
Security
- - Token is passed via environment variable, never logged
- After clone, remote URL is reset to remove embedded token
- If clone times out or crashes, a cleanup step removes token from INLINECODE22
Output Structure
CODEBLOCK4
Clone GitLab 技能
从 GitLab 组批量克隆所有项目,保持文件夹层级结构,检出关键分支,并生成 Excel 索引文件。
使用场景
- - 用户希望从一个或多个 GitLab 顶级组批量克隆所有项目
- 需要在本地保持原始组/子组文件夹层级结构
- 需要检出多个分支(默认分支 + 最新活跃分支 + release/prod 分支)
- 需要生成包含每个组独立工作表的 Excel 索引文件(01.Index.xlsx)
前提条件
开始前,需从用户处收集以下信息:
| 参数 | 必填 | 默认值 | 说明 |
|---|
| GitLab URL | ✅ | — | 例如 https://gitlab.company.com |
| 个人访问令牌 |
✅ | — | 需要 read
api + readrepository 权限范围 |
| 目标组 | ✅ | — | 组/子组/项目路径,逗号分隔。支持:顶级组(myGroup)、子组路径(myGroup/mySubGroup)或直接项目路径(myGroup/mySubGroup/my-project) |
| 本地存储路径 | ❌ | ~/Desktop/Code | 仓库存储位置 |
| 认证方式 | ❌ | HTTPS+Token | 或已配置密钥的 SSH |
| 模式 | ❌ | clone | clone(默认)、update(仅拉取)或 sync(克隆+拉取+清理过期 Excel 行) |
| 并行数 | ❌ | 4 | 并行克隆工作进程数(GITLAB_WORKERS) |
| 总超时时间 | ❌ | 0(无限制) | 全局超时时间(秒)(GITLAB
TOTALTIMEOUT)。0=无限制 |
工作流程
步骤 1:收集输入
向用户询问上述 4 个参数。将它们作为环境变量传递给脚本:
GITLABURL, GITLABTOKEN, GITLABGROUPS(逗号分隔), GITLABBASE_DIR
步骤 2:运行脚本
bash
cd <技能目录>/scripts
python3 cloneandindex.py
脚本自动处理所有操作:
- 1. 解析每个输入路径 — 自动检测是顶级组、子组还是直接项目
- 对于顶级组:递归获取所有子组(访问被拒绝则跳过)
- 对于子组路径(例如 myGroup/mySubGroup):直接解析并仅同步该子组及其后代
- 对于直接项目路径:仅同步该特定项目
- 克隆新项目 / 拉取已有项目(每个项目 5 分钟超时)
- 使用多进程进行并行克隆/拉取(超时时终止进程组以防止孤儿 git 进程)
- 检出分支:默认分支 + 最新活跃分支 + release/prod 分支
- 增量更新 01.Index.xlsx,每 50 个项目更新一次(确保部分结果在崩溃时得以保存)
- 收到 SIGTERM/SIGINT 信号时,在退出前紧急将待处理结果刷新到 Excel
- 在发现阶段打印每个组的项目数量,并实时显示进度和已用时间
- 在 sync 模式下,最终写入 Excel 时会删除已删除/已归档项目的行,并处理跨组迁移
步骤 3:报告结果
脚本完成后,报告:
- - 克隆/更新的项目总数
- 任何失败/超时的项目(脚本会打印摘要表格)
模式
仅更新
如果项目已存在且用户只想更新:
bash
GITLABMODE=update python3 cloneand_index.py
此模式跳过克隆,仅对已有仓库执行 git fetch --all && git pull,重新检出分支,并刷新 Excel。
同步(完全同步并清理)
进行完全同步并清理 Excel 中的过期数据:
bash
GITLABMODE=sync python3 cloneand_index.py
此模式行为类似 clone 模式(新仓库被克隆,已有仓库被拉取),但额外:
- - 删除 GitLab 上已不存在(已删除/已归档)项目的 Excel 行
- 处理在组间移动的项目(更新路径,删除旧行)
- 仅清理 GITLAB_GROUPS 中指定组所属的工作表(不会影响其他组的数据)
Excel 规范
文件: <本地路径>/ 目录下的 01.Index.xlsx(例如 ~/Desktop/Code/01.Index.xlsx)
工作表: 每个顶级组一个独立工作表(工作表名称 = 组名,例如 myGroup)
列:
| 列 | 字段 | 内容 |
|---|
| A | 主Group名称 | 顶级组(例如 myGroup) |
| B |
子Group路径 | 不含项目名称的完整组路径 |
| C | Project路径 | 完整路径(例如 myGroup/mySubGroup/myProject) |
| D | Project名称 | 项目名称 |
| E | Project描述 | GitLab 描述 |
| F | 已checkout分支 | 所有本地分支,每行一个 |
| G | 分支最新提交时间 | 对应的提交时间,每行一个 |
| H | SSH Git链接 | ssh
urlto_repo |
| I | 下载时间 | 克隆/更新时间戳 |
| J | Project ID | GitLab 项目 ID(隐藏列,用于匹配) |
排序: A(升序)→ B(升序)→ D(升序)
格式: 冻结标题行,所有单元格细边框,F/G 列左对齐并自动换行,其他列居中对齐,UTF-8 编码。
安全性
- - 令牌通过环境变量传递,不会记录到日志中
- 克隆后,远程 URL 会被重置以移除嵌入的令牌
- 如果克隆超时或崩溃,清理步骤会从 .git/config 中移除令牌
输出结构
~/Desktop/Code/
├── 01.Index.xlsx
├── myGroup/
│ ├── SubGroup1/
│ │ ├── project-a/
│ │ └── project-b/
│ └── SubGroup2/
│ └── project-c/
└── AnotherGroup/
└── ...