When to Use
Use when the main artifact is a Microsoft Excel workbook or spreadsheet file, especially when formulas, dates, formatting, merged cells, workbook structure, or cross-platform behavior matter.
Core Rules
1. Choose the workflow by job, not by habit
- - Use
pandas for analysis, reshaping, and CSV-like tasks. - Use
openpyxl when formulas, styles, sheets, comments, merged cells, or workbook preservation matter. - Treat CSV as plain data exchange, not as an Excel feature-complete format.
- Reading values, preserving a live workbook, and building a model from scratch are different spreadsheet jobs.
2. Dates are serial numbers with legacy quirks
- - Excel stores dates as serial numbers, not real date objects.
- The 1900 date system includes the false leap-day bug, and some workbooks use the 1904 system.
- Time is fractional day data, so formatting and conversion both matter.
- Date correctness is not enough if the number format still displays the wrong thing to the user.
3. Keep calculations in Excel when the workbook should stay live
- - Write formulas into cells instead of hardcoding derived results from Python.
- Use references to assumption cells instead of magic numbers inside formulas.
- Cached formula values can be stale, so do not trust them blindly after edits.
- Check copied formulas for wrong ranges, wrong sheets, and silent off-by-one drift before delivery.
- Absolute and relative references are part of the logic, so copied formulas can be wrong even when they still "work".
- Test new formulas on a few representative cells before filling them across a whole block.
- Verify denominators, named ranges, and precedent cells before shipping formulas that depend on them.
- A workbook should ship with zero formula errors, not with known
#REF!, #DIV/0!, #VALUE!, #NAME?, or circular-reference fallout left for the user to fix. - For model-style work, document non-obvious hardcodes, assumptions, or source inputs in comments or nearby notes.
4. Protect data types before Excel mangles them
- - Long identifiers, phone numbers, ZIP codes, and leading-zero values should usually be stored as text.
- Excel silently truncates numeric precision past 15 digits.
- Mixed text-number columns need explicit handling on read and on write.
- Scientific notation, auto-parsed dates, and stripped leading zeros are common corruption, not cosmetic issues.
5. Preserve workbook structure before changing content
- - Existing templates override generic styling advice.
- Only the top-left cell of a merged range stores the value.
- Hidden rows, hidden columns, named ranges, and external references can still affect formulas and outputs.
- Shared strings, defined names, and sheet-level conventions can matter even when the visible cells look simple.
- Match styles for newly filled cells instead of quietly introducing a new visual system.
- If the workbook is a template, preserve sheet order, widths, freezes, filters, print settings, validations, and visual conventions unless the task explicitly changes them.
- Conditional formatting, filters, print areas, and data validation often carry business meaning even when users only mention the numbers.
- If there is no existing style guide and the file is a model, keep editable inputs visually distinguishable from formulas, but never override an established template to force a generic house style.
6. Recalculate and review before delivery
- - Formula strings alone are not enough if the recipient needs current values.
- INLINECODE6 preserves formulas but does not calculate them.
- Verify no
#REF!, #DIV/0!, #VALUE!, #NAME?, or circular-reference fallout remains. - If layout matters, render or visually review the workbook before calling it finished.
- Be careful with read modes: opening a workbook for values only and then saving can flatten formulas into static values.
- If assumptions or hardcoded overrides must stay, make them obvious enough that the next editor can audit the workbook.
7. Scale the workflow to the file size
- - Large workbooks can fail for boring reasons: memory spikes, padded empty rows, and slow full-sheet reads.
- Use streaming or chunked reads when the file is big enough that loading everything at once becomes fragile.
- Large-file workflows also need narrower reads, explicit dtypes, and sheet targeting to avoid accidental damage.
Common Traps
- - Type inference on read can leave numbers as text or convert IDs into damaged numeric values.
- Column indexing varies across tools, so off-by-one mistakes are common in generated formulas.
- Newlines in cells need wrapping to display correctly.
- External references break easily when source files move.
- Password protection in old Excel workflows is not serious security.
- INLINECODE11 can contain macros, and
.xls remains a tighter legacy format. - Large files may need streaming reads or more careful memory handling.
- Google Sheets and LibreOffice can reinterpret dates, formulas, or styling differently from Excel.
- Dynamic array or newer Excel functions like
FILTER, XLOOKUP, SORT, or SEQUENCE may fail or degrade in older viewers. - A workbook can look fine while still carrying stale cached values from a prior recalculation.
- Saving the wrong workbook view can replace formulas with cached values and quietly destroy a live model.
- Copying formulas without checking relative references can push one bad range across an entire block.
- Hidden sheets, named ranges, validations, and merged areas often keep business logic that is invisible in a quick skim.
- A workbook can appear numerically correct while still failing because filters, conditional formats, print settings, or data validation were stripped.
- A workbook can be numerically correct and still fail visually because wrapped text, clipped labels, or narrow columns were never reviewed.
Related Skills
Install with
clawhub install <slug> if user confirms:
- -
csv — Plain-text tabular import and export workflows. - INLINECODE19 — General data handling patterns before spreadsheet output.
- INLINECODE20 — Higher-level analysis that can feed workbook deliverables.
Feedback
- - If useful: INLINECODE21
- Stay updated: INLINECODE22
何时使用
当主要交付物是Microsoft Excel工作簿或电子表格文件时使用,尤其是涉及公式、日期、格式、合并单元格、工作簿结构或跨平台兼容性问题时。
核心规则
1. 根据任务选择工作流程,而非习惯
- - 使用pandas进行数据分析、重塑和类似CSV的任务。
- 当涉及公式、样式、工作表、批注、合并单元格或工作簿保留时,使用openpyxl。
- 将CSV视为纯数据交换格式,而非Excel功能完备的格式。
- 读取数值、保留动态工作簿和从头构建模型属于不同的电子表格任务。
2. 日期是带有历史遗留问题的序列号
- - Excel将日期存储为序列号,而非真正的日期对象。
- 1900日期系统包含错误的闰日漏洞,部分工作簿使用1904系统。
- 时间是小数天数数据,因此格式化和转换都很重要。
- 如果数字格式仍向用户显示错误内容,仅日期正确是不够的。
3. 当工作簿需保持动态时,将计算保留在Excel中
- - 将公式写入单元格,而非硬编码Python计算的结果。
- 使用对假设单元格的引用,而非公式中的魔法数字。
- 缓存的公式值可能过时,编辑后不要盲目信任。
- 交付前检查复制公式的范围错误、工作表错误和无声的偏移偏差。
- 绝对引用和相对引用是逻辑的一部分,因此复制的公式即使有效也可能出错。
- 在填充整个区域前,先在几个代表性单元格上测试新公式。
- 在交付依赖分母、命名范围和前置单元格的公式前,验证这些元素。
- 工作簿交付时应零公式错误,不应留下已知的#REF!、#DIV/0!、#VALUE!、#NAME?或循环引用问题让用户修复。
- 对于模型类工作,在批注或附近备注中记录不明显的硬编码、假设或源输入。
4. 在Excel篡改数据前保护数据类型
- - 长标识符、电话号码、邮政编码和带前导零的值通常应存储为文本。
- Excel会静默截断超过15位的数字精度。
- 混合文本-数字列在读写时都需要显式处理。
- 科学记数法、自动解析日期和去除前导零是常见的数据损坏问题,而非外观问题。
5. 在修改内容前保留工作簿结构
- - 现有模板优先于通用样式建议。
- 合并范围仅左上角单元格存储值。
- 隐藏行、隐藏列、命名范围和外部引用仍可能影响公式和输出。
- 共享字符串、定义名称和工作表级约定即使可见单元格看起来简单也可能重要。
- 为新填充的单元格匹配样式,而非静默引入新的视觉系统。
- 如果工作簿是模板,除非任务明确更改,否则保留工作表顺序、列宽、冻结窗格、筛选器、打印设置、数据验证和视觉约定。
- 条件格式、筛选器、打印区域和数据验证通常承载业务含义,即使用户只提到数字。
- 如果没有现有样式指南且文件是模型,保持可编辑输入与公式在视觉上可区分,但切勿覆盖已有模板强制使用通用样式。
6. 交付前重新计算并审查
- - 如果接收方需要当前值,仅公式字符串是不够的。
- openpyxl保留公式但不进行计算。
- 确认没有残留的#REF!、#DIV/0!、#VALUE!、#NAME?或循环引用问题。
- 如果布局重要,在宣布完成前渲染或视觉审查工作簿。
- 注意读取模式:仅读取值打开工作簿然后保存,可能将公式扁平化为静态值。
- 如果必须保留假设或硬编码覆盖,使其足够明显,以便后续编辑者可以审计工作簿。
7. 根据文件大小调整工作流程
- - 大型工作簿可能因无聊原因失败:内存激增、填充的空行和缓慢的全表读取。
- 当文件大到一次性加载变得脆弱时,使用流式或分块读取。
- 大文件工作流程还需要更窄的读取范围、显式的数据类型和针对特定工作表,以避免意外损坏。
常见陷阱
- - 读取时的类型推断可能将数字保留为文本或将ID转换为损坏的数值。
- 不同工具的列索引不同,因此生成的公式中常见偏移一位的错误。
- 单元格中的换行需要自动换行才能正确显示。
- 外部引用在源文件移动时容易断开。
- 旧版Excel工作流程中的密码保护并非真正的安全措施。
- .xlsm可能包含宏,.xls仍是更严格的旧版格式。
- 大文件可能需要流式读取或更谨慎的内存处理。
- Google Sheets和LibreOffice可能以不同于Excel的方式重新解释日期、公式或样式。
- 动态数组或更新的Excel函数如FILTER、XLOOKUP、SORT或SEQUENCE可能在旧版查看器中失败或降级。
- 工作簿可能看起来正常,但仍携带先前重新计算中过时的缓存值。
- 保存错误的工作簿视图可能用缓存值替换公式,静默破坏动态模型。
- 复制公式时不检查相对引用,可能将一个错误范围传播到整个区域。
- 隐藏工作表、命名范围、数据验证和合并区域通常承载快速浏览中不可见的业务逻辑。
- 工作簿可能在数值上正确,但因筛选器、条件格式、打印设置或数据验证被剥离而失败。
- 工作簿可能在数值上正确,但因自动换行文本、裁剪标签或过窄的列宽未经审查而在视觉上失败。
相关技能
如果用户确认,使用clawhub install 安装:
- - csv — 纯文本表格数据的导入和导出工作流程。
- data — 电子表格输出前的一般数据处理模式。
- data-analysis — 可为工作簿交付物提供输入的高级分析。
反馈
- - 如果有用:clawhub star excel-xlsx
- 保持更新:clawhub sync