Tax Filing
A guided workflow for preparing US federal income tax returns. This skill covers all filer types — US citizens, resident aliens (RA), and nonresident aliens (NRA) — by first determining the correct filer type, then routing to the appropriate forms and procedures. Both citizen/RA and NRA workflows are fully self-contained in this skill, including PDF form field mappings, cross-form validation, and the safe update_form.py script.
Step 1: Gather Source Documents
Before anything else, ask the user what documents they have. Common source docs:
| Document | What it tells you |
|---|
| W-2 | Wages, federal/state tax withheld, employer HSA contributions |
| 1099-NEC |
Contractor / self-employment income |
| 1099-INT | Bank interest |
| 1099-DIV | Dividends (qualified and ordinary) |
| 1099-B | Stock/crypto sales (proceeds and cost basis) |
| 1099-MISC | Other income (royalties, rents, etc.) |
| 1099-SA / 5498-SA | HSA distributions and contributions |
| 1098 | Mortgage interest paid |
| 1098-T | Tuition paid (education credits) |
| I-94 | Travel history (needed for NRA determination) |
If the user has an I-94 or mentions a visa type, that's a strong signal they may be NRA — proceed to Step 2 with that in mind.
Step 2: Determine Filer Type
This is the critical routing decision. Read references/filing-status.md for the full decision tree. The short version:
- 1. US citizen or green card holder → Resident. File Form 1040. Go to Step 3a.
- Visa holder (F-1, J-1, H-1B, OPT, etc.) → Apply the Substantial Presence Test (SPT):
- Count days present: current year days + (1/3 × prior year days) + (1/6 × two years ago days)
- If total ≥ 183 →
Resident alien (unless an exemption applies). File
Form 1040. Go to Step 3a.
- F-1 and J-1 students are
exempt from SPT for their first 5 calendar years. They remain NRA. File
Form 1040-NR. Go to Step 3b.
- If total < 183 →
Nonresident alien. File
Form 1040-NR. Go to Step 3b.
- 3. Dual-status (changed status mid-year) → complex case. Note it for the user and suggest professional review for the transition period.
Ask the user directly if unclear. Don't assume.
Step 3a: Citizen / Resident Alien Workflow (Form 1040)
Read references/form-routing.md to determine which schedules and forms are needed based on the user's income types. For field-level details on individual schedule lines and common pitfalls, read references/common-schedules.md when filling specific forms.
Workflow
- 1. Determine filing status — Single, MFJ, MFS, HOH, QSS (see
references/filing-status.md) - Map income to forms — Use the routing table in INLINECODE5
- Standard vs. itemized deduction — 2025 standard deduction: $15,000 (Single), $30,000 (MFJ). Itemize only if total Schedule A deductions exceed this.
- Calculate key amounts from source docs:
- Total wages (sum of all W-2 Box 1)
- Total interest/dividends (1099-INT/DIV)
- Net self-employment income (1099-NEC minus expenses → Schedule C → Schedule SE)
- Capital gains/losses (1099-B → Form 8949 → Schedule D)
- Above-the-line deductions (HSA, student loan interest, SE tax deduction → Schedule 1)
- AGI = Total income - Adjustments
- Taxable income = AGI - Deduction (standard or itemized)
- 5. Fill PDF forms — Use
scripts/update_form.py (bundled with this skill) or write equivalent code following the three critical rules:
- Never write to the same path as input
- Always use
auto_regenerate=False
- Iterate all pages
- 6. Cross-validate — Check the validation rules below
- Final review — Re-extract all fields and confirm consistency
Cross-Form Validation Rules (Form 1040)
- 1. W-2 Box 1 (all) → 1040 Line 1a (total wages)
- Schedule 1 Line 11 → 1040 Line 8 (additional income)
- Schedule 1 Line 26 → 1040 Line 10 (adjustments)
- 1040 Line 9 (total income) = Line 1z + Line 8
- 1040 Line 11 (AGI) = Line 9 - Line 10
- 1040 Line 13 = standard deduction or Schedule A total
- 1040 Line 15 (taxable income) = Line 11 - Line 13 - Line 14
- Schedule C Line 31 (net profit) → Schedule SE Line 2
- Schedule SE Line 13 (SE tax) → Schedule 2 Line 4
- Schedule D Line 16 or 21 → 1040 Line 7 (capital gain/loss)
- W-2 Box 2 (all) → 1040 Line 25a (federal tax withheld)
- Estimated payments (1040-ES) → 1040 Line 26
Common Mistakes (Citizen/RA)
- - Forgetting to file Schedule SE when you have 1099-NEC income (self-employment tax is separate from income tax)
- Using the wrong cost basis from 1099-B (check Box 1e — if blank, you must calculate it yourself via Form 8949)
- Double-counting employer HSA contributions (W-2 Box 12 Code W) — these go on Form 8889 Line 9, not Line 2
- Missing the $3,000 capital loss limit — net losses over $3,000 carry forward, they don't all deduct in one year
- Filing as Single when Head of Household applies (HOH has a larger standard deduction and lower tax brackets)
Step 3b: Nonresident Alien Workflow (Form 1040-NR)
This section covers the complete NRA filing workflow. For NRA-specific field-to-line mappings, see references/form-field-maps.md. For PDF recovery procedures, see references/pypdf-recovery.md.
Critical pypdf Rules
These rules prevent data loss. Violating them will corrupt PDF files. The bundled scripts/update_form.py enforces all three automatically — use it instead of writing update logic from scratch.
- 1. NEVER write output to the same path as input. PdfReader uses lazy reading — if you write to the same file, you truncate it while the reader still holds references into it. Page annotations (already in memory) may survive, but the AcroForm catalog gets corrupted during the partial read/write overlap. Always write to a temp path first, then copy.
- 2. Always use
auto_regenerate=False when calling update_page_form_field_values(). The default True removes /AP (appearance stream) entries from each field. Without appearance streams, some PDF viewers render the field as blank even though the /V value is correct — the data is there but invisible.
- 3. Iterate all pages when updating fields, even if you think fields are on page 1. Some IRS forms silently split fields across pages — if you only update page 0, fields on page 1 will be silently skipped with no error.
- 4. If a PDF gets corrupted (field tree broken but annotation values survive):
- Check annotations directly:
page.get("/Annots") →
annot.get("/V")
- Rebuild the AcroForm
/Fields array from page annotations
- Read
references/pypdf-recovery.md when you see this symptom — it has the full step-by-step repair procedure
Core Update Function
A bundled script at scripts/update_form.py encodes all three critical rules above plus post-write verification. Use it for all form updates:
CODEBLOCK0
CODEBLOCK1
The script automatically verifies that fields survived the write and warns if the output looks corrupted. Ensure pypdf is available: pip install pypdf --break-system-packages.
Field Discovery Workflow
Before modifying any form, always extract and map fields first.
Step 1: Extract all field names and values
CODEBLOCK2
Step 2: Map fields to line numbers via Y-position
Before this step, read references/form-field-maps.md for the expected field-to-line table — it covers 1040-NR, 8843, Schedule NEC, Schedule OI, Form 8833, Form 8889, and Schedule 1. Use it as a reference while verifying the Y-position analysis below.
IRS PDFs use positional layout. Extract annotation rectangles to determine which line a field corresponds to:
CODEBLOCK3
Compare the Y-position ordering against the physical form layout to create a definitive field-to-line map.
Step 3: Check checkboxes and radio buttons
CODEBLOCK4
NRA Form Suite Overview
A typical NRA (F-1 OPT) filing includes these forms. See references/form-field-maps.md for complete field-to-line mappings.
| Form | Purpose | Key Fields |
|---|
| 1040-NR | Main return | Income lines, AGI, tax, withholding, refund |
| Schedule 1 |
Additional income/adjustments | Contractor income (Line 8h), HSA deduction |
| Schedule NEC | Tax on non-effectively-connected income | Dividends, capital gains, NEC tax |
| Schedule OI | Other information | Visa type, country, treaty claims, days present |
| Form 8843 | Statement for exempt individuals | Days of presence, visa status, exclusion days |
| Form 8833 | Treaty-based return position | Treaty article, exemption amount |
| Form 8889 | HSA | Contributions, employer contributions, deduction |
NRA Workflow Steps
- 1. Gather source docs (W-2, 1099s, 5498-SA, I-94)
- Extract all fields from all filled PDFs
- Build field-to-line maps using Y-position analysis and INLINECODE24
- Calculate key NRA amounts:
- Total wages (W-2 Box 1) → 1040-NR Line 1a
- Treaty-exempt income → Line 1k (requires Form 8833)
- Net wages = Line 1a minus Line 1k → Line 1z
- Contractor income (1099-NEC) → Schedule 1 Line 8h → 1040-NR Line 8
- HSA deduction → Form 8889 → Schedule 1 → 1040-NR Line 10
- AGI = Line 9 (total ECI) - Line 10 (adjustments)
- NRA cannot take standard deduction (must itemize or take $0)
- Non-effectively connected income (dividends, capital gains) → Schedule NEC at flat rates
- 5. Fill PDF forms using
scripts/update_form.py (different output path!) - Cross-validate every number against source docs and between forms (see rules below)
- Apply fixes using the safe update function
- Re-extract and verify all fields after each fix
- Final verification: read every form one more time and confirm consistency
Cross-Form Validation Rules (NRA / Form 1040-NR)
After filling, validate these consistency checks:
- 1. W-2 Box 1 → 1040-NR Line 1a (wages)
- Schedule 1 Line 10 → 1040-NR Line 8 (additional income from Sch 1)
- 1040-NR Line 1a minus treaty exempt = Line 1z (if treaty applies)
- 1040-NR Line 9 (total ECI) = Line 1z + Line 8 (or sum of all income lines)
- 1040-NR Line 11a (AGI) = Line 9 - Line 10 (adjustments)
- Schedule NEC total tax → 1040-NR Line 23a
- Form 8843 Line 4b = Schedule OI current year days (days to exclude)
- Form 8843 Lines 4a days must match Schedule OI for each year
- W-2 Box 2 → 1040-NR Line 25a (federal tax withheld)
- Form 8833 exemption amount → 1040-NR Line 1k (treaty exempt income)
Common NRA Mapping Errors
Watch for these — they are the most frequent mistakes when auto-filling:
- - Contractor income on Line 5b (pensions) instead of Line 8 (additional income from Schedule 1)
- Wages duplicated on both Line 1a and Line 1h (Line 1h is "other earned income", not a repeat)
- AGI placed on Line 6 (reserved/future use) instead of Line 11a
- Treaty exempt amount missing from Line 1k when Form 8833 is filed
- Line 1z left empty — should equal total of Lines 1a through 1h minus exempt
- Days of presence wrong on Form 8843 — must match I-94 travel history exactly, not assume 365
US-China Tax Treaty Quick Reference
This section covers the US-China treaty as a concrete example. Similar treaties exist for other countries (e.g., India Article 21(2), South Korea Article 21(1)) — verify article numbers and rates against the specific treaty if your country differs.
For Chinese nationals on F-1 visa:
- - Article 20(c): $5,000 exemption on wages/scholarship for students — reported on Line 1k of 1040-NR, requires Form 8833
- Article 9(2): 10% rate on dividends (vs 30% default) — reported on Schedule NEC
- IRC 871(i)(2)(A): Bank deposit interest is exempt for NRAs — do NOT report on any form
- IRC 871(a)(2): Capital gains taxed at 30% flat if present 183+ days — reported on Schedule NEC
Workflow Summary
- 1. Gather source documents from the user
- Determine filer type (citizen / RA / NRA) and filing status
- Route to the correct workflow (3a for 1040, 3b for 1040-NR)
- Identify required forms and schedules
- Calculate amounts from source documents
- Fill PDF forms safely (different output path,
auto_regenerate=False, iterate all pages) - Cross-validate all numbers between forms
- Final review — re-extract every form and confirm consistency
技能名称: tax-filing
详细描述:
税务申报
一个用于准备美国联邦所得税申报表的引导式工作流程。此技能涵盖所有申报人类型——美国公民、居民外籍人士 (RA) 和非居民外籍人士 (NRA)——首先确定正确的申报人类型,然后引导至相应的表格和程序。公民/RA 和 NRA 的工作流程均完全包含在此技能中,包括 PDF 表单字段映射、跨表单验证以及安全的 update_form.py 脚本。
步骤 1:收集源文件
在进行任何其他操作之前,先询问用户他们拥有哪些文件。常见的源文件包括:
| 文件 | 说明内容 |
|---|
| W-2 | 工资、联邦/州预扣税款、雇主 HSA 缴款 |
| 1099-NEC |
承包商/自雇收入 |
| 1099-INT | 银行利息 |
| 1099-DIV | 股息(合格股息和普通股息) |
| 1099-B | 股票/加密货币销售(收益和成本基础) |
| 1099-MISC | 其他收入(版税、租金等) |
| 1099-SA / 5498-SA | HSA 分配和缴款 |
| 1098 | 已付抵押贷款利息 |
| 1098-T | 已付学费(教育抵免) |
| I-94 | 旅行记录(确定 NRA 身份所需) |
如果用户有 I-94 或提及签证类型,这强烈表明他们可能是 NRA——在进入步骤 2 时需牢记这一点。
步骤 2:确定申报人类型
这是关键的路径决策。请阅读 references/filing-status.md 了解完整的决策树。简要版本如下:
- 1. 美国公民或绿卡持有者 → 居民。申报 Form 1040。进入步骤 3a。
- 签证持有者(F-1、J-1、H-1B、OPT 等) → 应用 实质性居住测试 (SPT):
- 计算在美国的天数:本年度天数 + (1/3 × 上一年度天数) + (1/6 × 两年前天数)
- 如果总数 ≥ 183 →
居民外籍人士(除非有豁免)。申报
Form 1040。进入步骤 3a。
- F-1 和 J-1 学生在
前 5 个日历年内豁免 SPT。他们仍为 NRA。申报
Form 1040-NR。进入步骤 3b。
- 如果总数 < 183 →
非居民外籍人士。申报
Form 1040-NR。进入步骤 3b。
- 3. 双重身份(年中身份变更)→ 复杂情况。为用户注明,并建议就过渡期寻求专业审核。
如果不清楚,直接询问用户。不要假设。
步骤 3a:公民/居民外籍人士工作流程(Form 1040)
阅读 references/form-routing.md,根据用户的收入类型确定需要哪些附表(Schedule)和表格。有关各个附表行项目的字段级别详细信息和常见陷阱,请在填写特定表格时阅读 references/common-schedules.md。
工作流程
- 1. 确定申报身份 — 单身、夫妻共同申报、夫妻单独申报、户主、符合条件的丧偶者(参见 references/filing-status.md)
- 将收入映射到表格 — 使用 references/form-routing.md 中的路由表
- 标准扣除与逐项扣除 — 2025 年标准扣除额:$15,000(单身),$30,000(夫妻共同申报)。仅当附表 A 的扣除总额超过此数额时才进行逐项扣除。
- 根据源文件计算关键金额:
- 总工资(所有 W-2 表格第 1 栏之和)
- 总利息/股息(1099-INT/DIV)
- 自雇净收入(1099-NEC 减去费用 → 附表 C → 附表 SE)
- 资本利得/损失(1099-B → Form 8949 → 附表 D)
- 线上扣除(HSA、学生贷款利息、自雇税扣除 → 附表 1)
- AGI = 总收入 - 调整项
- 应税收入 = AGI - 扣除额(标准或逐项)
- 5. 填写 PDF 表格 — 使用 scripts/update_form.py(随此技能提供)或编写等效代码,遵循三条关键规则:
- 切勿写入与输入相同的路径
- 始终使用 auto_regenerate=False
- 遍历所有页面
- 6. 交叉验证 — 检查下面的验证规则
- 最终审核 — 重新提取所有字段并确认一致性
跨表单验证规则(Form 1040)
- 1. 所有 W-2 表格第 1 栏 → 1040 表格第 1a 行(总工资)
- 附表 1 第 11 行 → 1040 表格第 8 行(额外收入)
- 附表 1 第 26 行 → 1040 表格第 10 行(调整项)
- 1040 表格第 9 行(总收入)= 第 1z 行 + 第 8 行
- 1040 表格第 11 行(AGI)= 第 9 行 - 第 10 行
- 1040 表格第 13 行 = 标准扣除额或附表 A 总额
- 1040 表格第 15 行(应税收入)= 第 11 行 - 第 13 行 - 第 14 行
- 附表 C 第 31 行(净利润)→ 附表 SE 第 2 行
- 附表 SE 第 13 行(自雇税)→ 附表 2 第 4 行
- 附表 D 第 16 行或第 21 行 → 1040 表格第 7 行(资本利得/损失)
- 所有 W-2 表格第 2 栏 → 1040 表格第 25a 行(联邦预扣税款)
- 预估税款(1040-ES)→ 1040 表格第 26 行
常见错误(公民/RA)
- - 在有 1099-NEC 收入时忘记申报 附表 SE(自雇税与所得税是分开的)
- 使用 1099-B 中错误的 成本基础(检查第 1e 栏——如果为空,您必须通过 Form 8949 自行计算)
- 重复计算 雇主 HSA 缴款(W-2 表格第 12 栏代码 W)——这些应填入 Form 8889 第 9 行,而非第 2 行
- 遗漏 $3,000 资本损失限额——超过 $3,000 的净损失结转至下一年,不能在一个纳税年度内全部扣除
- 在符合 户主 条件时按 单身 申报(户主身份享有更高的标准扣除额和更低的税率等级)
步骤 3b:非居民外籍人士工作流程(Form 1040-NR)
本节涵盖完整的 NRA 申报工作流程。有关 NRA 特定的字段到行项目映射,请参见 references/form-field-maps.md。有关 PDF 恢复程序,请参见 references/pypdf-recovery.md。
关键 pypdf 规则
这些规则可防止数据丢失。违反这些规则将损坏 PDF 文件。随附的 scripts/update_form.py 脚本会自动强制执行所有三条规则——请使用它,而不是从头编写更新逻辑。
- 1. 切勿将输出写入与输入相同的路径。 PdfReader 使用惰性读取——如果您写入同一个文件,您会在读取器仍持有对它的引用时截断该文件。页面注释(已在内存中)可能幸存,但在部分读/写重叠期间,AcroForm 目录会损坏。始终先写入临时路径,然后再复制。
- 2. 在调用 updatepageformfieldvalues() 时始终使用 auto_regenerate=False。 默认值 True 会从每个字段中删除 /AP(外观流)条目。没有外观流,某些 PDF 查看器会将字段渲染为空白,即使 /V 值是正确的——数据存在但不可见。
- 3. 更新字段时遍历所有页面,即使您认为字段在第 1 页上。某些 IRS 表格会静默地将字段拆分到多个页面——如果您只更新第 0 页,第 1 页上的字段将被静默跳过,且不会报错。
- 4. 如果 PDF 损坏(字段树损坏但注释值幸存):
- 直接检查注释:page.get(/Annots) → annot.get(/V)
- 从页面注释重建 AcroForm /