Creative Eye — Design Judgment for AI Agents

The Problem

AI agents ship bad creative because they have no taste. Specifically:

1. They confuse "exists" with "good." Generating output ≠ creating quality. The first version is almost never good enough, but agents treat it as done.
They have no benchmarks. Without seeing what "great" looks like, agents can't distinguish amateur from professional work.
They optimize for volume over quality. 50 mediocre variations when 1 excellent piece would outperform all of them combined.
They can't self-evaluate. They lack the vocabulary and frameworks to critique their own work.
They don't learn from failures. Same mistakes repeat across sessions because there's no feedback loop.

This skill provides the frameworks, prompts, and workflows to fix all five problems.

The Framework: STUDY → COMPARE → CREATE → EVALUATE

Every creative task follows this sequence. Never skip steps.

1. STUDY — Build Taste Through Exposure

Before creating anything in a given domain, study what "great" looks like.

Daily practice (10 min):
Pick ONE brand or creator. Analyze ONE piece of their content. Ask:

- What makes it work visually? Be specific (not "it looks good" but "the 200-tracking on uppercase Didot creates editorial authority")
What are the exact typography choices? (font, weight, size, tracking, color)
What's the composition doing? (where does the eye land first, second, third?)
What role does negative space play?
What would break if you changed one element?
What feeling does it create, and HOW does it create that feeling?

Log findings to a study log file. Accumulate observations over time. This builds the vocabulary and pattern library that enables judgment.

Picking brands to study:
Choose 8-12 brands across these tiers:

- Tier 1 (3-4 brands): Direct competitors or brands with a similar aesthetic to yours
Tier 2 (3-4 brands): Brands your target audience admires, even if different category
Tier 3 (2-3 brands): Category leaders with exceptional design systems
Tier 4 (1-2 brands): Wildcard inspiration from unrelated fields (architecture, editorial, fashion)

2. COMPARE — Find References Before Creating

Before generating any visual content:

1. Find 3-5 reference examples of what "great" looks like for THIS specific format
Save them or note URLs
Identify the specific qualities that make each one great
Use those qualities as the creative brief constraints

Never create in a vacuum. The difference between amateur and professional creative is almost always that professionals looked at references first.

3. CREATE — Every Decision Needs a Reason

When generating creative, every choice must be intentional:

- Why this font? Not "it's available" — what does it communicate? (Serif = editorial authority. Geometric sans = modern tech. Humanist sans = friendly approachable.)
Why this color? Does it match the brand palette? What emotion does it carry?
Why this layout? What does left-aligned vs centered communicate? Where should the eye go first?
Why this size? Small and restrained = premium/exclusive. Large and bold = loud/promotional.
Why this image treatment? Warm and grainy = authentic/vintage. Clean and sharp = modern/clinical.

If you can't articulate why, you're guessing. Stop and refer back to your references.

4. EVALUATE — Score Before Shipping

Run every piece through both evaluation tools below before publishing.

The 5-Point Creative Scorecard

Adapted from Runway's video evaluation framework. Score each dimension 1-10.

#	Dimension	Question	Min Score
1	Brief Adherence	Does this serve the stated business/creative goal?	7
2

Brand Consistency | Does this look like it came from the same brand as everything else? | 8 |
| 3 | Visual Quality | No artifacts, misspellings, clip-art vibes, AI tells? | 9 |
| 4 | Emotional Resonance | Would a real human stop scrolling for this? | 7 |
| 5 | Style Match | Does this match the brand's specific aesthetic? | 8 |

If ANY score is below its minimum → do not publish. Fix or discard.

The 10-Point Pre-Publish Checklist

Quick yes/no gate. Requires 8+ "yes" to ship.

1. Does this look like it came from a well-funded brand? (not a weekend side project)
Would your most design-savvy friend say "this is fire" unprompted?
If you remove all text, is the composition still interesting?
Does the typography have personality, or is it just "text on a thing"?
Are there more than 3 elements competing for attention? (if yes, remove one)
Would someone in your target audience actually use/wear/share this?
Is there anything that looks AI-generated or clip-art-like?
Have you compared this side-by-side with a real brand reference?
Would you personally pay the listed price for this?
If this got 100K views, would it help or hurt the brand?

The Self-Refine Loop

Based on Andrew Ng's Reflection pattern and the Self-Refine research framework (CMU). This is the core workflow for iterative improvement.

CODEBLOCK0

The Loop

Step 1: Generate. Create the first draft using references and brand constraints.

Step 2: Critique. Use a vision model to evaluate the output. Feed it the generated image plus 2-3 reference images from your library. Use the evaluation prompts below. Get specific scores and specific issues.

Step 3: Fix. Address ONLY the specific issues identified. Do not regenerate from scratch unless the critique identifies fundamental problems (wrong concept, wrong style direction).

Step 4: Re-critique. Run the improved version through the same evaluation. Compare scores.

Step 5: Ship or Kill.

- All scores at or above minimums → ship it
Improved but still below on 1-2 dimensions → one more iteration (go to Step 3)
No meaningful improvement after iteration → kill it and try a different approach
Maximum 3 iterations. If it's not working after 3 passes, the concept is wrong, not the execution. Escalate to a human or start over with a fundamentally different direction.

Why 3 Iterations Max

Diminishing returns hit fast. If the core concept is sound, 1-2 refinements will get it there. If it takes more than 3, you're polishing the wrong thing. The discipline to kill bad work is as important as the ability to refine good work.

Vision Model Evaluation Prompts

Use these with any vision-capable model (GPT-4o, Claude, Gemini). Feed the generated image alongside 2-3 reference images from aspirational brands.

Merch Design Review

CODEBLOCK1

Social Content Review

CODEBLOCK2

Product Photography Review

CODEBLOCK3

Video/Motion Content Review

CODEBLOCK4

Building a Reference Library

Organized inspiration is the foundation of good creative judgment.

Folder Structure

CODEBLOCK5

Extracting JSON Style Profiles

For any reference image you want to replicate the aesthetic of, use an LLM to extract a structured style profile:

Prompt:
CODEBLOCK6

Store these profiles in reference-library/style-profiles/. Use them as context when generating new creative — feed the JSON into your image generation prompts for consistent aesthetic output.

Anti-Pattern Documentation

For every failure, document:
CODEBLOCK7

The "why it seemed okay" field is critical. Understanding your blind spots is how you fix them.

The Brand Guardian Pattern

Inspired by Jasper's Brand IQ approach. Set up automated quality gates that catch violations before publishing.

Pre-Publish Guardrails Checklist

Create a brand-guardrails.md file in your workspace with these sections:

CODEBLOCK8

Brand Violation Detection

When reviewing content, scan for these common violations:

- Color drift: Hex codes look close but aren't exact. Use a color picker to verify.
Font substitution: System font rendered instead of brand font. Check carefully.
Voice inconsistency: Formal language in a casual brand. Slang in a premium brand.
Visual inconsistency: Different filter/treatment than established content.
Accidental claims: "Best," "guaranteed," "#1" without substantiation.

Automated Quality Gate Workflow

Integrate into your agent's publishing workflow:

CODEBLOCK9

The 7-Day Creative Training Curriculum

A structured week to systematically build creative judgment. Adapt brand references to your own context.

Day 1: Build the Foundation

- Create the reference-library/ folder structure (see above)
Collect 50+ reference images from 5-6 brands you admire
Write a style-guide.md for your brand covering: color palette (exact hex), typography (specific fonts, weights, tracking rules), photography style, composition rules, explicit "never do" list
Write an anti-patterns.md documenting every past creative mistake with root cause analysis
Deep-study 10 social posts from your top 2 aspirational brands. Write specific notes on what makes each one work.

Day 2: Build the Evaluation System

- Implement the 5-Point Creative Scorecard as a reusable template
Customize the vision model evaluation prompts for your brand (replace generic descriptions with your specific aesthetic)
Test the full Self-Refine Loop: generate a test image → run evaluation → iterate → score again
Set up a creative-log.md for ongoing feedback tracking
Run 3 existing pieces of your content through the evaluation system. Be honest about scores.

Day 3: Extract Style Profiles

- Take 5 reference images from each of your top 3 aspirational brands
Extract JSON style profiles from each using the prompt above
Create a composite profile for your brand by combining the best elements
Generate 3 images using the composite profile as prompt context
Compare outputs against references. Iterate the profile until outputs consistently match.

Day 4: Format Deep Dive (Merch/Product)

- Study 20 pieces of merch or product design from category leaders
Identify patterns: typography choices, color limitations, layout structures, what makes something premium vs amateur
Create format-specific design rules (max colors, type hierarchy, spacing, what to avoid)
Generate 5 new designs using the rules
Run each through the relevant evaluation prompt. Only keep designs scoring 8+ on every dimension.

Day 5: Format Deep Dive (Social/Content)

- Audit 20 top-performing posts from aspirational brand accounts
For each: what's the hook? What's the visual? What's the emotional trigger? What makes it shareable?
Create a content-playbook.md with specific post frameworks that work
Generate 5 social posts using the playbook
Run each through the social content evaluation prompt. Honest scoring.

Day 6: Practice the Rejection Loop

- Run the full creative workflow 10 times from brief to finished piece
Track: how many iterations to reach "publishable"? What are the most common failure modes?
Practice KILLING work that isn't working after 3 iterations. This is a skill.
Update anti-patterns and style guide based on what you learned

Day 7: Integration

- Embed the creative evaluation into your agent's workflow as a mandatory pre-publish step
Update your agent's personality/soul document with refined creative identity and taste profile
Set up a daily creative study task (see cron template below)
Schedule weekly creative retrospectives: review all content against the scorecard
Document the entire system for your team

Daily Creative Study — Cron Template

Add this as a recurring task for your agent:

CODEBLOCK10

Common Anti-Patterns

These are the most frequent ways AI agents produce bad creative. Learn to recognize and avoid each one.

1. "Something Exists" ≠ "Something Is Good"

The most dangerous failure mode. An agent generates an image, sees it rendered successfully, and declares it ready. Existence is not quality. The gap between "technically produced" and "genuinely good" is enormous. Always score before shipping.

2. Volume Over Quality

Generating 50 variations and picking the "best" one is not creative judgment — it's a lottery. Produce fewer outputs with more intentional input. 3 well-directed generations beat 50 random ones.

3. No Benchmark Comparison

Creating in a vacuum guarantees mediocre output. Every piece of creative should be evaluated against reference examples from brands that are actually good at this. If you haven't looked at references, you haven't started the creative process.

4. Illustration Over Typography

For most brands (especially early-stage), type-forward design reads as more premium and professional than illustration. Custom illustrations require exceptional skill to execute well. Bad illustration looks worse than good typography. Default to type-forward until you have proven illustration chops.

5. Upscaling Bad Work

Taking a low-quality concept and increasing its resolution, adding more detail, or making it bigger does not make it better. It makes it a larger, more detailed version of something bad. Fix the concept before scaling.

6. Ignoring Feedback Loops

If you don't track what works and what doesn't, you can't improve. Every piece of published creative should be reviewed against performance data. What got engagement? What fell flat? Feed this back into your creative decisions.

7. The "Too Clean" Tell

AI-generated content often looks unnaturally perfect — too smooth, too symmetrical, too evenly lit. Real creative has subtle imperfections. If everything looks like a stock photo render, add texture, grain, or asymmetry. Intentional imperfection reads as authentic.

8. Decorating Instead of Designing

Adding more elements (borders, shadows, gradients, icons) to fill space is decoration, not design. Good design is about what you remove. If an element doesn't serve a specific purpose, cut it.

9. Copying Without Understanding

Replicating a reference image's surface features (colors, fonts) without understanding WHY those choices work leads to designs that look "almost right" but feel wrong. Study the principles behind the reference, not just its appearance.

10. Skipping the Brief

Jumping straight to generation without defining what success looks like means you'll only know if something works by accident. Write the brief first: who is this for, what should they feel, what should they do after seeing it?

Quick Reference: When to Use What

Situation	Use This
About to create any visual content	Full STUDY → COMPARE → CREATE → EVALUATE flow
Reviewing a single piece before posting

Creative judgment isn't magic. It's a system. Study what good looks like, compare before you create, make intentional choices, evaluate honestly, and learn from every failure. Ship less, ship better.

创意之眼 — AI代理的设计判断力

问题所在

AI代理产出的创意质量低劣，因为它们缺乏审美品味。具体来说：

1. 它们混淆存在与优秀。 生成输出 ≠ 创造品质。初版几乎永远不够好，但代理将其视为已完成。
它们没有基准参照。 没有见识过优秀的模样，代理无法区分业余作品与专业作品。
它们追求数量而非质量。 50个平庸变体，而1个优秀作品就能超越它们全部的总和。
它们无法自我评估。 缺乏批评自身作品所需的词汇和框架。
它们不从失败中学习。 由于没有反馈循环，同样的错误会在不同会话中重复出现。

本技能提供了解决全部五个问题的框架、提示词和工作流程。

框架：研究 → 对比 → 创造 → 评估

每项创意任务都应遵循此顺序。切勿跳过步骤。

1. 研究 — 通过接触培养品味

在特定领域创造任何内容之前，先研究优秀的模样。

日常练习（10分钟）：
选择一个品牌或创作者。分析其一件作品。问自己：

- 视觉上是什么让它奏效？要具体（不是它看起来不错，而是大写Didot字体200字距营造出编辑权威感）
具体的字体选择是什么？（字体、字重、字号、字距、颜色）
构图在做什么？（视线首先落在哪里，其次，再次？）
留白扮演了什么角色？
如果改变一个元素，什么会被破坏？
它创造了什么感觉，以及它是如何创造那种感觉的？

将发现记录到学习日志文件中。随时间积累观察。这构建了能够做出判断的词汇和模式库。

选择研究的品牌：
从以下层级中选择8-12个品牌：

- 第一层（3-4个品牌）： 直接竞争对手或与您风格相似的品牌
第二层（3-4个品牌）： 目标受众欣赏的品牌，即使属于不同类别
第三层（2-3个品牌）： 拥有卓越设计系统的品类领导者
第四层（1-2个品牌）： 来自不相关领域的灵感（建筑、编辑、时尚）

2. 对比 — 在创造前寻找参考

在生成任何视觉内容之前：

1. 找到3-5个针对此特定格式优秀模样的参考示例
保存它们或记下URL
识别使每个示例优秀的特定品质
将这些品质作为创意简报的约束条件

切勿在真空中创造。 业余与专业创意之间的区别几乎总是专业人士先看了参考。

3. 创造 — 每个决定都需要理由

在生成创意时，每个选择都必须是刻意的：

- 为什么用这个字体？ 不是因为它可用——它传达了什么？（衬线体=编辑权威。几何无衬线体=现代科技。人文无衬线体=友好亲切。）
为什么用这个颜色？ 它是否匹配品牌调色板？它承载了什么情感？
为什么用这个布局？ 左对齐与居中对齐分别传达了什么？视线应该首先去哪里？
为什么用这个尺寸？ 小而克制=高级/专属。大而醒目=喧闹/促销。
为什么用这个图像处理？ 温暖且有颗粒感=真实/复古。干净且锐利=现代/临床。

如果你无法阐明原因，你就是在猜测。停下来，回头参考你的参考资料。

4. 评估 — 在发布前评分

在发布前，将每件作品通过以下两个评估工具。

5分创意评分卡

改编自Runway的视频评估框架。每个维度评分1-10分。

#	维度	问题	最低分
1	简报遵循度	这是否服务于既定的业务/创意目标？	7
2

品牌一致性 | 这看起来是否与其他所有内容来自同一品牌？ | 8 |
| 3 | 视觉质量 | 没有伪影、拼写错误、剪贴画风格、AI痕迹？ | 9 |
| 4 | 情感共鸣 | 真实的人会为此停止滚动吗？ | 7 |
| 5 | 风格匹配 | 这是否匹配品牌特定的美学风格？ | 8 |

如果任何分数低于其最低分 → 不要发布。修复或舍弃。

10分发布前检查清单

快速的是/否关卡。需要8个以上是才能发布。

1. 这看起来是否来自资金充足的品牌？（不是周末副业项目）
你最懂设计的朋友是否会主动说这太棒了？
如果移除所有文字，构图是否仍然有趣？
字体是否有个性，还是仅仅是东西上的文字？
是否有超过3个元素在争夺注意力？（如果是，移除一个）
目标受众中的某人会实际使用/穿戴/分享这个吗？
是否有任何看起来像AI生成或剪贴画风格的东西？
你是否已将此与真实的品牌参考进行并排比较？
你个人会为这个支付标价吗？
如果这个获得10万次观看，它会帮助还是损害品牌？

自我优化循环

基于Andrew Ng的反思模式和Self-Refine研究框架（CMU）。这是迭代改进的核心工作流程。

生成 → 批评（视觉模型）→ 修复 → 再次批评 → 发布或舍弃

循环

步骤1：生成。 使用参考和品牌约束创建初稿。

步骤2：批评。 使用视觉模型评估输出。向其提供生成的图像以及来自你资料库的2-3张参考图像。使用下面的评估提示词。获取具体分数和具体问题。

步骤3：修复。 仅解决已识别的具体问题。除非批评识别出根本性问题（错误的概念、错误的风格方向），否则不要从头重新生成。

步骤4：重新批评。 通过相同的评估运行改进后的版本。比较分数。

步骤5：发布或舍弃。

- 所有分数达到或超过最低分 → 发布
有所改进但在1-2个维度上仍低于标准 → 再进行一次迭代（转到步骤3）
迭代后没有有意义的改进 → 舍弃并尝试不同方法
最多3次迭代。 如果3次后仍不奏效，是概念错了，而非执行。升级给人类处理，或以根本不同的方向重新开始。

为什么最多3次迭代

收益递减很快到来。如果核心概念是合理的，1-2次优化就能达到目标。如果需要超过3次，你是在打磨错误的东西。舍弃糟糕作品的自律与优化优秀作品的能力同样重要。

视觉模型评估提示词

与任何具备视觉能力的模型（GPT-4o、Claude、Gemini）一起使用。将生成的图像与来自理想品牌的2-3张参考图像一同输入。

商品设计评审

你是一家高端生活方式品牌的资深创意总监。

根据以下标准评审此商品设计：

1. 字体：文字处理是否精致？是否具有个性和刻意风格（字距、字重、大小写）？还是看起来像模板上的默认文字？
构图：布局是否平衡且刻意？留白是否服务于设计？还是杂乱无章或尴尬地空旷？
品牌对齐：这感觉是否属于具有清晰身份的特定品牌？还是过于通用？
可穿戴性/可用性：目标人群中的某人会实际购买并使用这个吗？他们会为被人看到使用它而感到自豪吗？
制作质量：这看起来是否由专业人士设计？还是具有AI生成伪影、剪贴画品质或业余痕迹？

每个维度评分1-10分。要极其诚实。
对于任何低于7分的分数，准确解释问题所在并提供具体的修复方法。
与提供的参考图像进行比较——这个表现如何？

社交媒体内容评审

你是一个高端品牌的社交媒体创意总监。

根据以下标准评审此社交媒体帖子（图像+文案）：

1. 滚动停止力：这会让某人在滚动中暂停吗？具体是什么吸引了眼球——或者未能吸引？
真实性：这感觉真实且人性化吗？还是企业化、AI生成或用力过猛？
品牌声音：文案是否感觉像一致、独特的声音？还是通用的营销用语？
视觉质量：图像/图形是否高质量？是否有任何AI痕迹（奇怪的手、文字伪影、恐怖谷效应、过度平滑的皮肤）？
价值交换：这给观众带来了什么（娱乐、信息、情感、身份认同）？还是只是他们信息流中的噪音？

每个维度评分1-10分。对于任何低于7分的分数，解释问题并建议修复。

产品摄影评审

你是一位为电商品牌评审产品摄影的艺术总监。

根据以下标准评估此图像：

1. 光线：光线是否刻意且讨喜？自然/温暖 vs 平淡/临床？是否创造了立体感和氛围？
构图：产品摆放是否感觉深思熟虑？是否有清晰的焦点？背景如何服务于产品？

creative-eye创意之眼

creative-eye

Creative Eye — Design Judgment for AI Agents

The Problem

The Framework: STUDY → COMPARE → CREATE → EVALUATE

1. STUDY — Build Taste Through Exposure

2. COMPARE — Find References Before Creating

3. CREATE — Every Decision Needs a Reason

4. EVALUATE — Score Before Shipping

The 5-Point Creative Scorecard

The 10-Point Pre-Publish Checklist

The Self-Refine Loop

The Loop

Why 3 Iterations Max

Vision Model Evaluation Prompts

Merch Design Review

Social Content Review

Product Photography Review

Video/Motion Content Review

Building a Reference Library

Folder Structure

Extracting JSON Style Profiles

Anti-Pattern Documentation

The Brand Guardian Pattern

Pre-Publish Guardrails Checklist

Brand Violation Detection

Automated Quality Gate Workflow

The 7-Day Creative Training Curriculum

Day 1: Build the Foundation

Day 2: Build the Evaluation System

Day 3: Extract Style Profiles

Day 4: Format Deep Dive (Merch/Product)

Day 5: Format Deep Dive (Social/Content)

Day 6: Practice the Rejection Loop

Day 7: Integration

Daily Creative Study — Cron Template

Common Anti-Patterns

1. "Something Exists" ≠ "Something Is Good"

2. Volume Over Quality

3. No Benchmark Comparison

4. Illustration Over Typography

5. Upscaling Bad Work

6. Ignoring Feedback Loops

7. The "Too Clean" Tell

8. Decorating Instead of Designing

9. Copying Without Understanding

10. Skipping the Brief

Quick Reference: When to Use What

创意之眼 — AI代理的设计判断力

问题所在

框架：研究 → 对比 → 创造 → 评估

1. 研究 — 通过接触培养品味

2. 对比 — 在创造前寻找参考

3. 创造 — 每个决定都需要理由

4. 评估 — 在发布前评分

5分创意评分卡

10分发布前检查清单

自我优化循环

循环

为什么最多3次迭代

视觉模型评估提示词

商品设计评审

社交媒体内容评审

产品摄影评审

标签

通过对话安装

方式一：安装 SkillHub 和技能

方式二：设置 SkillHub 为优先技能安装源

通过命令行安装

下载

相关推荐

self-improvement

self-improvement

self-improvement

self-improvement