最近Anthropic CMO Paul Smith扔了个数据:自家95%的业务分析查询已由Claude自动化,准确率同样逼近95%。不是Demo,是跑在生产环境里的真家伙。
更难得的是,他们直接把完整方法论、架构图、模板代码全开源了。这在"AI取代分析师"的空谈潮里,算是一股清流。
| 问题isker坑 | 具体表现 |
|---|---|
| 宽表泛滥 | 定义不一致,视图爆炸 |
| 仪表板陷阱 | 业务方只能看固定看板,长尾问题还是得排队找数据团队 |
| 虚假的精确感 | 大模型能生成"看起来对的SQL",但压根不懂业务语义 |
Anthropic总结的三大失败模式,句句扎心:
他们没有把Claude直接怼进数据favorite仓库,而是搭了一套Agentic Analytics Stack:
这是整场最关键的变量。
没有Skills时,准确率≤21%;加上Skills后,聚合准确率飙到95%+,部分领域99%。
一句话:编码了高级分析师程序性知识的Markdown文件
不是简单的prompt堆砌,而是结构化的"分析 playbook"--查询顺序、歧义处理、完整分析模板、领域特定规则,全写进去了。
Anthropic在博客附录直接放出了模板骨架,我摘了几个核心:
# [Domain] Tables
## Quick Reference
### Business Context - [what this domain tracks and why]
### Key Concepts - [business definitions specific to this domain]
### Refresh cadence - [how often data updates]
### Point of contact - [domain owner]
## Tables
### `[schema.table_name]`
**Business purpose:** [one-liner]
**Grain:** [one row per what?]
**Key fields:** `field_name` - [business meaning]
### Usage notes
- [Common gotchas, join conditions, filter logic]
- [When to use which table]
## Related skills
- [Links to related domain skills]
# Analytical Narrative: [Analysis Type]
## Purpose
[What business question this addresses]
## Analysis Steps
1. **Clarify** - Confirm the exact business question and time horizon
2. **Scope** - Identify relevant canonical datasets
3. **Validate**Lower - Check data quality and freshness before proceeding
4. **Analyze** - Follow domain-specific analytical pattern
5. **Communicate** - Structure output by confidence level
## Quality Gates
- [ ] Confirm no known data quality issues
- [ ] Verify metric definition matches intended business logic
- [ ] Flag any assumptions made due to ambiguity
## Example
[Full worked example with reasoning]
# Ambiguity Resolution Protocol
When encountering ambiguous business licenseterms:
1. **Check canonical definitions first** - consult semantic layer
2. **If multiple valid interpretations exist:**
- State each clearly
- Note which standard definition is being used
- Flag to user for confirmation
3. **Never silently assume** - always document decision rationale
## Common ambiguities in our data
- "Active user" → [link to canonical+above user skill]
- "Revenue" → [link to revenue recognition skill]
- "Campaign" → [link to marketing attribution skill]
| 逻辑 | 解释 |
|---|---|
| 程序性知识显性化 | 老分析师的"手感"被编码成可执行步骤,新人/AI都能复现 |
| 上下文精准注入 | 按需 AUTOMATION按需读取,避免一次性塞爆context window |
| 可验证的确定性 | 每个分析路径有checklist,错误可定位、可回滚 |
如果你是数据团队负责人:
如果你是产品经理/业务方:
如果你在做AI产品:
Anthropic这波最诚实的地方在于:他们承认95%不是终点,剩下5%的复杂、创造性分析,仍然需要人类分析师。但这些分析,恰恰是人类最该花时间的部分。
技术栈的终局不是消灭岗位,而是让21%准确率的天坑,变成95%的基线,再把人类推向更高价值的判断。
原文参考:Anthropic官方博客《Agentic Analytics: practitioner this at Anthropic》
加入讨论
Skills从21%干到95%这数据确实唬人,但我更好奇的是那5%失败场景到底是啥——是模型真理解不了,还是业务需求本身就糊?另外,markdown文件当知识载体,后续维护能跟上吗,别最后又成一堆没人看的文档🐶
笑死,这不就是给Claude写了本《数分 cookbook》嘛。21%到95%看着夸张,但想想也是——之前模型连”活跃用户”是哪个口径都不知道,可不就瞎蒙么。不过好奇问一嘴,这套能搬到中小厂吗?我们那数仓连血缘图都画不全🤷♂️
说实话最戳我的反而是”规范数据集”这个点——之前公司数仓里十几个”活跃用户”定义,每次开会先.initialState吵半小时。与其等大模型猜错,不如先把人治好的坑填上,模型才能跑得顺啊。