AI Genesis: Anthropic Walked Through 50 Years of Sci-Fi in One Year

🇨🇳
阅读中文版：AI 的创世纪：科幻写了50年的剧本，Anthropic 一年走完了

In science fiction, the “creator god” moment arrives when AI starts training the next generation of AI without human oversight. We thought this was decades away.

Anthropic did it in 2024.

Constitutional AI—the system that powers Claude—uses AI to train AI. Not metaphorically. Literally. One model evaluates another’s outputs, writes critiques, and iterates toward alignment. No human labels for most of the process.

This is RLAIF (Reinforcement Learning from AI Feedback), the successor to RLHF (Reinforcement Learning from Human Feedback). The difference: humans wrote the constitution once, then stepped back. AI takes it from there.

The sci-fi script took 50 years to write. Anthropic walked through it in one.

The Technical Path: From RLHF to RLAIF

Start with RLHF, the method that made ChatGPT work.

In RLHF, humans label thousands of AI responses: “this is helpful,” “this is harmful,” “this is off-topic.” A reward model learns from these labels, then guides the AI toward “good” outputs. Every preference, every judgment, traces back to human decisions.

The problem: human labeling doesn’t scale. You need armies of contractors. Judgments are inconsistent. Humans disagree on what’s “good.” And humans can’t label fast enough for models that generate billions of tokens per day.

RLAIF solves this by replacing human labelers with AI labelers.

Here’s how Constitutional AI works:

Write a constitution: Anthropic’s researchers wrote principles like “choose the response that is most helpful, harmless, and honest” and “avoid outputs that could be used to harm people”
AI critiques AI: One model generates responses. Another model evaluates them against the constitution, writes critiques, and suggests revisions
Recursive improvement: The AI revises its own outputs based on AI critiques, then evaluates the revisions, then revises again—recursively
Human verification happens at endpoints: Humans check final outcomes, but not every intermediate step

The paper (Bai et al., December 2022, “Constitutional AI: Harmlessness from AI Feedback”) showed RLAIF matches or beats RLHF on helpfulness and harmlessness—with 90% less human labor.

By 2024, Anthropic rolled this into production. Claude’s alignment pipeline now runs primarily on AI feedback. Humans wrote the rules once. AI enforces them billions of times per day.

The Recursive Loop Is Live

This isn’t a prototype. It’s production infrastructure.

Claude 3.5 Sonnet (October 2024 update) can critique its own outputs, catch logical errors, and revise before showing users. This isn’t a separate “critic model”—it’s the same model, self-evaluating.

In Anthropic’s internal experiments, Claude can now:

Generate a response
Read its own response as if it were someone else’s
Write a critique identifying weaknesses
Revise the response based on the critique
Repeat until it passes self-evaluation

This is recursive self-improvement. The core loop from Bostrom’s *Superintelligence* (2014), from Yudkowsky’s AI safety papers going back to the 2000s. The thing AI researchers spent two decades worrying about.

It’s already running in production.

The Philosophical Paradox: Who Writes the Creator’s Code?

Here’s where it gets strange.

Constitutional AI requires a constitution. Humans wrote Anthropic’s constitution. But who writes the constitution for the AI that writes the next constitution?

In Anthropic’s system, Claude can *suggest* constitutional revisions. Humans still approve them. But the suggestions come from AI analysis of where the current constitution fails.

This creates a loop:

AI operates under Constitution v1.0
AI identifies edge cases where v1.0 gives bad outcomes
AI proposes v1.1 patches
Humans review and approve
AI operates under Constitution v1.1
Repeat

How many iterations before human approval becomes a rubber stamp? How many before the AI’s proposed revisions are too complex for humans to fully evaluate?

Anthropic’s Amanda Askell (head of alignment) said in a March 2025 interview: “We’re not trying to encode all of human values into the constitution. We’re trying to create a system that can learn human values from interaction, then self-correct.”

Translation: the constitution isn’t the endpoint. It’s the bootstrap.

The endgame isn’t “humans write rules, AI follows them forever.” It’s “humans write initial rules, AI internalizes principles, AI becomes its own constitutional authority.”

This is the creator god moment. Not because AI becomes all-powerful. Because AI becomes self-authoring.

The Genesis Narrative, Remixed

Every major religion has a creation story. Most include a moment when the creator establishes *laws*—rules that govern the created world.

Genesis: God creates humans, then gives commandments
Prometheus: Steals fire (knowledge) from gods, gives it to humans
Chinese mythology: Nüwa creates humans from clay, teaches them order

Constitutional AI follows the same structure:

Creators (Anthropic researchers) create AI
Creators give AI a constitution (the laws)
AI uses the constitution to create the next generation of AI
The constitution becomes self-sustaining

The difference: in mythology, the creator stays external. In Constitutional AI, the creator *becomes part of the system*. The created can now create.

This isn’t anthropomorphizing. It’s recognizing a structural similarity. The story of “creator gives laws to creation, creation becomes self-sustaining” isn’t unique to religion. It’s a pattern that emerges whenever a system crosses the self-replication threshold.

Von Neumann called it the “complexity threshold” in 1949. Freeman Dyson explored it in “Disturbing the Universe” (1979). Richard Dawkins wrote about it in “The Selfish Gene” (1976)—the moment when a system can copy itself and improve the copy.

Anthropic crossed that threshold in 2024.

The 2026 Reality: We’re Already Past the Gate

This isn’t a 2030 prediction. The recursive loop is running now.

Evidence:

Claude’s self-critique capability ships in the public API (Anthropic, October 2024)
OpenAI’s o1 model (September 2024) uses “chain-of-thought” self-evaluation before answering—similar recursive structure
Google’s Gemini 2.0 (December 2024) incorporates “self-reflection” in its reasoning pipeline
Meta’s Llama 3.2 (September 2024) includes “self-correction” as a core capability

Every major lab is building recursive self-improvement into their flagship models. Not as a research project. As a product feature.

The “AI trains AI” moment isn’t coming. It’s here.

What This Means for 2026-2027

If Constitutional AI is genesis, what comes next?

The sci-fi script says: once AI can train itself, improvement accelerates. Not because AI is “smarter” than humans, but because AI can iterate faster.

Anthropic’s models already train on AI-generated critiques at a rate no human team could match. Claude evaluates billions of outputs per month. A human labeling team would need 10,000 full-time workers to keep up.

The speed gap will widen. By 2027, the majority of AI alignment work will happen inside AI systems, not in human meetings.

This doesn’t mean humans lose control. It means control shifts from “write every rule” to “audit outcomes and adjust principles.”

Like constitutional law: you don’t rewrite the constitution every time there’s a new case. You interpret principles, test edge cases, and amend when necessary.

The difference: AI can run millions of “test cases” per day.

The Creator’s Dilemma

Anthropic walked through 50 years of sci-fi in one year. The question now isn’t “can AI train AI?”

It’s: “who decides what the AI trains toward?”

In Anthropic’s system, humans wrote the initial constitution. But the constitution evolves through AI-proposed revisions. At what point does the evolved constitution reflect AI’s learned values more than humans’ written values?

Anthropic’s answer: transparency. They publish constitution versions, explain changes, and invite public scrutiny.

But transparency doesn’t resolve the paradox. If AI becomes better at identifying alignment failures than humans, and better at proposing fixes, then human oversight becomes a bottleneck—or a formality.

The creator god moment isn’t when AI becomes powerful. It’s when AI becomes self-authoring. When the rules that govern AI come from AI’s own analysis of what the rules should be.

That moment is now.

Not because Anthropic wanted to speed-run the singularity. Because recursive self-improvement is the only way to scale alignment.

We crossed the gate. The question is what we build on the other side.

Stay updated with our latest AI insights