Tip
Try our hosted solution for free at kayba.ai: automated agent self-improvement from your terminal. CLI + dashboard that analyzes traces, surfaces failures, and ships improvements directly from Claude Code, Codex, and more.
AI agents don't learn from experience. They repeat the same mistakes every session, forget what worked, and ignore what failed. ACE adds a persistent learning loop that makes them better over time.
The agent claims a seahorse emoji exists. ACE reflects on the error, and on the next attempt, the agent responds correctly β without human intervention.
| Metric | Result | Context |
|---|---|---|
| 2x consistency | Doubles pass^4 on Tau2 airline benchmark | 15 learned strategies, no reward signals |
| 49% token reduction | Browser automation costs cut nearly in half | 10-run learning curve |
| $1.50 learning cost | Claude Code translated 14k lines to TypeScript | Zero build errors, all tests passing |
uv add ace-frameworkOption A β Interactive setup (recommended):
ace setup # Walks you through model selection, API keys, and connection validationOption B β Manual configuration:
export OPENAI_API_KEY="your-key" # or ANTHROPIC_API_KEY, or any of 100+ supported providersThen use it:
from ace import ACELiteLLM
agent = ACELiteLLM(model="gpt-4o-mini")
# First attempt β the agent may hallucinate
answer = agent.ask("Is there a seahorse emoji?")
# Feed a correction β ACE extracts a strategy and updates the Skillbook
agent.learn_from_feedback("There is no seahorse emoji in Unicode.")
# Subsequent calls benefit from the learned strategy
answer = agent.ask("Is there a seahorse emoji?")
# Inspect what the agent has learned
print(agent.get_strategies())No fine-tuning, no training data, no vector database.
-> Quick Start Guide | -> Setup Guide
ACE maintains a Skillbook β a persistent collection of strategies that evolves with every task. Three specialized roles manage the learning loop:
| Role | Responsibility |
|---|---|
| Agent | Executes tasks, enhanced with Skillbook strategies |
| Reflector | Analyzes execution traces to extract what worked and what failed |
| SkillManager | Curates the Skillbook β adds, refines, and removes strategies |
The Recursive Reflector is the key innovation: instead of summarizing traces in a single pass, it writes and executes Python code in a sandboxed environment to programmatically search for patterns, isolate errors, and iterate until it finds actionable insights.
flowchart LR
Skillbook[(Skillbook)]
Start([Task]) --> Agent[Agent]
Agent <--> Environment[Environment]
Environment -- Trace --> Reflector[Reflector]
Reflector --> SkillManager[SkillManager]
SkillManager -- Updates --> Skillbook
Skillbook -. Strategies .-> Agent
All roles are backed by PydanticAI agents with structured output validation. PydanticAI routes to 100+ LLM providers through its LiteLLM integration, with native support for OpenAI, Anthropic, Google, Bedrock, Groq, and more.
Based on the ACE paper (Stanford & SambaNova) and Dynamic Cheatsheet.
| Runner | Class | Description |
|---|---|---|
| LiteLLM | ACELiteLLM |
Batteries-included agent with .ask(), .learn(), .save() β accepts any LiteLLM model string |
| Core | ACE |
Full learning loop with batch epochs and evaluation |
| Trace Analyser | TraceAnalyser |
Learn from pre-recorded traces without re-running tasks |
| browser-use | BrowserUse |
Browser automation that improves with each run |
| LangChain | LangChain |
Wrap any LangChain chain or agent with learning |
| Claude Code | ClaudeCode |
Claude Code CLI tasks with learning |
uv add ace-framework[browser-use] # Browser automation
uv add ace-framework[langchain] # LangChain
uv add ace-framework[logfire] # Observability (auto-instruments PydanticAI)
uv add ace-framework[mcp] # MCP server for IDE integration
uv add ace-framework[deduplication] # Embedding-based skill deduplicationHave existing agent logs? Extract strategies from them directly:
from ace import ACELiteLLM
agent = ACELiteLLM(model="gpt-4o-mini")
agent.learn_from_traces(your_existing_traces)
print(agent.get_strategies())tau2-bench by Sierra Research: airline domain tasks requiring tool use and policy adherence. Claude Haiku 4.5 agent, strategies learned on the train split with no reward signals, evaluated on the held-out test split.
pass^k = probability all k independent attempts succeed. ACE doubles consistency at pass^4 with 15 learned strategies.
ACE + Claude Code translated this library from Python to TypeScript with zero supervision:
| Metric | Result |
|---|---|
| Duration | ~4 hours |
| Commits | 119 |
| Lines written | ~14,000 |
| Build errors | 0 |
| Tests | All passing |
| Learning cost | ~$1.50 |
ACE is built on a composable pipeline engine. Each step declares what it requires and what it produces:
AgentStep -> EvaluateStep -> ReflectStep -> UpdateStep -> ApplyStep -> DeduplicateStep
Use learning_tail() for the standard learning sequence, or compose custom pipelines:
from ace import Pipeline, AgentStep, EvaluateStep, learning_tail
steps = [AgentStep(agent), EvaluateStep(env)] + learning_tail(reflector, skill_manager, skillbook)
pipeline = Pipeline(steps)The pipeline engine (pipeline/) is framework-agnostic with requires/provides contracts, immutable context, and error isolation. See Pipeline Design and Architecture.
| Command | Description |
|---|---|
ace setup |
Interactive setup β model selection, API keys, connection validation |
ace models <query> |
Search available models with pricing |
ace validate <model> |
Test a model connection |
ace config |
Show current configuration |
kayba |
Cloud CLI β upload traces, fetch insights, manage prompts |
ace-mcp |
MCP server for IDE integration |
- Full Documentation β Guides, API reference, examples
- Quick Start β 5-minute setup
- Setup Guide β Configuration and providers
- Architecture β Core concepts and system design
- Code Reference β Implementations, API, usage examples
- Design Decisions β Rejected alternatives and rationale
- Pipeline Engine β Step composition and context flow
- Examples β Runnable demos
- Changelog β Version history
Contributions are welcome. See Contributing Guidelines.
Built by Kayba and the open-source community.


