██╗ ██╗███╗ ███╗██╗███╗ ██╗ █████╗
██║ ██║████╗ ████║██║████╗ ██║██╔══██╗
██║ ██║██╔████╔██║██║██╔██╗ ██║███████║
██║ ██║██║╚██╔╝██║██║██║╚██╗██║██╔══██║
███████╗██║██║ ╚═╝ ██║██║██║ ╚████║██║ ██║
╚══════╝╚═╝╚═╝ ╚═╝╚═╝╚═╝ ╚═══╝╚═╝ ╚═╝
from Latin līmen — "threshold"
Cross the boundary between known and unknown.
Built by The Agile Monkeys.
Give Limina a problem with a measurable goal. It will autonomously research it — forming hypotheses, running experiments, challenging its own direction — until it finds a solution backed by evidence, or tells you what it learned trying.
Limina is an autonomous research harness for AI agents. You describe a problem with clear success criteria, and the agent works through it: break it down, survey existing approaches, form hypotheses, design and run experiments, challenge its own assumptions, and iterate — until it reaches a solution or exhausts the approaches and tells you what it learned.
It works on anything with a measurable outcome. Our team uses it for:
- Optimizing a search engine for a large e-commerce platform
- A/B testing product features
- Researching state-of-the-art approaches for audio transcription
- Optimizing social media reach with data-driven experiments
- Investigating root causes of production performance issues
Everything the agent does is written to a persistent knowledge base (kb/). Hypotheses link to experiments. Experiments link to findings. Decisions are logged with reasoning. If the agent gets stuck, it escalates to you instead of guessing. You don't just get a result — you get the full trail of how it got there and why.
This repository is a template/starter system — clone it, start an agent, and describe your problem.
- Technical leads — You need to make a decision between approaches and don't have weeks to run the comparison yourself. Limina does the legwork and gives you the evidence to decide.
- Product teams — You want to optimize a metric — conversion rate, latency, cost, user engagement — and need systematic experimentation, not guesswork.
- Research engineers — You're tired of manually setting up experiment after experiment, tracking what you tried, and remembering why you discarded something three days ago. The agent keeps the full trail for you.
- Scientists — Your research involves systematic evaluation across many variables. Limina runs the loop — hypothesize, test, record, review, iterate — so you can focus on the questions, not the bookkeeping.
- Business intelligence — You have a question that requires more than pulling a dashboard. Something that needs real investigation: gathering data from multiple sources, testing assumptions, building evidence for a recommendation.
- Anyone with a goal that can be measured — If you can define what "better" looks like, Limina can research how to get there.
Define a mission. Describe your research objective — what you're trying to figure out, what "better" means, what resources the agent can use, and when it should come to you for a decision.
Let it run. The agent breaks the problem into tasks, forms hypotheses, runs experiments, and iterates toward your success criteria. It works across hours or days and picks up where it left off after interruptions.
Steer when needed. When the agent hits something it can't decide on its own — needs more budget, wants to try a risky approach, reached a fork — it stops and asks you.
Get the result. When the agent meets your success criteria — or determines it can't — you have the solution, the full research trail, and the reasoning behind every decision it made along the way.
Open Claude Code or Codex and paste:
Install the Limina research skill by running:
curl -fsSL https://raw.githubusercontent.com/theam/limina/main/setup.sh | bash
Then ask me to change my Claude Code working directory to the folder where I want
my research project to live, and help me set up a new Limina research project.
The agent will install the skill, ask you to switch to your preferred directory, then guide you through everything — project name, research objective, context, success criteria.
When setup is done, open Claude Code in the new project directory:
cd <your-project-name> && claude --dangerously-skip-permissionsThe agent reads the methodology automatically and starts researching.
As the agent works, it builds a knowledge base in kb/:
kb/
├── mission/
│ ├── CHALLENGE.md ← your research brief
│ └── BACKLOG.md ← task tracking
├── research/
│ ├── hypotheses/H001.md ← what it thinks might work
│ ├── experiments/E001.md ← how it tested each hypothesis
│ └── findings/F001.md ← what it learned
├── reports/
│ └── SR001.md ← strategic review
└── tasks/
├── T001.md
└── T002.md
Check progress anytime by reading the files in kb/ or asking the agent for a status update. When it gets stuck or needs a decision, it will ask you.
The agent will ask you about your problem interactively. You'll get better results if your description reads like a research brief — here's what to include:
- Research objective — what problem you're trying to solve or improve
- Evaluation target — what "better" means and what failure is unacceptable
- Baseline — the current system, method, or repo to beat or replace
- Resource envelope — what compute, budget, datasets, APIs, and services are available
- Autonomy boundaries — what the agent is allowed to generate on its own (evaluation sets, synthetic data, benchmarks)
- Escalation rules — when it should ask you for more budget, tools, or approvals
Research & optimization:
Your objective is to improve a multilingual retrieval system for a product catalog.
The system should support both natural-language intent queries and traditional keyword search.
Success requires high precision, high recall, and strong latency. Missing relevant items or
returning irrelevant ones is not acceptable.
You have an existing baseline system to improve.
You may use the datasets, services, and API keys available in the project environment.
You also have a bounded compute budget and should optimize for effective iteration, not long
expensive runs by default.
If evaluation data does not exist, generate it yourself and document how it was created.
If additional tools, budget, or access are needed, ask with a clear justification.
Investigation & root cause analysis:
Our API's P99 latency jumped from 120ms to 800ms after the last deploy.
We need to find the root cause and a fix.
The service is a Node.js app on ECS with a PostgreSQL database.
You have access to the repo, CloudWatch logs, and APM traces.
Success means P99 back under 200ms with the fix verified in staging.
If you need access to production or want to run load tests, ask first.
Product optimization:
We need to improve the conversion rate of our landing page.
Current conversion is 2.3% and we want to reach 4%.
Run A/B tests on copy, layout, and CTA variations. You can generate
test variants and analyze results from our analytics API.
Track what you tested, what worked, and why.
If you need to deploy a variant to production, ask first.
You describe the problem
→ Agent decomposes into tasks
→ Hypothesis → Experiment → Finding
→ Reviews direction, challenges assumptions
→ Iterates from persistent state across sessions
- You describe the research objective, constraints, and available resources.
- The agent decomposes the work into tasks, questions, and hypotheses.
- The agent runs experiments, gathers evidence, and records findings.
- The agent reviews the direction, challenges assumptions, and updates the plan.
- The agent continues from persistent state across sessions instead of starting over.
Limina works with Claude Code, Codex, and OpenCode. Claude Code loads CLAUDE.md automatically; Codex and OpenCode load AGENTS.md. Both files are functionally equivalent — they guide the agent through the same methodology using runtime-specific tools.
| Capability | Claude Code | Codex | OpenCode |
|---|---|---|---|
| Ask the user for missing information | AskUserQuestion |
request_user_input or a direct question |
Direct question |
| Delegate work | Slash commands and Claude agents | spawn_agent / send_input |
— |
| Communicate status | Active session/chat | Active session/chat | Active session/chat |
| Validate KB state | python3 scripts/kb_validate.py |
python3 scripts/kb_validate.py |
python3 scripts/kb_validate.py |
cook is a universal orchestration CLI that handles work-review-gate cycles across any agent runtime. Use it when you want the agent to run fully autonomously with built-in review gates.
npm install -g @let-it-cook/cliContinue research (open-ended):
cook "Continue research" review \
"Review current status and verify if we achieved the target mission" \
"DONE if we achieved the target mission, else ITERATE"Research with iteration cap:
cook "Continue research" review \
"Review current status and verify if we achieved the target mission" \
"DONE if we achieved the target mission, else ITERATE" \
--max-iterations 10Mixed agents (Codex work, Claude review):
cook "Continue research" review \
"Review current status and verify if we achieved the target mission" \
"DONE if we achieved the target mission, else ITERATE" \
--work-agent codex --review-agent claudeChallenge review:
cook "Run /challenge with target 'Research direction'" review \
"Read the CR report and assess whether critical issues were addressed" \
"DONE if no critical issues remain, else ITERATE"- A persistent knowledge base in
kb/ - A research-first workflow:
- research: Hypothesis → Experiment → Finding
- engineering: Investigation → Feature → Implementation → Retrospective
- First-class review artifacts: Challenge Reviews and Strategic Reviews
- Adapters for Claude Code, Codex, and OpenCode
- Core artifact templates in
templates/ - A read-only KB validator:
python3 scripts/kb_validate.py
The system is built around a persistent knowledge base in kb/.
- Durable state lives in
kb/, not only in conversation context - Every unit of work is a task
- Research tasks follow Hypothesis → Experiment → Finding
- Engineering tasks follow Investigation → Feature → Implementation → Retrospective
- Reviews are first-class artifacts: Challenge Reviews and Strategic Reviews
DECISIONS.mdandCEO_REQUESTS.mdare mission ledgers, not file-backed artifact types
These are the file-backed artifact types enforced by the validator:
| Prefix | Meaning | Location |
|---|---|---|
T |
Task | kb/tasks/ |
H |
Hypothesis | kb/research/hypotheses/ |
E |
Experiment | kb/research/experiments/ |
F |
Finding | kb/research/findings/ |
L |
Literature review | kb/research/literature/ |
FT |
Feature spec | kb/engineering/features/ |
INV |
Investigation | kb/engineering/investigations/ |
IMP |
Implementation log | kb/engineering/implementations/ |
RET |
Retrospective | kb/engineering/retrospectives/ |
CR |
Challenge review | kb/reports/ |
SR |
Strategic review | kb/reports/ |
The validator is read-only in v1. It checks:
- last-ID declarations in
BACKLOG.md - task file and backlog row consistency
INDEX.mdcoverage for core artifact files- research traceability: experiments link to hypotheses, findings link to experiments
- engineering traceability across investigations, features, implementations, retrospectives
- challenge review and strategic review metadata and naming
- malformed filenames, duplicate IDs, and ID gaps
Found a bug? Have an idea? We'd love your input.
- Open an issue to report problems or suggest features
- Start a discussion to ask questions or share how you're using Limina
Apache 2.0, © The Agile Monkeys. See LICENSE.