AlphaApollo is an agentic reasoning framework that orchestrates multiple models and tools to enable iterative, verifiable, and self-evolving reasoning. It supports a broad range of paradigms, including tool-integrated reasoning, agentic post-training (e.g., multi-turn supervised fine-tuning and reinforcement learning), and agentic self-evolution. The framework offers extensible environments and toolsets for easy customization, extension, and scalable deployment of agentic reasoning workflows.
- [2026.01] We are excited to release AlphaApollo, an agentic LLM reasoning system for advanced reasoning.
- [2025.10] Our technical report is released; see here for details.
conda create -n alphaapollo python==3.12 -y
conda activate alphaapollo
git clone https://github.com/tmlr-group/AlphaApollo.git
cd AlphaApollo
bash installation.sh- Tool-integrated reasoning rollout with seamless environment interaction
- Dynamic memory updates for multi-turn reasoning
- Multi-turn supervised fine-tuning (SFT)
- Reinforcement learning algorithms: GRPO, PPO, DAPO, and more
- Multi-round, multi-model solution refinement with shared state
- Iterative improvement via feedback and executable checks
- Python interpreter
- Retrieval-Augmented Generation (RAG)
Detailed quick-start commands (including script entrypoints) are documented in quick-start.md.
Note: Before using the local RAG module, please follow RAG Service Setup.
# no-tool reasoning
python3 -m alphaapollo.workflows.test \
--model.path=Qwen/Qwen2.5-3B-Instruct \
--preprocess.data_source=math-ai/aime24# tool-integrated reasoning
python3 -m alphaapollo.workflows.test \
--model.path=Qwen/Qwen2.5-3B-Instruct \
--preprocess.data_source=math-ai/aime24 \
--env.informal_math.enable_python_code=true \
--env.informal_math.enable_local_rag=false \
--env.max_steps=4Single-question evaluation:
# Select specific dataset samples (e.g., the 0th AIME test question) and test
python3 -m alphaapollo.workflows.test \
--model.path=Qwen/Qwen2.5-3B-Instruct \
--preprocess.module=alphaapollo.data_preprocess.prepare_custom_data \
--preprocess.data_source=math-ai/aime24 \
--preprocess.splits=test \
--preprocess.sample_indices=0 \
--data.path=~/data/custom_data/test.parquet# Directly evaluate a plain text question (not from a dataset)
python3 -m alphaapollo.workflows.test \
--model.path=Qwen/Qwen2.5-3B-Instruct \
--preprocess.module=alphaapollo.data_preprocess.prepare_single_question \
--preprocess.question_text="What is the sum of integers from 1 to 1000?" \
--preprocess.ground_truth="500500" \
--data.path=~/data/single_question/test.parquet# multi-turn SFT
python3 -m alphaapollo.workflows.sft \
--model.partial_pretrain=Qwen/Qwen2.5-3B-Instruct \
--preprocess.data_source=AI-MO/NuminaMath-TIR# multi-turn RL
python3 -m alphaapollo.workflows.rl \
--model.path=Qwen/Qwen2.5-3B-Instruct \
--preprocess.data_source=HuggingFaceH4/MATH-500 \
--algorithm.adv_estimator=grpoBefore running the self-evolution scripts, make sure to serve the corresponding number of models.
python alphaapollo/utils/ray_serve_llm.py --model_path Qwen/Qwen3-4B-Instruct-2507 --gpus "0,1" --port 8000 --model_id "qwen3_4b_inst"# single-model evolution
python3 -m alphaapollo.workflows.evo \
--preprocess.data_source=math-ai/aime24 \
--run.dataset_name=aime24 \
--policy_model_cfg.model_name=qwen3_4b_inst \
--policy_model_cfg.base_url=http://localhost:8000/v1 \
--verifier_cfg.model_name=qwen3_4b_inst \
--verifier_cfg.base_url=http://localhost:8000/v1+------------------------------------------------------------------+
| alphaapollo/data_preprocess |
| (dataset preparation scripts) |
+------------------------------------------------------------------+
|
V
+------------------------------------------------------------------+
| alphaapollo/core |
| (core code) |
| |
| +----------------------+ +----------------------+ |
| | generation/ | | tools/ | |
| | | <----------> | - python_code | |
| | | | - rag/ | |
| +----------------------+ +----------------------+ |
| Λ |
| | |
| V |
| +------------------------------------------------------------+ |
| | environments/ | |
| | - informal_math_training/ | |
| | - informal_math_evolving/ | |
| | - memory/ | |
| | - prompts/ | |
| +------------------------------------------------------------+ |
+------------------------------------------------------------------+
- Environment package in alphaapollo/core/environments/informal_math_training/
- Prompts in alphaapollo/core/environments/prompts/informal_math_training.py
- Environment package in alphaapollo/core/environments/informal_math_evolving/
- Prompts in alphaapollo/core/environments/prompts/informal_math_evolving.py
- Python Code implementation: alphaapollo/core/tools/python_code.py
- RAG implementation: alphaapollo/core/tools/rag/
AlphaApollo is built upon the open-source projects verl, verl-agent, vllm, and sglang. We sincerely thank the contributors of these projects for their valuable work and support.
If you find AlphaApollo useful in your research, please consider citing our work:
@article{zhou2025alphaapollo,
title = {{AlphaApollo}: A System for Deep Agentic Reasoning},
author = {Zhou, Zhanke and Cao, Chentao and Feng, Xiao and Li, Xuan and Li, Zongze and Lu, Xiangyu and Yao, Jiangchao and Huang, Weikai and Cheng, Tian and Zhang, Jianghangfan and Jiang, Tangyu and Xu, Linrui and Zheng, Yiming and Miranda, Brando and Liu, Tongliang and Koyejo, Sanmi and Sugiyama, Masashi and Han, Bo},
journal = {arXiv preprint arXiv:2510.06261},
year = {2025}
}