Skip to content

tmlr-group/AlphaApollo

Repository files navigation

AlphaApollo: A System for Deep Agentic Reasoning

AlphaApollo is an agentic reasoning framework that orchestrates multiple models and tools to enable iterative, verifiable, and self-evolving reasoning. It supports a broad range of paradigms, including tool-integrated reasoning, agentic post-training (e.g., multi-turn supervised fine-tuning and reinforcement learning), and agentic self-evolution. The framework offers extensible environments and toolsets for easy customization, extension, and scalable deployment of agentic reasoning workflows.

News

  • [2026.01] We are excited to release AlphaApollo, an agentic LLM reasoning system for advanced reasoning.
  • [2025.10] Our technical report is released; see here for details.

Installation

conda create -n alphaapollo python==3.12 -y
conda activate alphaapollo

git clone https://github.com/tmlr-group/AlphaApollo.git
cd AlphaApollo

bash installation.sh

Supported features

  • Tool-integrated reasoning rollout with seamless environment interaction
  • Dynamic memory updates for multi-turn reasoning
  • Multi-turn supervised fine-tuning (SFT)
  • Reinforcement learning algorithms: GRPO, PPO, DAPO, and more
  • Multi-round, multi-model solution refinement with shared state
  • Iterative improvement via feedback and executable checks
  • Python interpreter
  • Retrieval-Augmented Generation (RAG)

Quick-start recipes

Detailed quick-start commands (including script entrypoints) are documented in quick-start.md.

Note: Before using the local RAG module, please follow RAG Service Setup.

Agentic reasoning

# no-tool reasoning
python3 -m alphaapollo.workflows.test \
  --model.path=Qwen/Qwen2.5-3B-Instruct \
  --preprocess.data_source=math-ai/aime24
# tool-integrated reasoning
python3 -m alphaapollo.workflows.test \
  --model.path=Qwen/Qwen2.5-3B-Instruct \
  --preprocess.data_source=math-ai/aime24 \
  --env.informal_math.enable_python_code=true \
  --env.informal_math.enable_local_rag=false \
  --env.max_steps=4

Single-question evaluation:

# Select specific dataset samples (e.g., the 0th AIME test question) and test
python3 -m alphaapollo.workflows.test \
  --model.path=Qwen/Qwen2.5-3B-Instruct \
  --preprocess.module=alphaapollo.data_preprocess.prepare_custom_data \
  --preprocess.data_source=math-ai/aime24 \
  --preprocess.splits=test \
  --preprocess.sample_indices=0 \
  --data.path=~/data/custom_data/test.parquet
# Directly evaluate a plain text question (not from a dataset)
python3 -m alphaapollo.workflows.test \
  --model.path=Qwen/Qwen2.5-3B-Instruct \
  --preprocess.module=alphaapollo.data_preprocess.prepare_single_question \
  --preprocess.question_text="What is the sum of integers from 1 to 1000?" \
  --preprocess.ground_truth="500500" \
  --data.path=~/data/single_question/test.parquet

Agentic learning

# multi-turn SFT
python3 -m alphaapollo.workflows.sft \
  --model.partial_pretrain=Qwen/Qwen2.5-3B-Instruct \
  --preprocess.data_source=AI-MO/NuminaMath-TIR
# multi-turn RL
python3 -m alphaapollo.workflows.rl \
  --model.path=Qwen/Qwen2.5-3B-Instruct \
  --preprocess.data_source=HuggingFaceH4/MATH-500 \
  --algorithm.adv_estimator=grpo

Agentic self-evolution

Before running the self-evolution scripts, make sure to serve the corresponding number of models.

python alphaapollo/utils/ray_serve_llm.py --model_path Qwen/Qwen3-4B-Instruct-2507 --gpus "0,1" --port 8000 --model_id "qwen3_4b_inst"
# single-model evolution
python3 -m alphaapollo.workflows.evo \
  --preprocess.data_source=math-ai/aime24 \
  --run.dataset_name=aime24 \
  --policy_model_cfg.model_name=qwen3_4b_inst \
  --policy_model_cfg.base_url=http://localhost:8000/v1 \
  --verifier_cfg.model_name=qwen3_4b_inst \
  --verifier_cfg.base_url=http://localhost:8000/v1

Code Structure

+------------------------------------------------------------------+
| alphaapollo/data_preprocess                                      |
| (dataset preparation scripts)                                    |
+------------------------------------------------------------------+
                               |
                               V
+------------------------------------------------------------------+
| alphaapollo/core                                                 |
| (core code)                                                      |
|                                                                  |
|  +----------------------+              +----------------------+  |
|  | generation/          |              | tools/               |  |
|  |                      | <----------> | - python_code        |  |
|  |                      |              | - rag/               |  |
|  +----------------------+              +----------------------+  |
|              Λ                                                   |
|              |                                                   |
|              V                                                   |
|  +------------------------------------------------------------+  |
|  | environments/                                              |  |
|  | - informal_math_training/                                  |  |
|  | - informal_math_evolving/                                  |  |
|  | - memory/                                                  |  |
|  | - prompts/                                                 |  |
|  +------------------------------------------------------------+  |
+------------------------------------------------------------------+

Informal Math Environment (Training):

Informal Math Environment (Evolving):

Tools (for reference)

Acknowledgement

AlphaApollo is built upon the open-source projects verl, verl-agent, vllm, and sglang. We sincerely thank the contributors of these projects for their valuable work and support.

Cite

If you find AlphaApollo useful in your research, please consider citing our work:

@article{zhou2025alphaapollo,
  title = {{AlphaApollo}: A System for Deep Agentic Reasoning},
  author = {Zhou, Zhanke and Cao, Chentao and Feng, Xiao and Li, Xuan and Li, Zongze and Lu, Xiangyu and Yao, Jiangchao and Huang, Weikai and Cheng, Tian and Zhang, Jianghangfan and Jiang, Tangyu and Xu, Linrui and Zheng, Yiming and Miranda, Brando and Liu, Tongliang and Koyejo, Sanmi and Sugiyama, Masashi and Han, Bo},
  journal = {arXiv preprint arXiv:2510.06261},
  year = {2025}
}

About

[arXiv:2510.06261] "AlphaApollo: A System for Deep Agentic Reasoning"

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages