PA
pat-jj/Awesome-Adaptation-of-Agentic-AI
Repo for "Adaptation of Agentic AI"
Awesome Adaptation of Agentic AI
A curated list of papers on adaptation strategies of agentic AI systems. This repository accompanies the paper "Adaptation of Agentic AI" (Ongoing Work).
Cite this paper:
@article{jiang2025adaptation,
title={Adaptation of Agentic AI},
author={Jiang, Pengcheng and Lin, Jiacheng and Shi, Zhiyi and Wang, Zifeng and He, Luxi and Wu, Yichen and Zhong, Ming and Song, Peiyang and Zhang, Qizheng and Wang, Heng and others},
journal={arXiv preprint arXiv:2512.16301},
year={2025}
}
Table of Contents
Agent Adaptation
A1: Tool Execution Signaled Agent Adaptation
Development Timeline:
RL-based Methods
| Time | Method | Venue | Task(s) | Tool(s) | Agent Backbone | Tuning |
|---|---|---|---|---|---|---|
| 2025.11 | Orion | arXiv |
IR | Retrievers | LFM2 | GRPO |
| 2025.10 | olmOCR2 | arXiv |
Document OCR | Synthetic Document Verifier | Qwen2.5-VL | SFT, GRPO |
| 2025.10 | AlphaProof | Nature’25 |
Formal Theorem Proving | Lean Compiler | Transformer (3B Enc-Dec) | SFT, AlphaZero, TTRL |
| 2025.10 | ToolExpander | arXiv |
Tool-Calling | Various APIs | Qwen2.5 | SFT, GRPO |
| 2025.09 | BFS-Prover-V2 | arXiv |
Formal Theorem Proving | Lean Compiler | Qwen2.5 | BFS-Guided AlphaZero-like EI |
| 2025.09 | WebGen-Agent | arXiv |
Website Generation | VLM, GUI Agent, Code Executor | Various Models | SFT, Step-GRPO |
| 2025.09 | Tool-R1 | arXiv |
General Tool-Augmented Reasoning, Multimodal QA | Code Execution, Multimedia Tools | Qwen2.5 | GRPO |
| 2025.08 | FTRL | arXiv |
Multi-Step Tool-Use | Simulated APIs | Qwen3 | GRPO |
| 2025.08 | Goedel-Prover-V2 | arXiv |
Formal Theorem Proving | Lean Compiler | Qwen3 | SFT, GRPO |
| 2025.07 | Leanabell-Prover-V2 | arXiv |
Formal Theorem Proving | Lean Compiler | Qwen2.5 | SFT, AlphaZero-like EI |
| 2025.06 | Router-R1 | NeurIPS'25 |
Multi-Round Routing | LLM Routing Pool | Qwen2.5, LLaMA3.2 | PPO |
| 2025.05 | R1-Code-Interpreter | arXiv |
Coding | Code Execution Sandbox | Qwen2.5 | GRPO |
| 2025.05 | Tool-N1 | arXiv |
Tool-Calling | Various APIs | Qwen2.5 | GRPO |
| 2025.04 | DeepSeek-Prover-V2 | arXiv |
Formal Theorem Proving | Lean Compiler | DeepSeek-V2 | SFT, GRPO |
| 2025.04 | Kimina-Prover | arXiv |
Formal Theorem Proving | Lean Compiler | LLaMA-2 | SFT, AlphaZero-like EI |
| 2025.04 | SQL-R1 | NeurIPS'25 |
Text2SQL Search | SQL Engine | Qwen2.5, OmniSQL | SFT, GRPO |
| 2025.03 | Rec-R1 | TMLR'25 |
Recommendation Optimization | Recommendation System | Qwen2.5, LLaMA3.2 | GRPO |
| 2025.03 | ReZero | arXiv |
Web Search, IR | Web Search Engine | LLaMA3.2 | GRPO |
| 2025.03 | Code-R1 | --- |
Coding | Code Executor | Qwen2.5 | GRPO |
| 2025.02 | DeepRetrieval | COLM'25 |
Web Search, IR, Text2SQL | Search Engine, Retrievers, SQL exec. | Qwen2.5, LLaMA3.2 | PPO, GRPO |
| 2025.01 | DeepSeek-R1-Zero (Code) | Nature |
Coding | Code Executor | DeepSeek-V3-Base | GRPO |
| 2024.10 | RLEF | ICML'25 |
Coding | Code Executor | LLaMA3.1 | PPO |
| 2024.08 | DeepSeek-Prover-V1.5 | ICLR’25 |
Formal Theorem Proving | Lean 4 Prover | DeepSeek-Prover-V1.5-RL | SFT, GRPO |
| 2024.05 | LeDex | NeurIPS'24 |
Coding | Code Executor | StarCoder & CodeLlaMA | SFT, PPO |
SFT & DPO Methods
| Time | Method | Venue | Task(s) | Tool(s) | Agent Backbone | Tuning |
|---|---|---|---|---|---|---|
| 2024.12 | AWL | ICML'25 |
Scientific Reasoning, Adaptive Tool Usage |
Scientific Simulators | Llama-3.1-8B, Qwen-2.5-{14/32}B |
SFT, DPO |
| 2024.10 | LeReT | ICLR'25 |
IR | Dense Retriever | LLaMA3, Gemma2 | DPO-like (IPO) |
| 2024.10 | ToolFlow | NAACL'25 |
Tool-Calling | Various APIs | LLaMA3.1 | SFT |
| 2024.06 | TP-LLaMA | NeurIPS'24 |
Tool-Calling | Various APIs | LLaMA2 | SFT, DPO |
| 2024.05 | AutoTools | WWW'25 |
Automated Tool-Calling | Various APIs | GPT4, LLaMA3, Mistral | SFT |
| 2024.03 | CYCLE | OOPSLA'24 |
Coding | Code Executor | CodeGen, StarCoder | SFT |
| 2024.02 | RetPO | NAACL'25 |
IR | Retriever | LLaMA2-7B | SFT, DPO |
| 2024.02 | CodeAct | ICML'24 |
Coding | Code Executor | LLaMA2, Mistral | SFT |
| 2024.01 | NExT | ICML'24 |
Program Repair | Code Executor | PaLM2 | SFT |
| 2023.07 | ToolLLM | ICLR'24 |
Tool-Calling, API Planning, Multi-Tool Reasoning | Real-World APIs | LLaMA, Vicuna | SFT |
| 2023.06 | ToolAlpaca | arXiv |
Multi-Turn Tool-Use | Simulated APIs | Vicuna | SFT |
| 2023.05 | Gorilla | NeurIPS'24 |
Tool-Calling, API Retrieval | Various APIs | LLaMA | SFT |
| 2023.05 | TRICE | NAACL'24 |
Math Reasoning, QA, Multilingual QA, Knowledge Retrieval | Calculator, WikiSearch, Atlas QA Model, NLLB Translator | ChatGLM, Alpaca, Vicuna | SFT |
| 2023.02 | Toolformer | NeurIPS'23 |
QA, Math | Calculator, QA system, Search Engine, Translation System, Calendar | GPT-J | SFT |
A2: Agent Output Signaled Agent Adaptation
Development Timeline:
Methods with Tools
| Time | Method | Venue | Task(s) | Tool(s) | Agent Backbone | Tuning |
|---|---|---|---|---|---|---|
| 2025.10 | TT-SI | arXiv |
Tool Calling | Various APIs | Qwen2.5 | Test-Time Fine-Tuning |
| 2025.10 | A²FM | arXiv |
Web Navigation, Math, QA | Search Engine, Crawl, Code Executor | Qwen2.5 | APO, GRPO |
| 2025.09 | VerlTool | arXiv |
Math, QA, SQL, Visual, Web Search, Coding | Code Interpreter, Search Engine, SQL Executor, Vision Tools | Qwen2.5, Qwen3 | GRPO |
| 2025.08 | MedResearcher-R1 | arXiv |
Medical Multi-hop QA | Medical Retriever, Web Search API, Document Reader | MedResearcher-R1 | SFT, GRPO |
| 2025.08 | Agent Lightning | arXiv |
Text-to-SQL, RAG, Math | SQL Executor, Retriever, Calculator | LLaMA3.2 | LightningRL |
| 2025.07 | CodePRM | ACL'25 |
Coding | Code Executor | Qwen2.5-Coder | SFT |
| 2025.07 | DynaSearcher | arXiv |
Multi-Hop QA, RAG | Document Search, KG Search | Qwen2.5, LLaMA3.1 | GRPO |
| 2025.06 | MMSearch-R1 | arXiv |
Web Browsing, QA, Multimodal Search | Image Search, Web Browsing, Retriever | Qwen2.5 | REINFORCE, SFT |
| 2025.06 | Self-Challenging | arXiv |
Web Browsing, Calculation, Retail, Airline | Code Interpreter, Web Browser, Database APIs | LLaMA3.1 | REINFORCE, SFT |
| 2025.05 | StepSearch | EMNLP'25 |
Multi-Hop QA | Search Engine, Retriever | Qwen2.5 | StePPO |
| 2025.05 | ZeroSearch | arXiv |
Multi-Hop QA, QA | Search Engine, Web Search | Qwen2.5, LLaMA3.2 | REINFORCE, GPRO, PPO, SFT |
| 2025.05 | AutoRefine | NeurIPS'25 |
Multi-Hop QA, QA | Retriever | Qwen2.5 | GRPO |
| 2025.04 | ReTool | arXiv |
Math | Code Interpreter | Qwen2.5 | PPO |
| 2025.04 | ToolRL | arXiv |
Tool Calling | Various Tools | Various Models | GRPO |
| 2025.04 | DeepResearcher | arXiv |
QA, Multi-Hop Reasoning, Deep Research | Web Search API, Web Browser | Qwen2.5 | GRPO |
| 2025.03 | ReSearch | NeurIPS'25 |
QA | Search Engine, Retriever | Qwen2.5 | GRPO |
| 2025.03 | Search-R1 | COLM'25 |
QA | Search Engine, Retriever | Qwen2.5 | PPO, GRPO |
| 2025.03 | R1-Searcher | arXiv |
QA | Retriever | LLaMA3.1, Qwen2.5 | REINFORCE++ |
| 2025.02 | RAS | arXiv |
QA | Retriever | LLaMA2, LLaMA3.2 | SFT |
| 2025.01 | Agent-R | arXiv |
Various Tasks | Monte Carlo Tree Search | Qwen2.5, LLaMA3.2 | SFT |
| 2024.06 | Re-ReST | EMNLP'24 |
Multi-Hop QA, VQA, Sequential Decision, Coding | Various APIs | Various Models | DPO |
| 2024.06 | RPG | EMNLP'24 |
RAG, QA, Multi-hop Reasoning | Search Engine, Retriever | LLaMA2, GPT3.5 | SFT |
| 2023.10 | Self-RAG | ICLR'24 |
RAG, QA, Fact Verification | Retriever | LLaMA2 | SFT |
| 2023.10 | FireAct | arXiv |
QA | Search API | GPT3.5, LLaMA2, CodeLLaMA | SFT |
Methods without Tools
| Time | Method | Venue | Task(s) | Tool(s) | Agent Backbone | Tuning |
|---|---|---|---|---|---|---|
| 2025.10 | Empower | arXiv |
Coding | --- | Gemma3 | SFT |
| 2025.10 | KnowRL | arXiv |
Knowledge calibration | --- | LLaMA3.1, Qwen2.5 | REINFORCE++ |
| 2025.10 | GRACE | arXiv |
Embedding Tasks | --- | Qwen2.5, Qwen3, LLaMA3.2 | GRPO |
| 2025.06 | Magistral | arXiv |
Math, Coding | --- | Magistral | PPO, GRPO |
| 2025.05 | EHRMind | arXiv |
EHR-based Reasoning | --- | LLaMA3 | SFT, GRPO |
| 2025.01 | Kimi k1.5 | arXiv |
Math, Coding | --- | Kimi k1.5 | GRPO |
| 2025.01 | DeepSeek-R1-Zero (Math) | Nature |
Math | --- | DeepSeek-V3 | GRPO |
| 2024.09 | SCoRe | ICLR'25 |
Math, Coding, QA | --- | Gemini1.0 Pro, Gemini1.5 Flash | REINFORCE |
| 2024.07 | RISE | NeurIPS'24 |
Math | --- | LLaMA2, LLaMA3, Mistral | SFT |
| 2024.06 | TextGrad | Nature |
Various Tasks | --- | GPT3.5, GPT4o | Prompt Tuning |
| 2023.03 | Self-Refine | NeurIPS'23 |
Dialogue, Math, Coding | --- | GPT3.5, GPT4, CODEX | Test-Time Prompting |
Tool Adaptation
T1: Agent-Agnostic Tool Adaptation
Foundational Systems and Architectures
| Year.Month | Method Name | Venue | Paper Name |
|---|---|---|---|
| 2021.08 | Neural Operators | JMLR'23 |
Neural Operator: Learning Maps Between Function Spaces |
| 2023.09 | HuggingGPT | NeurIPS'23 |
HuggingGPT: Solving AI Tasks with ChatGPT and its Friends in Hugging Face |
| 2023.08 | ViperGPT | ICCV'23 |
ViperGPT: Visual Inference via Python Execution for Reasoning |
| 2025.07 | SciToolAgent | Nature Comp. Sci.'25 |
SciToolAgent: A Knowledge-Graph-Driven Scientific Agent for Multitool Integration |
Categories and Training Methods
| Year.Month | Method Name | Venue | Paper Name |
|---|---|---|---|
| 2021.01 | CLIP | ICML'21 |
Learning Transferable Visual Models from Natural Language Supervision |
| 2023.04 | SAM | ICCV'23 |
Segment Anything |
| 2024.06 | SAM-CLIP | CVPR'24 |
SAM-CLIP: Merging Vision Foundation Models Towards Semantic and Spatial Understanding |
| 2023.12 | Whisper | ICML'23 |
Robust Speech Recognition via Large-Scale Weak Supervision |
| 2024.10 | CodeAct | ICML'24 |
Executable Code Actions Elicit Better LLM Agents |
| 2020.04 | DPR | EMNLP'20 |
Dense Passage Retrieval for Open-Domain Question Answering |
| 2020.04 | ColBERT | SIGIR'20 |
ColBERT: Efficient and Effective Passage Search via Contextualized Late Interaction over BERT |
| 2021.12 | Contriever | TMLR'22 |
Unsupervised Dense Information Retrieval with Contrastive Learning |
| 2022.12 | e5 | arXiv |
Text Embeddings by Weakly-Supervised Contrastive Pre-training |
| 2021.07 | AlphaFold2 | Nature |
Highly Accurate Protein Structure Prediction with AlphaFold |
| 2023.03 | ESMFold | Science |
Evolutionary-Scale Prediction of Atomic-Level Protein Structure with a Language Model |
T2: Agent-Supervised Tool Adaptation
Development Timeline:
| Time | Method | Venue | Task(s) | Tool Backbone | Agent Backbone | Tuning |
|---|---|---|---|---|---|---|
| 2025.10 | QAgent | arXiv |
QA, RAG | Qwen2.5-3B | Qwen-7B | GRPO |
| 2025.10 | AgentFlow | arXiv |
Web Search, Planning, Reasoning, Math | Qwen2.5-7B | Qwen2.5-7B | Flow-GRPO |
| 2025.10 | Advisor Models | arXiv |
Math, Reasoning | Qwen2.5-7B, Qwen3-8B | GPT-4o-Mini, GPT-5, Claude4-Sonnet, GPT-4.1-Mini | GRPO |
| 2025.10 | AutoGraph-R1 | arXiv |
KG Construction, RAG | KG Constructor (Qwen2.5-3B/7B) | Frozen RAG Generator (Qwen2.5-7B) | GRPO |
| 2025.10 | MAE | arXiv |
Math, Coding, Commonsense Reasoning | Qwen2.5-3B | Qwen2.5-3B | REINFORCE++ |
| 2025.09 | Mem-α | arXiv |
Retrieval, Test-Time Learning, Long-Range Understanding | Qwen3-4B | Qwen3-4B, Qwen3-32B, GPT-4.1-Mini | GRPO |
| 2025.08 | AI-SearchPlanner | arXiv |
Web QA | Qwen3-32b | Qwen2.5-7B | PPO |
| 2025.08 | Memento | arXiv |
Long-Horizon Reasoning, Web Research, QA, Academic Reasoning | Q-function (two-layer MLPs) | GPT-4.1 | Soft Q-Learning |
| 2025.08 | R-Zero | arXiv |
Math, Reasoning | Qwen3-4B, Qwen3-8B, OctoThinker-3B, OctoThinker-8B | Qwen3-4B, Qwen3-8B, OctoThinker-3B, OctoThinker-8B | GRPO |
| 2025.06 | Sysformer | arXiv |
QA, RAG | Small Transformer | LLaMA-2-7B, LLaMA-3.1-8B, Mistral-7B, Phi-3.5-mini, Zephyr-7B-beta | Supervised Learning |
| 2025.05 | s3 | EMNLP'25 |
QA, RAG | Qwen2.5-7B | Qwen2.5-7B, Qwen2.5-14B, Claude-3-Haiku | PPO |
| 2024.10 | Matryoshka Pilot | NeurIPS'25 |
Math, Planning, Reasoning | LLaMA3-8B, Qwen2.5-7B | GPT-4o-Mini, GPT-3.5-Turbo | DPO, IDPO |
| 2024.06 | CoBB | EMNLP'24 |
QA, Math | Mistral-7b-inst-v2 | GPT-3.5-Turbo, Claude-3-Haiku, Phi-3-mini-4k-inst, Gemma-1.1-7B-it, Mistral-7B-inst-v2 | SFT, ORPO |
| 2024.05 | Medadapter | EMNLP'24 |
Medical QA, NLI, RQE | BERT-Base-Uncased | GPT-3.5-Turbo | SFT, BPO |
| 2024.03 | BLADE | AAAI'25 |
Domain-Specific QA | BLOOMZ-1b7 | ChatGPT, ChatGLM, Baichuan, Qwen | SFT, BPO |
| 2024.02 | ARL2 | ACL'24 |
QA | LLaMA2-7B | GPT-3.5-Turbo | Contrastive Learning |
| 2024.02 | EVOR | EMNLP'24 |
RAG-based Coding | GPT-3.5-Turbo | GPT-3.5-Turbo, CodeLLaMA | Prompt Engineering |
| 2024.02 | Bbox-Adapter | ICML'24 |
QA | DeBERTa-v3-base (0.1B), DeBERTa-v3-large (0.3B) | GPT-3.5-Turbo, Mixtral-8x7B | Contrastive Learning |
| 2024.01 | Proxy-Tuning | COLM'24 |
QA, Math, Code | LLaMA2-7B | LLaMA2-70B | Proxy-Tuning |
| 2024.01 | BGM | ACL'24 |
QA, Personalized Generation (NQ, HotpotQA, Email, Book) | T5-XXL-11B | PaLM2-S | SFT, PPO |
| 2023.10 | RA-DIT | ICLR'24 |
Knowledge-Intensive Tasks (MMLU, NQ, TQA, ELI5, HotpotQA, etc.) | DRAGON+ | LLaMA-65B | SFT, LSR |
| 2023.06 | LLM-R | EACL'24 |
Zero-shot NLU (Reading Comprehension, QA, NLI, Paraphrase, Sentiment, Summarization) | E5-base | GPT-Neo-2.7B, LLaMA-13B, GPT-3.5-Turbo | Contrastive Learning |
| 2023.05 | AAR | ACL'23 |
Zero-Shot Generalization (MMLU, PopQA) | ANCE, Contriever | Flan-T5-Small, InstructGPT | Contrastive Learning |
| 2023.05 | ToolkenGPT | NeurIPS'23 |
Numerical Reasoning, QA, Plan Generation | Token Embedding | GPT-J 6B, OPT-6.7B, OPT-13B | Proxy-Tuning |
| 2023.03 | UPRISE | EMNLP'23 |
Zero-shot NLU (Reading Comprehension, QA, NLI, Paraphrase, Sentiment, Summarization) | GPT-Neo-2.7B | BLOOM-7.1B, OPT-66B, GPT-3-175B | Contrastive Learning |
| 2023.01 | REPLUG | NAACL'24 |
QA | Contriever | GPT3-175B, PaLM, Codex, LLaMA-13B | Proxy-Tuning, LSR |
Citation
If you find this repository useful, please consider citing our survey:
@article{jiang2025adaptation,
title={Adaptation of Agentic AI},
author={Jiang, Pengcheng and Lin, Jiacheng and Shi, Zhiyi and Wang, Zifeng and He, Luxi and Wu, Yichen and Zhong, Ming and Song, Peiyang and Zhang, Qizheng and Wang, Heng and others},
journal={arXiv preprint arXiv:2512.16301},
year={2025}
}
Contributing
We welcome contributions! Please feel free to submit a Pull Request to add new papers or update existing entries.
(ノ◕ヮ◕)ノ*:・゚✧ Keep exploring the awesome world of agentic AI! ✧゚・: *ヽ(◕ヮ◕ヽ)







