Awesome Adaptation of Agentic AI

A curated list of papers on adaptation strategies of agentic AI systems. This repository accompanies the paper "Adaptation of Agentic AI" (Ongoing Work).

Cite this paper:

@article{jiang2025adaptation,
  title={Adaptation of Agentic AI},
  author={Jiang, Pengcheng and Lin, Jiacheng and Shi, Zhiyi and Wang, Zifeng and He, Luxi and Wu, Yichen and Zhong, Ming and Song, Peiyang and Zhang, Qizheng and Wang, Heng and others},
  journal={arXiv preprint arXiv:2512.16301},
  year={2025}
}

Agent Adaptation
- A1: Tool Execution Signaled
- A2: Agent Output Signaled
Tool Adaptation
- T1: Agent-Agnostic Tool Adaptation
- T2: Agent-Supervised Tool Adaptation

Agent Adaptation

A1: Tool Execution Signaled Agent Adaptation

Development Timeline:

RL-based Methods

Time	Method	Venue	Task(s)	Tool(s)	Agent Backbone	Tuning
2025.11	Orion	arXiv Paper	IR	Retrievers	LFM2	GRPO
2025.10	olmOCR2	arXiv Paper Code	Document OCR	Synthetic Document Verifier	Qwen2.5-VL	SFT, GRPO
2025.10	AlphaProof	Nature’25 Paper	Formal Theorem Proving	Lean Compiler	Transformer (3B Enc-Dec)	SFT, AlphaZero, TTRL
2025.10	ToolExpander	arXiv Paper	Tool-Calling	Various APIs	Qwen2.5	SFT, GRPO
2025.09	BFS-Prover-V2	arXiv Paper Code	Formal Theorem Proving	Lean Compiler	Qwen2.5	BFS-Guided AlphaZero-like EI
2025.09	WebGen-Agent	arXiv Paper Code	Website Generation	VLM, GUI Agent, Code Executor	Various Models	SFT, Step-GRPO
2025.09	Tool-R1	arXiv Paper Code	General Tool-Augmented Reasoning, Multimodal QA	Code Execution, Multimedia Tools	Qwen2.5	GRPO
2025.08	FTRL	arXiv Paper Code	Multi-Step Tool-Use	Simulated APIs	Qwen3	GRPO
2025.08	Goedel-Prover-V2	arXiv Paper Code	Formal Theorem Proving	Lean Compiler	Qwen3	SFT, GRPO
2025.07	Leanabell-Prover-V2	arXiv Paper Code	Formal Theorem Proving	Lean Compiler	Qwen2.5	SFT, AlphaZero-like EI
2025.06	Router-R1	NeurIPS'25 Paper Code	Multi-Round Routing	LLM Routing Pool	Qwen2.5, LLaMA3.2	PPO
2025.05	R1-Code-Interpreter	arXiv Paper Code	Coding	Code Execution Sandbox	Qwen2.5	GRPO
2025.05	Tool-N1	arXiv Paper Code	Tool-Calling	Various APIs	Qwen2.5	GRPO
2025.04	DeepSeek-Prover-V2	arXiv Paper Code	Formal Theorem Proving	Lean Compiler	DeepSeek-V2	SFT, GRPO
2025.04	Kimina-Prover	arXiv Paper Code	Formal Theorem Proving	Lean Compiler	LLaMA-2	SFT, AlphaZero-like EI
2025.04	SQL-R1	NeurIPS'25 Paper Code	Text2SQL Search	SQL Engine	Qwen2.5, OmniSQL	SFT, GRPO
2025.03	Rec-R1	TMLR'25 Paper Code	Recommendation Optimization	Recommendation System	Qwen2.5, LLaMA3.2	GRPO
2025.03	ReZero	arXiv Paper Code	Web Search, IR	Web Search Engine	LLaMA3.2	GRPO
2025.03	Code-R1	--- Code	Coding	Code Executor	Qwen2.5	GRPO
2025.02	DeepRetrieval	COLM'25 Paper Code	Web Search, IR, Text2SQL	Search Engine, Retrievers, SQL exec.	Qwen2.5, LLaMA3.2	PPO, GRPO
2025.01	DeepSeek-R1-Zero (Code)	Nature Paper	Coding	Code Executor	DeepSeek-V3-Base	GRPO
2024.10	RLEF	ICML'25 Paper	Coding	Code Executor	LLaMA3.1	PPO
2024.08	DeepSeek-Prover-V1.5	ICLR’25 Paper Code	Formal Theorem Proving	Lean 4 Prover	DeepSeek-Prover-V1.5-RL	SFT, GRPO
2024.05	LeDex	NeurIPS'24 Paper	Coding	Code Executor	StarCoder & CodeLlaMA	SFT, PPO

SFT & DPO Methods

Time	Method	Venue	Task(s)	Tool(s)	Agent Backbone	Tuning
2024.12	AWL	ICML'25 Paper Code	Scientific Reasoning, Adaptive Tool Usage	Scientific Simulators	Llama-3.1-8B, Qwen-2.5-{14/32}B	SFT, DPO
2024.10	LeReT	ICLR'25 Paper Code	IR	Dense Retriever	LLaMA3, Gemma2	DPO-like (IPO)
2024.10	ToolFlow	NAACL'25 Paper	Tool-Calling	Various APIs	LLaMA3.1	SFT
2024.06	TP-LLaMA	NeurIPS'24 Paper	Tool-Calling	Various APIs	LLaMA2	SFT, DPO
2024.05	AutoTools	WWW'25 Paper Code	Automated Tool-Calling	Various APIs	GPT4, LLaMA3, Mistral	SFT
2024.03	CYCLE	OOPSLA'24 Paper	Coding	Code Executor	CodeGen, StarCoder	SFT
2024.02	RetPO	NAACL'25 Paper Code	IR	Retriever	LLaMA2-7B	SFT, DPO
2024.02	CodeAct	ICML'24 Paper Code	Coding	Code Executor	LLaMA2, Mistral	SFT
2024.01	NExT	ICML'24 Paper	Program Repair	Code Executor	PaLM2	SFT
2023.07	ToolLLM	ICLR'24 Paper Code	Tool-Calling, API Planning, Multi-Tool Reasoning	Real-World APIs	LLaMA, Vicuna	SFT
2023.06	ToolAlpaca	arXiv Paper Code	Multi-Turn Tool-Use	Simulated APIs	Vicuna	SFT
2023.05	Gorilla	NeurIPS'24 Paper Code	Tool-Calling, API Retrieval	Various APIs	LLaMA	SFT
2023.05	TRICE	NAACL'24 Paper Code	Math Reasoning, QA, Multilingual QA, Knowledge Retrieval	Calculator, WikiSearch, Atlas QA Model, NLLB Translator	ChatGLM, Alpaca, Vicuna	SFT
2023.02	Toolformer	NeurIPS'23 Paper Code	QA, Math	Calculator, QA system, Search Engine, Translation System, Calendar	GPT-J	SFT

A2: Agent Output Signaled Agent Adaptation

Development Timeline:

Methods with Tools

Time	Method	Venue	Task(s)	Tool(s)	Agent Backbone	Tuning
2025.10	TT-SI	arXiv Paper	Tool Calling	Various APIs	Qwen2.5	Test-Time Fine-Tuning
2025.10	A²FM	arXiv Paper Code	Web Navigation, Math, QA	Search Engine, Crawl, Code Executor	Qwen2.5	APO, GRPO
2025.09	VerlTool	arXiv Paper Code	Math, QA, SQL, Visual, Web Search, Coding	Code Interpreter, Search Engine, SQL Executor, Vision Tools	Qwen2.5, Qwen3	GRPO
2025.08	MedResearcher-R1	arXiv Paper Code	Medical Multi-hop QA	Medical Retriever, Web Search API, Document Reader	MedResearcher-R1	SFT, GRPO
2025.08	Agent Lightning	arXiv Paper Code	Text-to-SQL, RAG, Math	SQL Executor, Retriever, Calculator	LLaMA3.2	LightningRL
2025.07	CodePRM	ACL'25 Paper	Coding	Code Executor	Qwen2.5-Coder	SFT
2025.07	DynaSearcher	arXiv Paper Code	Multi-Hop QA, RAG	Document Search, KG Search	Qwen2.5, LLaMA3.1	GRPO
2025.06	MMSearch-R1	arXiv Paper Code	Web Browsing, QA, Multimodal Search	Image Search, Web Browsing, Retriever	Qwen2.5	REINFORCE, SFT
2025.06	Self-Challenging	arXiv Paper	Web Browsing, Calculation, Retail, Airline	Code Interpreter, Web Browser, Database APIs	LLaMA3.1	REINFORCE, SFT
2025.05	StepSearch	EMNLP'25 Paper Code	Multi-Hop QA	Search Engine, Retriever	Qwen2.5	StePPO
2025.05	ZeroSearch	arXiv Paper Code	Multi-Hop QA, QA	Search Engine, Web Search	Qwen2.5, LLaMA3.2	REINFORCE, GPRO, PPO, SFT
2025.05	AutoRefine	NeurIPS'25 Paper Code	Multi-Hop QA, QA	Retriever	Qwen2.5	GRPO
2025.04	ReTool	arXiv Paper Code	Math	Code Interpreter	Qwen2.5	PPO
2025.04	ToolRL	arXiv Paper Code	Tool Calling	Various Tools	Various Models	GRPO
2025.04	DeepResearcher	arXiv Paper Code	QA, Multi-Hop Reasoning, Deep Research	Web Search API, Web Browser	Qwen2.5	GRPO
2025.03	ReSearch	NeurIPS'25 Paper Code	QA	Search Engine, Retriever	Qwen2.5	GRPO
2025.03	Search-R1	COLM'25 Paper Code	QA	Search Engine, Retriever	Qwen2.5	PPO, GRPO
2025.03	R1-Searcher	arXiv Paper Code	QA	Retriever	LLaMA3.1, Qwen2.5	REINFORCE++
2025.02	RAS	arXiv Paper Code	QA	Retriever	LLaMA2, LLaMA3.2	SFT
2025.01	Agent-R	arXiv Paper Code	Various Tasks	Monte Carlo Tree Search	Qwen2.5, LLaMA3.2	SFT
2024.06	Re-ReST	EMNLP'24 Paper Code	Multi-Hop QA, VQA, Sequential Decision, Coding	Various APIs	Various Models	DPO
2024.06	RPG	EMNLP'24 Paper Code	RAG, QA, Multi-hop Reasoning	Search Engine, Retriever	LLaMA2, GPT3.5	SFT
2023.10	Self-RAG	ICLR'24 Paper Code	RAG, QA, Fact Verification	Retriever	LLaMA2	SFT
2023.10	FireAct	arXiv Paper Code	QA	Search API	GPT3.5, LLaMA2, CodeLLaMA	SFT

Methods without Tools

Time	Method	Venue	Task(s)	Tool(s)	Agent Backbone	Tuning
2025.10	Empower	arXiv Paper Code	Coding	---	Gemma3	SFT
2025.10	KnowRL	arXiv Paper Code	Knowledge calibration	---	LLaMA3.1, Qwen2.5	REINFORCE++
2025.10	GRACE	arXiv Paper Code	Embedding Tasks	---	Qwen2.5, Qwen3, LLaMA3.2	GRPO
2025.06	Magistral	arXiv Paper	Math, Coding	---	Magistral	PPO, GRPO
2025.05	EHRMind	arXiv Paper Code	EHR-based Reasoning	---	LLaMA3	SFT, GRPO
2025.01	Kimi k1.5	arXiv Paper Code	Math, Coding	---	Kimi k1.5	GRPO
2025.01	DeepSeek-R1-Zero (Math)	Nature Paper	Math	---	DeepSeek-V3	GRPO
2024.09	SCoRe	ICLR'25 Paper Code	Math, Coding, QA	---	Gemini1.0 Pro, Gemini1.5 Flash	REINFORCE
2024.07	RISE	NeurIPS'24 Paper Code	Math	---	LLaMA2, LLaMA3, Mistral	SFT
2024.06	TextGrad	Nature Paper Code	Various Tasks	---	GPT3.5, GPT4o	Prompt Tuning
2023.03	Self-Refine	NeurIPS'23 Paper Code	Dialogue, Math, Coding	---	GPT3.5, GPT4, CODEX	Test-Time Prompting

Tool Adaptation

T1: Agent-Agnostic Tool Adaptation

Foundational Systems and Architectures

Year.Month	Method Name	Venue	Paper Name
2021.08	Neural Operators	JMLR'23 Paper	Neural Operator: Learning Maps Between Function Spaces
2023.09	HuggingGPT	NeurIPS'23 Paper Code	HuggingGPT: Solving AI Tasks with ChatGPT and its Friends in Hugging Face
2023.08	ViperGPT	ICCV'23 Paper Code	ViperGPT: Visual Inference via Python Execution for Reasoning
2025.07	SciToolAgent	Nature Comp. Sci.'25 Paper	SciToolAgent: A Knowledge-Graph-Driven Scientific Agent for Multitool Integration

Categories and Training Methods

Year.Month	Method Name	Venue	Paper Name
2021.01	CLIP	ICML'21 Paper Code	Learning Transferable Visual Models from Natural Language Supervision
2023.04	SAM	ICCV'23 Paper Code	Segment Anything
2024.06	SAM-CLIP	CVPR'24 Paper	SAM-CLIP: Merging Vision Foundation Models Towards Semantic and Spatial Understanding
2023.12	Whisper	ICML'23 Paper Code	Robust Speech Recognition via Large-Scale Weak Supervision
2024.10	CodeAct	ICML'24 Paper Code	Executable Code Actions Elicit Better LLM Agents
2020.04	DPR	EMNLP'20 Paper Code	Dense Passage Retrieval for Open-Domain Question Answering
2020.04	ColBERT	SIGIR'20 Paper Code	ColBERT: Efficient and Effective Passage Search via Contextualized Late Interaction over BERT
2021.12	Contriever	TMLR'22 Paper Code	Unsupervised Dense Information Retrieval with Contrastive Learning
2022.12	e5	arXiv Paper Code	Text Embeddings by Weakly-Supervised Contrastive Pre-training
2021.07	AlphaFold2	Nature Paper Code	Highly Accurate Protein Structure Prediction with AlphaFold
2023.03	ESMFold	Science Paper	Evolutionary-Scale Prediction of Atomic-Level Protein Structure with a Language Model

T2: Agent-Supervised Tool Adaptation

Development Timeline:

Time	Method	Venue	Task(s)	Tool Backbone	Agent Backbone	Tuning
2025.10	QAgent	arXiv Paper Code	QA, RAG	Qwen2.5-3B	Qwen-7B	GRPO
2025.10	AgentFlow	arXiv Paper Code	Web Search, Planning, Reasoning, Math	Qwen2.5-7B	Qwen2.5-7B	Flow-GRPO
2025.10	Advisor Models	arXiv Paper Code	Math, Reasoning	Qwen2.5-7B, Qwen3-8B	GPT-4o-Mini, GPT-5, Claude4-Sonnet, GPT-4.1-Mini	GRPO
2025.10	AutoGraph-R1	arXiv Paper Code	KG Construction, RAG	KG Constructor (Qwen2.5-3B/7B)	Frozen RAG Generator (Qwen2.5-7B)	GRPO
2025.10	MAE	arXiv Paper Code	Math, Coding, Commonsense Reasoning	Qwen2.5-3B	Qwen2.5-3B	REINFORCE++
2025.09	Mem-α	arXiv Paper Code	Retrieval, Test-Time Learning, Long-Range Understanding	Qwen3-4B	Qwen3-4B, Qwen3-32B, GPT-4.1-Mini	GRPO
2025.08	AI-SearchPlanner	arXiv Paper	Web QA	Qwen3-32b	Qwen2.5-7B	PPO
2025.08	Memento	arXiv Paper Code	Long-Horizon Reasoning, Web Research, QA, Academic Reasoning	Q-function (two-layer MLPs)	GPT-4.1	Soft Q-Learning
2025.08	R-Zero	arXiv Paper Code	Math, Reasoning	Qwen3-4B, Qwen3-8B, OctoThinker-3B, OctoThinker-8B	Qwen3-4B, Qwen3-8B, OctoThinker-3B, OctoThinker-8B	GRPO
2025.06	Sysformer	arXiv Paper	QA, RAG	Small Transformer	LLaMA-2-7B, LLaMA-3.1-8B, Mistral-7B, Phi-3.5-mini, Zephyr-7B-beta	Supervised Learning
2025.05	s3	EMNLP'25 Paper Code	QA, RAG	Qwen2.5-7B	Qwen2.5-7B, Qwen2.5-14B, Claude-3-Haiku	PPO
2024.10	Matryoshka Pilot	NeurIPS'25 Paper Code	Math, Planning, Reasoning	LLaMA3-8B, Qwen2.5-7B	GPT-4o-Mini, GPT-3.5-Turbo	DPO, IDPO
2024.06	CoBB	EMNLP'24 Paper Code	QA, Math	Mistral-7b-inst-v2	GPT-3.5-Turbo, Claude-3-Haiku, Phi-3-mini-4k-inst, Gemma-1.1-7B-it, Mistral-7B-inst-v2	SFT, ORPO
2024.05	Medadapter	EMNLP'24 Paper Code	Medical QA, NLI, RQE	BERT-Base-Uncased	GPT-3.5-Turbo	SFT, BPO
2024.03	BLADE	AAAI'25 Paper Code	Domain-Specific QA	BLOOMZ-1b7	ChatGPT, ChatGLM, Baichuan, Qwen	SFT, BPO
2024.02	ARL2	ACL'24 Paper Code	QA	LLaMA2-7B	GPT-3.5-Turbo	Contrastive Learning
2024.02	EVOR	EMNLP'24 Paper Code	RAG-based Coding	GPT-3.5-Turbo	GPT-3.5-Turbo, CodeLLaMA	Prompt Engineering
2024.02	Bbox-Adapter	ICML'24 Paper Code	QA	DeBERTa-v3-base (0.1B), DeBERTa-v3-large (0.3B)	GPT-3.5-Turbo, Mixtral-8x7B	Contrastive Learning
2024.01	Proxy-Tuning	COLM'24 Paper Code	QA, Math, Code	LLaMA2-7B	LLaMA2-70B	Proxy-Tuning
2024.01	BGM	ACL'24 Paper	QA, Personalized Generation (NQ, HotpotQA, Email, Book)	T5-XXL-11B	PaLM2-S	SFT, PPO
2023.10	RA-DIT	ICLR'24 Paper	Knowledge-Intensive Tasks (MMLU, NQ, TQA, ELI5, HotpotQA, etc.)	DRAGON+	LLaMA-65B	SFT, LSR
2023.06	LLM-R	EACL'24 Paper Code	Zero-shot NLU (Reading Comprehension, QA, NLI, Paraphrase, Sentiment, Summarization)	E5-base	GPT-Neo-2.7B, LLaMA-13B, GPT-3.5-Turbo	Contrastive Learning
2023.05	AAR	ACL'23 Paper Code	Zero-Shot Generalization (MMLU, PopQA)	ANCE, Contriever	Flan-T5-Small, InstructGPT	Contrastive Learning
2023.05	ToolkenGPT	NeurIPS'23 Paper Code	Numerical Reasoning, QA, Plan Generation	Token Embedding	GPT-J 6B, OPT-6.7B, OPT-13B	Proxy-Tuning
2023.03	UPRISE	EMNLP'23 Paper Code	Zero-shot NLU (Reading Comprehension, QA, NLI, Paraphrase, Sentiment, Summarization)	GPT-Neo-2.7B	BLOOM-7.1B, OPT-66B, GPT-3-175B	Contrastive Learning
2023.01	REPLUG	NAACL'24 Paper Code	QA	Contriever	GPT3-175B, PaLM, Codex, LLaMA-13B	Proxy-Tuning, LSR

Citation

If you find this repository useful, please consider citing our survey:

@article{jiang2025adaptation,
  title={Adaptation of Agentic AI},
  author={Jiang, Pengcheng and Lin, Jiacheng and Shi, Zhiyi and Wang, Zifeng and He, Luxi and Wu, Yichen and Zhong, Ming and Song, Peiyang and Zhang, Qizheng and Wang, Heng and others},
  journal={arXiv preprint arXiv:2512.16301},
  year={2025}
}

Contributing

We welcome contributions! Please feel free to submit a Pull Request to add new papers or update existing entries.

_{(ﾉ◕ヮ◕)ﾉ*:･ﾟ✧ Keep exploring the awesome world of agentic AI! ✧ﾟ･: *ヽ(◕ヮ◕ヽ)}

pat-jj/Awesome-Adaptation-of-Agentic-AI

Awesome Adaptation of Agentic AI

Table of Contents

Agent Adaptation

A1: Tool Execution Signaled Agent Adaptation

RL-based Methods

SFT & DPO Methods

A2: Agent Output Signaled Agent Adaptation

Methods with Tools

Methods without Tools

Tool Adaptation

T1: Agent-Agnostic Tool Adaptation

Foundational Systems and Architectures

Categories and Training Methods

T2: Agent-Supervised Tool Adaptation

Citation

Contributing

On this page

Contributors

Latest Release