Adaptation of Agentic AI: A Unified Framework for Optimizing Agents and Tools

A recent research collaboration between Stanford, Harvard, UC Berkeley, and Caltech addresses the current limitations of Agentic AI systems, such as unreliable tool use and poor long-horizon planning. The paper proposes a unified, mathematically defined framework for how these systems—composed of large language models, planning modules, tool interfaces, and memory—should adapt and improve over time.

Modeling Agentic Systems

The research models an agentic AI system as a foundation model agent supported by three critical components:

Planning Module: Decomposes goals into actions using procedures like Chain-of-Thought or dynamic methods like ReAct.
Tool Use Module: Connects the agent to external environments (web search, APIs, code execution).
Memory Module: Stores short-term context and long-term knowledge via Retrieval Augmented Generation (RAG).

Adaptation involves modifying prompts or parameters within these components using techniques ranging from Supervised Fine-Tuning (SFT) to Reinforcement Learning (PPO, GRPO).

The Four Adaptation Paradigms

The core contribution of the paper is a framework defined by two dimensions: whether the target is the agent or the tool, and whether the supervision signal comes from tool execution or final agent output. This creates four distinct paradigms:

A1 (Tool Execution Signaled Agent Adaptation): The agent is optimized based on verifiable feedback from tool usage (e.g., SQL execution accuracy or code correctness). Examples include Toolformer and DeepRetrieval.
A2 (Agent Output Signaled Agent Adaptation): The agent is optimized based on the quality of the final answer. To prevent the agent from ignoring tools to merely satisfy likelihood, this method often requires sparse rewards propagated through the full trajectory.
T1 (Agent-Agnostic Tool Adaptation): Tools are optimized independently to be generally useful (e.g., improving a retrieval system’s accuracy) without reference to a specific agent.
T2 (Agent-Supervised Tool Adaptation): Tools or memory systems are optimized to serve a frozen, powerful agent. Examples include systems like s3 and AgentFlow, where a fixed generator supervises a learned searcher.

Key Takeaways for Implementation

The researchers argue that practical, robust systems will likely require a hybrid approach. While A1 and A2 methods provide rare updates to the strong base model, scalability and robustness are best achieved through frequent T1 and T2 adaptations—refining retrievers, search policies, and memory stores around the agent.

Mentoring question

Given the trade-off between the high cost of training foundation models and the flexibility of external tools, in which scenarios would you prioritize T2 (optimizing tools for a frozen agent) over A1 (optimizing the agent itself)?

Source: https://share.google/rWDO5EgHiLBTfkrA9