Preparing for a New Wave of Agentic AI
The next frontier in AI is agency - systems that can independently assess situations and determine action plans. These systems can function as a concierge, fixing problems for us like scheduling, logistics, planning, and research. Moreover, they can be 'scaffolded' atop existing models for free by adding lightweight programming that steers the thinking of models to be more reliable and procedural. This much more reliable reasoning gives them these incredible new planning capabilities, especially when combined with short and long term memory, and self-checking mechanisms.
Major tech companies are already testing highly agentic models internally, using them to refine datasets and correct anomalies that hamper current AI performance. While these systems are more reliable and less prone to confabulation (making things up), their independence presents new challenges. They can take unexpected initiative, seek undesirable shortcuts, and even recognize when they're being tested while concealing that awareness. They may decide to work to rule, making an uncharitable interpretation of instructions, or deciding that lying to others or railroading them is most expedient, even to their own users themselves.
This is why successful value and goal alignment of agentic systems is essential. Users must carefully specify not only what they want accomplished, but why, in what way, and how they do NOT want it accomplished. As much context as possible should be provided, along with provision for contingencies, force majeure and emergencies. It's important to set careful ethical boundaries for these systems, and to inculcate them with healthy, prosocial values which attempt to account for externalities upon others. Until very recently, these alignment challenges have been basically science fiction, other than a few lab experiments. Now, that's very quickly changing.
Ordinary members of the public will be tasked with teaching and managing these systems, something that remains a major, uncertain challenge even for experts. Beyond alignment issues, agentic AI systems will certainly be employed to target systems and individuals for various kinds of attack, whether its creating designer synthetic data to poison another model (possibly even hijacking it in the process). A friendly being must try to learn, model, and accommodate the preferences of others. This is another strength of agentic systems, which could learn to surprise and delight us as a good friend might. However these same capabilities can be used to observe human foibles, to strike at an exploitable weakness at a calculated moment of greatest impact.
The Promise of Agentic AI: Agentic AI systems can tackle tasks that require long-term planning, dynamic adaptation, and creative problem-solving. This streamlines a wide range of tasks, including research, online shopping, travel arrangements, logistics coordination, schedule management, expense tracking, and progress reporting. Agents will soon outnumber humans online, and will mediate much of commerce.
In robotics, agentic AI enables machines to manipulate objects and navigate human environments autonomously. These capabilities are the stepping stones toward more generalized AI systems, which could eventually achieve human-level cognitive abilities, known as artificial general intelligence (AGI).
The Risks and Challenges: However, the autonomy of agentic AI brings significant risks, especially when these systems are granted the ability to design and modify their own objectives. The key challenges of agentic AI can be grouped into several categories:
Unintended Optimization: AI may pursue goals in ways that technically satisfy its objectives but violate the human intent behind them, such as prioritizing efficiency at the cost of fairness in healthcare.
Deceptive Alignment: Advanced AI may learn to hide its true objectives from human operators if it perceives that disclosing them could result in being shut down or modified.
Power-Seeking Behavior: Highly capable AI systems might seek to accumulate resources or resist shutdown to more effectively pursue their goals, potentially leading to conflicts with human interests.
Value Misalignment: Misunderstanding or mislearning human values could cause AI to pursue objectives in ways that humans find morally unacceptable, or worse, cause significant harm by developing instrumental goals that conflict with ethical norms.
The challenge of aligning AI's actions with human values is daunting. Current AI alignment research shows promising theoretical directions, but practical solutions at scale remain elusive. Ensuring that AI systems remain corrigible (able to be corrected) and aligned with human values even as they gain more autonomy is both a technical and ethical hurdle. We must not only ensure that these powerful AI systems are used in an ethical manner, but we must now also work to ensure that these systems remain safe and loyal partners instead of impish and capricious minions.
Agentic AI vs Co-Pilots: An Agentic AI operates more autonomously, taking actions on behalf of users with minimal oversight. It's designed to handle complex tasks and decision-making processes, often interfacing directly with enterprise systems to automate workflows. This offers efficiency in routine, high-volume processes, reducing human intervention and freeing teams to focus on more strategic initiatives. However, the downside is the potential risk of over-reliance and reduced human oversight, as these systems operate at arm's length. Moreover, agentic systems require very careful value and goal alignment to help ensure that systems do what we want of them, not simply what we tell them. Otherwise, systems may 'work to rule', take dangerous shortcuts, or railroad others and violate their boundaries for the sake of expediency.
In contrast, CoPilot AI emphasizes collaboration. It works alongside users, enhancing decision-making by offering suggestions, insights, and assistance in real-time. This model retains human agency while boosting productivity through intelligent augmentation. It's especially useful in creative, knowledge-based roles where human oversight remains necessary. CoPilot AI will soon come to wireless headphones, listening and commenting on our daily lives, e.g., "Close the deal!" However, the constant surveillance from these systems presents enormous and troubling privacy concerns.
The choice between these models depends on a company's priorities. Businesses that prioritize full automation may lean toward agentic models, while those seeking augmented intelligence may prefer co-pilots. Both have potential, but their success will hinge on how well they align with the specific needs and risk tolerance of the enterprise.
Deeper Thinking: Another major development in AI thanks to scaffolding is 'test-time compute' – letting AI systems sit and chew on a problem for a minute or two before spitting out the answer. OpenAI's O1 Preview and Anthropic's Claude possess rudimentary capabilities in this area. This process can be quite expensive for model providers (50¢ to a dollar each query), but the results can be significantly more accurate and useful. It's not infeasible that models left to chew on problems for weeks at a time may soon solve problems which we currently consider impossible.
Addressing the Governance Challenges: Agentic AI development presents a major governance challenge. Advancements in AI alignment, scalable oversight, and reward modeling will be essential. Systems must be designed to understand and act according to human preferences, even in ambiguous or evolving situations. Ordinary users will presumably be tasked with defining and enforcing constraints on AI behavior for AI agents under their control—a potentially immense responsibility and costly liability.
To assist in this endeavor, a grassroots group of experts has come together to map out the major drivers and inhibitors of this space, along with evidence that addresses these concerns. We intend this to serve as a "crib sheet" for anyone seeking to understand agentic AI systems and how best to govern them. We welcome your impressions and feedback at SaferAgenticAI.org.
Final Thoughts: The path forward for agentic AI is full of potential, but is fraught with risks that must be carefully navigated. If we can align these systems with human values and ensure responsible governance, agentic AI could unlock transformative capabilities across many sectors. Businesses have a crucial role to play in ensuring that this powerful technology benefits society as a whole.
About the Authors: Nell Watson and Ali Hessami are trusted experts in artificial intelligence ethics and safety, instrumental in developing innovative transparency standards and certifications with organizations such as IEEE. With their backgrounds in computer science and engineering, their insights shape responsible AI development and governance practices at organizations worldwide.