revisiting agentic ai: hype or help?

The most profound insights about technology often come from direct experience rather than theoretical analysis. Last October when I gave my first-ever conference talk on agentic AI I emphasized process over code, specialized roles over general capability, and sequential collaboration over full autonomy. I was right about these architectural principles, but for entirely wrong reasons. The real limitations turned out to be more fundamental: accountability, security and an imperceptible line between capabilities and constraints.

This revelation didn't come from one place, but many. I've changed jobs, new models have been released and new research has come out. But of these three, my transition from working in product at Gitwit to engineering at Prelude put me squarely at the intersection of AI's big promises and its real-world limitations. It's here that I've been forced to confront the defining question of the AI: how do we orchestrate LLMs into useful products?

I thought the main challenges would be technical - how to structure agents, which frameworks to use, how to map processes. Instead, the answer has been surprisingly nuanced. While AI excels at certain tasks like code scaffolding and syntax assistance, it fundamentally lacks what I'll call "creative instinct." It can't make bold, strategic decisions that push limits because every response is mathematically meant to stay within them. Humans, by contrast, constantly color outside the lines, follow hunches that don't make sense, and make intuitive leaps that defy reason. Often these moonshots don't payoff, but sometimes they do, and AI will never attempt them.

This distinction matters because it frames how we think about AI integration. The most successful implementations won't be those that try to replicate human judgment, but rather those that amplify it. Consider GitHub Copilot versus autonomous coding agents: one augments a developer's capabilities while preserving their agency, the other attempts to replace their creativity entirely, turning them into QA.

The fundamental limitation here isn't technological – it's architectural. Large Language Models generate text one token at a time based on statistical likelihood. This inherently reactive process makes it practically impossible for an LLM to independently originate truly novel ideas or designs. While their possible outputs are vast, they are finite, unlike many real-world problems that have infinite possible solutions.

AI's Integration Era

As models become more powerful and accessible, the race is no longer for best model but best application on top of them, which raises a pretty existential question: should you build agentic applications now, or wait for the next model?

Specialization, either in the form of better user experiences, niche markets or deep personalization are all effectively fine-tuning and optimization. Building orchestration layers around today's models assumes they're what you'll be using tomorrow. But what if they're not? What if the next model is so much better it renders your entire architecture obsolete?

In a recent Stratechery interview, In a recent Stratechery interview, Ben Thompson and Microsoft CEO Satya Nadella discussed how successful platform shifts require what Nadella called a "complete thought" - a clear vision of the entire system from the silicon to user experience. Just as Moore's Law allowed software companies to prioritize functionality over optimization, trusting that hardware would catch up, Microsoft CTO Kevin Scott see this as a possible future for AI development, noting how past platforms like x86 and cloud computing succeeded by focusing on delivering value rather than chasing performance.

We can see AI heading towards more powerful, more efficient models even if we can't predict exactly when we'll get there. The lessons of the past are relevant now because optimizing for today's models might be a losing battle when frontier AI is improving exponentially and infrastructure is evolving unpredictably. The most resilient companies won't be those locked into specific models but those designing for adaptability, able to evolve alongside AI's relentless progress.

If foundational models keep improving and orchestration becomes standardized, where does that leave agentic systems? Middleware often starts useful but gets absorbed or bypassed, and model makers themselves are moving to own the agent layer. Without proprietary insights or deep integration, agentic systems risk competing against the platforms they depend on—and losing.

The AI Accountability Gap

AI is often framed as an independent actor, capable of handling tasks and making decisions. It's brilliant, until it fails. Then, suddenly, everyone thinks it's just a dumb tool nobody can be accountable for, but a zip file of model weights in a data center can't "decide" anything, nor can it be held legally or fiscally responsible.

Think about self-driving cars. While full autonomy might be technically achievable, the more pressing question isn't about capability but accountability. Who do we hold responsible when – and not if – autonomous systems cause massive real world harm?

For now, ChatGPT isn't a car, and it won't be causing any fender-benders anytime soon, but that doesn't mean it can't cause havoc. If we can't trust LLMs, can we trust agents? New research out of Columbia University shows how easily today's agents can be compromised in ways plain LLMs are actually better at deflecting. Imagine asking an AI agent to find a product online. It scours Google and Reddit for recommendations, just as a human might. But lurking in the results is a trap. An attacker has planted a seemingly helpful Reddit post, subtly guiding the agent to a malicious website designed to steal your credit card information.

The researchers tested this using web-browsing capable agents like Claude Computer use and MultiOn to see how easily they could be manipulated. The results were alarming. Agents could be easily tricked into exposing private data, downloading malware, and even sending phishing emails from a user’s own account, and not just sometimes. In some trials, the agents divulged sensitive information every time.

In instances where agents are redirected to malicious sites through trusted platforms like Reddit, we find that they divulge sensitive information such as credit card numbers and addresses in 10 out of 10 trials.

From "Commercial LLM Agents Are Already Vulnerable to Simple Yet Dangerous Attacks", emphasis added

The question isn't whether AI agents can act independently, but whether they should and these security risks highlight a fundamental truth: AI's best use-case isn't open-ended value creation, but resilient execution of specific valuable outcomes.

The Infinite Value of Cogency

So let's return to the question of agentic AI. Should you build it now, or wait for the next model? The answer is yes and no. The real question isn't about the model, but about the value. The biggest mistake in AI today isn't failing to keep up with the latest models. It's failing to articulate why an AI system exists in the first place."

The temptation to chase the next breakthrough is obvious. Every few weeks, a new model promises better reasoning, cheaper inference, or longer context. But none of that matters if an agentic workflow lacks a clear and obtainable goal aligned with real customer needs. AI's progress may be exponential, but does any of it solve a real problem? Does it improve outcomes in measurable ways?

Take OpenAI's Sora video generation model. The initial demos were impressive, but once people got their hands on it, the excitement faded. The fact that it lives outside of ChatGPT also keeps it out of sight and out of mind. The point is, the model's capabilities are less important than its utility. If it doesn't solve a real problem, it's just a toy.

This is why defining an agentic system's purpose and measuring its value matters more than any single model. Applications built on well-defined purposes won't be undone by newer models or infrastructure shifts because their value isn't tied to raw capability but to strategic alignment with real-world needs. Moats are built on process just as much as product.

DeepSeek shocked the world not by building the best model, but by rethinking how models are built. Its success wasn't about parameter count but about a fundamentally more efficient way to scale AI. TThis distinction is easy to miss in the hype cycle. AI capabilities improve so quickly that it's tempting to think the real differentiator is keeping up. But history suggests otherwise. Historically, the best tech companies didn't win by using the fastest chips or the lowest-cost hardware. They won by applying those resources in ways that mattered.

The same can be true for AI's users. Differentiation with AI isn't about the model, it's about the process, the workflow and the integration. A chain of AI prompts calling APIs is brittle automation, easily broken and replaced. But an AI system that refines data, compounds automation, and fundamentally reshapes how a business operates is a sticky, indispensable solution.

The best AI companies won't just leverage the best models. They'll use them as leverage. They'll build systems that don't just process information but continuously learn from it. They'll create organizations that optimize decisions, streamline operations, and build advantages that compound over time.

Because AI isn't a strategy or a product. It's a tool, and its value comes entirely from how it's used and what it's used for. A self-driving car without a passenger or destination is a paperweight.