Are AI agents ready for work?

Brilliant Noise 6 May 2026

We refresh this page regularly to keep pace with fast-moving AI platforms and policies.

The gap between the conversation and the reality

The agent conversation has run ahead of the agent reality.

According to a 2026 survey by WRITER and Workplace Intelligence, 97% of executives report that their company deployed AI agents in the past year. Yet only 12% of agent initiatives successfully reach production at scale, according to the Composio AI Agent Report. McKinsey’s 2025 State of AI puts it slightly differently: 23% of organisations are now scaling agentic AI in at least one function, with another 39% experimenting. Gartner’s prediction is starker still – more than 40% of agent projects will fail by 2027.

Between the headline numbers and the operational reality is a wide gap. This piece is an attempt to look honestly into that gap: where agents are genuinely earning their keep, where the conversation has outrun the work, and what marketing, comms and strategy leaders should actually do about it today.

What we actually mean by “agents”

The word “agent” is doing a lot of work in 2026, and not all of it the same.

For our purposes, an AI agent is a system that can plan, decide and execute multi-step tasks on someone’s behalf – using tools, accessing other systems, and adapting along the way. Chatbots answer questions. Agents take action on workflows. The distinction matters because the engineering, governance and risk profiles of the two are very different.

A few useful sub-distinctions:

  • Single-purpose agents handle one task end-to-end – book a meeting, reconcile an invoice, triage a ticket.
  • Multi-step agents chain several actions across systems within one workflow.
  • Multi-agent systems orchestrate several specialised agents under a central coordinator – for example one agent qualifying a lead, another drafting outreach, a third checking compliance.

Most enterprise deployments today are in the first two categories. The third is where the more breathless predictions live, and where the data shows failure rates climbing fastest.

Where agents are genuinely working

The domains where agents are delivering measurable value share a set of features. They have a defined scope. They have clear success criteria. They sit on top of structured, reasonably clean data. They can keep a human in the loop. The cost of being wrong is recoverable.

Concretely, agents are doing real work today in:

  • IT operations and helpdesk – triaging tickets, resetting access, pulling diagnostic information.
  • Finance operations – reconciling transactions, processing invoices, flagging anomalies.
  • Employee service – answering policy questions, processing routine requests, surfacing internal knowledge.
  • Onboarding and offboarding – running checklists across multiple systems, provisioning and revoking access.
  • Customer support – handling first-touch interactions, escalating where appropriate.
  • Sales operations – enriching CRM records, drafting outreach, keeping records up to date.

To make these less abstract, three patterns we see most often in organisations getting real value from agents today:

The marketing operations agent. Sits on top of a team’s CRM and email tools. When a qualified lead arrives, it pulls relevant context – company size, recent activity, content engagement – drafts a personalised first-touch email, and queues it for a marketer to approve. The marketer goes from writing first emails to reviewing them. Stack: typically a CRM (HubSpot, Salesforce or similar), a marketing automation platform, and an LLM via API underneath.

The content production agent. Used by editorial and content teams for the unglamorous middle of the production process. An editor drops in a brief; the agent pulls relevant context from existing assets, drafts a first version, runs it through a tone-of-voice check, flags the questions only the editor can answer, and gets it ready for a human review pass. The editor’s day shifts from drafting to deciding. Stack: a CMS or document store, a style guide or tone-of-voice document, and an LLM with a long enough context window to hold the brief, the existing material and the draft at once.

The internal helpdesk agent. Lives inside an organisation’s chat tool or ticketing system, handling the first wave of IT and HR queries. Resolves routine ones directly – password resets, access requests, policy questions – and escalates anything ambiguous to a human with the full context attached. Stack: a chat surface (Slack, Teams) or service desk (ServiceNow, Jira Service Management) at the front; HR and IT systems behind; an LLM doing the reasoning.

But notice what’s common to the working examples: a clear job, a clean handoff, a contained surface area, and a way back out if something goes wrong. None of these are accidental.

Where the hype is outrunning reality

The same survey data that shows real progress also reveals where the conversation has detached from the work. Three patterns keep repeating.

The pilot-to-production gap. 67% of organisations report measurable gains from agent pilots, but only 10–12% successfully scale to production. Pilots happen in clean conditions, on curated data, with the team that built them watching closely. Production is messy. The agent has to handle real traffic, real edge cases, real integrations, real auditors. Most organisations underestimate the engineering required to bridge that gap.

The governance gap. The same WRITER survey that found 97% deployment also found that 36% of organisations have no formal plan for supervising agents, and 35% admit they couldn’t immediately “pull the plug” on a rogue agent. Eighty-two per cent of executives say they’re confident their policies protect against unauthorised agent actions, while only around 14% of agents reach production with full security or IT approval. Confidence and competence are pulling apart.

The expectation gap. The same survey that captures real adoption also reports that 75% of executives expect AI agents to be part of their company’s C-suite within five years. Set against the production failure rates, this is aspirational rather than operational – a long-running thought experiment turned into a survey question. AI agents in C-suite roles is not a 2030 conversation. The interesting 2030 conversation is whether organisations have the foundations in place to use any agent reliably.

The pattern across all three: the technology is real, the productivity gains are real, but the gap between individual super-users delivering five-times productivity and organisational outcomes that show up in the P&L is wider than the agent conversation lets on.

From prompting to delegating

In our piece on AI literacy, we made the case that the skill of working with AI is shifting. Prompt fluency was about crafting good inputs. Agent fluency is about delegating well: choosing the right task to hand off, setting clear boundaries on what an agent should and shouldn’t do, and knowing how to review the work it sends back.

That shift is harder than it sounds. Most organisations have built decades of muscle memory around assigning tasks to people. They have processes for hiring, supervising, reviewing, escalating. None of that translates cleanly to a non-human worker. An agent doesn’t have a manager checking in. It doesn’t have peers noticing if its outputs feel off. It doesn’t get tired and ask for help. The mechanisms that catch human errors don’t apply.

Building the equivalent for agents – clear task definitions, monitoring, audit trails, escalation paths, regular review – is the single most under-invested area in agent deployment today.

What to do today

For marketing, comms and strategy leaders weighing where to invest, four practical steps.

Start with specific, well-governed use cases. The teams getting real value from agents are picking specific, well-scoped tasks with clear success criteria. Don’t start with “transform the customer journey”. Start with “automate the first-touch reply on inbound enquiries in a specific category”. You’ll learn far more, faster.

Build the foundations before scaling. The data this year is unambiguous: agent failures rarely come from the model. They come from the data, the integration, the governance and the observability around the agent. Invest in those before scaling anything ambitious.

Default to human-in-the-loop. For anything consequential, the agent should be making recommendations or executing reversible steps, with a human checking the high-stakes decisions. Trust expands as outcomes earn it. Skipping that stage is how you end up needing to pull the plug on something you can’t easily pull the plug on.

Measure outcomes, not activity. It’s easy to count agent invocations or tasks completed. The harder, more important question is: did the customer get a better answer? Did the team save time on real work? Did the error rate go down? Build the measurement framework before you build the agent.

And remember the security frame from our earlier piece: every permission you grant an agent is a permission an attacker could exploit if the agent is compromised. Agents expand the data perimeter. Vet integrations and limit scope.

Where this leaves you

The agent era is real. The technology is moving fast, capability is growing, and the use cases that already work are genuinely valuable for the teams running them.

But it’s a foundation-building era. The companies that will get the most from agents are the ones doing the unglamorous work first: cleaning data, defining processes, designing governance, training people on how to delegate well. The flashy demos are mostly demos. The compounding returns will go to the organisations that treat agents as systems – with all the discipline, accountability and integration that implies.

Used thoughtfully, agents can do extraordinary work. The job of leadership now is to put the foundations in place so that they can.

If you spot a change in the platforms or the deployment landscape that affects this guidance, tell us. We keep this page updated so it stays practical and current.

Last updated: May 2026