Anthropic and OpenAI Pioneers Launch Loop Engineering, Redefining AI Agent Automation

Three of the most influential voices in AI development — Boris Cherny, creator of Anthropic’s Claude Code; Peter Steinberger, a senior engineer at OpenAI; and Addy Osmani, Google Cloud’s head of developer experience — ignited a firestorm of interest in mid-June 2026 when they simultaneously unveiled a foundational shift in agent design: loop engineering. The practice, detailed in a flurry of blog posts, conference talks, and social media threads, moves beyond the static art of prompt crafting to orchestrate autonomous, recurring workflows where AI agents plan, act, observe, and refine their own outputs in cycles. For enterprise developers and Windows enthusiasts alike, the announcement signals a maturity milestone for agentic AI, promising to unlock automation scenarios that were previously too brittle or complex to trust entirely to machines.

At its core, loop engineering is the discipline of designing AI agents that execute iterative, self-correcting routines rather than one-shot completions. While prompt engineering focuses on constructing the perfect input to elicit a desired output, loop engineering treats the agent as a persistent executor that can break tasks into steps, evaluate intermediate results, and loop back to earlier stages when something goes wrong or new information emerges. This approach is not merely an incremental improvement; it represents a paradigm shift where the agent’s control flow becomes the primary design surface. As Cherny explained in a seminal June 15 post on the Anthropic blog, “We’ve been treating language models as functions; it’s time we treat them as processes.” His work on Claude Code had already demonstrated the power of letting an agent iterate over code changes, run tests, and fix failures autonomously — a workflow that loop engineering generalizes across domains.

Peter Steinberger, known for his deep work on the OpenAI API and real-time agent infrastructure, provided the technical blueprint a day later. In a detailed GitHub gist that quickly amassed thousands of stars, he outlined a four-phase loop pattern: Plan (break down the goal), Execute (perform an action), Observe (gather feedback), and Adapt (decide next action). Steinberger stressed that the loop must be stateful, with the agent maintaining a memory of what it tried and what happened, enabling long-running processes that can recover from dead ends. “The model is the engine, but the loop is the driver,” he wrote. He also released a lightweight Python library called agentloops that codifies this pattern, complete with integration to OpenAI’s function calling and streaming APIs, allowing developers to wrap any LLM-powered agent in a self-improving cycle.

Addy Osmani brought the enterprise perspective. Speaking at Google Cloud Next Tokyo, he described loop engineering as the missing link for production-ready AI agents in large organizations. “We kept seeing ambitious agent POCs fail in the real world because they lacked resilience,” Osmani said. “A single prompt, no matter how clever, can’t handle edge cases or unexpected tool outputs. But with loop engineering, you design the escalation logic — what happens when the agent’s first attempt fails? How does it revise its approach? That’s where the value lives.” Osmani demonstrated a loop-powered customer support bot that could autonomously decide when to query internal knowledge bases, when to escalate to a human, and even when to retry a failed API call with adjusted parameters — all without a human in the loop.

Under the hood, loop engineering leverages several recent advances in foundation models and agent architectures. The most critical is function calling with forced schemas: by requiring the agent to output structured actions (like tool calls) rather than free text, the loop can deterministically parse what to do next, catch malformed attempts, and feed error messages back into the planning phase. Memory management also plays a key role. Unlike simple chat contexts, a loop agent needs a working memory of sub-goals, completed steps, and encountered errors. Vector databases and hybrid retrieval-augmented generation (RAG) are often used to persist this context across iterations. Crucially, loop engineering does not prescribe a single model; it can wrap any LLM that supports tool use, from GPT-5 to Claude 4 to Gemini Ultra, making it vendor-agnostic.

For Windows developers, the implications are immediate and tangible. Microsoft has been investing heavily in its Windows Copilot and the broader AI platform with Semantic Kernel, Power Automate, and the new .NET AI building blocks. Loop engineering dovetails perfectly with these tools. Imagine a Windows desktop automation agent that periodically monitors a folder of invoices, extracts data using a vision model, validates it against an ERP system, and retries on validation failure — all running as a background process in Windows. With loop engineering, developers can define such workflows declaratively: the agent plans the extraction, executes it, observes if the ERP rejects the data, adapts by querying a different field, and loops until success. This is a level of robustness that traditional robotic process automation (RPA) cannot match without extensive rule coding.

Power Automate users, too, stand to benefit. Today, cloud flows often use brittle linear logic: if this condition, then that action. By injecting a loop engineering node into a flow — essentially a “try-resolve” block — organizations can handle exceptions gracefully. For example, an agent tasked with generating a monthly report might first scrape multiple data sources, then notice a discrepancy in totals, autonomously query the source systems for more detail, and regenerate the report — all within a single automated loop. Microsoft’s recent preview of Copilot in Power Automate, which already allows natural language descriptions, could be extended to generate loop-aware templates, democratizing access to multi-step AI agents for non-developers.

Challenges remain, however. Loop engineering introduces a new class of failure modes that are harder to debug than prompt errors. An infinite loop can burn through API costs and waste time; a poorly specified Observe phase might misinterpret feedback and adapt incorrectly. Steinberger acknowledged this in his gist, noting that “loop engineering demands a new kind of observability — you need to trace not just token usage but decision graphs over time.” To address this, several startups and cloud providers are already shipping agent tracing dashboards that visualize each iteration, much like a debugger. On GitHub, community forks of agentloops have added integration with OpenTelemetry and Windows Performance Analyzer, reflecting the cross-platform interest.

Despite the challenges, early adopters are reporting dramatic improvements in automation reliability. A Reddit thread on r/MachineLearning from June 20 detailed a case of a fintech company that replaced a 300-step manual reconciliation process with a loop-engineered agent: after two weeks of tuning, the agent achieved 99.4% accuracy, up from the 85% of the original prompt-only bot, and required zero human intervention for a full month. Another user described using loop engineering to automate GitHub issue triage, where the agent categorizes bugs, attempts to reproduce them in a sandbox, and suggests a priority — looping back to re-categorize if the reproduction fails. Such anecdotes underscore why the concept is resonating so strongly.

Boris Cherny’s announcement carried special weight given his role in creating Claude Code, the tool that first popularized agentic coding. In his blog, he reflected that while Claude Code’s “agent mode” was a step forward, it was essentially a predefined loop of generating code, checking diffs, and running tests — a hardcoded pattern. Loop engineering, he argued, enables developers to compose custom loops that suit domain-specific tasks. He revealed that a forthcoming version of Claude Code will expose a loop designer UI, allowing users to visually wire together plan–execute–observe–adapt stages without writing boilerplate code. Meanwhile, Peter Steinberger hinted that OpenAI is experimenting with native loop support in its agent builder platform, possibly with built-in cost controls (like a maximum iteration count and token budget).

Addy Osmani focused on governance. He emphasized that loop engineering must be paired with guardrails — not just technical boundaries but organizational policies that define what agents are allowed to do in a loop. For example, a loop designed to auto-scale cloud resources must have a mandatory human approval if it tries to increase spend beyond a threshold. Google Cloud’s new Agent Safety Module, announced alongside his talk, integrates such policies directly into the loop’s planning phase, blocking unsafe actions before execution.

For the Windows community, the most exciting near-term opportunity is the convergence of loop engineering with Microsoft’s AI stack. Windows 11’s Copilot already uses a simple loop-like pattern for some tasks, but the underlying logic is opaque. With the publicization of loop engineering, third-party developers can now build their own Copilot plug-ins that feature transparent, customizable loops. Moreover, the Semantic Kernel team is actively evaluating how to bake “step-wise reasoning” into the planner, which would essentially implement loop engineering natively. A June 21 post on the Semantic Kernel GitHub discussions page by a Microsoft program manager acknowledged the community’s calls for more advanced agent patterns and promised a design proposal within weeks.

Critics caution that loop engineering could exacerbate AI risks if deployed recklessly. The more autonomy an agent has to loop and self-correct, the more potential for unintended consequences, especially in critical infrastructure. A whitepaper from the AI Safety Institute, published on June 22, recommended that all loop-engineered agents undergo red-teaming that specifically targets iterative behaviors — for example, testing whether the agent can be tricked into an infinite loop that escalates privileges. These discussions will likely shape regulatory frameworks over the next year.

Nevertheless, momentum is undeniable. Conference organizers are scrambling to add “Loop Engineering” tracks to autumn events. Online courses on platforms like Pluralsight and Coursera have announced specializations in the topic, with Microsoft Learn planning a module on “Building Resilient AI Agents with Loop Patterns.” The term itself is trending on X, where the #LoopEngineering hashtag has generated over 100,000 posts in the first week.

For organizations navigating this shift, the immediate step is education. Development leads should evaluate how loop patterns can augment existing automations, starting with low-risk internal processes. Tool vendors, from Microsoft to independent startups, will race to offer guardianship features that make loop agents observable and safe. The pioneers have lit the path; now the broader industry must walk it — carefully, but with conviction. As Peter Steinberger noted, “We’re not inventing AI agents; we’re giving them a spine — a structured way to persist, to learn from mistakes, and to keep trying until they get it right.” That, more than any single model upgrade, may define the next wave of enterprise AI.