On measuring productivity gains through AI

Anil Turaga • June 8, 2026

What the AI productivity measurement problem means for companies that don't own their AI layer.

Cognition (the company behind Devin) recently announced the AI Productivity Guarantee [1]. It's a commitment to cover Devin's costs in credits if they can't prove it saved you the equivalent in engineering hours. They do it with an agent that evaluates each completed session using the transcript, the resulting PR and codebase context to estimate how many hours a human would have needed. That estimate is converted into dollars based on hourly bill rate, and the guarantee is backed up to $10M.

For any company measuring AI ROI, this is the first automated methodology validated in production [2]. Across 258 sessions from 126 enterprise users, Cognition's estimator hit r²_log of 0.74. Interesting point is that they only used 25 sessions as a development set for the estimator. Anthropic ran the same exercise on a thousand Jira tickets and got 0.46, working only from ticket titles and descriptions.

Estimator accuracy across measurement approaches

METR (34 sessions, technical staff) 0.83

Cognition (233 held-out sessions, enterprise) 0.74

Anthropic (1,000 Jira tickets, titles only) 0.46

From Cognition's published comparison. Richer session data correlates with better accuracy.

📷

Predicted vs. human estimates from Cognition's evaluator. The gray band shows sessions within 2× of the true estimate. Source: cognition.ai/blog/ai-productivity

Replicating this is relatively easy in software engineering tasks. Claude Code, Codex and similar tools store full session transcripts locally and those link to PRs with relatively little effort. Most other business functions don't have this setup.

Sales, finance and legal mostly run AI through platforms like Microsoft Copilot that don't have a straightforward way to export session transcripts and associated artifacts. Enterprise platforms like SAP have also been tightening API access for third-party AI integrations [3].

Without deep access to data there's no measurement, and without measurement there's no ROI case to make.

↓ No session data outside engineering

↓ No measurement

↓ No ROI case for further investment

↓ AI investment stays concentrated in engineering

↺ repeats

The exit is owning the AI layer. Kirkland & Ellis, the world's highest-grossing law firm, is going to spend $500M to build their own AI platform by partnering with Palantir and Scale AI [4]. Most organizations won't have that runway, but the direction is the same regardless of scale. A technology partner with deep expertise can help with the entire lifecycle of platform building and AIOps.

IT services companies face this more directly than others. As measurement methodologies like Cognition's spread, clients will start asking about the productivity gains of their developers using AI. For service firms, building this capability internally is also a competitive investment. You get to accelerate your own developers with internal platforms based on real data, and the experience of running it on your own teams feeds directly into how you build these platforms for clients.

[1] cognition.ai/blog/ai-guarantee

[2] cognition.ai/blog/ai-productivity

[3] theregister.com - AI clause in new SAP API policy

[4] ft.com - Kirkland & Ellis AI platform