Building the Context Layer Inside People Operations

Executive summary

AI adoption in HR is nearly universal and realized value is rare. The cause sits below the model. HR agents fail in production because the data is fragmented, the definitions are contested, and the rules live in people’s heads instead of in a form a machine can read.

The fix is a context layer: a governed set of machine-readable definitions, identities, and rules that every HR agent must resolve against before it answers or acts. People Operations is the right place to build it first, because it owns the richest tribal knowledge in the company and carries the highest cost when context is wrong.

This paper sets out what is uniquely hard in HR, the foundation in priority order, a 60 to 90 day starting plan, and the metrics that keep you honest. The throughline: only the function that runs the work can author the ground truth the agents need. That reframes People Operations from a consumer of AI into the owner of its foundation.

01 · The problem

The failure is below the model

Start with the gap that should worry every HR leader. Adoption is high. Value is rare. And most teams cannot even tell which it is.

The pattern repeats across every major research house in 2025 and 2026. Gartner’s HR survey found 88% of HR leaders say their organization has not realized significant business value from AI tools. SHRM’s State of AI in HR 2026 found that 56% of HR teams do not formally measure AI success at all, and only 49% have an AI policy. McKinsey’s 2025 work found about 88% of companies use AI somewhere, only about a third have scaled it, and only around 6% are real high performers. The MIT NANDA study supplied the headline that 95% of enterprise generative AI efforts show no measurable return, with the root cause named as the learning gap: tools that never connect to how the work actually runs.

Read together, these say one thing. The bottleneck is not model quality. It is the foundation underneath the model.

The governance gap, in one statistic

Roughly 74% of organizations plan to deploy agentic AI within two years. Only about 21% have a mature governance model for it. Agents are scaling faster than the guardrails. In HR, where the data is the most sensitive in the company, that gap is not a risk to tolerate. It is the risk.

02 · What is uniquely hard in HR

The definitions everyone assumes and no one wrote down

Horizontal AI advice skips the part that actually breaks HR agents. A general data strategy never has to answer what a case is, or who counts as a worker. In People Operations, these are the whole game, and each one is contested across systems.

What is a case, and what is resolved

In modern case management the case itself, not the workflow, is the unit that matters, and it carries many interdependent workflows. The same case type is named one way in the employee-facing catalog, another way in the back-office topic taxonomy, and a third way in the HR service record. An agent that cannot resolve those three to one definition will route, count, and close cases inconsistently. And resolved is not closed. If the machine cannot tell the difference, every metric built on it is fiction.

Who counts as a worker

Identity is the quiet killer. The same person carries different identifiers across the HRIS, the case system, and the directory. Worse, contingent workers are often kept out of the HRIS on purpose because they live in a separate vendor management system. An HR agent wired only to the HRIS is blind to a large slice of the workforce. The fix is a single canonical worker identity with merge rules at ingest and scheduled de-duplication, and access tied to contract dates.

Service levels, severity, and sensitivity

HR service levels split response time from resolution time and tier both by severity. Those tiers have to be data the agent reads, not a convention in someone’s memory. And HR holds the most sensitive data in the company: identifiers, salary, health, performance, disciplinary, and investigation files. Data sensitivity classes and legal-hold rules are hard constraints on what any agent may retrieve or surface.

Why this is the high-leverage move

An HR agent works in the demo and dies in production because case, resolved, and worker are contested, system-specific definitions. The leverage is not a better model. It is a machine-readable definition layer that every agent resolves against. Only the function that runs case management can author it credibly.

03 · The foundation

What to build, in priority order

The context layer is not one product. It is a stack, and the order matters. Build it bottom up.

Layer	What it is	Why it comes first
1. Canonical worker identity	One employee ID, merge rules at ingest, scheduled de-dup, contingent workers included from the VMS.	Every other layer is wrong if the agent cannot tell who it is talking about.
2. Machine-readable definitions	Case taxonomy, the resolution rule, SLA tiers by severity, data-sensitivity classes, encoded as data.	This is the ground truth agents resolve against. Wiki pages do not count.
3. Governed context layer	Definitions, access policy, and lineage enforced in the query path, respecting existing permissions.	Governance has to fire before an answer is generated, not after.
4. Sensitive-data guardrails	Sensitivity classes, legal hold, purpose limits, so a self-replanning agent cannot drift into special-category data.	In HR a wrong norm is a privacy incident, not a bad chart.
5. Compliance scaffolding	Worker notice, human in the loop, log retention, explanation capability.	The EU AI Act puts HR AI in its high-risk class. Build for it now.
6. Measurement	Accuracy, autonomous resolution, escalation, drift, against a human baseline.	56% of HR teams measure nothing. Pick this fight on day one.

The compliance work and the capability work are the same work

HR and employment AI is high-risk under the EU AI Act, with duties to inform workers, keep a human in the loop, retain logs for at least six months, and explain decisions. A proposed delay to late 2027 is not yet law, so do not defer. The audit trail those rules demand is the same lineage and logging the context layer needs to function. Build it once.

04 · The plan

A 60 to 90 day start

This is a construct, not a vendor framework, drawn from the patterns that work. The goal is one defensible win, not a platform.

0–30

Inventory and define

Map every system that holds worker data: the HRIS, the case system, the VMS. Author version one of the machine-readable definitions: what a case is, what resolved means, the SLA tier matrix, the data-sensitivity classes, and the canonical worker identity. Pick one high-volume, low-sensitivity case type, such as a benefits or policy lookup, as the beachhead.

30–60

Ground and guardrail

Stand up a context layer over the beachhead that enforces definitions and permissions in the query path. Wire in canonical identity. Add runtime guardrails: purpose limitation, a human in the loop on any decision that affects employment, and logging on from day one. Instrument three numbers, not one: deflection, autonomous resolution, and accuracy.

60–90

Pilot, measure, govern

Run the beachhead agent on real cases against a human baseline. Publish the number you can defend. Execute the compliance hygiene: worker notice, an oversight roster, log retention. Then decide to scale or kill on measured value. This is how you avoid the pilot purgatory that swallows most programs.

05 · Measurement

Measure the things that can embarrass you

The old software metrics do not capture autonomous decisions. The agent metric stack that does:

Autonomous resolution rate. The share of cases the agent closes end to end with no human. This is the honest number.
Deflection. Cases that never became a ticket. Useful, but softer and larger than resolution. Never report deflection as if it were resolution. In published case studies deflection runs roughly 1.6 times the autonomous-resolution figure.
Accuracy and hallucination rate. How often the answer is correct. Leaders target hallucination below 1%.
Escalation and override rate. Your proxy for where the agent hits the edge of its knowledge or its context.
Drift. Behavior shifts as models update and data changes. Watch it, or stale context will compound errors quietly across every downstream agent.

If you report only deflection, you are reporting the flattering number. Report autonomous resolution and accuracy against a human baseline, or you are not measuring at all.

06 · The authority claim

Why People Operations owns this

Enterprise systems are very good at recording outcomes: the final status, the closed case. They are poor at recording the reasoning that produced them. That reasoning still lives in chat threads, side conversations, and people’s heads, and it has rarely been treated as data. That reasoning is the context layer waiting to be built.

In People Operations, the reasoning is the playbook in your best caseworker’s head. It is the most valuable and least governed asset in the company. The function that runs case management is the only one that can turn it into ground truth, because it is the only one that knows what is true. That is the reframe. People Operations is not a buyer of someone else’s AI. It is the owner of the foundation that decides whether any of the company’s HR AI works at all.

Takeaways

What to remember

The failure is below the model. 88% see no value, 56% measure nothing, 95% show no return. The cause is fragmented data and missing definitions, not weak models.
Case, resolved, and worker are contested definitions. Settle them as machine-readable data, or every agent and every metric inherits the confusion.
Identity is the quiet killer. One canonical worker ID, including contingent staff, or the agent is blind to part of the workforce.
Govern in the query path. Permissions and policy have to fire before an answer is generated, not after it ships.
Compliance and capability are one build. The EU AI Act audit trail is the same logging the context layer needs. Do not defer it.
Measure the honest number. Autonomous resolution and accuracy against a human baseline, not deflection alone.
People Operations owns the ground truth. Only the function that runs the work can author the context. That is the authority, and the opportunity.

Sources

Where this comes from

Drawn from 2025 and 2026 research across independent houses, primary regulation, and named case studies. Vendor metrics are flagged and should be read as ceilings, not typical results.

Gartner, HR Leaders survey, Oct 2025 — 88% no significant value from AI. verified via HR Executive
SHRM, State of AI in HR 2026 — 39% adoption, 56% measure nothing, 49% have a policy. primary
McKinsey QuantumBlack, State of AI 2025 — ~88% adopt, ~6% high performers, ~39% any EBIT impact. corroborated
MIT NANDA, The GenAI Divide 2025 — 95% no measured return, cause is the learning gap. reframe: no 6-month ROI, not 95% crashed
Deloitte / TechTarget — 74% deploying agentic AI in 2 years, ~21% have mature governance. verified
Mercer, Global Talent Trends 2026 — C-suite confidence down from 65% (2024) to 51% (2026). primary
HR Acuity, ServiceNow HRSD community — the case as unit; case-type naming across catalog, topic, and service layers. practitioner
Rizing / SuccessFactors, MiHCM — contingent-worker exclusion from HRIS; canonical ID, merge rules, scheduled de-dup. practitioner
IAPP — GDPR purpose limitation and agentic self-replanning; employer liability and operationalizing principles. verified
EU AI Act, Annex III pt.4 and Articles 26 and 86 — HR AI as high-risk; worker notice, human oversight, log retention, explanation. primary
Gibson Dunn, Covington — Digital Omnibus proposes delay to Dec 2027, not yet law; do not defer compliance. verified
ServiceNow Knowledge Graph; Workday Agent System of Record and Case Agent; Microsoft Employee Self-Service Agent — governed-grounding architectures. vendor
Moveworks case studies (Johnson Controls, Databricks, Equinix) — deflection vs autonomous-resolution gap. vendor
DataRobot, Fin.ai — agent metric stack: resolution, accuracy, escalation, drift. vendor
Diginomica (Phil Wainewright, Foundation Capital) — systems record outcomes, not reasoning. verified

Send to your inbox

Email this paper to your inbox, with the illustration and an editable document version attached.