When policymakers and researchers want to know whether AI is displacing workers in specific roles, they need more than headline unemployment numbers — they need a way to link job losses to the nature of the work itself. That's where occupational AI exposure scoring comes in.
Why this matters now
Labor market analysts have long tracked what workers do when they lose jobs. The newer question is why — and specifically, whether the tasks in a given role are the kind that AI systems can plausibly substitute. Without an occupational exposure framework, a spike in unemployment among knowledge workers looks identical to one caused by an interest rate shock. With it, you can start to separate the signals.
How it works
Occupational AI exposure is a structured estimate of how susceptible the tasks within a job category are to automation or augmentation by AI systems. The core mechanism works in three stages: task decomposition, capability matching, and scoring aggregation.
@title Occupational AI exposure scoring pipeline
Job category
│
▼
Task decomposition
(break role into discrete subtasks)
│
▼
Capability matching
(map subtasks to known AI capabilities)
│
▼
Exposure score
(aggregate substitutability estimate)
@caption Roles are scored by decomposing tasks and matching each to current AI capability coverage.
Researchers start by breaking each occupational category into its constituent tasks — drafting text, classifying images, routing decisions, physical manipulation, and so on. Each task is then assessed against what current AI systems can actually do reliably. A task like "summarize documents" maps cleanly to large language model capabilities; "negotiate in person" maps much less cleanly. The scores are aggregated, often weighted by how much time workers spend on each task, to produce a single exposure index per occupation.
The exposure score is not a prediction that workers will be displaced. It is a measure of task-level substitutability — how much of the work could be handled by AI given current capabilities. That distinction matters. High exposure means the role is structurally vulnerable if adoption accelerates; it doesn't mean displacement has already occurred.
When exposure scores are joined to administrative data — like unemployment insurance claims — analysts can ask a much sharper question: are workers in high-exposure roles filing claims at higher rates than their low-exposure counterparts? That comparison is what separates an AI-specific labor signal from general economic noise.
Real-world applications
This framework shows up in several adjacent technical domains that professionals are increasingly expected to understand.
Retrieval-augmented generation (RAG) systems rely on text embeddings and vector databases to retrieve relevant content at inference time. Understanding which occupational tasks those systems can perform well — information lookup, summarization, classification — directly informs exposure scoring for knowledge-work roles. If you're building or evaluating a RAG pipeline, you're implicitly touching the capability side of the exposure equation.
Vector databases store the dense numerical representations that make semantic search possible. Exposure researchers use similar embedding-based techniques to compare job description language against descriptions of AI-automatable tasks — the same retrieval logic that powers enterprise search and recommendation systems.
Text embeddings are at the core of both. When a researcher encodes thousands of occupational task descriptions and clusters them by semantic similarity to known AI capabilities, that's an embedding workflow. The same skill transfers directly to building document retrieval systems, semantic classifiers, and agent memory architectures.
For product managers and engineers, the practical takeaway is that the measurement infrastructure for AI's labor market impact and the infrastructure for building AI products share a common technical foundation.
Where to go deeper
If occupational AI exposure interests you, the natural next step is understanding the underlying machinery. Start with text embeddings to see how semantic similarity is quantified, then move to vector databases to understand how those representations are stored and queried at scale. From there, retrieval-augmented generation shows you how retrieval and generation combine in production systems — which also happens to be one of the clearest illustrations of what AI can and cannot yet reliably substitute in knowledge work.