There’s no shortage of AI commentary in security. Depending on your tolerance, it’s either exciting or exhausting (Personally, I’m working on how I can move all of my meetings during RSAC from the Moscone Center to Sully’s Marina Lounge to dodge the inevitable onslaught…don’t tell Matt).

The recent Microsoft State of the SOC report is at least grounded in data. It shows that 76% of leaders believe routine IoC lookups should be automated and more than 90% expect AI to reduce manual effort across SOC workflows. The report also reveals 64% want an environment to build their own SOC-focused AI agents rather than rely on out-of-the-box pre-built tools.

That tells us something important. Security leaders expect automation to work inside their environment, and they expect control over it once it’s deployed.

The part that doesn’t get enough airtime in these discussions is whether the data layer most SOCs operate on is structured well enough for that expectation to be realistic.


AI Runs on Whatever Architecture You Give It

There’s an assumption in AI conversations in security: that you can layer intelligence on top of existing systems and the outcomes will materially improve.

In practice, AI behaves more like a multiplier. It amplifies the quality, structure, and accessibility of the data it operates on. If your data is coherent and well-modeled, AI accelerates useful reasoning. If your data is fragmented and inconsistently structured, AI output is more like a shot of Fernet at the Marina Lounge.

This dynamic shows up well beyond security. MIT Sloan and others studying enterprise AI adoption have been saying some version of the same thing for years: the teams that get real results invested in their data layer first. Clean inputs. Consistent schemas. Reliable access. The teams that struggle usually don’t struggle because the model was wrong or the prompt wasn’t clever enough. They struggle because the data underneath it is incomplete, inconsistent, or dependent on brittle integrations that no one wants to touch once they’re in production.

In the SOC, the same dynamic applies. An AI agent tasked with triage, enrichment, or correlation is only as effective as the telemetry it can access and the consistency with which that telemetry is structured.


The Data Reality Most SOCs Are Actually Operating In

The same report that highlights the AI mandate also makes clear how most environments are actually built.

On average, less than half of an organization’s security data resides in its SIEM. For the last fifteen years we have treated the SIEM as the gravitational center of the SOC. In practice, it captures a meaningful portion of telemetry, but nowhere near all the data required for most SOC workflows.

The rest lives across endpoint platforms, identity providers, SaaS applications, cloud control planes, and other operational systems of record that were never architected with “take me to your SIEM” as a design goal.

On top of that, a large percentage of SOCs still ingest data manually several times per week because it is not automatically integrated. Analysts are spending a meaningful portion of their time correlating and aggregating data across tools. Go visit your SOC and watch a senior analyst pivot between five consoles just to answer a basic question about a user session. You’ll see this is not an edge case.

None of this is new. Security data is distributed by design. Identity systems, cloud infrastructure, SaaS platforms, network and endpoint tools were built to run the business, not to conform to a unified security schema. Centralization captures some of that data, but rarely all of it, and almost never without cost, latency, and a healthy amount of engineering therapy.

When AI agents are introduced into that environment, they operate within those constraints. If a significant portion of the context required to understand an alert lives outside the system where the AI runs, the agent’s reasoning is necessarily bounded. If core entities like users, devices, and workloads are represented differently across systems, correlation becomes more fragile.

That isn’t a failure of AI. It’s a reflection of the underlying architecture.


Normalization Is Where This Actually Gets Decided

AI conversations tend to focus on models, workflows, and automation logic. Much less attention is paid to the data model beneath those workflows.

Normalization in security operations means establishing consistent representations of core entities across systems: users, devices, IP addresses, workloads, applications. It means that authentication events from an identity provider, process telemetry from an endpoint platform, and API activity from a cloud service can be correlated without custom translation logic for every use case.

Could an AI system theoretically reconcile all of that on the fly? In some cases, yes. A sufficiently capable model can infer that “user_id,” “principal,” and “account_name” probably refer to the same concept. But doing that dynamically increases complexity, increases the chance of subtle errors, and pushes more responsibility into probabilistic reasoning instead of deterministic structure. It also increases processing overhead and cost in ways that don’t always show up in a demo (Extra Usage: More tokens please 😁).

AI agents reason over entities and relationships. If those entities are expressed differently across tools, ambiguity is introduced into every automated decision path. The model may still produce output, but the reliability of that output now depends on how much implicit normalization and guesswork is happening behind the scenes.

A well-modeled, normalized data layer reduces that ambiguity before the AI ever touches the workflow. It allows automation logic to operate on predictable structures and enables cross-domain joins without layering inference on top of inference. That doesn’t eliminate risk, but it moves more of the system back into deterministic ground.

It also makes the system easier to audit and explain. When an AI-assisted workflow flags a user or recommends containment, you want to be able to trace that decision back through structured, consistent data, not a chain of inferred mappings and best guesses. In regulated environments especially, explainability is not optional.

This work is not particularly glamorous, but it does have a disproportionate impact on whether AI-assisted investigations remain reliable and defensible when the pressure is on.


Why Composable Agent Workflows Demand Better Data

The preference for an environment where teams can build their own agent workflows reflects the heterogeneity of modern security environments. Workflows are specific, risk tolerance varies, and regulatory requirements differ. A one-size-fits-all automation package rarely captures that nuance.

Custom AI agents require predictable inputs. They require confidence that when a workflow references a user or device, it is operating on a unified representation of that entity across systems. Without that foundation, the automation story becomes more complicated than advertised. Instead of reducing engineering overhead, it can shift it.


AI Will Separate Data-Driven SOCs From Everyone Else

Over the next several years, most enterprise SOCs will deploy AI-assisted triage, enrichment, and investigation capabilities. The difference in outcomes will not be determined solely by which model is adopted. It will be driven by differences in data architecture maturity.

Two organizations can deploy similar AI capabilities. The one with broad, coherent access to identity, endpoint, cloud, and contextual data will see faster investigations and higher confidence decisions. The one operating in a highly fragmented environment may reduce some repetitive tasks but struggle to materially compress investigation timelines.

AI doesn’t erase architectural debt. It just makes it run faster.

AI raises the ceiling only as high as the foundation allows.


So We Built a Security Data Mesh

We tackled this challenge by enabling a Security Data Mesh. Security data is inherently distributed, and treating centralization as a prerequisite for usability introduces cost, latency, and ongoing maintenance burdens that compound over time.

A Security Data Mesh connects to distributed systems where data already resides and provides a normalized, coherent layer across those sources. It enables federated access and cross-source correlation through a consistent data model rather than requiring bulk ingestion into a single system.

This allows teams to maintain architectural flexibility while giving analysts and AI agents access to structured, cross-domain context. It also reduces reliance on brittle ETL pipelines that must be updated every time a schema changes.

The outcome is data that’s accessible and modeled to support the next generation of automation.

Security leaders are right to expect AI to reduce manual effort. The results will depend on the foundation they’ve built. The organizations that see the greatest impact will be the ones that treat AI as an acceleration layer built on top of deliberate, well-structured data architecture.

If you’re exploring how to get more out of your existing data, reach out and let’s connect. Our SecDataOps experts are standing by. Even a short working session can surface where fragmentation is quietly taxing your SOC.