November 19, 2025

What the Security Industry Gets Wrong About Data Mesh – Part 1

The Definitive Guide to what Cybersecurity Mesh Architecture (CSMA) should look like.

This is Part 1 of a 3-part blog series about Data Mesh intended to educate and serve as a decision-making tool for security leaders rethinking their data strategy. You can read more in Part 2 and Part 3.

Introduction

Our compatriots on the data & analytics side of the house often get hit with the same dreaded label we do in security, especially as a centralized organization: “You’re a bottleneck!”. In certain cases, that’s not too far from the truth. When you have a function that serves many different needs, that can happen, so it’s not entirely wrong. Data teams need to proffer data in various ways, be it tables, materialized views, dashboards, Excel spreadsheets, and more to answer business- and mission-critical questions distributed teams have. Security teams often face that same issue, especially in “shift-left” subdisciplines such as cloud security, application security (AppSec), or even for banal subdisciplines such as internal audit or security governance.

To attempt to head off the problems that stack up and what lead to the dreaded bottleneck label, in 2019, Zhamak Dehghani laid out four principles of a Data Mesh as an architectural pattern for centralized data teams to use. That being said, Data Mesh wasn’t literally proposed because of any label, but it can be considered a metacritique on the state of data operations at the time – and those truths then still hold true now, for them and for us in security.

We’ll cover the Data Mesh principles in depth later, but it’s important historical context as in the security industry “we” also have a Data Mesh! Cybersecurity Mesh Architecture (CSMA) is the concept coined by Gartner, and echoed by several other companies such as Fortinet, Okta, and many others across various glossaries and blog posts. When you examine what CSMA is versus proper Data Mesh, you may raise your eyebrows.

To muddy the waters even more, here at Query, we call our platform a (Federated) Security Data Mesh that aligns more closely with the original definition of Data Mesh. You’ll also encounter the term Security Data Fabric in the market – and that’s actually what Query is at the infrastructure level. Here’s the distinction that matters:

Data Fabric is a technology pattern: infrastructure for federated access and integration across distributed data sources without moving the data
Data Mesh is an organizational pattern: domain ownership, data products, and federated governance

They’re complementary. Query provides the data fabric infrastructure (federated query virtualization, OCSF abstraction, no data movement) that makes Data Mesh consumption patterns possible. But the data fabric alone doesn’t give you a Data Mesh – you still need the organizational transformation around domain ownership and accountability. More on this throughout the document.

Gartner’s Cybersecurity Mesh Architecture (CSMA) also borrows the ‘mesh’ terminology – but as we’ll see, it prescribes centralizing data and control planes rather than truly decentralizing them. Even with sophisticated federated infrastructure like Query, implementing a true Security Data Mesh requires organizational transformation around domain ownership. CSMA doesn’t even provide the infrastructure – it just centralizes everything with extra steps and calls it ‘mesh architecture’. I’m here to tell you that that is architecturally backwards.

The purpose of this document is to first and foremost be an educational piece, and secondarily a decision making tool for Security leaders who are rethinking their data strategy. Yes, I will shill for my own company (they do pay my bills afterall) and talk about what we do, and how it fits in the Mesh (albeit not 100%). Additionally, Data Mesh as a concept (no matter how favorably I talk about it) is very hard to implement fully, it is not a “silver bullet” nor is it a “religion”. It is simply a data architecture model that we in security can take from our compatriots in data & analytics as we have the same compartmentalized and fractured issues as they do in terms of access to data.

All that notwithstanding, my mission is to bring true SecDataOps practices to security practitioners, and a properly implemented Security Data Mesh will be beneficial to many organizations – which is why it’s critical to understand what Data Mesh actually requires versus what CSMA delivers. Finally, this isn’t an attack on Gartner, but is an attack on CSMA inasmuch as it’s called a “mesh architecture”, by the end of this document you should be well armed with undertaking a Data Mesh project and have the concepts demystified.

(Author’s Note: Any critique on CSMA is primarily derived from Gartner’s Cybersecurity Leader Guide to Cybersecurity Mesh Architecture from 24 October 2025, a companion to their revised CSMA 3.0 guideline published in June 2025, as well as the CSMA 3.0 guideline proper: The Future of Security Architecture Is Here: Cybersecurity Mesh Architecture 3.0 (CSMA). Supplemental open-source articles and blogs were used for research and directly quoted or linked where appropriate.)

All About That Mesh

Data Mesh is all about decentralization along sociotechnological lines, with variation, that means that sometimes the actual technology is decentralized and in other senses it’s about decentralization of the people and processes.

Really, Data Mesh centralizes almost nothing – no data platforms, no pipeline orchestration, no central data teams owning products. The closest approximation to centralization is federated coordination of governance policies and interoperability standards. However, even governance isn’t centralized – it’s federated decision-making with computational enforcement.

This will be a bit hypocritical given that we straight up call our platform a Security Data Mesh, but, Mesh is meant to be an architectural & ownership pattern and not a physical product (though it can be through the sum of all parts). Zhamak laid out four key principles of a Data Mesh, each with their own core concepts and functionality they’re meant to provide.

Domain Ownership
Data as a Product
Self-Service Data Infrastructure
Federated (Computational) Governance

What is data mesh chart — *“What Is Data Mesh” – https://www.datamesh-architecture.com/#what-is-data-mesh*

Before we explore these principles in greater depth, I wanted to talk about the problem set that leads to the bottleneck of data, what Zhamak called the “Great Divide of Data” which is really a qualitative label of the types of data that are consumed: operational data and analytical data.

From the perspective of data & analytics teams, operational data is typically the more “raw” or transactional data. If you looked at this from a Medallion Architecture perspective that would be your “Bronze Layer”, data in its native form, stored behind first-party vendor APIs (e.g., Workday ERP, CrowdStrike Falcon, ZScaler ZIA, ServiceNow CMDB, etc.) or in Relational Database Systems (RDBMS) such as Microsoft SQL or PostgreSQL databases.

In a security-centric lens, you can think of operational data as what is commonly labeled as “raw logs” or “telemetry”, that could be CSVs of firewall data from Palo Alto VM-Series or Cisco ASA firewalls, from the NSS forwarded from ZScaler, Netskope or Trellix logs. It can be email trace data from IronPort or Mimecast, or myriad of host-based logs such as Window Events Logs, OSQuery, Sysmon, or vendor-specific host logs such as Carbon Black process observations or Microsoft Defender for Endpoint tables like DeviceImageLoadEvents.

Where this data lives is just as varied, I’ve seen security teams store data in PostgreSQL, or use their Security Information & Event Management (SIEM) tool like Splunk, DataDog SIEM, or Elastic Security as a database (in practice, they *are* document databases, technically). That doesn’t account for Data Lakes built on S3, GCS, or Azure Storage Accounts or purpose-built data intelligence platforms such as Databricks or Snowflake which are popular locations to land security data in.

Analytical Data on the other hand is your deduplicated, normalized, curated sources that would be considered the Gold Layer in a Medallion Architecture. This is ultimately the data that domain teams will produce and/or consume from, and it’s already “shaped” in such a way that enables whatever consumption models or business processes are tied to it. You would typically see this data in a data warehouse such as Snowflake, Microsoft Fabric, Amazon Redshift, Teradata, IBM Neteeza, or otherwise. Additionally, being so closely tied to Medallion Architecture, you’d also find this data in a myriad of data lake variants; data lakes (proper), data lakehouses, data lakemarts, and otherwise.

In security, we don’t have a direct corollary into analytical data, outside of teams that go through the effort of cleansing and preparing data for consumption. And even then, is it meant to answer a business question (inasmuch as security is a business process) or is that more of the same transactional data? You be the judge, but typically that treatment is reserved for normalization, in-situ enrichment – such as adding ASN, Geolocation, asset ownership, vulnerability exploits, et al, or flattening data so it can be more reliably queried. Yes, there are so-called Security Analytics tools out there, and in CSMA’s hierarchy, a Security Analytics & Intelligence Layer (SAIL) is called out as a specific Layer, as well.

This divide between operational and analytical data is precisely where problems emerge in both traditional data teams and security operations. To bridge this divide, there are pipelines, and that is what provides the linkage between operational and analytical – you cannot have the latter without the former. That is always where failures crop up, and something screams out into the void: “bottleneck!”. Extraction, Transformation, and Loading (ETL) pipelines are what’s hot in security – with no less than two dozen companies attempting to wrangle with Cribl for sovereignty over IT and security data mobility.

The great device of data chart — *“The great device of data” – Zhamak Dehghani – https://martinfowler.com/articles/data-mesh-principles.html#TheGreatDivideOfData*

Even so, using a SaaS ETL provider that is laser-focused on security data is hardly infallible. Depending on the unique facets of the type of data, formats, storage locations, query engines, and usage – brittleness is introduced. As the great warrior-poets of Mobb Deep once said – “Make one false move and it’s an up-north trip” – change a timestamp format, remove or add an unexpected column, change a data type, rename a column? Your detections, analytics, fancy ML models, visualizations, SPL or KQL queries are all cooked!

That gets us back to Zhamak’s principles. If the ultimate goal of the central data (security or SecDataOps) team is to facilitate the transformation from operational to analytical, why not give ownership or suzerainity of that domain of data to those closest to it? What has centralization gotten us anyway? Expensive SIEMs, more expensive “transformation” and “modernization”, and it gave birth to dozens of categories of tools across 1000s of companies with misleading or outright paradoxical marketing.

The best way to beat centralization is with decentralization.

Domain Ownership

A core concept of Data Mesh is decentralization, which is simple enough to conceptualize: not all of the data (operational or analytical) needs to be in the same physical (logical?) location, or even owned by the same teams. It’s okay if you are using a data lakehouse built with Google Cloud Storage, Unity Catalog, Apache Iceberg, and Star Rocks in addition to Snowflake or Amazon Redshift Serverless. This is along the same vein of what we here at Query proselytize about, keep your data in its most advantageous location, be it behind a native API, in a SIEM, a warehouse, or otherwise.

What’s more important is that you can access that data wherever it lives – like using Query, for instance – and that the data products produced ultimately follow governance and interoperability standards set forth in your organization. In turn, those standards must be agreed upon and conceptualized working backwards from your jobs to be done.

The partner to that decentralization is distribution of accountability and responsibility, those closest to the data should be owners over the production, computation, and overall change & scalability of said data. That is what a domain is, it’s composable along business/function lines and not tool-centric which is a large premise of CSMA.

In security, we’re a bit ahead of the game here given our native decomposition across subdisciplinary lines. For some examples

The TVM domain Team owns vulnerability data products – whether that data comes from CrowdStrike Spotlight, Qualys, or Tenable is an implementation detail hidden behind their data product interface
The IAM Security domain Team owns identity/access data products – consuming from Entra ID, CrowdStrike Identity Protection, and Okta, then serving a unified identity security product
The Endpoint Security domain Team owns endpoint detection data products – CrowdStrike Falcon is one source, but they might also integrate Carbon Black or Microsoft Defender for Endpoint

Here is a litmus test for you: if you switch tools, does your domain still exist? The TVM domain persists whether you use Amazon Inspector, CrowdStrike Spotlight, or Qualys. But a ‘CrowdStrike Data domain’ would disappear if you switched to SentinelOne – that’s tool-centric organization, not Data Mesh.

domain oriented ownership of analytical data chart — *Example domain oriented ownership of analytical data, Zhamak Dehghani, https://martinfowler.com/articles/data-mesh-principles.html#DomainOwnership*

With the example of a larger product suite (or, ahem, “platformization”) such as Palo Alto Cortex & Prisma Cloud, CrowdStrike Falcon, Microsoft (M365) Security, or SentinelOne Singularity there will still be a singular owner of that tool – a group of administrators or a platform team – in a Data Mesh, domains are not defined along tool lines. domain teams own their functional area and are accountable for data quality, regardless of which tools generate the underlying data. Tool administration might be centralized (a “CrowdStrike admin team”), but data product ownership is with the functional domain team.

Tool Owner = You administer/operate CrowdStrike Falcon
Domain Owner = You’re accountable for the “Endpoint Security” domain

In practice, many security organizations are organized around tool ownership (e.g., the CrowdStrike team, the M365 Security team), especially for the large aforementioned platforms in the previous paragraph. This often reflects vendor support models, licensing boundaries, and the reality that these platforms require specialized administration. While this is better than full centralization, it isn’t pure Data Mesh – it’s a transitional model where tool teams act as domain owners.

Even if your Data Mesh project is undertaken solely for the security organization, do not fall into the trap of expecting small groups (or individuals) within the SecDataOps task force or the SOC to be domain owners. They certainly have a role to play in the Data Mesh, and they may end up being domain owners of a small piece of the pie, but ultimately SecDataOps task forces should handle data infrastructure and the SOC will primarily be a consumer. If there are individuals doing discrete jobs in the SOC, such as cyber threat intelligence, then there may be a precedent for a domain-team in the SOC to ensure CTI/OSINT is distributed across Data Mesh consumers.

In a true domain-oriented model, the engineering team managing your payments API would own the security logs for that API as a data product, with SLOs around data quality, discoverability, and freshness. The SOC would consume that product through standardized interfaces, perhaps with help from the SecDataOps task force or team. In the CSMA model, the Payments API team ships raw logs to a centralized SIEM where someone else figures out normalization, correlation, and retention. Exactly the centralized evil twin in contrast to Data Mesh.

However you boil it down, domain-ownership is the first important principle of Data Mesh. You want to enable the domain teams to do their work but also push accountability of action down to them. In the next Principle, you will learn about enablement, because there will still be a data skills gap in most security organizations despite any “best-of-breed” tooling in place.

Query provides the federated query infrastructure that is the consumption plane of a Security Data Mesh. We don’t store your data – we virtualize access to it wherever it lives, using OCSF as the interoperability standard that makes cross-domain queries possible. Instead of consumers needing to know which APIs hold what data across CrowdStrike’s dozens of endpoints or hundreds of tables in Microsoft Sentinel or Snowflake, we map them to OCSF Event Classes on your behalf (or enable you to map it with no-code or AI).

This means you can query for Vulnerability Findings by normalized severity, status, or impacted hostname – and Query handles the federated execution across CrowdStrike Spotlight, Qualys, Tenable, or wherever that domain’s data lives. The query is parallelized across any connector mapped to that OCSF class, abstracting away the technical complexity of disparate APIs, schemas, and authentication mechanisms.

However, Query doesn’t replace the organizational work of domain ownership. The TVM team still needs to be accountable for the quality, freshness, and reliability of vulnerability data, even if Query makes it easier for consumers to access. Query provides the “how” (federated infrastructure), but your organization must still define the “who” (domain owners) and “what” (data product SLOs). Of course, if you were consuming via a direct API static schema Connector in our product, that “what” is largely on us and the downstream provider.

Without federated infrastructure like Query – or the organizational willingness to build it yourself – security organizations default to the centralized bottleneck model. SecDataOps becomes responsible for pulling operational data, hosting it in common locations, performing cross-joins and enrichment, then disseminating finalized datasets to consumers.

This centralized pattern fails at scale because SecDataOps may lack both the domain expertise to define quality for every use case AND the engineering bandwidth to keep up with every domain’s evolving needs. The skills gap in data engineering across most security organizations makes this even harder.

This centralized bottleneck is exactly what CSMA prescribes – aggregate data from decentralized sources into centralized platforms (Security Analytics and Intelligence Layer (SAIL)), then provide unified dashboards (Operations Dashboard Layer). Gartner dresses it up with “mesh” terminology, but it’s the same hub-and-spoke model that creates the bottlenecks Data Mesh was designed to solve. More on that later.

Query and Data Mesh: Where We Fit

✅ What Query Provides:

Federated query infrastructure across 50+ sources

OCSF-based schema standardization and mapping

Self-service consumption without centralization

Query translation, auth, pagination handled for you

⚠️ What Query Doesn’t Replace:

Organizational domain ownership and accountability

Data product quality SLOs and monitoring

Cultural shift to treat data as products

Domain teams’ responsibility for their data

Think of Query as the railroad tracks and trains that let domain teams ship their data products. But domains still need to decide what to ship, maintain quality control, and respond when consumers have issues.

Data as a Product

An important paradigm of a Data Mesh, Data as a Product means that data must be a first-class citizen. Teams are beleaguered enough without having to worry about pipeline brittleness, concurrency, latency, complex microservices architecture – they just want the damn data. It’s the domain teams’ job to serve it in accordance with governance and interoperability standards to the consumers.

domain with its analytical data product chart — *Example domain with its analytical data product, Zhamak Dehghani, https://martinfowler.com/articles/data-mesh-principles.html#DataAsAProduct*

The most important paradigm about data is that it must be trustworthy, and that is defined with your domain owners, other stakeholders (Consumers), and ultimately what taskings they’re looking to accomplish when readily enabled with such data. Trustworthiness could be tied to timeliness, to quality, to latency, to deduplication, to pre-enrichment, other facets, or some combination of them (or all of them). This goes back to the federation of governance, it must be agreed upon by anyone operating within the Data Mesh.

There is an implicit “chain of trust” when you have different SaaS providers in play. If you’re in security you know this all too well, well documented and performant APIs are not a strong suit of many security vendors. You’re essentially at their whims for what authentication protocols & mechanisms, rate limits, pagination, filtering, and schema documentation is provided. A lot of these tools aren’t even meant for ad-hoc search and advanced filtering, with asynchronous APIs usually being the mechanism to bulk-move data.

You have to conduct your own exploratory data analysis and metadata management to deal with missing keys, mutated schemas, mismatching types, nullability, and other important facets of the data. Ultimately, you must make do within the confines of your governance efforts and what the minimum necessary SLAs you must fulfill.

Again, all of those details are important to the domain teams, and not the consumers. Those pains must be as opaque as possible, and if the datasets have issues, domain teams must take the efforts to mitigate them. In a way, the domain team would be proffering a ready-made Silver Layer in a medallion architecture layout. The challenge is most security teams lack the data engineering expertise to build this infrastructure themselves. That’s where Query comes in. (As well as my insistence that security organizations become SecDataOps organizations).

As far as the Query role in enabling Data as a Product, domain teams want to serve their data products from where the data naturally lives – behind CrowdStrike APIs, in Snowflake tables, within Microsoft Sentinel – without building custom query infrastructure for consumers. Query handles the federated query complexity that would otherwise force domain teams to either:

Build their own APIs, authentication, pagination, query translation layer (unrealistic for most security teams)
Funnel all data into a centralized SIEM/warehouse where SecDataOps handles queries (the centralized bottleneck)

Instead, domain teams map their data sources to OCSF schemas through Query. Consumers query the TVM domain for Vulnerability Findings via FSQL, GraphQL, or the Query UI – and Query federates that query across CrowdStrike Spotlight, Qualys, and Tenable in parallel. The domain team owns the data product and its SLOs; Query provides the consumption infrastructure.

Critically, Query doesn’t store the data – we virtualize access to it. The TVM team’s data product remains distributed across their chosen tools/platforms. If they need to materialize data centrally for specific use cases (compliance archives, ML training data), the Query managed pipelines can orchestrate that movement to S3, Google Cloud Storage, Cribl Stream, or other destinations. This is to enable selective centralization only, especially in cases where native APIs do not retain data beyond typical 14-30-90 day time periods.

This is a conflux of distributed query planning & execution, query translations (using the correct disjunctive or conjunctive normal forms), managing pagination, nullability, missing keys, in-situ transformations, streaming, session management, secrets management, and more. Without Query – or similar federated infrastructure – security organizations typically default to a centralized bottleneck model, where SecDataOps pulls telemetry and logs from operational systems, lands them in centralized stores, performs cross-joins and enrichment, then disseminates curated datasets to downstream users.

This centralized pattern consistently fails at scale: even the most skilled SecDataOps teams rarely have the domain expertise to define quality and semantics for every source, nor the engineering bandwidth to continuously adapt pipelines as each domain’s schema, tooling, or operational context evolves. The result is brittle integrations, delayed insights, and ultimately loss of trust in the data.

By providing this infrastructure layer, Query enables domain teams to serve data products from where the data naturally lives, without needing deep data engineering expertise. The TVM team can serve vulnerability data products directly from CrowdStrike Spotlight, Qualys, and Tenable APIs – with Query handling the technical complexity while they maintain accountability for data quality and SLOs.

Query vulnerability infrastructure chart

Now, stepping back from infrastructure specifics to the broader principle: everything that makes up a dataset is the Product. The pipelines, the code, the orchestrators, streaming services, and similar are all one unit. Said another way: you must write your code, build your schedulers, streaming infrastructure, and anything else in a performant way to enable that data Product. Just like you wouldn’t try to sell a SaaS tool without a UI or API (right?), if any part of that chain has an issue then that data is not ready to be a Product. Any discrepancies against agreed upon governance and interoperability notwithstanding, either.

The choices you make do matter in the lens of the next Data Mesh Principle, Self-Service Data Infrastructure, because if your choices lock you into being unable to meaningfully abstract data access in such a way to empower domain team Consumers (or provide patterns for lesser skilled domain team Producers), you’ve already failed your Data Mesh project.

I’ll use that big-brained phrase again: decentralization occurs along sociotechnological lines, it’s unfair to make a domain team accountable for something when they cannot operate the machinery. It would be like suddenly making your SecOps team responsible for vulnerability management without giving them access to Qualys or Amazon Inspector, nor any documented processes or KPIs to adhere to. Said another way, anytime you hear “we should implement a Data Mesh” you should think of “Enablement” as your strategic objective.

Self-Service Data Infrastructure

Enablement is what this Principle is all about. Self-Service Data Infrastructure is all about abstracting complexity – like when making data products – from the consumption and production model. As the central data & analytics or SecDataOps organization, you must meaningfully abstract all of the “sharp edges” and difficulty to get at the data that teams need to execute on their “jobs to be done”.

During the height of the “cloud wars” from 2017-2020, “self-service” almost always meant a service catalog – templated infrastructure blueprints exposed through tools like ServiceNow to help teams rapidly provision resources. That’s not what Data Mesh means by self-service.

The Self-Service piece lies at the “top” of the data platform that underpins your Mesh, it has nothing to do with the infrastructure that powers it. You can certainly use a service catalog of old, and in many ways having blueprints instead of doing custom builds helps with both enablement and empowerment of your domain teams.

Infrastructure wise, your Data Mesh can have every component running on Databricks, using Amazon Elastic Kubernetes Service (EKS), using Amazon EMR Serverless Applications, home-built systems with your own DAGs, or otherwise. Again, that is second-order business to the data Product. However, you want to expose APIs, metadata management, schemas, documentation, and anything else in the way that the domain teams want it such that they can produce with your tools, or consume from them.

If your team wants the data in CSV because their workflow includes adding CSVs of data capture from three relevant security APIs into ServiceNow SIR Incidents, then provide an API or a location they can consume CSV from. If they are not familiar with using Python requests or using PowerShell or bash, then provide them an interactive web portal where they can ClickOps their way to victory. However, if your Cyber Data Science team is building seasonality and anomaly detection ML models, they probably need an API that’ll accept a list of features, or perhaps they’ll want to consume from a materialized view in Snowflake, or pick up well-partitioned and compacted Apache Parquet files.

It goes back to understanding the jobs to be done, the technical skills (or limitations) of a domain team, and what exactly they need to be set up for success. If a domain team wants to build their own Analytical Data, they’ll need a composable way to define what views look like, how to materialize them, or a domain-specific language or configuration to define data Products – with the pipelines, orchestration, authentication, governance, etc. handled for them.

Zhamak lays out three examples of “Planes” used in a Self-Service Data Platform: Data infrastructure provisioning plane (DIPP), Data product developer experience plane (DPDXP), and the Data mesh supervision plane (DMSP) which are matched against specific personas and not necessarily the entirety of a domain team.

DIPP: This is essentially a (Data) Infrastructure Lifecycle Management interface where the underlying infrastructure (as much as they want to consume from it) is orchestrated and built-out, it could be a service catalog. This would allow the entire data Product, or any sub-component of it (orchestrators, streams, Kafka workers, underlying code) could be requested and run. Perhaps a vulnerability management team wants hourly pulls of deduplicated container vulnerabilities instead of daily, or the Cyber Data Science team needs to provision a larger EMR Application or wants to include Iceberg or Hudi support in it, it all happens here.
DPDXP: This plane serves as the primary interface for data product developers and analysts, providing a significantly higher abstraction level than the Data Infrastructure Provisioning Plane (DIPP). Its core function is to simplify the entire data product lifecycle management process. It achieves this through the use of declarative interfaces, which allow users to define what they want to achieve rather than how to achieve it. This abstraction shields developers and analysts from the underlying infrastructure complexities, enabling them to focus on data product development without needing deep knowledge of the operational nuances of the data platform. This plane plays a crucial role in automatically implementing standardized cross-cutting concerns; such as data governance, security, observability, and quality, which are often tedious and error-prone when implemented manually.
DMSP: This utility will live at the Data Mesh “level” and is used for consumption and exploration. Showing interdependence, hierarchy, schemas, metadata, and even as detailed as specific graphs, correlations, enrichment, and more. For instance, you could render a semantic layer using UML, LookML (from Google Cloud Looker), YANG, or even render it within a property graph database. This is where benefits from a unified data model such as the Open Cybersecurity Schema Framework (OCSF) come into play, as all Attributes (key:value pairs) are standardized and strongly-typed.

data mesh service service planes chart — *Example data mesh service service planes, Zhamak Dehghani, https://martinfowler.com/articles/data-mesh-principles.html#Self-serveDataPlatform*

Query operates across all three of these example planes:

As a DIPP: Query allows domain teams to build analytics, detections, and data pipelines. Its federated infrastructure enables managed detections across distributed security data without centralization, correlating events from various domains (IAM, Network Security, EDR) via federated queries, eliminating ETL. Query also orchestrates selective data movement to S3, GCS, Azure, Cribl Stream, or Splunk for specific uses like compliance or ML training, making centralization intentional, not the default.
As a DPDXP: For domain teams serving data products through APIs, Query abstracts the complexity of making those products consumable. Instead of building custom APIs, pagination handlers, authentication systems, and query parsers, domain teams can expose their data through existing APIs (CrowdStrike, Snowflake, etc.) and leverage the Query infrastructure to make them accessible via OCSF schemas.
As a DMSP: Query provides discovery, exploration, and consumption of data products across the Data Mesh. The OCSF mappings create a semantic layer where analysts can explore “what vulnerability data exists” without needing to know “which APIs have vulnerability data.” This is delivered via our patented query planning & execution engine, only sending translated queries and normalizing data that makes your search intent. In our platform, interdependence between domains becomes visible – you can see that Vulnerability Findings come from TVM domain (CrowdStrike Spotlight, Qualys) while Asset Inventory comes from IT Operations domain. This can be further enabled by our AI Agents that provide mission-specific natural language interfaces into our platform.

Whichever way you approach this breakdown, Self-Service is the key here (obviously) and if you cannot abstract enough of it to match the skill levels of your domain teams, then your Data Mesh project will fail.

Think of other interfaces that your team would find beneficial. Your planes could very well be aligned to domains or discrete jobs-to-be-done, such as a plane for vulnerability management or one for enterprise security that contains ways to get at pre-enriched and joined data across your HRIS/ERP, ServiceNow, and Entra ID or Okta environments. What planes will you build for your Self-Service Data Infrastructure?

Federated Computational Governance

For data enablement operations to be feasible, a Data Mesh requires a governance model that supports decentralization, domain self-sovereignty, and interoperability through global standardization. This model also needs a dynamic topology and, critically, automated execution of decisions by the platform. This approach is termed federated computational governance by Zhamak.

Federated computational governance involves a decision-making framework led by a federation of domain data product owners and data platform product owners. This group maintains autonomy and local decision-making power while establishing and adhering to global rules. These rules apply to all data products and their interfaces, ensuring a healthy and interoperable ecosystem.

Federated Governance parquet chart — *“Federated Governance”* *– https://www.datamesh-architecture.com/#federated-governance*

The primary challenge for this group is to balance centralization and decentralization, determining which decisions are localized to each domain and which are made globally. Ultimately, global decisions aim to foster interoperability and create a compounding network effect through the discovery and composition of data products.

Traditional governance is operational – a central committee reviews data models, approves schemas, and manually enforces standards. This creates bottlenecks and doesn’t scale across autonomous domains. Don’t you remember we’re trying to avoid the “b-word”? Federated computational governance flips this: global standards are encoded in the platform and automatically enforced. domains can move fast because policy compliance happens through automated checks, not manual reviews. The federation decides what the rules are; the platform ensures they’re followed.

For instance, at the global level your SecDataOps team may standardize on OCSF and govern all datasets against it. Rather than manual reviews, the platform automatically validates that any data product exposed through the Self-Service Data Infrastructure planes pass OCSF schema validation. If a domain team tries to serve a data product with invalid OCSF schemas, the deployment fails automatically – no manual gatekeeper needed. In practice, this would be a GitOps-driven workflow, with checks automated and reporting done on it to feed continuous improvement (along with SLO-level metrics to support the entire Data Mesh ecosystem).

However, this would be to the detriment of teams unfamiliar with OCSF or with legacy data that doesn’t map cleanly. So the federation (or, Ministry of Awesome Data) might decide to also expose Bronze Layer data (raw, unvalidated) for domains that need it, allowing them to apply their own transformations. Decentralization! The federation sets the rule: “All Silver/Gold layer products must be OCSF-compliant, Bronze layer has no schema requirements” – and the platform enforces it computationally.

Global standards shouldn’t overspecify implementation details that domain teams understand better than any central committee. For example, it may not make sense to centrally mandate a specific primary key (if using OLTP or systems that can deduplicate or assert uniqueness on a specific key) in any given data Product. That should be up to the domain team – they know their data best.

The closest construct to a primary key in OCSF is metadata.uid, but if you’re consuming data from ServiceNow SIR Incidents it should be up to the domain team to pick if it will be sys_id, incident_id, or correlation_id mapped to metadata.uid. Likewise for Defender XDR data stored in the DeviceProcessEvents table in Sentinel, some will use the ReportId while others may generate a hash of the entire record.

In a centralized governance model, every time a domain team wants to add a new data source, change a retention policy, or modify their schema, they wait for the central security governance board to review and approve. This creates the very bottleneck Data Mesh was designed to solve.

In federated computational governance, the TVM domain team can independently decide to add Snyk as a new vulnerability data source, set their own retention policy (within regulatory minimums), and evolve their data product schema – as long as they remain OCSF-compliant and meet global data quality standards. The platform validates compliance automatically; the SOC can immediately start consuming the updated product without any manual coordination.

This is greatly in contrast with CSMA’s Centralized Policy, Posture, and Playbook Management Layer (C3PM) where a central team builds, manages, and orchestrates all security policies, then translates them into native tool configurations. While Policy-as-Code is slick, it’s just in reference to security policy and not data governance. There’s no concept of domains setting their own local policies within global constraints, because CSMA doesn’t have domains in the first place.

I’ve name-dropped several CSMA Layers throughout this section, now it’s time to dive deeper into CSMA and my overall critique. In this section you learned all about Data Mesh as a decentralized architectural pattern, with some security-specific details. In the next section you’ll get the same level of deep understanding for CSMA 3.0.

If you’re considering your security data strategy and want to explore the benefits of a decentralized approach, reach out to us. SecDataOps savages are standing by…

Stay Dangerous.

This is Part 1 of a 3-part blog series about Data Mesh intended to educate and serve as a decision-making tool for security leaders rethinking their data strategy. You can read more in Part 2 and Part 3.

Contributed by:

Jonathan Rau

VP/Distinguished Engineer, Query