Building the Right Data Architecture blog header

October 15, 2025

Building the Right Architecture for Distributed Security Data

Security data is no longer confined to a single source or centralized SIEM. It’s dispersed across clouds, SaaS platforms, identity systems, EDR tools, and more. This decentralized landscape presents a challenge: how do you operationalize security when your data lives everywhere?

The answer isn’t to move your data. Moving big data is expensive, burdensome, and wasteful, especially when the desired outcome can be achieved without moving the data. You can build a security operations architecture that accesses and acts on the data where it lives—a truly federated approach.

If you’re building your architecture for distributed security data, here’s what you need to include.

Architectural Components You Need

1. Secure Data Access over Vendor APIs

Every data source, from cloud logs to SaaS telemetry, must be accessed via authenticated and authorized APIs supported by the source platform’s vendor. This tier is the backbone for federated architectures.

The challenge of course, is that these APIs are all different and non-standard. That’s where the data mesh comes in.

2. Data Mesh (with Data Pipeline option)

A data mesh creates a uniform layer of access across federated sources, effectively allowing the higher layers to remain agnostic of heterogeneous vendor APIs. An important aspect of the mesh is that it is read-only, as it is not meant to modify or re-organize data. In scenarios where reorganization is needed to optimize storage format or to move to a local data lake, that’s when the pipelining capability (optional) of the mesh can be used.

3. Data Standardization / Normalization

Security data is noisy and not in a common structure across sources. Normalization and standardization with something like OCSF (Open Cybersecurity Schema Framework), enables unified queries, analytics, and ML across heterogeneous sources.

4. Distributed Query Engine

You need a distributed query engine that can push compute to the edge or to the local lake—querying remote sources without (or minimal) data movement. It should be able to understand entities and their relationships, and traverse the cross-platform chain to support the use-case coming from the higher tiers.

5. Federated Search Tier

Federated search is the foundation of this architecture. It lets you search across distributed datasets—cloud, SaaS, on-prem, via a simple and common way to search. Searching should support parallelism, dependency resolution, and bring only the relevant subset of results back.

6. API Tier (MCP, SPL, REST or GraphQL)

A robust API layer to support a query language. The query can be coming from an MCP client, or a REST or GraphQL client.

7. Agentic AI (with A2A)

Bring your AI to where your data resides instead of moving data to your AI. Agentic AI leverages our APIs and the data mesh to deliver answers and provide context-aware assistance, detections, and insights uniformly across all data.

8. Federated Detections Tier

With the data mesh architecture, the opportunity exists to detect threats based upon context from multiple distributed data sources. That’s what we define as Federated Detections.

9. Analyst Experience Tier

Console
SIEM App
AI Copilots

Your security team still needs a place to work. The console, in combination with an app for your SIEM, provides unified investigation, triage, and detection engineering across distributed data, normalized into a usable schema.

CoPilot AI-assistant(s) help analysts write queries, correlate data, and investigate incidents faster. With agentic AI, copilots leverage federated access to provide instant insights.

Look for a Complete Federated Security Stack

This stack creates a layered, composable system designed for SecDataOps teams where data remains distributed but operational capabilities are unified.

FEDERATED SECURITY STACK

L9: Analyst Experience Tier

Analyst Console / SIEM App
Copilot (leveraging MCP and Agentic AI)

L8: Agentic AI

A2A
Access controls
Data controls
Safety checks

L7: API Tier

MCP, Query Language (SPL, FSQL, …), REST, GraphQL
Data Access Gateway

L6: Federated Detections Tier

Detections that leverage cross-platform data and analytics

L5: Federated Search Tier

Filter, aggregate, quantify, analyze

L4: Distributed Query Engine

Query Planner & Executor
Traverse & Join Relationships

NORMALIZED API GATEWAY (L1-L3):

L3: Data Standardization and Normalization

OCSF Standardization
Extract Entities, Relationships

L2: Data Mesh Architecture (with Pipeline)

Multi-Cloud, SaaS, Hybrid, On-prem
Pipelining (optional) to move to local lake

L1: APIs and Secure Access

Leverage Platform APIs

Figure 1: Layers of a complete Federated Security Stack

What To Watch Out For

1. Partial stack doesn’t work. Ask for an integrated solution.

Many vendors offer partial solutions. Some sell just the AI, others just a query engine or SIEM-lite capabiltiies. If you’re offered only Agentic AI or a search UI, you’re still left to integrate everything else on your own — APIs, normalization, query federation, and user interface.

Unless you have the engineering resources and time to build this stack yourself, a partial solution is not a solution.

2. Exposing raw/un-normalized native data to LLMs is dangerous.

Exposing raw data can lead to prompt injection, poisoning and other LLM security issues. You would also be exposing critical internal organizational context, often with PII, to the LLM provider. Normalize and standardize data in a structured manner, filter it further, and remove PII before your LLM gets to see your data.

While LLMs can be beneficial with raw data as well, limit that access to smaller approved samples used during training of your model for it to understand data structure and configure desired normalization and standardization.

3. Letting Agentic AI loose on your data can quickly become expensive.

It is very expensive (10x more than search) for LLMs to process data. That is why LLM providers limit rates and charge by number of tokens. Security data is big data, so token costs quickly get impractical. Who remembers their splunk ingestion pricing? Multiply that by 10x!

You want your federated security stack to manage what subset of data is exposed in what way to LLMs.

4. Be very skeptical of AI-driven automated alert triage and actions.

While vendors will pitch AI-driven auto-triage and actions, be skeptical and build in the right checks and balances. There certainly is opportunity there but a safer place to start is using AI to aid the analyst with investigation. Let the analyst be the final authority, not the AI. AI will mature further and the day may come when it can replace some analyst work fully, but today is not that day.

Query Gives You a Complete Federated Security Solution

Query provides a complete out-of-the-box federated security stack. It connects to your data where it lives, normalizes it using OCSF, allows secure federated queries, and safely delivers AI-assisted triage and investigation—all without requiring data movement or engineering resources.

Within 30 minutes, your team can be querying across Microsoft, AWS, CrowdStrike, and other platforms without building pipelines or normalization layers yourself.

Summary

To handle distributed, cloud-native, high-volume security data, you need a federated architecture. This isn’t optional anymore, it’s essential. Choose a solution that offers the full stack: secure access, normalization, federated search, AI, and analyst tooling. Query provides all of that turnkey.

Contributed by:

Dhiraj Sharan

Chief Scientist & Founder, Query