microsoft security and query federated search

February 18, 2025

Microsoft Security & Query Federated Search: Better Together

Introduction

The Microsoft Security ecosystem is large, which is certainly one of the understatements of all time, but when you combine the actual security-related SKUs and security-relevant data it is VERY large. Everything from Microsoft Entra ID (formerly known as Azure Active Directory) to the mobile device management platform Microsoft Intune, as well as the several different types of Microsoft Defender products, can be security-relevant.

However, getting at this data has always been a challenge. Requiring out-of-the-box and custom pipelines and orchestration to move the data into areas to consume it such as Microsoft Sentinel, Azure Blob Storage, or externally. With centralization comes duplication, and if the environment is large enough, the real risk of data not being fresh enough for your incident responders, investigators, or other Security Operations (SecOps) personnel to use.

In this blog you will learn how to make the most of your Microsoft Security data (and relevant data in other locations) by using Query Federated Search. This allows your security and observability staff to get at exactly the data they need from exactly the right place in Microsoft without having to move it. Likewise, they can get the same value even if you are partially centralized within Sentinel, Azure Log Analytics, or elsewhere in the Microsoft ecosystem.

Overview of Microsoft Security capabilities

This section will not attempt to enumerate every difference and nuance of the variety of licensing plans and tiers of M365 (e.g., F5, E5, E3, etc.). For further information, refer to the M365 Maps Feature Matrix!

The core to every Azure tenant or Microsoft M365 environment is Entra ID. This makes sense because it is your Directory–also called a Tenant–where you associate your Azure subscriptions, assign licenses, create users, monitor devices, and more. EntraID itself has security features and security-relevant data such as keeping track of all users and their devices (and associated metadata), along with some security-specific tools such as Conditional Access, Privileged Identity Management (PIM), and the audit and authentication logs. While some of these features are being deprecated (such as Risky Sign-Ins), just using a robust license such as EntraID P2 affords you a bevy of tools and telemetry to use.

From here, you have the Defender SKUs, the most well known being Microsoft Defender for Endpoint (MDE, formerly known as MDATP), which provides Endpoint Detection & Response (EDR), Vulnerability Management, and other host-based configuration and security management capabilities. MDE itself can be consumed using a variety of license models, and the Microsoft Defender console unifies all other Defender SKU findings and telemetry into a single place. For instance, you also have

Defender for Cloud Apps
Defender for Identity
Defender for IOT
Defender for Office 365
And several other Defenders!

Related to the Defender ecosystem is Microsoft Sentinel (formerly known as Azure Sentinel) which is a next-generation Security Information & Event Management (SIEM) tool that unifies log management, hunting & detections, as well as response and SOC efficiency metrics tracking in a central location. Using Defender Extended Detection & Response (XDR) you can extend and seamlessly move Defender-related telemetry into Sentinel to track Alerts and Incidents in a central spot and have host-based telemetry moved into Sentinel.

When combined with other out-of-the-box Collection Rules (the widgets that ingest and normalize data) and Analytics (the rules that trigger Alerts on the data), vendor-provided, and custom mechanisms: Sentinel can be a very formidable SIEM. When you consider the integrations into Azure services such as Azure Logic Apps, it makes a lot of sense why folks choose Sentinel to unify their detection and response efforts. Under the covers of Sentinel is Azure Log Analytics which is a columnar, NoSQL-like database that is queryable using Kusto Query Language (KQL) that offers fast and easy ingestion to onboard outside and custom sources into Sentinel as well.

Looping back to security-relevant data from security-specific data, there is Microsoft Intune which can integrate with MDE. Intune is a Mobile Device Management (MDM) platform–also known as Mobile Endpoint Management (MEM)–that allows you to consume host-based telemetry from your onboarded hosts and push down configurations and compliance standards. For instance, you can consume software and hardware information across your enterprise, but also configure rules for different operating systems. These rules can include automatically installing MDE, applying certain security features, and other system-specific configurations.

Outside of these tools, there’s security-relevant information in other SKUs. For instance, Microsoft Defender for Office 365 is configured via different Admin centers in Microsoft to configure Anti-Spam, Anti-Phishing, and Anti-Malware rules. If you wanted to view compliance related data you’d need to make it to the Compliance Center where security data on your AI chatbot usage is located within Microsoft Purview, but if you wanted to consume DLP data for your Amazon S3 buckets or Azure Blob, you’d need to go to Azure Purview specifically. There are even more tools than were mentioned here, we recommend further reading and exploration using the M365 Maps and other resources.

In the next section, we will explore how an organization would consume this data via API and the challenges posed with doing that.

Consuming Microsoft data via API

As covered in the previous section, there are a lot of tools, which are accessed with a bevy of different plans and licensing tiers. Surely, there has to be a way to consume this? The answer is: sort of.

Much of the information across the entire Microsoft ecosystem is not consumable from the same place. The Microsoft Graph API is the central hub for most consumable Microsoft-related data and is incredibly important. The Graph API has several dozens of endpoints to allow you to consume users, devices, mail, reports, tasks, and security-specific data such as Alerts from all of your Defender SKUs. That list barely scratches the surface, and does not even take into account that there is also a beta version of the Graph that offers different data and experimental endpoints too.

Microsoft Intune and MDE both each have their own API endpoints as well with their own drawbacks and different ways to get at the telemetry you want. For instance, if you wanted to get all vulnerabilities, you’d need to first loop through all of your onboarded devices (called machines in MDE) and retrieve them one by one. If you wanted to audit Intune-specific actions you would need to consume the Audit Log from Intune and not from the Graph API. While Sentinel and other tools can offer ways to get at some, or most, of the relevant data there are still challenges faced in knowing where to get it all and how to best use it in your environment.

There is also the issue of certain tools not providing a way to consume their alerting, finding, or generic information via API. Take Microsoft Purview for instance, if you wanted to review blocked activity from employees or tenant guests misusing generative AI there is currently no way to consume that data via API. Likewise, there is not a stable way to consume Microsoft Defender for Cloud Apps (formerly known as MCAS) data either, requiring a variety of API endpoints to approximate the data for.

The largest challenge is not even the fact that so many different license plans and tiers are confusing, or even the fact that data has to be retrieved from different sources, it is the contextualization. Both the context in which you have to consume and get at different data, but also the format of the data as you consume it from APIs or from Azure Log Analytics by way of Sentinel and KQL queries.

A lot of the data is incredibly rich but is not straightforward to normalize in a way to make consumption easy. Just take the fact that a “device” is known as three different terms from three different tools that do communicate (Entra ID, Intune, Defender). Each of these gives you different data points, and you may need all three to paint a clear picture of configuration, metadata, and ownership.

Once you get past these difficulties, now you must unify it in some way. At the end of the day, asset management data and its associated metadata is simply metadata for most SecOps use cases. The entry point to SecOps activities, be it for incident response or threat hunting, is some outside stimulus such as an Alert, a bulletin, or your own findings from threat intelligence, threat modeling, and/or purple team exercises. The first data points you may require are the Alert or Incident data, and lower-level telemetry from the environment. This can include process activity, network logs, authentication logs, audit logs, or all of them.

Again, this can be an issue since even centralization in Sentinel can be difficult for operational teams to manage atop their regular job. You can use Defender XDR to push a lot of this device-specific information and host-based telemetry into Sentinel, but to get the extra tidbits from Intune, Entra ID, as well as Entra ID-specific logs such as audit logs or sign-in logs requires more configurations. If you need outside logging data such as AWS CloudTrail, Google Workspace authentication logs, or other cloud-specific logs, that only compounds the issues.

It is not that Microsoft does not give you the ability to consume and centralize all of this telemetry – you can make the argument that Sentinel is the best it ever has been on that front – it is that it still shouldn’t come down to centralization to make sense of it all. While we are huge proponents of Security Data Operations – the concept of effectively operating and taking ownership of security data – we are aware that not every SecOps program can expand in that direction all at once.

In the next section, we want to explore how Microsoft tools combined with Query Federated Search are “better together”, and how federated search can help alleviate the burdens of centralization and the specialist information required such as knowing how to finesse the APIs or write a lot of advanced KQL queries and notebooks to contextualize the data.

Federated Search for Microsoft Security data

Federated Search is the mechanism by which an operator can use a single, centralized tool to search across disparate data sources. The entire lifecycle of the search is handled on behalf of the operator, everything from capturing the intent of the search, to translating the query, and performantly retrieving the downstream data. Query Federated Search takes this further by also normalizing and standardizing the data into a unified format, the Query Data Model (QDM), which is derived from the Open Cybersecurity Schema Format. This allows two-way translations of your search intent and the results which makes it far easier to analyze, aggregate, filter, pivot, and ultimately utilize for SecOps.

If this is your first time reading about the OCSF, or if you are coming back to it after an absence, consider reading our beginner and executive-friendly blog: Query Absolute Beginner’s Guide to OCSF. For a more detailed explanation of OCSF, see our other blog: Definitive Guide to Open Cybersecurity Schema Framework (OCSF) Mapping.

Note first that, federated search solutions should not be confused with federated query solutions which utilize similarity of a query engine to reach out to another similar source. For instance, federated queries are possible from databases or data warehouses such as Databricks because the similarity is SQL, allowing you to query other databases and warehouses from within Databricks. This does not work for data sources that do not use SQL, and there is no intervention on the performance tuning or safety of the query that is being dispatched.

Another key differentiating factor of Federated Search is the ability to search this data without having to duplicate or move it again. From a cost, data sovereignty, and performance perspective this is important because you are not subsidizing the persistent storage of a vendor, you’re utilizing the investments and vectors you already have access to. You do not need to worry about privileged or sensitive data being cloned into another source, and you do not need to wait for the extra roundtrip time of the data being duplicated. This just-in-time fulfilment of searches allows your SecOps team to concentrate on gathering evidence and making decisions and less on the cost, privacy and security impacts of the searches.

Federated Search provides teams the flexibility and freedom of choice when it comes to their consumption models and security data architectures. For instance, there is absolutely nothing wrong with keeping your Sentinel or Defender XDR investment, and for all intents and purposes, you should keep it! Using Query Federated Search, you let our platform be the expert KQL author and data retriever atop your Defender telemetry.

For data that is a bit harder to normalize and deal with API limitations, leave the data in place! For instance, you can get access to all of the different device data from Entra ID, Intune, and MDE without having to move it out of the API. That way you are guaranteed to have the most up to date data whenever you search for it, and have one less pipeline or orchestrator that you need to configure and monitor. You do not need to worry about deduplication or pivoting, or making sense of the data, as our normalization into QDM puts all like data into the same like Attributes. Whether the identifier is an id, a machineId, and aadDeviceId, or a device_id, in Query it is simply a Resource UID to pivot from and visualize.

If you are using Defender XDR, you can continue to consume your unified Incidents from Sentinel directly with the API. After your search concludes with Query, you can use your same automation to close the result and append the search history or a saved search from Query into the comments when closing it. If you struggle to consume disparate data from Defender SKUs, you can search it from Query without having to move it out of the Microsoft Graph. Even more impactful is that Query emulates search mechanisms and operators that Microsoft does not natively provide. For instance, if you wanted to search for Alerts where a specific Command Line argument, or a specific set of Resources (by IP, hostname, ID, or otherwise) are implicated, you can easily fulfill that using Query Federated Search.

Consider the following two reference architectures. The first illustrates some of the interconnectedness of the various Microsoft security tools across the various plans and license tiers and shows Query Federated Search at the center of all of the related-but-decentralized data. This allows you to use your full Microsoft security investment and likely be able to gain access to different data sources that are not necessarily easy to centralize into Sentinel or Azure Log Analytics, without needing to migrate off of Sentinel.

However, not everyone uses the same license tiers nor do they centralize data into Sentinel, this can be because of associated costs or the unwillingness to swap to a different SIEM or query language (KQL) from their current investments. A popular SIEM choice for use with Microsoft data is Google SecOps SIEM (formerly known as Google Chronicle). SecOps SIEM offers many out of the box Microsoft ingestion Feeds (what Google calls their connectors) to absorb important sources of logging and assets, and allows you to query them all with the Universal Data Model (UDM) query language.

Query can still interoperate in this model by keeping the current ingestion in place, or optionally, allowing you to start to draw down the amount of data sent across the wire into SecOps to make room for other log sources. Your SOC can also unify their view across the other Microsoft data which may not make it into the SecOps SIEM with the benefit of the UDM data and Microsoft data formatted into the same QDM schemas for quicker decision making.

Notice that the Feeds for Microsoft data into Google SecOps SIEM have some differences in the data available for consumption versus the Query Federated Search Connectors. The differences are only slight, but the overlaps are plentiful. Working backwards from the use cases in your SOC (informed by threat modeling and threat intelligence) as well as the individual data coming from each feed can inform your SOC about what feeds to keep active. You can slowly move over to keeping certain data in place (such as Intune and Alerts) while keeping the Entra ID logs in Google SecOps SIEM.

Likewise, you can access data in parallel that you may not have access to in your SIEM without processing. By not needing to maintain custom log forwarding, ETL, and/or orchestration infrastructure you can free up your SecDataOps and Detection Engineering teams to perform more valuable work such as getting bespoke log sources onboarded, creating better detection artifacts, or working on response and recovery efforts.

Whether your data is centralized, decentralized, or somewhere in between Query Federated Search can greatly bolster decision support within your SOC and wider organization when it comes to Microsoft security data.

Conclusion

In this blog you learned about Microsoft Security services, both direct security tooling and security-relevant tooling and sources. You also learned about the different plans and licensing tiers, and how this data is consumed and interconnected across consoles and APIs. Finally, you learned about what federated search is, and how Query Federated Search can help your SOC access and utilize Microsoft security data.

While the reference architectures covered did not account for every possible variation of data architectures or log sources, hopefully this illustrates the various overlaps and consumption models that can be supported. Whether you are a massive Microsoft Azure or M365 E5 shop, or only consume a handful of Azure and M365 services, this data is better together when used with Query Federated Search.

If you wanted to see how Query Federated Search works in real-time with this disparate Microsoft data, if you wanted to initiate of Proof of Value with our federated search platform, or were more interested in learning more about SecDataOps reference architectures and workshops: reach out to our sales team! Operators are standing by to help you more efficiently operate with security data.

Stay Dangerous

Contributed by:

Jonathan Rau

VP/Distinguished Engineer, Query