November 16, 2023

Context-Based Data Enrichment for Cybersecurity Investigations

It is said that ‘Knowledge is Power.’ For an analyst investigating an alert, having an extra boost of contextual knowledge can be liberating. Let’s look at how we can incorporate additional sources of knowledge in our alert investigation workflow. The truth will set you free!

Dealing with a high volume of raw alerts?

According to Trend Micro, 70% Of SOC teams are emotionally overwhelmed by alert volume. Raw alerts, in isolation, are akin to a handful of puzzle pieces without the accompanying picture. They may hint at a part of the story but often lack the detail required for understanding. This is where the power of context-based data enrichment becomes relevant. By layering raw data with contextual insights collected from internal and external sources, analysts can act on the alert more efficiently.

How context-based data enrichment can help prioritize

Augmenting key data points (e.g., an IP address, domain name, or file hash) with additional context to paint a fuller picture, helps the analyst triage an alert more effectively. The augmentation can be from a myriad of sources – internal telemetry, identity and asset repositories, cyber threat intelligence (CTI), and critically, Open Source Intelligence (OSINT). Let’s dig a bit further into enrichment from OSINT, i.e. intelligence from analyzing public information sources…

Enrichment from OSINT Tools

OSINT tools, whether community or commercial, are increasingly relevant reference sources for analysts. Let’s look at some examples on the context they provide:

Advanced Domain and IP Analysis:

Beyond the basic Whois lookup, tools like Shodan, Censys, and RiskIQ (now MS Defender Threat Intelligence) offer a granular view of domain relationships, IP histories, and even digital certificates. For instance, an IP might not just be linked to a malicious server but could be part of a larger network of compromised nodes.

Harnessing Threat Intelligence Platforms:

A sudden surge in discussions around a specific software vulnerability in underground forums can be a precursor to an imminent attack. Platforms like AlienVault OTX, MISP, Recorded Future, and IntSights provide threat landscapes based on current data.

Location and Time Context:

Tools like MaxMind or IP2Location can provide geospatial data on IP addresses. This can be crucial in understanding if an attack is originating from a region known for cybercrime. Additionally, timing patterns (e.g., attacks that consistently occur at off-peak hours) can offer clues about the threat actor’s location or intent.

Code Repositories and Their Hidden Secrets:

Beyond just GitHub or Pastebin, there are myriad platforms where threat actors might inadvertently (or intentionally) leave traces. Tools like GitRob, Pastehunter, and Gitleaks can be configured to continuously monitor and alert on suspicious code commits, configuration files, or data dumps that match predefined patterns or IoCs.

While platforms like Twitter and Reddit are valuable, hackers are using Tor to untraceably access underground forums. Tools like Cybersixgill probes the dark web for mentions of emerging threats, attack plans, or zero-day exploits. If you are red-teaming, Recon-ng provides web-based open source reconnaissance, and theHarvester harvests emails, subdomains and names.

Did we just create more work?

We talked about a lot of toolsets above. We have to be careful about how to incorporate the tool categories in the analyst’s investigation workflow. Otherwise we run the risk of creating more work, the very opposite of the problem we were trying to solve.

The naive approach is that the analyst opens multiple consoles and runs individual searches and manually compiles results in their head, or on a notepad, or in a ticket. That can lead to tiring scatterbrain syndrome, see our blogs Why So Many Tabs? and Measuring Analysts Searches per Investigation (ASPI).

Alternatively, one would expect their SIEM to solve this problem. But does it?

Does your SIEM bill increase with enrichment data volume? And does your SIEM look up live intelligence?

The vision of SIEM was that it would collect information from all relevant sources, and thus provide all the context that the analyst would need to triage an alert.

I am sure everyone would agree that the vision has fallen short. I blame the word ‘collect’ above. SIEM vendors were so tuned to the model of their collection and storage pipeline – thanks to their volume based pricing model – that they even applied it to intelligence sources. After all, if the SIEM is charging by data size, the more intelligence data you add, the higher the bill! Unfortunately, the data quickly becomes outdated and is a problem when there is a zero-day threat. Who wants intelligence that is (a) expensive, and (b) stale? SIEMs should be able to investigate and correlate with latest updated intelligence, in real time when the analyst is investigating. It is possible via APIs.

The inability of SIEMs to look up live intelligence led to some customers installing SOAR. Let’s talk about that…

Did your SOAR solve it for you?

SOAR did allow analysts access to live intelligence and the ability to attach it to the alert. However, the challenge has been the playbook implementations. In most scenarios SOAR needs either python or other proprietary macro language exposure. SOAR seems to be a sore subject – see our survey results Top SOAR: Learnings, Successes, and Challenges.

So now what? There is another option- Federated Search.

Have you tried Federated Search?

Federated Search would let analysts simultaneously search any OSINT and also internal sources to provide contextual enrichment. Analysts can choose what sources to query, based upon the event(s) they may be triaging. The analyst gets the most up-to-date information on-demand from the source.

An IP address that’s linked to a Tor exit node, has a history of malicious activity, and originates from a high-risk region, would visually show up with that detail, allowing analysts to prioritize their efforts higher than a generic alert.

Some vendors’ federated search may turn out to be vanilla textual search. Make sure to use one that can normalize and do relevant queries understanding common cybersecurity schema, can correlate results from across platforms, and can do further follow up searches automatically. Here is a guide on how you can evaluate federated search for security.

In Conclusion

In the ever-evolving threat landscape, enriching the context around an alert is king. Context-based data enrichment, powered by the right mix of OSINT tools and integrations, will ensure that analysts can get the relevant information on the alert they are triaging. Use federated search to connect all the dots from all relevant data sources and get a 360 degree view to understand the full story.

Contributed by:

Dhiraj Sharan

Chief Scientist & Founder, Query