March 5, 2024

Amazon S3 Buckets Integrated Into Query To Enrich Federated Search

Amazon Athena for S3

Amazon Athena is a serverless analytics service on the Amazon Web Services (AWS) Cloud built upon Trino and Presto that allows you to perform interactive analysis and querying against data stored within Amazon Simple Storage Service (S3) buckets. Athena is able to work with several open-table formats such as AWS Glue Data Catalog, Apache Iceberg and Delta Lake and file formats such as plaintext (.txt), Apache Parquet, Apache Avro, Apache ORC, JSON, XML, CSV, and more.

Query integrates with Amazon Athena for the purposes of querying data within Amazon S3 buckets. This can range from general information technology (IT) to specialized security data including high-cost and high-volume logs such as Authorization logs, Packet Captures, DNS or DHCP traffic, or telemetry from Endpoint Detection & Response (EDR) or Host-based Intrusion Detection/Prevention Systems (HIDPS), and more.

Within the Query Federated Search platform this data source is considered a Dynamic Schema platform, as it can be in any format you choose. Refer to the Configure Schema section of the Query product docs for more information to help you quickly onboard your data. For that purpose, Query provides a Configure Schema no-code workflow to allow users to easily introspect, auto-discover schema mapping opportunities and time-based partitions in the tables, and map the source data into the Query Data Model (QDM). This allows you to model nearly any logging or event data, or asset data, stored within a AWS Athena dataset and table.

Query’s integration with Athena and S3 cloud solution allows analysts to search the following entities:

IP Addresses (IPv4 and IPv6)
Domains & Hostnames
URLs & URIs
Email Addresses
Usernames & User IDs
Email Addresses
File Hashes (e.g., MD5, SHA1, SHA256, etc.)
File Names or Directories
Resource IDs (e.g., Agent or Device IDs, cloud resource IDs)
Process Names
MAC Addresses

For example, the analyst could obtain the following context:

Because Amazon S3 with Athena is considered to be a “green field” for data, security architects and engineers can decide which large volume data source they would like to store this data in to search.
For example, network flow log, Windows event data, and DNS data are examples of rather large (noisy) telemetry. Therefore you could put any of those into an Amazon S3 with Athena database and then search any of the entities above with Query.

To integrate Amazon S3 with Athena, see integration documentation here.

The Query Federated Search Connector for Amazon Athena (for Amazon S3) is able to make use of any Table or View you create that is registered in the AWS Glue Data Catalog, whether using the CREATE TABLE command directly within your Athena SQL DDL statements or tables that were created using AWS Glue Crawlers or otherwise. Query supports any table format including Glue, Snowflake and Delta Lake, though some unintentional bugs may occur if there are any nuances with data types or operators.

The integration will normalize data pulled from Amazon S3 into Query’s OCSF based QDM l;which then enables cross-platform joins, compounding the analyst’s ability to investigate. Query normalizes S3 data into QDM based on the schema mapping outlined by the analyst. With the federated join capabilities, the analyst can now see context on that entity pulled from additional data sources Query is integrated with.

Using the Query Federated Search platform you can quickly and easily surface any supported Entity within the platform if it matches in your data such as IP Addresses, Process Names, Domains, Hostnames, MAC Addresses, and more. Our platform will craft Athena SQL DDL statements on your behalf to pull out exactly the data you require, all results are normalized, deduplicated, correlated and enriched such that an entity-based search would surface similar data points across all of your onboarded resources.

Contributed by:

Query

Simplifying Search