Public cloud and networking make for odd bedfellows. Cloud networking is not just the virtualization of networking. In traditional setups, appliances and network taps are used to monitor traffic, but in cloud environments, this is virtualized, making direct monitoring more complex. At the OSI Layers 1 through 4 you’d be able to directly tap appliances to pull telemetry, but it is different in the cloud. Amazon VPC Flow Logs are an agentless source of network logging for OSI Layers 3 and 4 to help SRE, NetOps, SecOps, and platform teams alike.
By its very nature, the public cloud (Amazon Web Services, Microsoft Azure, Google Cloud Platform) do not allow you to use traditional appliance-based monitoring at the lower levels of the OSI model, as there is not an area to “plug in.” There are ways to physically connect to the cloud backbones, such as using AWS Direct Connect, but Amazon VPC Flow Logs are the source of logs that are the easiest to procure and as close to Layers 3 and 4 traffic capture as you can get without using virtual appliances.
Understanding Flow Logs
Amazon VPC Flow Logs enable you to capture information about the IP traffic going to and from network interfaces in your VPC.
Amazon VPC Flow Logs are still for OSI Layers 3 and 4, however they are recorded out-of-the-loop and are thus an abstraction – both from actual capture and completeness – versus running virtual taps on the machines themselves – if you can.
AWS has the concept of an Elastic Network Interface (ENI), which is a virtual Network Interface Card (vNIC) that has paravirtual attachments to instances, database services, and even AWS Lambda functions which are configured to run in a VPC. With that in mind, there is not always a host that you can install a sensor or agent on, hence why the ENI distinctions are so important in the Flow Logs themselves.
Capturing out-of-the-loop is important from a performance lens as additional latency and bandwidth consumption is negated in favor of efficient data capture. The main purpose of a Flow Log record is the capture of IP traffic flow (characterized by 5-tuple per ENI). This is aggregated every 10 minutes for most ENIs and every minute for Nitro-based instances.
Before getting too far ahead, let’s examine the basic format of an Amazon VPC Flow Log; it is always in a space-separated log with a single entry per line.
2 123456789010 eni-1235b8ca123456789 172.31.16.139 172.31.16.21 20641 22 6 20 4249 1418530010 1418530070 ACCEPT OK
- “2”: The VPC flow log version, each version adds different fields into the VPC Flow Logs, version 2 is default.
- “123456789010”: AWS account ID, this will be the account we are accessing the VPC Flow Log from. This will always be within the same Account as the network interface we are looking at.
- “eni-1235b8ca123456789”: The GUID of the Elastic Network Interface, remember, this can simply be the vNIC of an EC2 instance or represent a NAT Gateway or any other persistent connection within a VPC.
- “172.31.16.139” – Source IP address, where the traffic originated.
- “172.31.16.21” – Destination IP address, where the traffic is targeted or flowing through.
- “20641” – Source IANA port number, this is a typical ephemeral port
- “22” – Destination IANA port number, in this case, the traffic is bound for TCP 22 (Secure Shell [SSH])
- “6” – IP protocol number (6 corresponds to TCP) other common protocols you see here would be 17 which is UDP and 27 for RDP
- “20” – Number of packets exchanged
- “4249” – Number of bytes exchanged
- “1418530010” – Start of the capture window (Unix/Epoch timestamp)
- “1418530070” – End of the capture window (Unix/Epoch timestamp)
- “ACCEPT” – Action taken regarding the traffic (ACCEPT/DENY)
- “OK” – Log status (OK means logging was successful; NODATA means no data was present)
In summary, from this example log entry, we can gather that a network interface with ID eni-1235b8ca123456789 in AWS account 123456789010 had a flow of traffic between IP addresses 172.31.16.139 and 172.31.16.21, using TCP protocol (protocol number 6) over ports 20641 and 22, respectively. The traffic was allowed (ACCEPT), and it involved 20 packets and 4249 bytes during the capture window between the timestamps 1418530010 and 1418530070. The log status is OK, indicating that the log entry was created successfully.
VPC Flow Logs come in different formats – a default and a custom format with multiple versions – where custom allows you to specify the ordering and additional metadata fields. You can also expand the provided schema from VPC Flow Logs and use any of the fields from below. As of the publication of this blog, Amazon VPC Flow Logs are in Version 5 which is reflected below. Note that Version 2 fields are the default format and ordering. You must use Custom Format Flow Logs to select anything Version 3, 4, or 5 (and later versions) and their ordering. No two flow logs may end up looking the same, so ensure your changes are well-documented and well-governed.
version srcaddr dstaddr srcport dstport protocol start end type packets bytes account-id vpc-id subnet-id instance-id interface-id region az-id sublocation-type sublocation-id action tcp-flags pkt-srcaddr pkt-dstaddr pkt-src-aws-service pkt-dst-aws-service traffic-path flow-direction log-status
Given that VPC Flow Logs are space-separated, it is important that you apply consistency in how the VPC Flow Logs are constructed throughout your environment to improve outcomes for analysis and data work.
There are different reasons to include all data. It is worth creating a Flow Log with every field enabled to analyze what it looks like in your environment. If your organization configures (relatively) flat networks – such as without Transit Gateways or without using multiple ENIs, for instance, it may not make sense to capture fields such as “pkt-dstaddr”. If you do not use AWS Local Zones or AWS Wavelength Zones, then “sublocation-id” will not be used. The full schema is available from the Flow log records section of the Amazon Virtual Private Cloud User Guide.
Creation and Collection
Proper interpretation begins with proper governance for how Flow Logs are created and collected. This begins with automation and orchestration constructs that will create and manage them for every VPC in an AWS environment, apply consistent ordering and specification of fields, and centrally collect them. If logs are of a volume where streaming is possible, this matters more, as space-separated values do not provide the key name versus Parquet. Lastly, the point of collection should be standardized, though you can diversify this, as up to five Flow Log configurations are supported per VPC.
As of this writing, AWS supports three default locations to send VPC Flow Logs, the first is Amazon CloudWatch Logs, which is a near real-time log streaming service that is useful for quick and basic analysis. CloudWatch Logs Insights allows analysts to interactively search and analyze logs which make use of pipes, basic commands with support for statistical and mathematical operations. CloudWatch Logs also supports Subscription Filters which allows you to extend the streaming of logs to additional destinations such as Amazon Kinesis Data Streams, Amazon Kinesis Data Firehose, and/or AWS Lambda functions, which can be useful for cross-Account aggregation or sending logs to log collection services such as Amazon OpenSearch Service.
The second location is directly using Kinesis Data Firehose instead of proxying to it using CloudWatch Logs and Subscription Filters. Firehose is a fully managed streaming service with support for multiple destinations of its own. It allows you to set custom buffering strategies for writing batches of data, support for record conversion (such as JSON to Parquet), as well as dynamic partitioning depending on the destination. If you wanted to hydrate a data lake built on S3 or perform record conversion enroute to a SIEM or log aggregation service, then Firehose is a great selection.
Lastly, you can send Flow Logs to Amazon S3 which is well suited for long term storage and archival, as well as the basis for a data lake. To this end, you can simply dump the raw text logs into S3, or let the Flow Log service convert the records into Apache Parquet, which is a columnar data format that allows for fast reads and is very efficiently compressed – leading to faster query times and cheaper storage costs, respectively. Additionally, you can choose to use Hive-compatible partitions which will change how the folders (S3 prefix) are assembled. When using S3 as a destination, you can perform managed queries against the data using Amazon Athena which is a Presto/Trino-based managed service for efficiently querying and analyzing big data stored in S3 using SQL.
To derive value from Flow Logs, it is also important to understand the limitations. You cannot collect logs from VPCs peered outside of your AWS Account, and if multiple ENIs, Transit Gateway, or same-Account Peering is used – without the “pkt-dstaddr” and “pkt-srcaddr”, you will be unable to capture the intended destination and original sources for some scenarios. Additionally, AWS’ built-in DNS, NTP, IMDS, DHCP, and Windows License Activation traffic is not captured. Nor is the mirrored traffic, default VPC router (and other reserved IP addresses), or traffic between endpoint ENIs and Network Load Balancers captured.
If it is important for your use case to capture this type of activity at Layers 3 and 4, you will have to use host-based solutions such as forcing traffic through a forward proxy like Squid, using host-based IDS/IPS such as Suricata, or using other virtualized appliances at the extreme edges of your Amazon VPC networking construct. This too will need to be properly governed, maintained, cost-controlled, and data engineered to be able to correlate with and enrich AWS VPC Flow Logs.
Scenarios which Flow Logs are useful depend on the persona – NetOps and platform teams most regularly use them for troubleshooting and network metrics collection, whereas SecOps and other security professionals may also use them for troubleshooting, proving controls, and as part of investigations. This can include the following use cases:
- Diagnosing and troubleshooting AWS Security Groups and Network Access Control Lists
- Monitoring general reachability across ports and protocols to internet-facing destinations
- Monitoring the usage of specific ports (e.g., SMB, SSH, RDP, FTP, etc.)
- Determining traffic direction to monitor specific ingress or egress use cases.
- Monitoring what IP version (IPv4 or IPv6) is being used, or auditing for Elastic Fabric Adaptor usage.
- Identifying “top talkers” by aggregating packets or bytes transferred during a flow.
For cybersecurity use cases, Flow Logs are some of the most important logs you can collect in an AWS environment, despite the inherent limitations of virtualization, as well as the types and sources of traffic that are not included. You would be hard-pressed to find an easier to retrieve and parse agentless log source for network data in your AWS VPCs.
Use Case: Network Troubleshooting
VPC Flow Logs are typically used for network troubleshooting, such as tracing traffic dropped due to AWS Security Group rules. This is easy to do by analyzing the direction, source, and destination addresses and ports. Outside of Security Group related troubleshooting, you can analyze packets and bytes to generate a “top talker” list. This analysis can be useful to determine which instances are the target of higher volumes of traffic – either adversarial or as part of a business evaluation. These are important metrics and use cases to develop for baselining other time-series and anomaly-based evaluations. Additionally, for threat hunters, understanding the standard baseline is important to distinguish “business-as-usual” versus novel tradecraft.
VPC Flow Logs can also be used to troubleshoot automated network changes. While traffic constructs like VPC Route Tables and Transit Gateway are not dynamic by nature, their creation and modification can be orchestrated using DevOps automation tools or custom workflows (such as using AWS Lambda). Additionally, certain virtualized network devices can further modify VPC networking resources and reach across availability zones (AZs) using ICMP. While automated network operations can be useful to maintain high availability of services, they can also create – temporary or not – traffic ‘blackholes’ in your VPCs. VPC Flow Logs can allow you to see where something may have been pointed at the wrong object or simply had nowhere to go.
Use Case: Threat Hunting
Past these basic use cases, some threat hunting can be done using Flow Logs, as well. This can be finding accepted outbound traffic from application-related instances that are on file-transfer related ports or above-average volumes of data being transferred. This can use cyber threat intelligence to either find the specific instances of tradecraft, such as when an actor uses a specific port or known destination and chunks the data out, or when the Indicators of Compromise (IOCs) for the destination IP address is known.
Use Case: Incident Response
This use case can also be used for post-blast assessments or as part of incident response playbooks where the attacker is known to have gained initial entry but has not yet worked their way down the full kill chain and needs to be contained. Flow Logs can paint a picture of network-borne attacks: when they were initiated and indicate lateral movement throughout an environment. However, in the cloud, this would need to be partnered with AWS CloudTrail logs, as identity is a much more prevalent lateral movement vector.
A more simplistic investigation loop would be interdicting network floods, which can be the byproduct of a malicious Distributed Denial of Service (DDOS) campaign or some other malfunctioning bit of software that can impact availability. Looking for multiple, persistent source addresses or large amounts of traffic where other “routine” traffic in your Flow Logs is not present can be evidence of a network flood style DDOS event.
Use Case: Machine Learning Augmentation
Additionally, more advanced use cases using big data statistics and even machine learning algorithms can be applied to Flow Logs. An easy jumping off point is to use Z-Scoring which measures how many standard deviations a singular data point is from within the mean of a distribution. In Flow Logs, this would rely on the usage of your own capture periods such as hourly aggregations and a specific target. This can be bytes or packets transferred, the amount of a specific direction of traffic to a collection of specific hosts or ENIs or IP addresses, or the amount of time ports are reached.
Z-Scoring and other baselining also lays the groundwork for machine learning use cases, especially well suited for time-series algorithms that can derive anomalies, identify trends such as seasonality or outliers (which may or may not be anomalies in their own right). Amazon has first-party models within Amazon SageMaker, such as Random Cut Forests (RCF), that are well suited for this analysis and you can go as far as to forecast usage and other trends. Combined with threat models for the downstream applications, actionable cyber threat intelligence, and other supporting telemetry – machine-learning can rapidly help close the OODA (Observe, Orient, Decide, Act) Loop for your analysts and detection engineers alike.
Use Case: Compliance
Lastly, just by having and deriving value from Flow Logs – even just basic metrics – can help meet muster for regulatory compliance as well as industry best practice and compliance frameworks. For organizations that process payment data or otherwise store or process Cardholder Data (CHD), they’re in scope for the Payment Cards Industry Data Security Standard (PCI-DSS) which has requirements for logging components, similarly CIS Controls V8, ISO 27001:2022 Annex A and several other frameworks directly (or indirectly) require logging. The important piece to remember is that turning it on is the easy part, deriving actionable and provable security value for it is much more important.
Amazon VPC Flow Logs capture IP traffic flows and provide essential information for monitoring, analysis, and troubleshooting. This article covers where Flow Logs can be stored and the limitations of the logs, including the types of traffic that cannot be captured – meaning other data sources are still necessary. When it comes to deriving value from different sources of telemetry – be it from your MDM, Active Directory, or VPC Flow Logs – it can be a heavy burden on analysts and engineers alike to marshal, manage, and efficiently query and analyze all related data.
Using Query, a managed federated search platform, analysts can simply connect their downstream data sources – be it a data lake on S3, a SIEM, or direct APIs of security tools and query the logs using a natural language like interface. All data is normalized into a fork of the Open Cybersecurity Format (OCSF) for efficient analysis and pivoting.