Zeek & Corelight - Encrypted Traffic Collection

Zeek, and Corelight sensors specifically, divide the process of handling and analyzing data into four distinct areas, as illustrated in the diagram:

Inputs
Packages
Enrichments
Outputs

Much of the advanced analysis and detection in Zeek occurs through packages. These packages are collections of targeted detections, inferences, and data transformations designed to provide deeper visibility into adversary activities. The main categories of collections in Corelight NDR include:

Core
Encrypted Traffic (ETC)
C2 (Command and Control)
Entity
Analyzers
ICS/OT (Industrial Control Systems/Operational Technology)
Custom

Below is a high-level overview of the Encrypted Traffic Collection packages and the detections, inferences, data transformations, and analysis they provide.

Encrypted Traffic Collection (ETC)

An increasing amount of traffic within our networks is encrypted, and attackers are using encryption to blend with hide their connections amongst regular network traffic. Without full encryption interception, encrypted traffic cannot be directly examined. However, many useful insights can be gained from the unencrypted portions of these connections using Zeek’s data capture and analysis. Richard Bejtlich made an interesting analogy to this type of analysis, comparing it to examining a wrapped gift. By analyzing aspects of the wrapped gift, you can make educated guesses about its contents, its sender, and often predict what it is with a high degree of accuracy, and hence provide some indications that it isn't good.

Encryption Analysis - Certificate Hygiene

Since certificates are typically transmitted in plain text, their data can be captured and analyzed. The 'Cert Hygiene' package is designed to identify poor certificate practices that could lead to exploitation, as well as patterns in certificate attributes that may indicate malware activity. Similar to how IT professionals set up secure communications for legitimate systems, malware authors often follow certain certificate management patterns to quickly establish operational malware and covert connections.

Certificate hygiene analysis is only effective if your environment practices strong certificate management and avoids the use of self-signed, outdated, or expired certificates in regular system operations.

Some of the data analysis of network certificates include:

Self-signed certificate usage: Systems running on self-signed certificates are commonly using default configurations. Self-signed certificates are the easiest to setup and don't cost anything. These certificates lack a verifiable connection to your organization, so only someone within your organization can confirm their authenticity.
Soon to expire certificates: Certificates that are close to expiration can indicate neglect or oversight in certificate management.
Weak encryption: Older encryption protocols, such as SSL v3 or TLS 1.0/TLS 1.1, are more vulnerable to attacks.
New certificates: Hackers who use legitimate encryption often implement it shortly before or soon after compromising a system.

The following examples from the Zeek documentation on the ssl.log https://docs.zeek.org/en/master/logs/ssl.html and its conenction to the x509.log ( https://docs.zeek.org/en/master/logs/x509.html )demonstrates how certificate information is captured from network traffic.

Note: Despite common web encryption standards evolving and being renamed a long time ago from SSL to TLS 'Transport Layer Security', encrypted connections are often still referred to as SSL. Zeek continues to use the term ssl.log as the log location for data collected related to TLS sessions.

With certificate values come additional opportunities for threat hunting.

Hackers tend to be cost-conscious and are unlikely to pay for certificates. Instead, they'll often opt for free or self-signed certificates. While this isn’t a definitive red flag on its own, it can be an initial clue that prompts further investigation. Detecting a suspicious certificate is about pulling a thread, what starts as a minor anomaly can lead to more evidence of malicious activity, or it may prove harmless. However, the rise of free certificate services based on domain validation like, Let's Encrypt https://letsencrypt.org/ , zeroSSL https://zerossl.com/ , which are widely used by legitimate applications, has reduced the weight of free certificates as a strong indicator of suspicious activity. To strengthen your analysis, consider additional factors such as:
- The age of the suspicious certificate
- Whether the certificate is commonly used by other devices or software in your environment.
Default certificates in malware and Command & Control (C2) communications: Many threat actor tools come with default certificate settings. While advanced attackers often customize their certificates to blend in with regular traffic, many rely on defaults due to the time and complexity involved in changing them. For example, this Indicator of Compromise (IOC) from the Posh C2 framework (https://labs.nettitude.com/blog/detecting-poshc2-indicators-of-compromise/ ) highlights default settings that are deployed when this C2 software and self-signed certificate.

Detecting PoshC2 – Indicators of Compromise' - LRQA Netittude Lab

Certificate Chain FUID Analysis: FUID stands for 'File Unique Identifier.' Certificates follow a chain of trust, where the web server certificate is usually signed by an intermediate Certificate Authority (CA), which in turn is signed by a root CA. FUID values in the zeek ssl.log that allow pivoting to certificate data for the encrypted session include client certificates (client_cert_chain_fuids) and server certificates (cert_chain_fuid).

The diagrams below illustrate two separate certificate chain of trust concept, showing both client and server certificate.

This data can also open up additional threat-hunting opportunities and allow for correlation across different Zeek logs, such as ssl.log, x509.log, and files.log.

The certificate data can be hashed to generate a unique certificate identifier that can be searched and provide some valuable insights into the certificate’s lineage.
Broken chains of trust.
Suspicious or missing information in key certificate data fields such as certificate name, organizational unit, organization and country. Many malware implementations only have CN and it is random which also has high entropy values due to randomness.
- CN - Common Name
- OU = Organizational Unit
- O = Organization
- C = Country

Encryption Analysis - Encrypted DNS

DNS is an older protocol that was not originally designed to support encryption. The initial standard, RFC 882 ( https://datatracker.ietf.org/doc/html/rfc882 ), was published in 1983. In recent years, encryption standards for DNS have been introduced, such as DNS over TLS (DoT) in RFC 7858 and DNS Queries over HTTPS (DoH) in RFC 8484.

DNS over TLS (DoT) runs the binary DNS protocol over a TCP socket, typically using port 853. However, it faced some implementation challenges, including firewalls blocking the uncommon port and browser compatibility issues. DNS over HTTPS (DoH) became more popular because it encapsulates DNS requests within HTTPS traffic, allowing it to use the standard HTTPS port (TCP 443), bypassing some of those hurdles.

The diagrams below, based on CISA’s guidance from 'Adopting Encrypted DNS in Enterprise Environments,' illustrate the differences between traditional DNS and DNS over HTTPS implementations. Highlighted in the diagrams are possible traditional DNS traffic visibility points for analysis (Note: There are pros and cons to analyzing DNS traffic with NDR before and after the DNS resolver that are beyond the scope of this article), and the gaps to network visibility that occur with DoH implementations.

(https://media.defense.gov/2021/Jan/14/2002564889/-1/-1/0/CSI_ADOPTING_ENCRYPTED_DNS_U_OO_102904_21.PDF )

One major challenge with encrypted DNS is that many traditional cybersecurity detections rely on visibility into DNS traffic. These detections help filter domains based on known malicious domains, restricted content categories, reputation, and other advanced analyses. With DNS encryption, much of this visibility is lost. The diagram below displays how DoH prevents the ability to do this type of analysis.

(https://media.defense.gov/2021/Jan/14/2002564889/-1/-1/0/CSI_ADOPTING_ENCRYPTED_DNS_U_OO_102904_21.PDF )

Zeek’s DoH detection provides insights into DNS over HTTPS (DoH) traffic through its ssl.log and a new DoH.log. While it doesn't decrypt the traffic, it alerts you to connections using encrypted DNS. Organizations can then decide which encrypted DNS services are approved and deploy mitigation strategies, such as setting up their own encrypted DNS resolvers which can then convert the DNS queries to non-DoH queries which can be monitored for analysis. By configuring internal systems to use controlled encrypted DNS resolvers rather than public ones, they can maintain greater visibility into their organizations DNS traffic.

Encryption Analysis - Encryption Detections

Zeek packages can provide valuable insights into how encryption is used in network communications, helping identify both normal and suspicious encryption activities. Some common network behaviors related to encryption that may indicate evasion tactics include:

Non-SSL traffic on standard encryption ports (e.g., port 443)
Incomplete or improper encryption negotiation (e.g., no complete handshake)
SSL traffic on non-standard ports

Non-SSL traffic on standard encryption ports

Attackers may avoid encryption but still use standard encryption ports like 443, knowing that this port is typically allowed through network defenses. The query below, using standard CrowdStrike logging, identifies non-SSL traffic on port 443 that establishes a complete connection (as indicated by != S0). The results are grouped by the source of the traffic and sorted for easy analysis.

path #ssl | service != ssl | id.resp_p=443 | conn_state != S0 |   groupBy(id.orig_h) | sort ()

Another common variation of this tactic is detecting internal traffic connected to the Internet that uses SSH over SSL. This technique can be used to tunnel an SSH session from an external source into an internal environment. While attackers may leverage it for remote access, vendors also frequently use this method to securely manage, patch, and troubleshoot their systems deployed behind organizational firewalls. This usage of port 22 over 443 has been observed in Ransomware-as-a-Service (RaaS) operations where the Threat Actors installed OpenSSH on a compromised Windows system and tunneled the ssh connection out of the network over port 443 ( https://www.microsoft.com/en-us/security/blog/2022/10/18/defenders-beware-a-case-for-post-ransomware-investigations/ )

Below is a CrowdStrike query to identify this type of traffic, along with a diagram illustrating remote vendor support via shell access.

path #ssl | service = ssh | id.resp_p=443 | conn_state != S0 |   groupBy(id.orig_h) | sort ()

An additional traffic pattern sometimes seen with malicious C2 is the encrypted traffic will use SSL/TLS but use a pre-shared key as opposed to a standard x.509 certificate that you would see in a normal web client to server TLS session.

Encryption without proper negotiation

Corelight's encrypted traffic collection includes an additional log called the etc_viz log, which stands for 'encrypted traffic collection visibility.' This log provides details on how encryption is negotiated between a client and server. Historically, some malware has used custom encryption. If malware controls both ends of secure communication, it doesn't need to follow standard encryption protocols.

TLS operates over TCP, following the standard client-server three-way handshake:

Client → SYN → Server

Client ← SYN/ACK ← Server

Client → ACK → Server

Next comes the TLS handshake:

Client → Client Hello → Server

Client ← Server Hello + Certificate ← Server

Client → Client Key Exchange → Server

Client → Client Finished → Server

Client ← Server Finished ← Server

Client ← Encrypted Data → Server

In a typical encrypted session, the 4th packet is always the 'Client Hello.' However, some malware deviates from this, dropping a payload first before initiating encryption.

If the Zeek protocol analyzer detects custom encryption, it will flag it with an 'E!/e!' entry in the etc_viz.log file. While custom encryption is still highly suspicious, it can also be found in older technologies like VOIP gateways and apps like Telegram, which use custom encryption protocol MTProto.

Below is a CrowdStrike query that identifies traffic in the etc_viz log file along with the corresponding visibility status string found in the viz_stat field which can be searched for the suspicious encryption flag.

path #etc_viz | groupBy([viz_stat]) | sort(count)

Encryption on other ports

The most common ports for HTTPS are 443 and 8443, but you may also encounter other ports like 4443, 2053, 2083, 2087, and 2096. Threat actors sometimes use non-standard encryption ports to reduce scrutiny on their encrypted communications. One notable feature of Zeek parsing is its ability to extract certificate data from encrypted connections, regardless of the port used.

Below is a CrowdStrike query that identifies SSL traffic and groups it by port number, helping to detect SSL connections on non-standard SSL ports.

path #ssl | groupBy("id.resp_p") | sort()

Encryption Analysis - RDP Inferences

RDP (Remote Desktop Protocol) is frequently used by both threat actors and network administrators, making it challenging to differentiate between legitimate and malicious RDP activity. While Windows logs can track RDP connections, they often lack sufficient detail. Zeek RDP logs, however, can provide valuable additional insights for RDP monitoring and analysis.

SOCs should be alert to common red flags, such as RDP connections from external to internal systems. Beyond basic connectivity analysis, Corelight has developed an RDP inference package that flags suspicious attributes based on the RDP encryption setup and data transmission patterns, enhancing detection of potentially malicious activity.

RDP is an extension of the T-120 family of protocol standards and supports multichannel capabilities, allowing separate virtual channels for transmitting presentation data, serial device communication, and other inputs like keyboard and mouse activity. The primary function of RDP is to transmit display output from the remote server to the client while relaying keyboard and mouse inputs from the client back to the remote server. Given the sensitivity of this connection, RDP is encrypted. However, despite this encryption, certain inferences can still be made, such as the RDP authentication method (e.g., Kerberos ticket), the success of the authentication process, and insights into how specific attack tools like Metasploit, rdpscan, and SharpRDP behave after establishing an RDP connection. Additionally, RDP hash fingerprinting can provide further context about the connection.

Encryption Analysis - SSH Inferences

The Secure Shell (SSH) protocol enables secure command transmission over an unsecured network using cryptography to authenticate and encrypt connections between devices. SSH is commonly used for remote server management, infrastructure control, and file transfers. Like RDP inferences, SSH inferences analyze specific traffic patterns between clients and servers to gain insights about the activity. For example, it can determine whether authentication was certificate-based or password-based, whether there were multiple failed authentication attempts (indicating possible brute-force attacks), whether keystrokes were entered, or if files were transferred (suggesting secure copy). Additionally, it can assess the size of file transfers. The diagram below from Corelight’s 'Introducing the Corelight SSH Inference Package' blog illustrates these concepts.

https://corelight.com/blog/corelight-ssh-inference-package

List of possible inference codes related to the ssh session, added to the ssh.log file for the session.

Code	Name	Description
ABP	Client Authentication Bypass	A client wasn’t adhering to expectations of SSH either through server exploit or by the client and server switching to a protocol other than SSH after encryption begins.
BF	Client Brute Force Guessing	A client made a number of authentication attempts that exceeded some configured, per-connection threshold.
BFS	Client Brute Force Success	A client made a number of authentication attempts that exceeded some configured, per-connection threshold.
SFD	Small Client File Download	A file transfer occurred in which the server sent a sequence of bytes to the client.
LFD	Large Client File Download	A file transfer occurred in which the server sent a sequence of bytes to the client. Large files are identified dynamically based on trains of MTU-sized packets.
SFU	Small Client File Upload	A file transfer occurred in which the client sent a sequence of bytes to the server.
LFU	Large Client File Upload	A file transfer occurred in which the client sent a sequence of bytes to the server. Large files are identified dynamically based on trains of MTU-sized packets.
KS	Keystrokes	An interactive session occurred in which the client set user-driven keystrokes to the server.
SC	Capabilities Scanning	A client exchanged capabilities with the server and then disconnected.
SP	Other Scanning	A client and server didn’t exchange encrypted packets, but the client wasn’t a version or capabilities scanner.
SV	Version Scanning	A client exchanged version strings with the server and then disconnected.
SA	Authentication Scanning	The client scanned authentication methods with the server and then disconnected.
APWA	Automated Password Authentication	The client authenticated with an automated password tool (like sshpass).
IPWA	Interactive Password Authentication	The client interactively typed their password to authenticate.
PKA	Public Key Authentication	The client automatically authenticated using pubkey authentication.
NA	None Authentication	The client successfully authenticated using the None method.
MFA	Multifactor authentication	The server required a second form of authentication (a code) after a password or public key was accepted, and the client successfully provided it.
UA	Unknown authentication	The authentication method is not determined or is unknown.
AUTO	Automated interaction	The client is a script or automated utility and not driven by a user.
BAN	Server Banner	The server sent the client a pre-authentication banner, likely for legal reasons.
CTS	Client trusted server	The client already has an entry in its known_hosts file for this server.
CUS	Client untrusted server	The client did not have an entry in its known_hosts file for this server.
RSP	Reverse SSH Provisioned	The client connected with a -R flag, which provisions the ports to be used for a Reverse Session set up at any future time.

Encryption Analysis - Stepping Stones

SSH connections often involve a host connecting to another host, and then from there, connecting to additional systems. This is a common practice in segmented networks, where administrators use a jump host to access systems that only allow connections from that specific host. However, threat actors also leverage this technique to move laterally through a network in search of vulnerable systems to exploit.

For example, a recent hacking tool SSH Snake (https://github.com/MegaManSec/SSH-Snake ), is designed to propagate through networks using SSH. With traditional log and traffic analysis, each connection appears isolated, making it difficult and time-consuming to track the full path across multiple systems.

The 'stepping stones' package, based on research by Vern Paxson (https://www.icir.org/vern/papers/stepping-sec00.pdf ), helps detect these SSH connectivity patterns by identifying related SSH connections and stitching them together in a specialized Zeek log called stepping.log, providing a clearer picture of the full connection.

Encryption Analysis - VPN insights

The use of Zeek's Spicy parser generators has made developing complex C++ protocol parsers much more accessible. With Spicy, numerous parsers have been created for custom VPN protocols such as Wireguard, OpenVPN, IPsec, DTLS, and more. Why is this valuable for network traffic analysis? As VPN usage continues to grow, fueled by free services and lightweight software implementations like SoftEther (https://www.softether.org/ ), it's critical for organizations to monitor VPN activity.

VPNs themselves aren’t inherently bad, but from an organizational standpoint, understanding VPN usage is essential. Key questions include: Which systems are using VPNs for connectivity? Which VPN providers are in use, and are they approved? What are the details of the connections, such as time, data transferred, and geolocation? Is the VPN being used for legitimate remote access, or to bypass filtering?

This type of analysis can help detect malicious VPN usage. For instance, the popular command and control (C2) framework Sliver uses Wireguard as one of its primary channels for communication.