Network Detection Response (NDR) - Web Traffic Analysis Part 2

brencronin
Oct 23, 2024
15 min read

Updated: Nov 10, 2024

Overview of major security tools related to web traffic protection

Protection from Inside to Outside

Traditional web protection largely focuses on preventing users and systems from being exposed to malicious sites, whether intentionally or accidentally, where they could be exploited. While this isn't the full scope of web security, it plays a significant role in safeguarding against threats.

There are several traditional cyber tools related to web traffic, such as web proxies and Next-Generation Firewalls (NGFWs). Web proxies, typically positioned at an organization’s network boundary, function as intentional Man-in-the-Middle systems, intercepting users' web requests. This allows proxies to inspect URLs and categorize websites (e.g., business, sports, news). Organizations can then enforce policies on which sites users were permitted to access, which also enhanced cybersecurity. Most drive-by downloads and watering hole malware infections come from non-business categories (e.g., pornography, gaming, free software downloads), so restricting access to these nefarious category sites greatly reduces risk of user malware infections from drive-by downloads. Many of these web proxy tools now also offer the capability to perform more advanced categorization of web URLs track based on other factors such as Cyber Threat Intelligence (CTI), domain age, etc.

Many traditional firewalls added some web protections, like blocking known malicious URLs and using signature-based detection to prevent web exploits against web servers. Most Next Generation Firewalls (NGFW)'s, expanded their feature sets to include web proxy functions, enabling URL categorization and policy enforcement based on site categories. Additionally, it is common for advanced features integrated into NGFWs and standalone tooling to provide other cyber capabilities such as URL reputation checks and sandboxing to analyze the behavior of websites, which can include the analysis of web-based JavaScript execution and redirects.

Now that we've covered traditional cyber tools like web proxies and NGFWs for web, it's clear what they excel at when properly configured:

Blocking site access based on categorization
Blocking site access using URL signatures
Blocking malicious web requests and responses
Controlling blocking the execution of active site code (e.g., javascript)

While these capabilities are crucial for prevention, they can fall short when it comes to the advanced analysis required to detect more sophisticated web threats, especially when the URL isn't categorized as malicious. As a note many NGFWs are continually evolving to keep pace with the cyber arms race, adding advanced features for deeper analysis of web and other types of traffic.

Protection from Outside to Inside

Most organizations have systems that require an Internet presence (e.g., web servers) or need to maintain connectivity to the Internet. These systems must remain connected while being protected from malicious activity. This is where traditional tools like firewalls and Web Application Firewalls (WAFs) play a crucial role. While firewalls block unauthorized network access, WAFs specifically filter and monitor traffic to prevent harmful inputs, keeping your web servers secure.

Web proxies, Next-Generation Firewalls (NGFWs), and Web Application Firewalls (WAFs) are crucial components of an organization's security stack. However, they don't catch or prevent every type of attack. In the next section, we'll explore common attack methods and how advanced network data analysis, specifically using Zeek data, can help fill in detection gaps where other tools fall short.

Advanced Web Traffic Analysis using Zeek

Command & Control Communications

A significant aspect of Command & Control (C&C) communications analysis involves a thorough examination of web data, particularly the request headers. These headers provide valuable information to the web server about the host accessing the page and are formatted as key-value pairs. Certain fields within these headers can serve as indicators of potentially malicious activity.

Analyzing Request Headers

Here are some critical web request header fields to consider:

Host: Indicates the domain that was contacted.
User-Agent: Describes the browser or client application being used.
Referrer: Shows the URL of the page the user was on before the current request.
Accept: Lists the types of content and formats that the client browser will accept, with the first item typically being the preferred format.
X-Forwarded-For: Reveals the true source of the client if the connection is made through a proxy.

By analyzing these headers, security analysts can better understand the nature of the traffic and identify potential indicators of malicious transactions. A very common analysis technique is to look at the user agent string. The User-Agent string is a line of text sent by a web browser or client to a web server as part of a web request. It identifies the client software, including details about the browser type, version, operating system, and device being used. The diagram below displays an example standard User-Agent value within a request header.

Request Header Analyss - Example User Agent Strings

A common analysis technique is to examine user-agent strings in the environment and identify anything unusual. However, with the rise of BYOD (Bring Your Own Device) and a wide variety of apps using different user-agent strings, along with threat actors often mimicking legitimate ones, this method is becoming less reliable. While it’s still an extremely useful analysis technique, the process has become more tedious and less effective than it once was.

Despite its limitations, user-agent string data can still be highly effective when analyzed within the context of your specific environment. For instance, two commonly used system administration tools that initiate web connections are PowerShell on Windows and cURL on Linux. This is where context becomes critical. Is it normal for certain systems in your environment to make web connections using these tools? And if so, is it typical for them to connect to the Internet? Understanding what is expected in your environment is key to identifying unusual or potentially malicious behavior.

For example, on a system running powershell it is easy enough to download files using the built in powershell cmdlet Invoke-WebRequest.

Invoke-WebRequest -Uri "https://example.com/tesfile.zip" -OutFile "C:\Path\To\Save\testfile.zip"

When powershell makes the web request its user default user agent string will have indicators of powershell.

Mozilla/5.0 (Windows NT; Windows NT 10.0; en-US) WindowsPowerShell/5.1.19041.4894

Adding a visualizations or alert rule query to look for powershell in user agent string fields, and filtering out destinations (examples, amazonaws, etc) or systems that normally do this can be a very powerful threat hunting and detection technique.

user_agent.original: *Powershell*

Another example, on systems running cURL, which many Linux systems do.

curl -o /var/tmp/nothingtosee.zip https://example.com/tesfile.zip

When cURL makes the web request its user default user agent string will have indicators of cURL.

curl/8.#.#

Adding a visualizations or alert rule query to look for cURL in user agent string fields, and filtering out destinations or systems that normally do this can be a very powerful threat hunting and detection technique.

user_agent.original: *curl*

Another common Linux utility for file downloads is wget which is short for "world wide web get".

wget https://example.com/tesfile.zip

When wget makes the web request its user default user agent string will have indicators of wget or wget/version.

wget

Adding a visualizations or alert rule query to look for wget in user agent string fields, and filtering out destinations or systems that normally do this can be a very powerful threat hunting and detection technique.

user_agent.original: *wget

Both cURL and wget allow attackers to easily change the user agent string, which can indicate defense evasion tactics. This is why knowing the normal patterns in your environment is crucial. For example, if you typically see web requests from Linux systems using the default cURL user agent, any deviation from this should raise a red flag and prompt investigation into why the traffic is using a different user agent string. You can also add to your detection strategy to alert if shell commands are run where someone is trying to change the user agent string value. For example, Curl with hyphen A ;-) switch and wget with -user-agent string switch.

curl -A "user-agent-name-here"
wget --user-agent="user-agent-name-here"

Another suspicious network traffic pattern related to user agent string analysis is internal service with an uncommon HTTP port and interesting user agents and matches them with interesting mime types.

(event.dataset:"http" AND (http.response.mime_types:("application/java-archive" OR "application/mshelp" OR "application/chrome-ext" OR "application/x-object" OR "application/x-executable" OR "application/x-sharedlib" OR "application/-mach-o-executable" OR "application/x-dosexec" OR "application/x-java-applet" OR "application/x-java-jnlp-file" OR "text/x-php" OR "text/x-perl" OR "text/x-ruby" OR "text/x-python" OR "text/x-awk" OR "text/x-tcl" OR "text/x-lua" OR "text/x-msdos-batch") AND user_agent.original:(*certutil* OR powershell OR microsoft OR python OR libwww-perl OR go-http OR java OR lua-resty-http OR winhttp OR vb project OR ruby)) AND (NOT (source.port:("80" OR "8000" OR "8080" OR "8888"))) AND (destination.ip_public: true))

When analyzing traffic for suspicious MIME types, it's important to filter out the most common benign types to improve the accuracy of your investigation. Some of the most common and benign MIE types are:

application/x-x509-*
application/ocsp*
image/*
audio/*
video/*
text/*
application/xml
application/chrome-ext

Additional user agent string traffic to look for HTTP traffic with No HTTP Host Set or User Agent Set

(event.dataset:"http" and http.request.header_names:"USER-AGENT" AND ((NOT (http.request._names:"HOST")) OR http.request.header_names:"HOST"))

Another approach is to analyze traffic for abnormal or non-conforming HTTP requests, which could signal the use of a non-standard HTTP implementation.

(event.dataset:"weird" AND weird.name:"bad_HTTP_request")

Other User-Agent Analysis - BITS

Another example of useful user-agent string analysis is examining Microsoft's Background Intelligent Transfer Service (BITS) user agent string usage. BITS is built into Windows and is a system service that facilitates the transfer of files between a client and a server in the background, using idle network bandwidth. It is commonly used by applications like Windows Update, Microsoft Endpoint Configuration Manager, and other system services that need to download or upload large files. Elastic Security labs recently posted a research article on a malware called Bitsloth that uses the BITS service for its Command & Control (https://www.elastic.co/security-labs/bits-and-bytes-analyzing-bitsloth). Although BITS is most commonly used for legitimate system telemetry it is simple for malicious Threat Actors to setup their own BITS server using python ( https://github.com/SafeBreach-Labs/SimpleBITSServer )

user_agent.original: *Microsoft BITS*

Due to the extensive use of BITS for Windows system telemetry, simply searching for BITS user-agent strings isn’t an effective threat-hunting technique. However, analyzing BITS traffic directed to unusual destinations, outside of standard providers like Microsoft, Dell, or major CDNs, can be valuable. For instance, the Elastic Security Bitsloth report identified an IP IOC engaging in BITS communication with a compromised system, clearly indicating that this traffic wasn’t legitimate Microsoft activity.

Other User-Agent Analysis - Axios

A suspicious user-agent string frequently seen in Man-in-the-Middle attacks, as noted in this report (https://fieldeffect.com/blog/field-effect-discovers-m365-adversary-in-the-middle-campaign), is 'Axios.' Axios is a promise-based HTTP client used by developers to fetch data from their own or third-party servers. While its user-agent string is customizable (https://www.zenrows.com/blog/axios-user-agent#useragent-in-axios), it’s often left unchanged.

Consider the typical identity chain compromise, where attackers compromise one legitimate user, send phishing emails to their contacts, and continue compromising additional accounts. With large-scale phishing operations, some attempts are blocked, and accounts are reset. To quickly log in to compromised accounts and escalate access before detection, attackers rely on automation.

user_agent.original: *Axios*

Analyzing Response Headers

Other data useful for analysis is response headers. Response headers provide valuable data points that can aid in analysis and help detect malicious activity. Some key fields in the response header include:

Server: Identifies the software running on the server that generated the response.
Date: Indicates when the response was generated. After this date, the response may be considered stale.
Content-Type: Specifies the media type (MIME type) of the resource being returned, informing the client (typically a web browser) how to process or display the data. Examples include:
- text/html: For HTML documents (web pages).
- application/json: For JSON data.
- text/css: For CSS stylesheets.
- image/png: For PNG images.
- application/pdf: For PDF files.
Content-Length: Specifies the size of the response body in bytes, telling the client how much data to expect, which is crucial for handling large files or streams.
Last-Modified: Indicates the date and time the resource was last updated on the server. This helps with caching by allowing the client to check if the content has changed since the last request.
Set-Cookie: Allows the server to send cookies to the client, which are then stored and sent back in subsequent requests. Cookies are used for session management, user tracking, and other data handling.

The MIME type can be analyzed to identify suspicious file transfers over the web. Below are examples of some elastic queries of zeek data that can alert on suspicious traffic patterns.

Multiple compressed files transferred over HTTP can be indicative of attackers exfiltrating data via compressed files.

(event.dataset:"http" AND (http.request.method:("POST" OR "PUT") AND file.mime_type:("application/vnd.ms-cab-compressed" OR "application/warc" OR "application/x-7z-compressed" OR "application/x-ace" OR "application/x-arc" OR "application/x-archive" OR "application/x-arj" OR "application/x-compress" OR "application/x-cpio" OR "application/x-dmg" OR "application/x-eet" OR "application/x-gzip" OR "application/x-lha" OR "application/x-lrzip" OR "application/x-lz4" OR "application/x-lzma" OR "application/x-lzh" OR "application/x-lzip" OR "application/x-rar" OR "application/x-rpm" OR "application/x-stuffit" OR "application/x-tar" OR "application/x-xz" OR "application/x-zoo" OR "application/zip")) AND (NOT (http.request.referrer:*)))

Analyzing multiple compressed files transferred outbound is similar to inspecting compressed files over HTTP, but also includes outbound files tracked in the Zeek files log, which logs all files seen on the network, not just those transferred via HTTP. To ensure the rule's effectiveness, verify that your Zeek or Corelight device is properly configured with accurate local_orig and local_resp variables aligned to your organization’s subnets.

(event.dataset:"files" AND (NOT (file.size:"0")) AND file.mime_type:("application/vnd.ms-cab-compressed" OR "application/warc" OR "application/x-7z-compressed" OR "application/x-ace" OR "application/x-arc" OR "application/x-archive" OR "application/x-arj" OR "application/x-compress" OR "application/x-cpio" OR "application/x-dmg" OR "application/x-eet" OR "application/x-gzip" OR "application/x-lha" OR "application/x-lrzip" OR "application/x-lz4" OR "application/x-lzma" OR "application/x-lzh" OR "application/x-lzip" OR "application/x-rar" OR "application/x-rpm" OR "application/x-stuffit" OR "application/x-tar" OR "application/x-xz" OR "application/x-zoo" OR "application/zip"))

Other examples of suspicious behavior include clients transferring large amounts of data over HTTP. The number of bytes transferred in each session is recorded in Zeek logs and can be analyzed for further insights.

(event.dataset:"http" AND http.response.body.bytes:* AND http.response.body.bytes >10000000)

When analyzing data transfer anomalies, you're more likely to identify irregularities in larger volumes of data. To facilitate this analysis, it's essential to convert byte outputs into more manageable quantities.

To convert bytes to megabytes for bytes sent and received, use the following formulas:

megabytes = (orig_bytes / 1024)/1024

megabytes = (resp_bytes / 1024)/1024

Another useful analysis for all traffic in and out of your network, not just web protocol-based traffic, is connections to strange country codes by examining the resp_cc (responding Country Code) field in zeek data.

Standard internet traffic typically involves a user entering or clicking a URI, which triggers a DNS lookup to find the corresponding IP address for routing. However, some traffic bypasses this step and goes directly to an IP address, known as using 'naked IP addresses' because no DNS resolution is involved. Some web proxies or next-generation firewalls (NGFWs) may not be configured to properly analyze this type of traffic, allowing it to exit the network unchecked. Attackers exploit this by using naked IP addresses as a fallback method when they can't establish connectivity through standard URIs.

event.dataset:http AND ((destination.domain_ends_with_integer: true AND destination.domain_has_dot: true ) OR (domain_has_colon: true AND destination.domain_has_dot: false)) AND url.extension:(apm OR app OR appref\\-ms OR bas OR bat OR chi OR chm OR chq OR chw OR dll OR exe OR gadget OR hta OR inf OR jar OR jnlp OR jse OR lnk OR mde OR mht OR msi OR msix OR msixbundle OR pif OR pkg OR pl OR ps1 OR ps1xml OR ps2 OR ps2xml OR psc1 OR psc2 OR psd1 OR psd1 OR psdm1 OR psm1 OR py OR pyc OR pyo OR pyw OR pyz OR reg OR scr OR sct OR vbe OR vbs OR ws OR wsb OR wsc OR wsf OR xpi OR xz OR z OR zip OR zipx)

Other Command & Control (C2) Detections

Tools like Corelight provide advanced analysis of Zeek log data, enabling deeper inspection of traffic to detect potential Command and Control (C2) activity. For example, Cobalt Strike, a widely used C2 framework, allows operators to configure communication profiles for interacting with C2 servers. Advanced frameworks also offer 'malleable profiles' that randomize communication between infected systems and C2 servers, making detection more challenging.

However, many operators lack a deep understanding of C2 profile configurations, often relying on pre-built profiles. These default settings become valuable detection points, as many users fail to customize them. For instance, the pre-built jQuery profile found https://raw.githubusercontent.com/threatexpress/malleable-c2/master/jquery-c2.3.11.profile

contains unique values that can be leveraged for detection.

As highlighted in the DFIR Report’s article, https://thedfirreport.com/2022/01/24/cobalt-strike-a-defenders-guide-part-2/ , the jquery profile has some values for how the Cobalt Strike Beacon will communicate back and forth with the beacons Command & Control server. , the jQuery profile reveals specific settings for how Cobalt Strike’s Beacon communicates with its C2 server, offering critical insight for defenders. Corelight analyzes traffic patterns for numerous C2 profiles and raises a notice log if one is detected. An alert rule query can be built like below.

event.dataset: notice and notice.note : "HTTP_C2::C2_Traffic_Observed"

One thing to note is that not all C2 uses only HTTP, many use other protocols like HTTPS, mTLS, wireguard, and DNS so it is important to not limit detection rules to only HTTP C2 notes as shown by the wildcard alert rule below.

 event.dataset : notice and (notice.message : *C2* or notice.note : *C2*)

Installing malware or unwanted software & Users redirected to Man-in-the-Middle (MitM) for session and credential stealing

Zeek, being an analysis-based system rather than an inline traffic analyzer, functions primarily as a reactive or alerting tool for malware downloads and man-in-the-middle (MitM) attacks. The main defense against these threats typically comes from firewalls and EDR/XDR solutions having the malicious redirect IP and/or URL in their threat intelligence. However, Zeek's data analysis provides valuable insight for investigating incidents and can alert on suspicious activities that might be overlooked by firewalls or EDR/XDR due to deployment gaps or the attacker’s ability to evade detection.

A well-known threat group, SocGholish, is notorious for tricking users with fake update prompts, leading them to download malicious .zip or .js files. When users click these files, their machines get infected. A key preventive measure is blocking the execution of .js files, whether downloaded or unzipped, and allowing them to open only in a text editor. This helps prevent such infections from occurring.

Analyzing MIME content types can provide valuable visibility into suspicious file downloads. For instance, if an executable file is downloaded but the MIME type doesn't match the file extension, it could indicate an attempt to disguise the true nature of the file. This mismatch is often used to hide malicious downloads.

(event.dataset:"http" AND http.resp_mime_types:("application/java-archive" OR "application/mshelp" OR "application/chrome-ext" OR "application/x-object" OR "application/x-executable" OR "application/x-dosexec" OR "application/x-msdownload" OR "application/vnd.microsoft.portable-executable ") AND (NOT (url.original:(*.exe OR .dll OR .msi)))

Attackers often use LNK files, which are Windows shortcut files, to execute malicious code. However, it's important to note that LNK files are rarely downloaded or shared over the internet. Therefore, detecting the transfer of LNK files online can serve as a valuable indicator of suspicious activity.

(event.dataset:"http" AND (http.request.method.text:"GET" AND url.extension:(lnk OR LNK OR inf OR INF)) AND (NOT (http.request.referrer:*)))

Remote Access through web shells

The first step in combating web shells is prevention. Since deploying a web shell typically involves adding or modifying code on the web server, often through exploiting a vulnerability, it's crucial to regularly patch all components associated with the server. This practice significantly mitigates the risk of commonly known vulnerabilities.

Beyond prevention, effective detection is essential. Common data for detecting web shells displayed in the diagram below.

Host based
- Host-based file artifacts: Web shells run code on the exploited server, often blending in with legitimate web server code, making detection challenging. However, tools that analyze server code using static analysis, entropy measurements, pattern detection, and heuristic analysis can help identify web shell-like behavior. Splunk offers a free tool called 'ShellSweep,' which automates this process for detecting web shells: https://www.splunk.com/en_us/blog/security/shellsweepplus-web-shell-detection-tool.html
- Host based network-based artifacts: Web server log files, such as Apache's access.log, record details about incoming requests, including the date and time, source IP address, requested resources, and the user agent string
Network based:
- NGFW and WAF
- Analyzing network traffic using Zeek

The diagram underscores the importance of these detection strategies. Hackers often hide web shells in unusual locations that typical web traffic doesn’t touch. In the example below, the web shell is embedded within image files, allowing attackers to bypass standard server-side web shell detection, which may not inspect the code within image files.

Some techniques for detecting web shells through monitoring web server network traffic flows for anomalous behaviors include:

Suspicious traffic patterns where a web server resource URI is newly accessed and primarily accessed by a limited number of remote systems. This could indicate a new web shell was implemented at that URI.
Hidden Page Access: Normal web traffic usually targets a limited set of common pages, whereas web shells may direct requests to hidden pages, often appearing as web requests without a referring page.

Below are some example rules using Zeek HTTP data using elastic ECS that could help detect this activity. Rule reference: Corelight https://github.com/corelight/Elasticsearch_rules/blob/main/Elastic%20SIEM%20Rules/Elastic_Corelight_rules.ndjson

The below rule looks for HTTP PUT or POST requests to unusual extensions

(event.dataset:"http" AND (url.original:(*.jpg OR .jpeg OR .gif OR .png OR .icon OR .ico OR .xml OR .swf OR .svg OR .ppt OR .pttx OR .doc OR .docx OR .rtf OR .pdf OR .tif OR .zip OR .mov) AND http.request.method.text:("POST" OR "PUT") AND http.response.status_code:2) AND (NOT ((http.response.body.bytes:"0") )))

The below rule looks for HTTP PUT or POST requests to single locations from less than a few IPs over a set time period.

(event.dataset:"http" AND (url.original:(*.aspx OR .asp OR .php OR .jsp OR .jspx OR .war OR .ashx OR .asmx OR .ascx OR .asx OR .cshtml OR .cfm OR .cfc OR .cfml OR .wss OR .do OR .action OR .pl OR .plx OR .pm OR .xs OR .t OR .pod OR .php-s OR .pht OR .phar OR .phps OR .php7 OR .php5 OR .php4 OR .php3 OR .phtml OR .py OR .rb OR .rhtml OR .cgi OR .dll OR .ayws OR .cgi OR .erb OR .rjs OR .hta OR .htc OR .cs OR .kt OR .lua OR .vbhtml) AND http.request.method.text:("POST" OR "PUT")) AND (NOT (http.response.status_code:4*))))

A few other techniques for detecting web shells through monitoring web server network traffic flows for anomalous behaviors include:

Unexpected Network Flows: If a compromised web server is used to proxy requests into the internal network, it may initiate web requests to internal nodes.
Anomalous Responses: A non-web server node, such as a network device, suddenly responding to web requests from outside the network.

The concept shown in this diagram is that web shell code may be implemented on non-web server devices within the targets network to facilitate remote access either directly or proxy via a system that has external access.

The example below analyzes SMB logs for file paths accessed that are web related. Adversaries may place a web shell on a file share and execute that web shell by accessing it on an existing website.

(event.dataset:"smb_files" AND file.path:(*inetpub* OR wwwroot) AND (file.name:(*.aspx OR .asp OR .php OR .jsp OR .jspx OR .war OR .ashx OR .asmx OR .ascx OR .asx OR .cshtml OR .cfm OR .cfc OR .cfml OR .wss OR .do OR .action OR .pl OR .plx OR .pm OR .xs OR .t OR .pod OR .php\\-s OR .pht OR .phar OR .phps OR .php7 OR .php5 OR .php4 OR .php3 OR .phtml OR .py OR .rb OR .rhtml OR .cgi OR .dll OR .ayws OR .cgi OR .erb OR .rjs OR .hta OR .htc OR .cs OR .kt OR .lua OR .vbhtml) ORfile.name:/.*[^a-zA-Z0-9\\.\\_\\-][a-zA-Z0-9\\.\\_\\-]{1,3}\\.[A-Za-z0-9]{2,3}$/))