top of page

Log Collection Microsoft Defender, AMA and Microsoft Sentinel SIEM

  • brencronin
  • Mar 25
  • 11 min read

Updated: 4 days ago


Application log collection to Sentinel SIEM in a Microsoft defender environment


Microsoft Defender for Endpoint (MDE) is a highly capable EDR platform that, when deployed to workstations and servers, collects extensive telemetry including process creation, network connections, file activity, registry changes, authentication events, and security logs. Out of the box, it provides deep endpoint visibility, functionally similar to an enhanced, enterprise-grade Sysmon.


In many cases, when Microsoft Defender for Endpoint (MDE) is deployed on a server or workstation, a separate logging agent is unnecessary. MDE already collects the majority of system-level telemetry that traditional logging agents are designed to capture. Deploying both MDE and an additional logging agent to collect the same system logs results in duplicate data ingestion, which can significantly increase SIEM costs without adding meaningful visibility. To optimize efficiency and cost, organizations should rely on MDE for endpoint telemetry and only use additional agents for data sources not natively covered (e.g., application-specific logs).


The corner case for server logging agents


However, a key limitation is its lack of native support for collecting application-specific logs. For example, critical web server logs such as Apache access and error logs (/var/log/httpd/access_log and /var/log/httpd/error_log) are not ingested by MDE. This creates a gap when organizations need to centralize application telemetry in a SIEM like Microsoft Sentinel. The challenge extends beyond Apache to any custom or application-specific logging, with additional complexity depending on whether the workload is running on Linux or Windows.


At a high level, there are two primary approaches for centralized application log collection:


  1. Agent-Based


  1. Syslog Collection


  • Linux: Configure the host to forward application logs via syslog to the SIEM.

  • Windows: Native syslog is not supported, requiring deployment of a log collection agent to ingest and forward application logs via syslog.


Azure Monitor Agent (AMA) with Defender for Servers


The recommended enterprise approach is integrating MDE with Microsoft Defender for Servers (via Microsoft Defender for Cloud), which enables deployment of the Azure Monitor Agent (AMA). AMA is responsible for collecting custom log files.


Using Azure Monitor, you define a Data Collection Rule (DCR) that specifies the log file paths to ingest. For example, a DCR can target:


  • /var/log/httpd/access_log

  • /var/log/httpd/error_log


AMA then forwards these logs to a Log Analytics workspace, making them available in Microsoft Sentinel. The data is typically stored in a custom table (e.g., ApacheLogs_CL), where it can be leveraged for KQL-based threat hunting, analytics rule development, and detections.


Logging via syslog and not via an agent


Many places do not like the extra overhead of YAA 'yet Another Agent', so they look for native syslog to solve the issue. There are a couple of issues to start:


  • If the application is running on Windows, Windows doesn't support syslog natively.

  • Even on Linux based servers that support syslog natively there is a complex configuration to collect application specific log files


Syslog’s Relationship to Application Log Files


Many administrators are accustomed to easily forwarding logs such as /var/log/messages and /var/log/secure to a SIEM by configuring rsyslog (e.g., via /etc/rsyslog.conf or /etc/rsyslog.d/forward.conf). This works seamlessly because core system components, kernel services, PAM, SSH, sudo, and others, natively generate logs through the syslog API. These events are emitted as structured messages with predefined facility and severity fields, allowing rsyslog to immediately process and forward them based on its configuration.


In contrast, most application logs do not natively integrate with syslog. For example, Apache writes its access and error logs directly to disk (/var/log/httpd/access_log, /var/log/httpd/error_log) using its own logging mechanisms, completely bypassing the syslog subsystem. As a result, rsyslog has no inherent awareness of these files or their contents. From its perspective, they are simply static files on the filesystem unless explicitly configured to monitor and ingest them (e.g., via file input modules).


Example configuration to collect application logs (Example: /var/log/httpd/access_log and /var/log/httpd/error_log via syslog


/etc/rsyslog.d/apache_forward.conf
```
```
# Load the imfile module to read from flat log files
module(load="imfile")
# Apache Access Log
input(type="imfile"
      File="/var/log/httpd/access_log"
      Tag="apache-access"
      Severity="info"
      Facility="local6"
      PersistStateInterval="10"
      freshStartTail="off")

# Apache Error Log
input(type="imfile"
      File="/var/log/httpd/error_log"
      Tag="apache-error"
      Severity="error"
      Facility="local6"
      PersistStateInterval="10"
      freshStartTail="off")

# Forward all local6 facility messages to remote syslog server
# TCP (recommended for reliability)
local6.* action(type="omfwd"
                Target="192.168.1.100"
                Port="514"
                Protocol="tcp"
                action.resumeRetryCount="100"
                queue.type="linkedList"
                queue.size="10000"
                queue.filename="apache_fwd_queue"
                queue.saveOnShutdown="on")
```

Config file notes:


  • imfile stands for input module file. It is essentially rsyslog's built-in log file tailer, think of it as rsyslog's version of tail -f, but integrated into the pipeline. When you load imfile and point it at a file path, rsyslog:

    • Opens the file and tracks its read position (stored in a state file)

    • Monitors the file for new lines being appended

    • Takes each new line and wraps it in a syslog message structure, assigning the facility, severity, and tag you configured

    • Injects that synthetic syslog message into the rsyslog pipeline

  • At that point the Apache log line looks to the rest of rsyslog like any other syslog message, and all the normal forwarding, filtering, and routing rules apply to it. That is why you then assign Facility="local6", you are giving this synthetic message a facility code so you can route it with the local6.* forwarding rule.

  • `Facility="local6"` assigns these logs to the local6 syslog facility. Local facilities (local0–local7) are reserved for custom use. Pick whichever local facility is not already in use in your environment, just be consistent between sender and receiver. The reason you need that facility assignment is precisely because the message didn't arrive from a real syslog sender with a facility already set, you had to manufacture one so rsyslog has something to filter and route on. In short: native syslog sources push into rsyslog. File-based sources like Apache have to be pulled by rsyslog using imfile, and then given syslog metadata that the application never provided in the first place.

  • `Tag` values (`apache-access` and `apache-error`) are prepended to each log line, making it easy to filter and parse on the receiving end.

  • `freshStartTail="off"` means rsyslog will read the file from the beginning on first run, sending historical log content. Set this to `on` if you only want new lines from the point rsyslog starts.

  • `PersistStateInterval="10"` saves the file read position every 10 lines, so rsyslog knows where it left off if restarted and avoids re-sending lines.

  • The queue configuration provides a disk-assisted buffer so log lines are not lost if the remote server is temporarily unreachable.


AMA Agent to SIEM Ingest versus Syslog/Logstash SIEM injest pros/cons


Azure Monitor Agent (AMA) → Sentinel


In this architecture AMA runs on the server, a Data Collection Rule (DCR) tells it what to collect and where to send it, and it ships data directly to a Log Analytics workspace over HTTPS.

Pros:

  • Custom log collection via DCR. AMA supports custom log file collection via DCR configuration, meaning application /var/log/httpd/access_log logs can be collected directly without complex rsyslog imfile configuration or logging agent needed anyways for Windows application log collection.

  • Data Collection Rules are centrally managed. DCRs are defined in Azure and applied to agents centrally. You do not need to touch individual server configurations to change what is collected, you update the DCR and the change propagates to all associated agents.

  • TLS encrypted transport natively. AMA sends over HTTPS to Azure endpoints by default. There is no additional TLS configuration required unlike syslog over TCP.

  • Easier connection to FISMA boundaries. Data Collection Rules are where this becomes directly useful for FISMA boundary identification. A DCR is a scoped Azure resource. You design your DCRs to align with your FISMA system boundaries.

  • Works on both Linux and Windows natively

  • No intermediate infrastructure. AMA ships directly to Log Analytics. There is no syslog server or Logstash node to operate, patch, or monitor. The collection pipeline is essentially just the agent and the DCR.

Cons:

  • Agent on every server. You must install, maintain, and monitor AMA on every source. In large environments or environments with strict change management this is non-trivial. Agent updates, compatibility issues, and agent health monitoring become ongoing operational responsibilities.

  • Azure connectivity required. Every monitored server needs outbound HTTPS access to Azure Monitor endpoints. In air-gapped or highly restricted environments this may not be acceptable.


Syslog / Logstash → Sentinel


In this architecture logs flow from the server to a syslog receiver (or directly to Logstash), Logstash processes and enriches them, and then a Logstash output plugin forwards them into a Log Analytics workspace via the HTTP Data Collector API or the newer Logs Ingestion API.


Pros:

  • No agent on the monitored server (in a pure syslog push model). For environments where installing agents on servers is politically difficult, requires lengthy change management, or is prohibited by policy, syslog push requires nothing on the source beyond rsyslog configuration.

  • Logstash enrichment capability. Before data hits Sentinel you can parse, transform, enrich with threat intelligence, drop noisy fields, normalize field names, and reshape the data into whatever structure you want. This is a significant advantage, you arrive in Sentinel with clean, pre-processed data rather than raw log lines.


Cons:

  • Complex configuration on the server: because the application log files are not hooked into syslog natively the server need to maintain a more complex configuration to collect application log files. if application log files change or change paths or names this configuration needs to be changed in each server's configuration file.

  • Encryption complexity: The base protocol sends everything in plaintext. To encrypt the transport, you have to bolt TLS on top of it yourself including added complexity of certificate management.

  • Sentinel context is limited. Data ingested via the HTTP/Logs Ingestion API as a custom table does not automatically benefit from Sentinel's built-in entity mapping, UEBA enrichment, or analytics rule templates the same way native connector data does.

  • Difficult connection to FISMA boundaries. Syslog source identification is self-reported, hostname-dependent, and requires manual mapping to system boundaries that must be continuously maintained and is never fully authoritative.

  • Parsing maintenance burden. Logstash grok patterns or dissect filters for example Apache need to be written and maintained. When log formats change or new fields are added, the parser breaks and someone has to fix it.

  • Log lines arrive as raw strings. Unless Logstash parses them, Apache and IIS log lines land in Sentinel as a single unstructured text field. Writing KQL against raw text is significantly harder than querying structured fields.

  • Additional infrastructure to operate. You need to run, maintain, patch, and monitor the syslog receiver and Logstash nodes. These become critical infrastructure, if they go down, log collection stops. This is operational overhead AMA does not require.


Agent versus Syslog


In a pure Microsoft environment where all your sources are Azure VMs or Arc-enabled servers, AMA is the cleaner and lower-overhead choice. In a heterogeneous environment that includes network devices, non-Azure infrastructure, or multi-SIEM requirements, a syslog/Logstash pipeline either complements or replaces AMA depending on the source type. Most mature SOC environments end up running both, AMA for Windows and Linux servers, syslog/Logstash for network devices and specialty appliances, with Logstash sometimes acting as a normalization layer before everything lands in Sentinel.


Agent Deployment Scope — Targeted Installation vs. Universal Baseline


One of the earliest and most consequential decisions in a SIEM log collection architecture is determining the scope of agent deployment. Once you have decided to use an agent like AMA, you face a second decision: do you deploy it selectively only to servers that have specific log collection needs, or do you deploy it universally to every server in the environment as a standard baseline, regardless of whether that server currently has application logs to collect?


Option 1 — Targeted Deployment: Install the Agent Only on Servers with Known Collection Requirements


In this model you identify the specific servers that have application log files to collect, Apache servers, IIS servers, custom application servers, and install AMA only on those hosts, either manually or through a configuration management tool like Red Hat Satellite by creating a targeted host group.

Pros:

  • Smaller agent footprint across the environment. Fewer servers running the agent means fewer update cycles, fewer compatibility issues to manage, and less overall operational overhead from the agent itself.

  • Simpler initial rollout. Deploying to a defined, bounded set of servers is faster and easier to validate than a full-environment rollout.

  • Easier to justify to server owners. Asking to install an agent on a specific server because it runs Apache is a concrete, explainable request. Asking to install an agent on every server in the environment is a harder conversation.

Cons:

  • Creates a reactive collection posture. New servers or new applications deployed without notifying the SOC team will not have agents installed. Log collection gaps appear silently, you do not know what you are missing because you only know about what you deliberately targeted.

  • Operationally inconsistent. Different servers end up in different states, some with agents, some without, some with outdated agent versions because they were not included in the standard update process. Over time the environment becomes difficult to reason about.

  • FISMA and compliance risk. If a server within an authorization boundary is not in the targeted group, because the inventory was wrong or the server was added without following the process, its logs are absent from the SIEM. Demonstrating boundary-complete log collection to an auditor becomes difficult.

  • Higher long-term maintenance burden per server. Because agent deployment is not standardized, each targeted server becomes a one-off configuration that has to be tracked and maintained individually.


Option 2 — Universal Baseline Deployment: Install the Agent on All Servers via Satellite, Activate Collection Only Where Needed


In this model AMA is deployed to every managed server through a configuration management tool like Red Hat Satellite as part of the standard server build or baseline configuration. The agent is installed everywhere but DCR assignment, which is what actually activates log collection and registers the server back to Sentinel, is applied selectively only to servers where collection is required.

Pros:

  • Agent deployment becomes a solved problem. Because AMA is part of the standard build, every server that comes out of Satellite already has the agent. There is no separate process to track for agent installation and no risk of a server being missed simply because no one requested an agent deployment for it.

  • Separates deployment from activation. Having the agent installed and having it actively collecting logs are two different things in this model. You can install universally without immediately incurring collection costs, the agent sits dormant until a DCR is assigned to it. This gives you the infrastructure ready to collect from any server on short notice without having to go through an agent installation process at that moment.

  • Faster incident response capability. If during a threat hunt or incident investigation you determine you need logs from a server that was not previously in scope, you can assign a DCR and begin collection immediately. Without the universal baseline you would first need to install the agent, which adds time and change management process to an already time-sensitive situation.

  • Supports future collection needs without process overhead. As new application log collection requirements emerge, activating collection on an already-agented server is a DCR assignment, a low-friction change. Without the universal baseline every new requirement triggers an agent installation process.

Cons:

  • Larger initial rollout scope. Deploying to every server in the environment via Satellite requires more planning, testing, and coordination than a targeted rollout, particularly in large or heterogeneous environments.

  • Agent overhead on every server. Even a dormant agent consumes a small amount of memory, CPU, and disk on every server. In environments with resource-constrained servers or very large fleet sizes this aggregate overhead is worth quantifying before committing to universal deployment.

  • Requires mature configuration management. This model depends on Satellite or an equivalent tool being reliable, well-maintained, and covering the full server fleet. If your Satellite coverage is incomplete the universal baseline is only as universal as your Satellite reach, which may create a false sense of completeness.

  • Organizational change management. Pushing a new agent to every server in the environment is a significant change that may require broad approval, testing across OS versions and application types, and coordination with multiple server owner teams. This is a one-time cost but should not be underestimated.

  • Cost planning needed. Even dormant agents may generate some minimal telemetry depending on configuration. At very large scale this needs to be accounted for in Log Analytics cost modeling.


The Practical Recommendation


For most environments operating under compliance frameworks like FISMA, the universal baseline deployment through Satellite is the stronger long-term architecture. The operational consistency, compliance defensibility, and incident response agility it provides outweigh the higher initial rollout cost. The key insight is that separating agent installation from collection activation removes the false choice between "deploy everywhere and pay for everything" versus "deploy selectively and accept gaps." You get broad infrastructure readiness with targeted, cost-controlled collection, the DCR becomes the precise instrument of control, not the agent deployment itself.




References


Azure Monitor Agent: What You Need to Know Before Deploying


Collect guest log data from virtual machines with Azure Monitor














 
 
 

Recent Posts

See All
Kusto KQL - Part 3D - Operators

KQL Numeric and Comparison Operators KQL provides a standard set of arithmetic and comparison operators used for calculations and filtering: Arithmetic Operators (return numeric values) + Addition -

 
 
 

Comments


Post: Blog2_Post
  • Facebook
  • Twitter
  • LinkedIn

©2021 by croninity. Proudly created with Wix.com

bottom of page