Incident Response (IR) Overview
I view IR as a set of standardized blocks that you fit together to handle each incident uniquely. Some examples of standardized IR blocks include things like; live memory analysis with different tools (e.g., Redline, Volatility), autoruns analysis, Sysmon, registry analysis, memory acquisition, disk acquisition, disk analysis, network forensics, etc, etc.
Common high-level themes across all IR include:
The incident needs proper scoping in order to put together the best plan possible.
Certain IR techniques have a time window of usage (e.g., they cannot be done or produce relevant data if the system is in a certain state).
Management will want answers related to the incident, “yesterday”
Incident Response (IR) Frameworks
There are two popular industry standards for IR phases:
NIST
1. Preparation
2. Detection and Analysis
3. Containment, Eradication, and Recovery
4. Post-Incident Activity
SANS (PICERL)
1. Preparation
2. Identification
3. Containment
4. Eradication
5. Recovery
6. Lessons Learned
Planning
Planning involves two key components:
Security Detection Controls Coverage
Understand cyber instrumentation and telemetry Leverage all available telemetry for IR. What is available?
Endpoint data (e.g., event logs, registry, user execution, etc.
Network data (e.g., zeek, proxies, VPNs, firewall logs)
EDR data
Incident Response Procedures (Outlined in this document)
Security Detection Controls Coverage
Cybersecurity systems (sensors & instrumentation) are deployed throughout the organization to detect possible security incidents.
Planning of what cybersecurity systems to deploy is a risk-based decision.
‘Lessons Learned’ from Incidents feedback into Planning for security controls.
KPI metrics:
% of Security controls coverage against Mitre Attack Techniques used by Threats to your organization
% of Security controls coverage against Mitre Attack Techniques.
Identification
Identification of incidents is primarily through alert telemetry integrated into SOAR via SIEM and EDR platforms. Because alert telemetry varies there are three high level types of alerts that are triaged: Alerts that are false positives; 1. alerts that are true positives but the system blocked the alert so there is no compromise; 3. alerts that are true positives that the system did not block and there is a successful user comprise, malware infection, or malicious insider.
Identification - Determining impacted user and systems
Identification - Determining what the impact is
Identification - Determining the scope of the impact
Left of compromise/malware/insider threat (e.g., Where and when it started. Phishing email etc)
Right of compromise/malware/insider threat (e.g., What did the Threat Actor (TA) do during the compromise. For example, persist access, load apps, download data, view data, send additional phishing emails, etc).
This process is iterative where additional IOCs may expand the scope of the investigation.
KPI Metric - Mean time to Address (MttA)
Other Key Identification Concepts
Start with where attackers frequently target, the high value targets (e.g., Domain Controllers, File servers, Exchange servers, etc.)
After triaging the high value targets, work on root cause analysis and lateral movement.
Divide the work into two workstreams, one works point-in-time forward and the other works point-in-time backwards
Look for lateral movement outside of normal business hours and look for quick wins (e.g., service installs, malicious PowerShell activity, scheduled task creations, etc.)
prioritizing the investigation of systems where they new the attacker touched the system but did not immediately find evidence of the attackers.
Look for Threat Actors TAs covering tracks and obfuscation:
TAs deleting all their tools and move onto different systems
Altering communications
Using encryption for communications
Implementing multi-backdoors
Implementing one-off back doors (e.g., 9 systems had same back doors, but 10th system had some different back doors)
Never stop looking, and
Containment
The most critical activity related to a confirmed incident is containing the incident. This phase also involves determining the scope of the incident so other possible user compromises related to the initial user compromise can be contained.
KPI Metric - Meant time to Contain (MttC)
Eradication
Eradication deals with completely removing the TA from the environment. This can include:
Delete (Remediate) initial phishing vector emails from the system
Removing TA added MFA
Removing TA added email box rules
Removing TA added apps
Removing malware from the system
KPI Metric - Meant time to Resolve (MttR)
Recovery
Recovery can be simple or complex depending upon the scope of the compromise or malware infection.
Simple recovery can include:
The user successfully resets their password to a new secure password the TA does not know (e.g., the user has access back to systems to work).
Machine removed from isolation or wiped and user has access to systems for work.
Complex recovery can include:
Verification of multiple servers have no malware and back online and functional.
Additionally, security control to block the TA from permanently accessing the systems.
Lessons Learned
A post incident analysis should be conducted to understand lessons learned and facilitate an improved response to future events.
"Lessons Learned process: Preparation How could we have avoided the incident altogether? This includes changes to your network architecture, system configuration, user training, or even policy. What policies or tools could have improved the entire process? Identification What telemetry sources (IDS, net flow, DNS, etc.) could have made it easier or faster to identify this attack? What signatures or threat intelligence could have helped? Containment Which containment measures were effective? Which were not? Could other containment measures have been useful if they’d been more easily deployable? Eradication Which eradication steps went well? What could have gone better? Recovery What slowed the recovery? (Hint: focus on communication, as that’s one of the toughest parts of recovery to do well.) What did the response to recovery tell us about the adversary?"
Roberts, Scott J; Brown, Rebekah. Intelligence-Driven Incident Response: Outwitting the Adversary (Kindle Locations 889-892). O'Reilly Media. Kindle Edition.
High Level Incident Categories
The type and scope of the incident is what will dictate the IR techniques, tools and processes to use. At the base level of Incidents are a few main categories U. M. I :
User Compromise
Malware Infected Systems
Insider Threat
Other important base level response procedures related to these main categories are:
Phishing
Data Exfiltration
Ransomware
These are separated because ‘phishing’ in itself is not a cyber security incident. Phishing is a mechanism to compromise user accounts, or infect systems with malware. Data Exfiltration and ransomware are the most common ensuing Threat Actor (TA) actions after user compromise, or malware infection. Additionally, Data Exfiltration is commonly seen as an action by malicious insiders.
User Credential Compromise - Identification
High Level User Compromise alert response procedures
Quickly triage the alerts for additional suspicious user activities. The more alerts and suspicious activities the higher the likelihood of user account compromise. Do not delay investigation before contacting the user. Often a quick discussion with the user can determine if the account was compromised or not.
Example: Strange logon locations, tokens, hours, etc - Entra ID Sign-in Logs, Defender Alerts, Defender Incidents
Example: Strange MFA registration activity Azure Identity Protection, Defender Alerts, Defender Incidents
Example: Strange user activity (e.g., browsing SharePoint)
Example: Strange email box activity and rules Exchange Admin Center - Mailboxes, Defender Email & Collaboration > Explorer
Example: AiTM related alert (Consider the user compromised)
2. Contact the user to verify their activities. Note: You can not sley rely on emails for contact. If the TA has access to the compromised users email box they can simply say, “yes that was me.”
You want to quickly determine if the users' own activities triggered the alert. The most common example is if a user is having logon issues. But other common examples are users traveling.
Do not be vague in your communications. Be more precise like, “…we saw suspicious logon activities from X location…”, “…..we saw a new MFA registration on your account…..”
If the user can not confirm the activity assume their account was compromised. There are two ways user may not confirm the activity.
User confirms they did not perform the suspicious activity.
Ask the user if they recently had to enter their credentials in the system, specifically from an email (This will help speed up the investigation in finding the initial attack vector)
User does not respond
3. Compromise determination:
Confirming that the user is compromised? There are three main ways to confirm the user is compromised. When the user is confirmed compromised move quickly to Containment.
Overwhelming evidence of compromise (e.g., AiTM alert, multiple alerts)
Strong evidence of compromise and the user cannot be reached to confirm
User confirms that they weren’t source of activity.
User not compromised.
Close alert
Look for possible alerting tuning opportunities
User Credential Compromise - Containment
It is of paramount important to Reset users credentials and revoke all tokens
Reset credentials and revoke sessions on compromised account.
Verify threat actor is no longer logging into account (i.e., You only see logons attributable to the legitimate users current location)
Determine initial attack vector
Defender > Email & Collaboration > Explorer - Once identified - move to Eradication to remove phishing email from environment
Sites such as Virus T
Determine attack scope (full attack story).
What did the TA do. Did they send phishing emails from the compromised user? Did they view files? Did they view files?
Check the user’s forwarding rules for newly established and suspicious forwarding rules - Exchange Admin Center - Mailboxes
It may be necessary to quarantine affected devices to prevent lateral movement - Intune admin center > Devices
Add suspected malicious addresses or domains to Threat Policies >Tenant Block List to prevent incoming communication from those sources.
User Credential Compromise - Eradication
After finding initial phishing email, remediate from the system [zap].
Use Defender > Email & Collaboration >Explorer > Select Email > Take Action > Submit Remediation to Action Center
NOTE - Capture Email for future reference and analysis by pulling the email header via Defender Email & Collaboration > Explorer > Select Email > Open in New Window > Copy Header and Save
Remove TA MFA if applicable.
Remove TA email box rules if applicable.
Remove TA added apps if applicable.
Check to see if browser passwords downloaded. Advise user to reset other credentials that they have had saved in browser.
Notify 3rd party source of initial phishing email that they have a compromised user that was the source of a phishing campaign.
Notify ISPs that were hosting malicious sites that were used as part of teh compromise.
Add suspected malicious addresses or domains to Threat Policies >Tenant Block List to prevent incoming communication from those sources.
User Credential Compromise - Recovery
Verify that there is no TA logon activity using the compromised credentials.
Verify all remnant of TAs activity (MFA, email box rules, apps, etc) are removed
Verify that all initial vector phishing emails are deleted as well as phishing emails sent during the compromise. If phishing emails left the system to a 3rd party notify the 3rd party that they may have receive phishing emails from a compromised account within your organization.
Reset additional credentials that may have been comprised (Credentials in browsers or viewed on sharepoint)
High Level User Malware alert response procedure
Contact the affected users to verify their activities. In many cases the system will alert on suspicious files that are either unique or custom software tools.
If the user admits to getting infected through web surfing or clicking a phishing link or says they have no idea what the file is move to containment.
If the EDR did not auto contain the malware move to containment.
If the user says they are using custom software, investigate the software further as to why the system identified it as malware.
If analysis still indicates strong evidence that vendor software is malware they may need to contact vendor to find out why their software is being detected as malware.
Malware - Containment
There are two separate paths depending upon whether the EDR automatically stopped the malware.
If the EDR automatically stopped the malware:
Remove the malware files
Determine how the malware got onto the system (email, download, etc)
If the EDR did not contain the malware:
Contain infected devices from the network to prevent the spread of malware.
Defender > Assets > Devices > Select Device > Contain Device [three dots in upper right corner]
Work with IT on next steps to remove the malware.
Add an IOC for malware hash in EDR.
Threat hunt malware hash in EDR and elastic.
Comments