IRP: Breach Detection, Containment, Recovery - Beyond Theory

Breach containment isn’t about stopping the attacker; it’s about stopping the bleeding before you can figure out how to stop the attacker.

Imagine you walk into your kitchen and the sink is overflowing, water everywhere. Your first thought isn’t to figure out why the sink is overflowing (clogged drain, broken pipe, etc.). It’s to turn off the faucet. That’s containment.

Here’s a simulated incident: a web server is exhibiting unusual outbound network traffic, potentially indicating a compromise and data exfiltration.

# Initial observation: High outbound bandwidth from webserver-prod-01
$ sar -n DEV 1 5 | grep eth0
12:00:01     eth0: 1234567890.1234567890  1234567890.1234567890   0.00    0.00
12:00:02     eth0: 1234567890.1234567890  1234567890.1234567890   0.00    0.00
12:00:03     eth0: 1234567890.1234567890  1234567890.1234567890   0.00    0.00
12:00:04     eth0: 1234567890.1234567890  1234567890.1234567890   0.00    0.00
12:00:05     eth0: 1234567890.1234567890  1234567890.1234567890   0.00    0.00

# Wait, that's not right. Let's try a more active monitor.
$ tcpdump -i eth0 -nn 'tcp and port 443' -c 100 | grep -E ' > [0-9]+\.[0-9]+\.[0-9]+\.[0-9]+:| < [0-9]+\.[0-9]+\.[0-9]+\.[0-9]+:' | awk '{print $7}' | sort | uniq -c | sort -nr
 56789 192.0.2.10:443 > 203.0.113.5:443
 12345 192.0.2.10:443 > 198.51.100.20:443
  5000 192.0.2.10:443 > 192.0.2.15:443

# The server `webserver-prod-01` (IP 192.0.2.10) is sending a lot of data to external IPs.

The core problem is that a compromised system can become a launchpad for further attacks, either against your other internal systems or externally, damaging your reputation and potentially incurring legal liabilities. Containment aims to sever these pathways.

Isolate the Affected System

This is the most immediate and impactful step. You need to stop the system from communicating with anything else.

Diagnosis: Confirm network connectivity.

# From another machine, try to ping the webserver
ping webserver-prod-01
# Try to SSH into the webserver
ssh webserver-prod-01

If these succeed, the system is still network-connected.

Fix: This depends on your infrastructure.
- Cloud (AWS): Modify the Security Group attached to the instance.
```
aws ec2 modify-security-group-rule --group-id sg-xxxxxxxxxxxxxxxxx --rule-id sgrule-yyyyyyyyyyyyyyyyy --direction egress --protocol all --port all --cidr 0.0.0.0/0 --description "Temporarily block all outbound"
```
  This command effectively denies all outbound traffic by removing the existing permissive rule and adding a block rule. In practice, you might simply remove the existing outbound rule that allowed all traffic.
- On-Premise (Firewall): Create an explicit deny rule at the top of your firewall policy for the server’s IP address.
```
# Example Cisco ASA syntax
access-list OUTSIDE_IN_ACL extended deny ip host 192.0.2.10 any
access-list OUTSIDE_IN_ACL line 100000 access-group OUTSIDE_IN_ACL in interface outside
```
  This adds a rule to block all IP traffic originating from 192.0.2.10 to any destination.
- On-Premise (Network Switch/Router ACL): Apply an Access Control List on the switch port or router interface where the server connects.
```
# Example Juniper Junos syntax
set firewall family inet filter BLOCK_OUTBOUND term BLOCK_SERVER from source-address 192.0.2.10/32
set firewall family inet filter BLOCK_OUTBOUND term BLOCK_SERVER then discard
set interfaces ge-0/0/1 unit 0 family inet filter output BLOCK_OUTBOUND
```
  This filters traffic leaving the ge-0/0/1 interface, specifically discarding any packets originating from 192.0.2.10.
Why it works: This physically severs the server’s ability to send data out of your network, preventing further exfiltration or lateral movement.

Identify and Block Malicious Destinations

Even if you can’t isolate the server immediately, you can stop it from talking to the specific bad guys.

Diagnosis: Use the tcpdump output from earlier or firewall logs to identify the IPs the server is communicating with. Look for unusual, high-volume connections to external, non-standard IPs.
```
# Re-run tcpdump to capture more recent traffic if needed
$ tcpdump -i eth0 -nn 'tcp and port 443' -c 100 | grep -E ' > [0-9]+\.[0-9]+\.[0-9]+\.[0-9]+:| < [0-9]+\.[0-9]+\.[0-9]+\.[0-9]+:' | awk '{print $7}' | sort | uniq -c | sort -nr
```
In our example, 203.0.113.5 and 198.51.100.20 are suspicious.

Fix: Add firewall rules to block outbound connections to these IPs.

Cloud (AWS): Add explicit deny rules to the Security Group.

aws ec2 create-security-group-rule --group-id sg-xxxxxxxxxxxxxxxxx --direction egress --protocol all --port all --cidr 203.0.113.5/32 --description "Block known bad IP"
aws ec2 create-security-group-rule --group-id sg-xxxxxxxxxxxxxxxxx --direction egress --protocol all --port all --cidr 198.51.100.20/32 --description "Block known bad IP"

On-Premise (Firewall):

# Example Palo Alto Networks PAN-OS syntax
set rulebase security rules BLOCK_EXTERNAL_IPS from zone trust to zone untrust
set rulebase security rules BLOCK_EXTERNAL_IPS from [ <your-internal-zone> ]
set rulebase security rules BLOCK_EXTERNAL_IPS to [ <your-external-zone> ]
set rulebase security rules BLOCK_EXTERNAL_IPS source address webserver-prod-01
set rulebase security rules BLOCK_EXTERNAL_IPS destination address 203.0.113.5
set rulebase security rules BLOCK_EXTERNAL_IPS destination address 198.51.100.20
set rulebase security rules BLOCK_EXTERNAL_IPS service any
set rulebase security rules BLOCK_EXTERNAL_IPS action deny
set rulebase security rules BLOCK_EXTERNAL_IPS description "Block known bad IPs outbound"

This creates a rule that explicitly denies any traffic originating from your internal zone, destined for the specified malicious IPs.

Why it works: This prevents the compromised server from communicating with known command-and-control servers or data drop points, even if it’s still running.

Disable Compromised User Accounts or Services

If the compromise is tied to specific credentials or a running process, disabling them can stop the attack.

Diagnosis: Examine process lists and user logins on the server.
```
# Look for suspicious processes
$ ps aux | grep -v root | grep -v 'your_app_user' | grep -E 'bash|sh|nc|wget|curl'
# Check for unusual logins
$ last | head
```
If you see processes like nc (netcat) or wget running with unusual arguments, or logins from unexpected users/IPs, this is a strong indicator.
Fix:
- User Account: Lock the account.
```
# Linux
sudo usermod -L compromised_user
# Or better, expire the account
sudo chage -E 0 compromised_user
```
  This prevents the compromised_user from logging in or running any further processes.
- Service: Stop and disable the service.
```
# If it's a systemd service
sudo systemctl stop suspicious.service
sudo systemctl disable suspicious.service
```
  This stops the malicious process from executing and prevents it from starting on reboot.
Why it works: This removes the specific identity or mechanism the attacker is using to operate on the system, effectively cutting off their access through that vector.

Take a Forensic Snapshot

Before you wipe or rebuild, you need evidence.

Diagnosis: This isn’t a diagnostic step, but a preparatory one. You’ve identified a compromise and are moving to containment.
Fix: Create a disk image or memory dump of the affected system.
- Disk Imaging (e.g., using dd or specialized forensic tools):
```
# From a separate forensic workstation or a live CD
# WARNING: This command overwrites the destination. Ensure /dev/sdX is correct.
sudo dd if=/dev/sda of=/mnt/forensic_storage/webserver-prod-01.img bs=4M status=progress
```
  This creates a bit-for-bit copy of the server’s primary disk (/dev/sda) to a safe, external storage location (/mnt/forensic_storage/webserver-prod-01.img).
- Memory Dump (e.g., using lime-forensics or volatility):
```
# Using LiME (Linux Memory Extractor)
sudo insmod /path/to/lime.ko "path=/mnt/forensic_storage/webserver-prod-01.mem format=lime"
```
  This module injects into the kernel and dumps the system’s RAM contents into a file, capturing volatile data like running processes and network connections that are lost on reboot.
Why it works: This preserves the state of the compromised system for later analysis, allowing investigators to understand the attack vector, scope, and impact without altering the original evidence.

Revert to a Known Good State

Once contained and potentially imaged, the fastest way to ensure a clean system is often to replace it.

Diagnosis: You’ve performed containment, taken snapshots, and are ready to remediate.
Fix:
- Rebuild: Provision a new server from scratch using your standard build pipeline (e.g., Terraform, Ansible, Chef).
- Restore: Restore from a known good backup taken before the compromise.
```
# Example of restoring a database from a backup
psql -U myuser -d mydatabase < /path/to/good_backup.sql
```
- Patch/Scan: If the compromise was minor and you have a strong understanding of the vulnerability, you might patch the existing system, but rebuilding is generally safer.
Why it works: This ensures that no residual malware or backdoors remain on the system, providing the highest confidence in its integrity.

The next hurdle you’ll likely face after containment is understanding how the breach occurred, which involves deeper forensic analysis.

Isolate the Affected System

Identify and Block Malicious Destinations

Disable Compromised User Accounts or Services

Take a Forensic Snapshot

Revert to a Known Good State

Related Concepts

More Deep Dives in Infrastructure Security