# Firewall Management Lab — Technical Writeup

*Audience: SOC leads, network engineers, IT directors reviewing this project as part of a hiring process.*

---

## 1. Goals

This lab is meant to prove one thing: that I can operate a firewall the way a production SOC does — with documented policy, a change management gate in front of every rule modification, and the ability to investigate a blocked-traffic alert end to end.

It is deliberately **not** a red-team or attack lab. The value here is the defensive process, not exploit development. The firewall is not connected to anything I care about compromising; the traffic is simulated or between virtual machines. The point is the workflow and the documentation habit.

Specific things this lab is designed to practice:

- Writing and maintaining pf firewall rules in a way that is auditable (labeled, commented, CR-referenced)
- Using Suricata for IDS alerting without tuning it into silence — knowing which alerts matter
- Following a change management process even when working alone, because the habit is the skill
- Investigating a blocked or alerted flow from alert through log correlation through root-cause determination
- Writing incident reports that a real team could act on

What it doesn't claim: production experience, enterprise scale, or a hardened threat model. The lab runs on repurposed hardware with simulated traffic. The discipline is real; the blast radius is not.

---

## 2. Topology

```
                    [WAN — 203.0.113.0/30]
                           |
                           | em0
                    [pfSense Firewall]
                   /                  \
               em1                   em2 / em3 / em4
              /                             |
    [DMZ — 172.16.0.0/24]         [Core Switch (L3)]
     172.16.0.10  reverse proxy      /      |       \
     172.16.0.20  DNS resolver      /       |        \
                                   /        |         \
                       [VLAN 10]  [VLAN 20]  [VLAN 30]
                    192.168.10.0  192.168.20.0  10.0.30.0
                       /24           /24            /24
                     Users         Servers        Mgmt
```

### Addressing plan

| Segment | Interface | Subnet | Gateway | Purpose |
|---------|-----------|--------|---------|---------|
| WAN | em0 | 203.0.113.0/30 | 203.0.113.1 | Upstream ISP (RFC 5737 doc range in lab) |
| DMZ | em1 | 172.16.0.0/24 | 172.16.0.1 | Perimeter services |
| VLAN 10 | em2 | 192.168.10.0/24 | 192.168.10.1 | Workstations |
| VLAN 20 | em3 | 192.168.20.0/24 | 192.168.20.1 | Internal servers |
| VLAN 30 | em4 | 10.0.30.0/24 | 10.0.30.1 | Out-of-band management |

### Hosts

| Name | IP | VLAN | Role |
|------|----|------|------|
| proxy | 172.16.0.10 | DMZ | NGINX reverse proxy |
| dns | 172.16.0.20 | DMZ | Unbound recursive resolver |
| ws01 | 192.168.10.10 | 10 | User workstation |
| ws02 | 192.168.10.11 | 10 | User workstation |
| web | 192.168.20.10 | 20 | Web application server |
| files | 192.168.20.20 | 20 | SMB file server |
| syslog | 192.168.20.30 | 20 | syslog-ng log collector / backup server |
| mgmt | 10.0.30.5 | 30 | Admin workstation |
| fw | 10.0.30.1 | 30 | pfSense management IP |

### Trust model

Three zones, ordered by trust level:

**Untrusted (WAN):** No unsolicited inbound traffic is permitted. Anti-spoofing rules block RFC 1918 and other bogons arriving on the WAN interface. All NAT masquerades internal subnets behind the WAN IP.

**DMZ (semi-trusted):** The DMZ is reachable by WAN for services that need to be — currently none, because no public services are exposed. DMZ hosts may initiate outbound HTTP/HTTPS for updates. They may send syslog to the log collector. They may not reach any internal LAN segment directly.

**Internal LAN (trusted, segmented):** Three VLANs with explicit-deny between them. Users can reach the internet and the file server. Users cannot reach servers directly on most ports. Management can reach everything via SSH but only from the management VLAN. No segment can reach the management VLAN except the firewall itself.

---

## 3. Default policy stance

Every interface has `block in log all` as its last rule. There is no implicit "allow established" on the inside interfaces — pfSense handles stateful return traffic, but new connections from any direction require an explicit permit.

**Why explicit-deny and not a stateful-allow-out?**

Many small firewalls are configured with an implicit "allow outbound, deny inbound" policy. That stops drive-by attacks from the internet but does nothing if a host inside the network is compromised or misconfigured. The explicit-deny model means that *any* new flow — including user-to-server, server-to-internet, or DMZ-to-internal — requires a written rule. That rule has a change-request ID in its label. This creates an audit trail and forces a conscious decision for every access path.

**What "change-managed" means operationally:**

Every time I add, modify, or remove a rule, I fill out the change request template before touching the firewall config. The template forces me to articulate the justification, assess the risk, write a test plan, and think through rollback. Working alone, this feels unnecessary — but that's the point. The habit of documenting decisions is not for the current moment; it's for the version of me that comes back six months later and has no idea why a rule exists.

**Logging defaults:**

Every block rule logs. The `log` keyword in pf sends dropped packets to `/var/log/filter.log`, which syslog-ng picks up and forwards to the log collector. Passes are not logged by default (too noisy), but specific high-interest passes — particularly anything touching the management VLAN — have explicit `log` on the pass rule.

---

## 4. Ruleset structure

Rules are organized by interface (pfSense processes rules per-interface, inbound). Within each interface block, the order is:

1. Anti-spoofing blocks (WAN only)
2. Specific permits, each labeled with a CR reference
3. Explicit named blocks for high-interest flows (belt-and-suspenders over the default deny)
4. Default deny with log

**Object tables** are defined at the top: `<user_net>`, `<server_net>`, `<dmz_hosts>`, etc. This means adding a new host to the user VLAN doesn't require editing every rule that references the user network — only the table definition.

**Naming convention for labels:** `<zone>-<direction>-<service>`. Examples:
- `users-web-egress` — Users VLAN, outbound to internet, HTTP/HTTPS
- `dmz-no-lateral` — DMZ hosts blocked from reaching LAN
- `mgmt-ssh-fw` — Management SSH to the firewall

Labels appear in firewall log entries and in `pfctl -s rules`, which makes log correlation trivial: grep the label, get the rule. Grep the rule ID (the numeric `1000000NNN` pfSense assigns), cross-reference to the label.

---

## 5. Change management

Every rule modification — add, edit, or delete — goes through the same four-step process:

```
Request → Review → Implement → Verify
```

**Request:** Fill out `policy/change-request-template.md`. This includes scope (which interface, which hosts, which ports), justification, risk assessment, and rollback plan. The CR gets a sequential ID: CR-YYYY-NNN.

**Review:** In a real environment, a second set of eyes. In this lab, I sit on the request for at least an hour and re-read it. The goal is to catch the thing you don't see when you're in the middle of configuring.

**Implement:** Apply the change in pfSense GUI or via pf.conf edit. Back up the config before and after: `Diagnostics → Backup & Restore` produces a dated XML snapshot. Update `baseline.rules` to reflect the new state.

**Verify:** Test the specific flow (e.g., `nc -zv destination port` from the correct source host). Check the firewall log to confirm the label appears. Check that adjacent flows that should remain blocked are still blocked. Mark the CR closed.

**Rollback:** pfSense keeps config backups in `/cf/conf/backup/`. Reverting to the pre-change config takes about 90 seconds via the GUI, or `pfctl -f /etc/pf.conf` at the CLI.

---

## 6. Traffic analysis

**What gets logged:**
- All drops (every interface, every direction) — syslog-ng at 192.168.20.30
- Suricata alerts (inline mode via pfSense package) — same syslog collector
- DHCP leases — pfSense DHCP log, also forwarded to syslog-ng

**Toolchain for investigation:**

*For a blocked flow:* Start with pfSense filter log. Filter by source IP or destination port. The label on the blocking rule tells you which policy denied it. This is usually enough to identify the cause.

```bash
# On the syslog server: find all drops from a specific host
grep "192.168.10.11" /var/log/syslog/filter.log | grep "block"
```

*For a Suricata alert:* Suricata writes alerts to `/var/log/suricata/fast.log` and to syslog. The alert includes source, destination, signature ID, and the signature name. Cross-reference the signature name against the Emerging Threats ruleset documentation to understand what traffic pattern triggered it.

```bash
grep "ET SCAN" /var/log/suricata/fast.log | tail -20
```

*For an unknown flow:* `tcpdump` on the relevant pfSense interface to capture raw packets. `-w` to a file, then open in Wireshark for analysis.

```bash
# Capture TCP on the Users interface for 60 seconds
tcpdump -i em2 -w /tmp/users-cap.pcap -G 60 -W 1 tcp
```

*For log correlation:* syslog-ng writes structured logs per host under `/var/log/syslog/hosts/<ip>/current`. Cross-referencing endpoint process logs with firewall block timestamps is how you go from "what was blocked" to "what was the source process."

---

## 7. Incident response workflow

The firewall is involved in all four IR phases. Here's how I practice each:

**Triage:** Alert fires (Suricata or a firewall log anomaly). First question: is the traffic being blocked or permitted? If blocked, the threat is already mitigated — triage drops from Critical to Low. Pull the filter log for the source IP to see the full scope of attempts.

**Containment:** If traffic is blocked, containment is the firewall doing its job. Verify that no state was established (`pfctl -s state | grep <source-ip>`) before downgrading severity. If traffic was permitted (false negative), add a temporary block rule immediately, then investigate.

**Eradication:** Identify the root cause (compromised host, misconfigured application, test traffic that wasn't coordinated). Remediate at the source, not just at the firewall. A block rule without a root-cause fix is just a bandage.

**Recovery:** Remove temporary rules. If a permanent rule change is needed (e.g., to block a newly-discovered lateral path, or to explicitly permit a legitimate flow that was being blocked), file a CR and go through the normal change process — even if the incident is "closed."

**Lessons learned:** Every IR gets a written report. The value is not the report itself but the habit of asking "what would have caught this earlier?" and "does the detection need tuning?"

---

## 8. Worked example: IR-2026-003

*This is a real example from the lab. The full report is in `policy/incident-report-example.md`.*

**Alert:** Suricata fires `ET SCAN Potential SSH Scan OUTBOUND` at 02:11 UTC. Source: 192.168.10.11 (ws02). Targets: three hosts in VLAN 20, port 22. Traffic is blocked by `users-default-deny`.

**Triage:** Filter log confirms all attempts blocked, no state established. Severity: Low.

**Investigation:** syslog-ng shows the Veeam backup agent on ws02 starting a job at 02:08. The agent's config points to all three server-VLAN hosts and was reconfigured by a software update to try SSH transport instead of the application port (9443). The agent was iterating its configured host list, probing each on port 22 — which Suricata correctly identified as a scan pattern.

**Outcome:** No breach. Reconfigured the agent, scoped its target list to only the backup server (192.168.20.30), corrected the transport port to 9443. Filed CR-2026-016 to explicitly permit that specific flow. Added an explicit `users-no-ssh-to-servers` block rule so future SSH-to-servers blocks are clearly labeled in audit logs rather than falling through to the generic default-deny.

**Time to resolve:** 65 minutes. **Impact:** Zero.

---

## 9. What's next

Honest gaps and the next milestones:

| Gap | Plan |
|-----|------|
| No centralized log search — grepping flat files doesn't scale | Stand up Graylog or Elastic Stack on a separate VM in VLAN 20 |
| Suricata rules are mostly defaults — high false-positive rate | Work through the ET ruleset methodically; suppress known-good; write one custom rule |
| No IDS coverage on inter-VLAN traffic (Suricata is only on WAN) | Enable Suricata on em2 and em3; compare alert volume |
| Change process is documented but not version-controlled | Move baseline.rules and CR log into git; each CR becomes a commit message |
| No network monitoring baseline — anomaly detection requires knowing "normal" | Enable pfSense traffic graphs; capture a week of baselines before tuning Suricata |
| Lab topology is single-firewall — no redundancy or HA testing | Add a second pfSense in CARP/HA mode; practice failover |

The goal is not to finish — it's to keep the process honest. Each gap is a planned lab exercise, not a shortcoming.
