π Welcome to BGP Redundancy and ISP Failover Lab
This comprehensive lab will guide you through implementing enterprise-grade network redundancy using BGP and ISP failover mechanisms with Palo Alto Firewall. You'll learn to handle real-world scenarios where outages over consecutive days resulted in critical connectivity loss.
π‘ Lab Scenario Context
Your organization recently experienced two separate ISP outages over consecutive days - first with ISP-A (AS 65100), followed by ISP-B (AS 65200). These outages resulted in loss of outbound connectivity and internal systems access, despite having redundant BGP configurations. The goal is to identify the root cause and implement a reliable failover solution.
π What You'll Learn
- Analyze existing BGP configurations and identify failover logic gaps
- Implement advanced BGP tuning for improved redundancy and failover detection
- Configure SLA monitoring for proactive ISP health checks
- Design and validate seamless failover behavior during maintenance windows
- Troubleshoot common BGP path selection and routing issues
- Document network improvements and create maintenance procedures
- Conduct live failover testing with minimal service disruption
π¬ Lab Environment
Primary Components: Palo Alto PA-FW-01 firewall, two ISP connections (ISP-A: AS 65100, ISP-B: AS 65200), edge routers (EDGE-RTR-01, EDGE-RTR-02), distribution switch (DIST-SW-01), and VPC test node for validation.
Network Scope: Trust Zone (172.16.1.0/24), Internal Zone, routing between multiple autonomous systems with BGP path selection optimization.
π Ready to Begin?
Navigate through the tabs above to start your journey. Begin with the Topology section to understand the network architecture, then move through Prerequisites and Configuration for hands-on implementation.
ποΈ Network Topology and Architecture
BGP Redundancy and ISP Failover Architecture
ISP-A ISP-B
(AS 65100) (AS 65200)
| |
βββββββββ βββββββββ
βISP-A PEβ βISP-B PEβ
β10.1.1.1β β10.1.1.1β
βββββ¬ββββ βββββ¬ββββ
β β
BGP Primary BGP Secondary
(201 - AS 65100) (202 - AS 65200)
β β
ββββββΌβββββ ββββββΌβββββ
βEDGE-RTR-β βEDGE-RTR-β
β 01 ββββββ Trust Zone βββββ€ 02 β
β10.1.1.x β 172.16.1.0/24 β10.1.1.x β
βSecondaryβ βSecondaryβ
β Rtr β β Rtr β
ββββββ¬βββββ ββββββ¬βββββ
β β
β OSPF to FW β
ββββββββββββ¬ββββ¬βββββββββββββββ
β β
ββββββΌββββΌβββββ
β PA-FW-01 β
βTrust:172.16.β
β1.3/24 Inter-β
βnal:10.0.1.0/β
β 24 β
ββββββββ¬βββββββ
β
β Internal Zone
β 10.0.1.0/24
ββββββββΌβββββββ
β DIST-SW-01 β
β10.1.1.100/24β
β L3 Switch β
ββββββββ¬βββββββ
β
βRouting Path
β10.0.1.0/24
ββββββββΌβββββββ
β VPC-TEST-01 β
β 10.1.1.101 β
β Test Node β
βββββββββββββββ
π‘ Architecture Key Points
Dual ISP Design: Primary ISP-A (AS 65100) and Secondary ISP-B (AS 65200) provide redundant internet connectivity with BGP path selection based on AS path length and local preference.
Centralized Security: Palo Alto PA-FW-01 serves as the central security enforcement point between Trust and Internal zones, with OSPF routing to edge routers.
Failover Logic: BGP path selection prioritizes ISP-A as primary with automatic failover to ISP-B during outages, complemented by SLA monitoring for enhanced detection.
π§ Configuration Notes
- AS 65001 (Enterprise): Local autonomous system
- Primary: ISP-A (Lower MED priority)
- Secondary: ISP-B (Higher MED backup)
- BGP between edge routers: eBGP sessions
- OSPF between routers: Internal routing
- PA Firewall central routing: Trust/Internal security zones
- Path monitoring for failover: Proactive health checks
- Failover detection enhancements: Bidirectional Forwarding Detection (BFD)
- Single distribution layer: Simplified L3 switching
- Centralized security policies: Application-aware filtering
π Prerequisites and Planning
π§ Hardware Requirements
- Palo Alto Networks PA-FW-01 (PAN-OS 10.1+)
- Two edge routers supporting BGP (Cisco/Juniper preferred)
- Layer 3 distribution switch
- Test workstation/server for validation
- Console access to all network devices
π Network Prerequisites
- Active ISP connections from two different providers
- Assigned BGP autonomous system numbers
- IP address allocations for WAN interfaces
- Trust Zone subnet: 172.16.1.0/24
- Internal Zone subnet: 10.0.1.0/24
π Knowledge Prerequisites
- BGP fundamentals and path selection algorithms
- OSPF routing protocol configuration
- Palo Alto PAN-OS command line interface
- Network troubleshooting methodologies
- Understanding of MED, AS-PATH, and Local Preference
β οΈ Important Planning Considerations
Maintenance Window: Schedule this lab during planned maintenance windows as BGP changes can temporarily affect routing.
Backup Configuration: Create full configuration backups of all devices before starting.
ISP Coordination: Inform ISP providers about planned failover testing to avoid unnecessary trouble tickets.
π οΈ Pre-Lab Checklist
Verify Current Network State
Confirm all ISP links are operational and BGP sessions are established
Backup Configurations
Export running configurations from all network devices
Establish Console Access
Ensure out-of-band management access to all devices
Document Current Routing Table
Capture baseline routing information and BGP path selection
βοΈ Configuration and Implementation
Initial BGP Configuration Analysis
Start by analyzing the current BGP configuration to understand existing failover logic:
π‘ Key Discovery Points
Look for BGP session states, path selection criteria, and identify why failover may not be working as expected. Common issues include incorrect MED values, missing BFD configuration, or inadequate SLA monitoring.
Configure Enhanced BGP Settings
Implement improved BGP configuration with better failover detection:
Implement Path Selection Optimization
Configure BGP attributes to ensure proper primary/secondary path selection:
β οΈ Local Preference Impact
Local Preference values are shared within the AS. Higher values (200 for ISP-A) are preferred over lower values (100 for ISP-B), ensuring ISP-A is always the primary path when available.
Configure SLA Monitoring for Enhanced Failover
Implement proactive monitoring to detect ISP health issues:
Configure BFD for Rapid Failure Detection
Enable Bidirectional Forwarding Detection to reduce convergence time:
β BFD Benefits
BFD provides sub-second failure detection (typically 3-9 seconds) compared to BGP keepalive timers (60-180 seconds), dramatically improving failover performance.
Configure Security Policies and NAT
Set up security policies to allow traffic through both ISP paths:
Commit Configuration and Validate
Apply the configuration and perform initial validation:
π Configuration Validation Checkpoints
BGP Sessions: Both ISP-A and ISP-B sessions should show "Established" state
Route Selection: Default route should prefer ISP-A path (higher local preference)
BFD Status: BFD sessions should be "Up" for rapid failure detection
SLA Monitoring: Interface monitoring should show healthy status
π§ Troubleshooting Guide
π Common Issues and Solutions
| Issue | Symptoms | Solution |
|---|---|---|
| BGP Session Not Establishing | show routing protocol bgp peer shows "Connect" or "Active" | Check IP connectivity, firewall rules, and AS number configuration |
| Wrong Path Selection | Traffic routing through secondary ISP when primary is available | Verify local preference settings and BGP attributes |
| Slow Failover | Minutes to detect and failover during ISP outage | Enable BFD and optimize BGP timers |
| NAT Not Working | Internal hosts cannot reach internet | Check NAT policies and security zone assignments |
| Routing Loops | Traceroute shows circular paths | Verify routing policies and redistribution settings |
| BFD Session Down | BFD status shows "Down" despite physical link up | Check BFD configuration parameters and network delays |
| SLA Monitoring False Positives | Interface shows down when connectivity exists | Adjust monitoring thresholds and probe intervals |
π Diagnostic Commands
BGP Status and Path Analysis
Network Connectivity Testing
BFD and SLA Monitoring Status
π¨ Critical Troubleshooting Steps
Always verify: Physical layer connectivity, IP addressing consistency, BGP AS numbers, and firewall security policies before diving into complex BGP troubleshooting.
π‘ Advanced Troubleshooting Tips
Packet Captures: Use debug dataplane packet-diag to capture BGP packets during session establishment issues.
Log Analysis: Monitor system logs for BGP state changes and BFD session transitions.
Route Injection Testing: Temporarily inject specific routes to test path selection behavior.
β Verification and Testing
π§ͺ Verification Test Plan
Baseline Connectivity Verification
Establish baseline performance metrics before failover testing:
β Success Criteria
Both BGP sessions established, default route preferring ISP-A, and successful internet connectivity from test workstation.
Failover Testing - ISP-A Outage Simulation
Simulate primary ISP failure and verify automatic failover:
π Failover Performance Metrics
Target Failover Time: < 10 seconds
Packet Loss: < 3 packets during transition
Automatic Failback: Should occur within 30-60 seconds after ISP-A restoration
SLA Monitoring Validation
Test SLA monitoring effectiveness:
End-to-End Application Testing
Perform application-level testing from test workstation:
Load and Stress Testing
Validate performance under load conditions:
β οΈ Performance Testing Considerations
Coordinate load testing with ISP providers to avoid triggering DDoS protection or rate limiting. Monitor firewall CPU/memory utilization during high-throughput testing.
π Verification Checklist
- βοΈ Both BGP sessions establish successfully
- βοΈ Primary path selection working correctly (ISP-A preferred)
- βοΈ BFD sessions operational for rapid failure detection
- βοΈ Failover completes within 10 seconds
- βοΈ Automatic failback occurs after primary restoration
- βοΈ SLA monitoring detects and logs connectivity issues
- βοΈ NAT policies work correctly for both ISP paths
- βοΈ No routing loops or suboptimal paths
- βοΈ Application connectivity maintained during failover
- βοΈ Performance acceptable under load conditions