Each office will have their own internet connection, but in the event that connection fails the desire is to have branches backhaul internet access over the MPLS to the data center. Since there is the possibility that the internet failure is located somewhere other than between the branch ISR and the ISP device there needs to be a method to verify connectivity. Enter the IP SLA commands.
The following commands will set this up. First, the actual IP SLA command. The SLA is created, a request type, target, and source is specified, the VRF is specified, then threshold and frequency are set. The threshold is how many milliseconds pass before marking the link is down. The frequency is how often the requests are sent.
ip sla 11
icmp-echo 8.8.8.8 source-interface GigabitEthernet0/0/1
vrf IWAN-SECONDARY
threshold 2500
frequency 15
ip sla 12
icmp-echo 8.8.4.4 source-interface GigabitEthernet0/0/1
vrf IWAN-SECONDARY
threshold 2500
frequency 15
ip sla 13
icmp-echo 4.2.2.2 source-interface GigabitEthernet0/0/1
vrf IWAN-SECONDARY
threshold 2500
frequency 15
ip sla 14
icmp-echo 198.41.0.4 source-interface GigabitEthernet0/0/1
vrf IWAN-SECONDARY
threshold 2500
frequency 15
ip sla 15
icmp-echo 198.41.0.4 source-interface GigabitEthernet0/0/1
vrf IWAN-SECONDARY
threshold 2500
frequency 15
As you can see, there are five listed. Two are Google DNS servers, one is a Level3 DNS server, and the last two are root DNS servers. I chose five because I thought it was enough to confirm an actual internet outage, but not so many as to bog the system down with requests.
The next step is to schedule the SLA commands to run. The commands are pretty self-explanatory. Run forever, start now.
ip sla schedule 11 life forever start-time now
ip sla schedule 12 life forever start-time now
ip sla schedule 13 life forever start-time now
ip sla schedule 14 life forever start-time now
ip sla schedule 15 life forever start-time now
Now the important part comes in. Just because we are running these commands doesn’t mean much. We want to track the reachability. The following commands do just that.
track 11 ip sla 11 reachability
track 12 ip sla 12 reachability
track 13 ip sla 13 reachability
track 14 ip sla 14 reachability
track 15 ip sla 15 reachability
Next is creating a list of these SLAs. Then setting a threshold. In this command if half the sites are down the group is marked as down.
track 10 list threshold percentage
object 11
object 12
object 13
object 14
object 15
threshold percentage down 49 up 50
The last step is to actually put this into use. The route from the default VRF needs to be removed and replaced with the same command, but with the track command added.
no ip route 0.0.0.0 0.0.0.0 GigabitEthernet0/0/1 ip route 0.0.0.0 0.0.0.0 GigabitEthernet0/0/1 Next_Hop_IP track 10
Now, if the tests to the external sites fail the default route is removed. Then, a default route can be learned through EIGRP to route back through the data center to get out to the internet.
Here are a couple commands for troubleshooting:
show ip route track-table
show ip sla summary
The other side to this is configuring the data center router to advertise the default route. First, create an ACL to define the default route.
ip access-list standard DEFAULT-ONLY
permit 0.0.0.0
Then create a route map that includes the previously created ACL.
route-map STATIC-IN permit 10
description Redistribute local default route
match ip address DEFAULT-ONLY
Finally, add the route map to the EIGRP redistribution
router eigrp IWAN-EIGRP
address-family ipv4 unicast autonomous-system 400
topology base
redistribute static route-map STATIC-IN
exit-af-topology
exit-address-family
If all went according to plan then a failure in the branch internet service should remove the default route, and then EIGRP should propagate the new default route.