This moody eigrp on DMVPN tunnels

Published on April 29, 2013

This moody eigrp on DMVPN tunnels

I will share the problem and its sudden solution, which we encountered last week, and which caused us a lot of trouble.
So, the situation is fairly standard, the central office of the company is connected by communication channels to remote units. Communication (Internet and VPN) is provided by two operators. In order to minimize the downtime of remote units when one channel falls at the office, 2 tunnels DMVPN were built for each unit. Dynamic intra-network routing, eigrp. Respectively in the central office 2 Cisco routers are used.
The number of remote units is about 70, respectively, each router builds the same number of tunnels. The average load on the channel is 40-60% of the bandwidth guaranteed by operators.
DMVPN setup was used fairly standard, described in the primer:



hub:

interface Tunnel201
description - = DMVPN_201 = -
ip address 10.10.201.1 255.255.255.0
no ip redirects
ip mtu 1416
ip hold-time eigrp 1 25
no ip next-hop-self eigrp 1
ip nhrp authentication 11111
ip nhrp map multicast dynamic
ip nhrp network-id 201
no ip split-horizon eigrp 1
delay 1000
cdp enable
tunnel source GigabitEthernet0 / 0.2
tunnel mode gre multipoint
tunnel key 11111
!
end

router eigrp 1
network 10.10.201.0 0.0.0.255
network 192.168.0.0 0.0.1.255

In general, a classic circuit that works fine for several months in a row.

www.cisco.com/en/US/tech/tk583/tk372/technologies_configuration_example09186a008014bcd7.shtml is a similar example from Cisco.

Until one day we did not encounter the problem that all internal routes on eigrp did not start to fall off every 30-90 seconds. At the same time, the tunnel stood perfectly and the tunnel interface perfectly pinged. Errors like:

* Apr 23 2013 15: 19: 47.759 GMT + 11:% DUAL-5-NBRCHANGE: EIGRP-IPv4 1: Neighbor 10.10.202.9 (Tunnel202) is down: holding time expired
* Apr 23 2013 15: 19: 52.707 GMT + 11:% DUAL-5-NBRCHANGE: EIGRP-IPv4 1: Neighbor 10.10.202.9 (Tunnel202) is up: new adjacency
* Apr 23 2013 15: 23: 56.298 GMT + 11:% DUAL-5-NBRCHANGE : EIGRP-IPv4 1: Neighbor 10.10.202.57 (Tunnel202) is up: new adjacency
* Apr 23 2013 15: 24: 43.070 GMT + 11:% DUAL-5-NBRCHANGE: EIGRP-IPv4 1: Neighbor 10.10.202.9 (Tunnel202) is down: holding time expired

Moreover, the problem arose suddenly and immediately on both routers. Began dancing with a tambourine, smoking cisco.com, etc.
Overloading each of the routers separately did not solve the problem.
The overload of both routers, taken as a last resort, left the branches without communication for a rather large (up to 15-20 minutes) interval, but it helped to cope with the problem. It was possible to exhale and calmly start looking for a reason, hoping that for a few more months everything would work just fine, as it had worked before.
However, we were happy early, 3 days later the problem repeated in exactly the same way. All recommendations from cisco.com on changing mtu on the tunnel and the physical interface, as well as other shamanistic campaigns, did not bring any results. After a fairly long period of time, in one of the very tiny forums, we found a topic with a similar problem, and in the last message it was written something like:
“Thank you all, the problem is fixed. I don’t know how and why, but the inclusion of ip bandwidth-percent eigrp helped. ”

Since there was nothing to do anyway, without much faith in success, we prescribe the specified command in the properties of the tunnel, indicating 100 as the parameter for using the channel (there’s nothing to lose anyway), and - MIRACLE, it worked like a clock.
But after looking through a bunch of forums we repeatedly came across this problem that other colleagues had, and the solution was never described.
Naturally, in the future we reduced the percentage figure.
Maybe we invented another bike, but did not find this information anywhere else. Maybe someone else will need it. Please use.