Software routing: the story of moving from hub-n-spoke (vyatta + openvpn) to fullmesh (mikrotik + tincvpn) - part 1

Published on July 04, 2013

Software routing: the story of moving from hub-n-spoke (vyatta + openvpn) to fullmesh (mikrotik + tincvpn) - part 1


    Under the cut of a curious habrachitatel is waiting for a description of the suffering of people who at one time saved money on tsiska with dmvpn.
    And also what came of it.


    Question Background


    A few years ago, when the grass was greener, and% company_name% was just starting out and had 2 branches,
    it was decided to use Vyatta Network OS as a single routing platform, and static openvpn site-to-site tunnels as a solution for VPN

    One of the reasons for this decision was the rich possibility of customizing the ISR.
    As one of the backup measures, 2 routers were installed in each file.
    A star-shaped topology with such a number of peers was considered justified.

    As a hardware appliance it was supposed to use
    - full-fledged servers with esxi - for large installations
    - servers with atom - for small ones.

    SUDDENLY, over the year the number of branches increased to 10, and another one and a half - up to 28. The
    constant flow of firm quests from the business provoked traffic growth and the emergence of new network and application services.
    Some of them settled on linux router virtual machines, instead of separate servers.

    Functionality vyatta began to be missed, a large number of packages affected the overall stability and performance of services.

    Also, there were a number of problems with the VPN.
    1. At that time, there were about 100 static tunnels terminated at one point.
    2. In the future, there were another opening of 10 branches.

    In addition, openvpn in the current implementation,
    1. does not allow the creation of dynamically generated site-to-site tunnels.
    2. It is not multithreaded, which on processors of routers with 1-2 uplink tunnels leads to overload of 1-2 cores (from 8-16 in our case) with encryption and leads to funny consequences for traffic inside the tunnel.
    3. Clause 2 is especially typical for routers whose processors lacked AES-NI support.

    You could use gre + ipsec, which would remove performance problems - but there would be tunnels, hundreds of them.

    It was very sad, I wanted to get warm with tsisk with hardware encryption and use dmvpn.
    Or already global MPLS VPN.
    In order not to trifle.

    It was necessary to do something (s)

    - 1-2x processor servers with a delicious amount of brain and esxi on board became branches
    - routers were decompressed and all third-party services moved to dedicated virtual machines with pure linux
    - after some searches, the virtualized x86 routeros was chosen as the routing platform.
    Mikrotik had already had operational experience by then, there were no problems with performance and stability on our tasks under x86.
    On iron solutions, especially on RB7 ** and RB2011 ** - it was and is more fun.

    Potentially, such a solution gave an increase in functionality in the field
    - route redistribution
    - pbr + vrf
    - mpls
    - pim
    - qos
    and also some short ones.

    The problem was that routeros (like vyatta) does not support multipoint vpn.
    Again it became sad, they began to dig towards purely Linux fullmesh solutions.

    Separate, by the way, thanks to the ValdikSS habrayuzer for his article .

    Were considered: peervpn, tinc, campagnol, GVPE, Neorouter, OpenNHRP, Cloudvpn, N2N
    As a result, we stopped at tincvpn, as a compromise solution.
    I also really liked gvpe with exotic encapsulation methods, but it did not allow to specify several IPs (from different ISPs) for remote peer.

    What a shame, tinc was also single-threaded.
    As a solution, a dirty hack was used:
    1. between each two members of the mesh it rises not in 1, but in N tunnels, which are then combined into a bond-interface with outgoing traffic balancing.

    2. as a result, we already have N processes that can already be distributed between different cores.

    When using the specified scheme, in the lab with
    - 8 tinc-interfaces, aggregated into one bond according to the specified scheme
    - forced binding of processes to different cores
    - 1 x xeon 54 **, 55 ** (old-generation processors without aes-ni support)
    - aes128 / sha1,
    - udp as the transport protocol.
    - without using qos both inside and outside the tunnel.

    The vpn performance was about 550-600 mbps on a gigabit channel.
    We were saved.

    What happened:
    1. The final decision was given the working name “vpn-pair”.
    2. Consists of two virtual machines located on one esxi host and united by a virtual interface.
    1 - routeros, which deals exclusively with routing issues;
    2 - pure linux with licked and slightly dopped tinc.
    Acts as a vpn bridge.
    The tinc configuration is similar to the laboratory one, while the bond-interface is bridged with a virtual link to the microtik.
    As a result, all routers have a common L2, you can raise ospf and enjoy.
    3. Each branch has 2 vpn-pairs.

    ... to be continued ...

    Notes


    1. In principle, it is possible to integrate tinc into vyatta, for this reason a patch was even written (for the option with 1 interface)
    But to predict the behavior of such a bundle in an emergency (or after updating the platform) with a custom patch is difficult to the end for a large network such experiments are undesirable.

    2. The “matryoshka” option was also considered.
    It had 2 directions:
    - purchase of hardware x86 Mikrotik and support for linux-vm with a vpn-bridge by means of integrated virtualization (KVM).
    did not take off due to poor performance.

    - nested virtualization (esxi -> routeros -> kvm -> linux).
    It didn’t take off for the same reasons, due to the lack of support for VT-X / EPT emulation in esxi 5.1 (needed to start a kvm machine).