Once again about hash collisions in switches

    The saying


    We put here in test mode one switch from Eltex, the MES5248. And we began to torture him in every way - to configure VLANs, to fasten MSTP and to be exhausted in every possible way. In laboratory conditions, no visible jambs were found - they were placed under real live traffic. And then a strange glitch came out, at the level of the elusive Joe - from time to time in the ARP table, individual entries did not have a value in the port field (yes, they display it in the ARP table. Conveniently). Attempts to catch the tail, repeat, were unsuccessful. Technical support puzzled itself and the developers, eventually came to the observation that records in the ARP table live longer than in the MAC table. And longer than it is configured, which, of course, is a bug, but it’s not a terrible horror. It remains only to verify that the entries in the MAC table die by timeout, natural death,

    Well, check this with your hands, of course, laziness. Is it a joke - browse> 2000 poppies several times in a row and find holes. It is necessary to code. To do for a specific switch is a thankless task, so the code turned out for all the switches that are on the farm. If you are interested in what happened - read the tale.

    Fairy tale


    As a result, a progka was born, which first cleans the MAC table, and then, approximately once a minute, merges it from the switch and analyzes it. At the same time, they are considered

    1. Unique poppies observed since the beginning of testing. (Ever)
    2. Unique poppies that are currently observed on the switch. (Now)
    3. For each poppy, I remember the time when it first appeared in the table. And if he appears in it again after a time less than aging time - it is believed that he was knocked out for some reason. (Early deaths)
    4. All such poppies are counted. (that before slash in Early deaths)

    By the way, I shot the telnet fdb-table, although it is possible by snmp, which I already wrote about on the hub , unlike UserSide results , on the way there were switches that give fdb wildly for telnet and instantly for snmp, for example, DXS-3326GSR.

    Observations are conducted for 20 minutes.

    To my surprise, the switch, for which everything was started, proved to be surprisingly good - the time before the aging time Now did not differ from Ever and was empty in Early deaths. The Cisco 3750 behaves similarly.

    But in the rest of the zoo, problems did show up. Although most of the tested dlink has a smaller MAC table, the total number of addresses is often much smaller. As a result, such a graph was drawn - the dependence of the number of problems on the total size of the poppy table.



    Actually, what conclusions - different switches of the same vendor behave differently and the number of problems grows very quickly from the number of poppies. Org output is operational - reduce the number of vlan-s by 3526. Increase the aging time on all equipment.

    Further - apparently, the switches deal with hash collisions in principle and in different ways. Some - throw out old poppies, others - probably, somehow bypass the collision with a loose table. How exactly they do this, it’s not clear to me - either they count the address by the second hash function, or they put the new address in the next cell, or the traffic to the new poppy is stupid, or something else. If representatives of vendors read this and this information is not secret - share in the comments.

    The plans are to finalize the program to data collection and management via SNMP, detach from internal dependencies, put it in public. Accordingly, the question for the community - is it worth it to do?

    Another interesting question is - what are your thoughts on how to remotely catch the broadcast situation on new poppies and estimate the amount and / or volume of such traffic?

    Only registered users can participate in the survey. Please come in.

    Whether to put the code in public

    • 74.5% Yes, I drive on my equipment 85
    • 17.5% Yes, I will test the new 20
    • 0.8% No, it will be necessary - I will write 1
    • 6.1% No, it is not necessary, everything works 7
    • 0.8% No, I already have this, I will describe in the comment 1

    Also popular now: