How not to build networks

    image

    Good day.
    In fact, I never thought that I would write an article about such trivial things, but for the fifth time I came across a disregard for the simplest rules for building networks. If it was a question of some small desks, but this is the case with large providers, banks and the state office, the names of which for some reason I will not disclose.

    Immediately make a reservation that everything written below is purely my opinion, which I do not impose on anyone . Also make a reservation that we are talking mainly about ip networks, where are we without ip in the modern world?
    Actually, all the problems of any organization involved in communication networks can be divided into several groups:
    • Physical structure of SCS
    • The logical structure of SCS
    • Monitoring
    • Access Control, Security, and Remote Access
    • Application Processing System
    • Backup system

    It would seem that all this is so clear, known and chewed that it’s not worth talking, however, the reality is more severe.

    Let's start with point number one - Physical structure of SCS
    In order not to go far for an example, right now I’m watching a picture of how ruthless road workers maim optics that were lying on the ground and not hanging for several months for reasons I did not understand. The most interesting thing in this situation is that the optics are working ...

    But, back to the more mundane things - the data center, server, central control center, etc.

    I changed about five jobs, from large providers and the state office to banks and everywhere, in every place it was like in the picture in the header. The only thing in one large provider was several exemplary sites where the bosses were taken. The rest of the nodes were all sad.

    I think you should not say that you need to buy organizers, carefully lay wires, stick stickers, do not leave kilometer "snots", keep a cross journal. These simple manipulations will not turn this into:

    image

    In this:

    image

    I remember that my colleagues somehow put things in order in such “snots,” but since not only we had access to the site, a month later the web again grew. The main problem of such “snot” is that all tasks are set with the deadline “yesterday”, the installers don’t really care about the “beauty” and throw wires anyhow, and due to the lack of control who, what and where they threw, it’s impossible to apply punitive measures and the infernal web grows. If everything is not so bad with the schemes for laying and uncoupling optical cables, then with the schemes of cross-connection indoors, everything is sad.

    Also, it is worth periodically blowing dust at communication centers, this is a better situation, about half of my employers did this. But, he was a real witness of how dust stalactites hung from routers.

    My most "sore" point -The logical structure of SCS .
    I will not speak with you as an abstract storyteller, but as a person who has served a rather large network with static routing, the absence of an address plan and complete anarchy for more than a year. All arguments on the need to introduce dynamic routing were broken down into the phrase - “the most reliable statics”.

    Of course, in some small local area network, statics are quite appropriate, but when the geography of your network grows beyond the boundaries of your cozy office, it is time to abandon statics.

    Although, there is another sad example when there was a network with IGP - OSPF, with several hundred routers in one AREA. The person simply did not suspect that it was necessary to break the network into zones, not to mention the fact that there are zones of different types. There was an answer to any questions - "it still works."

    Descript was invented for a reason, sign ports, vlan, subinterfaces, vlan interface - write everything. Since the person who comes after you or to help you does not want to study kilometers of arp and mac tables at all, go through a dozen switches and routers to understand how everything works. Also, indicate bandwidth, even if it is not taken into account in calculating the cost of the route, you can always find out the bandwidth of the channel, moreover, it will be useful for monitoring (more on this below).

    Draw the diagrams! In no place did I have normal circuits, not to mention separate L2 / L3 circuits. But then, at every enterprise there is the most “valuable” employee who is valuable in that he remembers how we connected this damned channel. At the request of providing the scheme, at best, an ancient folio is drawn from the safe, which depicts a cloud with two lines.

    There is another example, when the operator sent me circuits to the channel, where all the devices were drawn in the form of squares, the circuit was completely unreadable. Icons of router, switch, satellite, etc. come up for a reason, draw readable diagrams, gentlemen!

    An address plan is what a priori should be at the stage of network emergence. But in reality it happens that it either simply does not exist, or it is “bitten” by pieces from different ranges, sometimes even with intersections. In combination with the static L3, you are guaranteed a loop and not one.

    In addition to the address plan, it would be nice to keep a tablet with vlan'mi, and even better to raise QinQ, especially if you are a provider, and provide communication services, since vlan tend to run out and intersect.

    Also, there is such a thing as network design. Saving on it leads to sad consequences. There is one well-known major operator that provides IP-TV service, because multicast runs in one vlan, and ADSL is used as the last mile, incorrect settings of vpi, vci on the subscriber’s modem lead to L2 loops, television streams, Internet It works poorly, all subscribers suffer.

    Another very sore subject is vlan1. Why, well, why it continues to be used for control, sometimes even for data transmission and wonder L2 loops? Why can not I choose another vlan and make it native? It’s especially “nice” to look for a loop when an unmanaged switch is on access.

    Next on the agenda - Monitoring
    For some reason, many small providers prefer to respond to problems solely on a call from subscribers. For larger, at best, all monitoring is based only on icmp .

    There are tons of great open source monitoring systems: Zabbix, Cacti, Dude, etc. Also, the creature is a bunch of paid ones. I think it’s not necessary to say how important monitoring is, that in addition to icmp , snmp also exists .

    However, I nevertheless came across an employer who claimed that he had excellent Zabbix monitoring, which monitors absolutely everything. In fact, the person who set it up did not bother to read the documentation, the pullers were overloaded, accordingly the data was lost, little data was collected, all the data on the nodes was entered manually. MySQL configuration was standard.

    Now this monitoring is in a divine form, LLD (low level detection) is configured , which automatically adds and removes interfaces, tunnels, modules, fans, power supplies, etc. The information in the chart signature is taken from descript, interface speeds are taken from bandwitch, that's why you need to update this information. The only thing that has been done badly for me in Zabbix is ​​housekeeping - partitioning is necessary for large installations.

    Collect syslog from the equipment, not from everything, but at least from critical nodes. It is possible to do this through Zabbix, but it seems to me that the solution will come out too “heavy” (there is a corresponding article on the Zabbix blog).

    Expand several NetFlow collectors, put some kind of analyzer and you will always see who and what overloads your channels. If there is any kind of billing, you can use it.

    Not a Minor Point - Access Control, Security, and Remote Access
    As a rule, with the word AAA, I only heard “what is this”? - or - "Ah, I know - it's a battery!" Therefore, either one local account is used for all, or each has its own. In any case, when an employee is fired, especially when he didn’t take it himself, everyone starts to run with bulging eyes, looking for where else he is registered. I have an article on how to deploy Tacacs +, believe that it will make your life much easier. In addition, in the logs you will see who and what did with the equipment.

    Some use RADIUS, invents something else. Personally, I like Tacaca +, almost all modern equipment supports this protocol.

    Also, set the ACL to access, at least on the equipment that has “white” addresses, as our narrow-eyed friends do not doze off.

    Do not release the control outwards, that is - ssh, telnet, rdp, etc. If you already release, at least set firewall on certain ip. Personally, I always deployed openvpn for access, generating an individual key for each employee.
    In order not to go far for examples, at my current place of work I have a network on Cisco equipment, with a bunch of local accounts and different passwords for privileged mode, which of course no one remembers. By inhuman willpower, it was possible to deploy Tacacs + on this network - two redundant servers, geographically separated, synchronized. But, despite this, when setting up new equipment, they continue to set up a million local accounts, instead of one. To the statement that local accounts do not work when Tacacs + is running, the answer is “local accounts are reliable, but what happens”?

    Also, when I started Tacacs +, I saw a bunch of entries in the authorization logs - attempts to log in as root, guest, etc. from our chinese friends. When asked about access ACL, he got round eyes. By the way, now we were talking about a large bank ... I didn’t think of including the

    application processing system in the list, but there was a case that in the state telecommunications services office with several thousand subscribers, applications were received by phone, printed and transmitted to the contractor by , drum roll, fax! But in general, in this regard, everything is fine, except that the "usability" of all systems with which I had to work at zero.

    Go ahead - backup system
    I think there is no need to explain that equipment can fail due to power surges, fires, floods, cleaners and other natural disasters. To quickly replace it, we will need its configuration. He witnessed how a person "gave birth" from the memory to the configuration of a burned-out router, simultaneously smoking steam from the ears and other parts of the body. I don’t call to raise something global, for example Rancid + SVN (as Mr. EvilMause suggested , Rancid can be forced to make backups from the equipment of any vendors). But everyone can write the simplest script in bash, or in another language that will run on routers, give a command to copy the configuration file to ftp / tftp and sort by folders, and Tacacs + will allow you to create an account with limited rights.

    By the way, Cisco equipment supports changing and copying configuration via snmp in case of RW community, but community must be bound to ACL, otherwise ... Especially if it is snmp 1 or v2c. In the same bank, many places where RW snmp community was configured without binding to the ACL, and community was no more complicated than “public”. But one day, as the legends say, a certain “hacker" penetrated the bank’s network and turned off ip routing. How did he do that ?!

    For servers, you can use something like Bacula, if you are really lazy and a server without a RAID controller, use at least SoftRAID 1, for example Linux mdad, but not the built-in FakeRAID. There will be at least some kind of data backup. And best of all, since we live in the age of modern technology, use virtualization. However, just the server side of most operators is not so bad. But, with backups, trouble. In Banks, things are much better with this, since all work is tied to the AU.

    To summarize
    The conversation was only about ip networks, if you look in the direction of satellite networks SCPS - they suffer from “interference” from an unknown source. In VSAT, time slots end in a “miraculous” way. I personally witnessed how beautifully a person configured HUB iDirect. This model has two Protocol Processors that can balance the load among themselves, respectively, need dynamics, the only protocol that this system supports is RIPv2. But this person made a bunch of router - Protocol Processor statics, wrapping everything on one Protocol Processor, without even bothering to make a route to the second, with greater metrics. Accordingly, half of the modems did not work for an “inexplicable” reason, and in the event of a transfer of load to the second Protocol Processor, everything did not work. Or a designer who tried to assure everyone that the calculation was correct and he does not understand why the antenna is aligned exactly in a column standing alone in the field (a real case, by the way). Fortunately, I never touched PDH and SDH, although it seems that things are a little more fun there. I can’t talk about telephony; I have never worked with her.

    It would seem that these are all such obvious things that you should not even talk about, but ... But we have what we have. Perhaps in other regions the situation is radically different, but it is hard to believe.

    It seems to me that all this happens for the following reasons:
    • Commercial companies are primarily focused on the needs of the business, and rightly so, but "high" managers are focused not on the good of the company, but on the personal good. As a result of which - unrealistic deadlines for tasks, lack of payments for processing, and then the search for the guilty. In the state, delete the words “business” and “commerce”.
    • As a result of the first, a certain attitude of workers to their work is formed.
    • The desire will save on staff, instead of a staff of competent specialists, to hire one and ten yesterday’s schoolchildren or humanities, who were not taken anywhere else.
    • The lack of competent local managers who would act as a layer between technical specialists and directors. As a rule, managers are either not competent and everything falls on the shoulders of subordinates, or they have scored everything for a long time.
    • The desire to save or "cut" money on purchases. This leads to the fact that equipment is purchased that does not cope with the duties assigned to it, incompatible, or simply not needed.


    • For those who have the patience to read this stream of consciousness to the end, it may seem that I'm just whining, and I would agree with you, but after the fifth time of the same problems, it’s hard to restrain myself.

      Once again, I recall that everything that is stated in the article is purely my opinion and is based on personal experience. This is written with the hope that someone will not make such mistakes.

      Thanks.

    Also popular now: