vakorovin May 31, 2013 at 15:08

Tale of how we were friends with a billing card

Habr, and hello again! Last year I wrote one article , after that there were several attempts to write a new one, but it did not work out. Finally, a more or less formed thought appeared, which I will try to arrange as a full-fledged article. It will be about working with devices, more precisely, about how we were able to connect the database of the equipment used, their geographical location with the billing used. Those interested - under cat.

Unfortunately, I can’t remember how it all began. At the time of joining the company, Quantum GIS was already in use, and partially the network was mapped. I had to get acquainted with Kugis in connection with the task - to put all the equipment used in the purchased provider (just in the provider that I got a job in). I must say that at that time I didn’t understand what the cards could be used for - well, nothing, my opinion soon enough changed to the opposite.

In principle, everything is simple, 4 layers:

Devices (switches, media converters, servers, etc.);
Logical connections (where the signal is coming from);
Couplings, splice-cassettes;
Cable: optics and copper (if used as highways).

2 tables in Postgre SQL, the first for storing points, the second for lines.

A clutch, switch, or an important client is defined by the type field. At that time, there were several types, and no one thought about possible problems in the future.

Some time passed, the devices were applied. I will not be able to convey the chronology of events, since they returned to each task repeatedly, each time tightening the nuts. In any case, the applied devices did not give particularly any useful information. The devices continued to roam, the port numbers changed (I mean the installers, who connect the switch on the side, on the 25th or 28th port). By and large, at that time we were able to see some problem areas - the use of utp as highways and overgrown segments (it was such that up to 40 others were connected from one switch, and when the electricity was lost in this house, up to 30 were left without internet) houses). Already at this stage, tasks were set for branch managers to eliminate these stocks.

As for working with the card, we could only use Zabbix, which stupidly pinged devices and tinted those that were not available in red.
As I wrote in a previous article, work was underway on creating a new shell with expanding functionality for UTM. Breaking down the addresses of subscribers with a space, we received a list of settlements, streets and houses. Having worked with this list, we got our own base of addresses. And to the standard table UTM - users added the field house_id, by which we could get the correct subscriber address without abbreviations and typos right up to the house. In the new shell, the subscriber address has become a set of selects instead of a single text input. Yes, by the way, KLADR does not coincide with what is written in the passports of subscribers, so we did not use it.

Having received the list of used addresses, we put in order the addresses of the devices on the map. Thus, we got a certain number of subscribers and devices for which house_id coincided. Something already. After analyzing this information, we got a list of houses in which not a single device is listed, and several subscribers are connected there.

I completely forgot to say that in our case access to the Internet was provided through a vpn connection, so there are gaps in the equipment used, especially if one provider takes over another when part of the purchased provider’s staff goes overboard.

So, there is a list of houses without devices and with subscribers. Each case is unique, somewhere a copper transfer to a neighboring house, where an unmanaged switch is connected, etc., etc. All problem areas identified and resolved. Newly found copper flips - under the replacement for optics. Next, I will move on to another topic, and summarizing the above, I repeat - we returned to these tasks repeatedly and as a result were able to achieve 100% knowledge about the state of the network.

Next, I think it's worth talking about the logical component of the process. In general, it’s probably worthwhile to outline a separate post about setting up switches, but you can’t show some aspects of this topic either — I won’t open the topic differently.

Speaking of the fact that we were able to identify the overgrown segments, I assume that we have already identified some standards. Yes they are. I will try to briefly list them, but if something is not clear - ask in the comments, I will answer with pleasure.

The equipment is located in the networks 192.168.x.0 / 24;
Each 192.168.x.1 switch defines as many segments as how many gigabit ports it has (no more than 24). For example, D-Link DGS 3627G - 24 segments. x - let's say the network number (grows with each L3);
Each segment consists of no more than 10 devices with no more than 240 subscriber ports;
Each segment begins with a switch, whose ip-address is formed according to the principle: 192.168.x.y0, where y is the number of the L3 port from which the segment is connected. For example, for an L3 switch with an IP address of 192.168.37.1, the bush on port 19 will start from the switch 192.168.37.190 (that is, the last number in the address of the first device will always be a multiple of 10);
The remaining switches in the segment (no more than the remaining 9 out of 10 managed) receive the addresses 192.168.x.y1-192.168.x.y9;
The first switch in the segment sets the IP addresses of subscribers connected in this segment as follows: 10.x.y0.n, where n is the serial number of the subscriber, grows with the number of subscribers in the segment. We added some restriction, n cannot be less than 10 (one is a gateway, we keep the rest in reserve, just in case), i.e. subscribers connected to the 192.168.37.190-192.168.37.199 switches will get the addresses 10.137.190.10-10.137.190.249;
The entire management network is moved to a separate vlan, vlan is also allocated to each subscriber segment (they have nothing to do with this article, so I will not dwell on them).

Thus, in each segment we can theoretically connect up to 240 subscribers, in practice there are no such numbers (maximum numbers fluctuate around 50-60 subscribers).

Now the network meets the requirements that have been listed. Yes, these requirements are not without drawbacks, but nevertheless we get a fully managed and structured network. In connection with the implementation of gpon, some adjustments are made, but I will not dwell on them.

At the time when these postulates were formed, addresses were assigned to the switches by pike command. At the very beginning there was a case when one administrator set up the switch in one location, I in another, as a result, they occupied the same ip-address. There was something to do with this.

But in addition to separation by segments, we must define a uniform model of iron operation globally, throughout the network. It was clearly deposited in my memory as I was taught how to configure switches. There were several text files with examples of configs for different pieces of iron, and for each specific device of configs there were several and basically identical hardware (let's say DES 3028 and 3200-28) was configured differently. There is a time setting, not here; there is an ACL, and there is, but it is different.

In 2011, there were severe thunderstorms in the Tver Region, part of the switch ports throughout the network burned out, the network went 30-40 percent. She lay down because even though the brains of the switches were not damaged, the winding ports did the trick - 100% load on the processor and the switch no longer wants to communicate with us. 2 weeks of sleepless nights (without exaggeration) - the installers did not have time to carry the switches to the office, so we had to wait for those moments when the switch at the installation site still came to life for some moments, running into it and setting the storm control just from the buffer. This and other examples did their job - other standards were formed. Scripts filled the necessary configs throughout the network, the problems disappeared. Yes, last year on all transit switches put up the UPS, the difference before and after can not be described. Zabbix began to call at times less.

We figured it out. The next logical step is to remove the mac addresses on the switch ports. Knowing the mac-addresses of subscribers, we were able to determine who is included in which port. Shoals surfaced again. It turns out there is a managed 24-port switch, 7 subscribers, the 2nd on the 1st port, the 3rd on the 7th and several ports are scattered. Yeah, five-port hubs again. Well, yes, the installers in the shield set so that they would not drag utp to the second subscriber on the floor. Eliminate. Put things in order. Subscribers re-bound. We observe, or rather move on. Bam! Go back. It turns out that in one of the cities, the installers who are on the deal, those who are only on the connections, when they turn on the new subscriber, so that they take less time and pull out the already working subscriber from the port (it is visually visible - it works) and they include a new one in this port, and working anywhere. They are not responsible for this. Another installer pops up an application - the subscriber has lost the Internet. The reason, I think, is clear. They inserted such lyuley to the head of the branch and his mutt that part of these businessmen left. Good lesson to the rest. Somewhere I saved a screenshot (to search for laziness), for days it took about - 7 subscribers visited one port (a new house was included that way). Looking ahead, I’ll say - at that time we already tested the system - when choosing a subscriber’s address, only certain switches were offered. And choosing a switch - ports with marks (free / burned out / busy / hub) were offered to choose from - i.e. Any port can be selected, but there will be a warning. As soon as the port is selected and if it is not marked in the system as burned out, a command to enable the port is sent to the switch. There are many aspects, I will not pay your attention to them. In general, in this city, with the permission of the founders, we included this system. Installers ran in the soap - checked all subscribers who have lost the Internet. Yes, it was, so to speak, rude in relation to subscribers, and I am usually an ardent opponent of experiments on subscribers, who, for their part, fulfill monthly payment obligations, but everything was done for them. So, within a few days we got a tight binding of subscribers to ports and forced the installers to work according to the regulations. To summarize the topic of port bindings, I can say that impb bindings are created throughout the network, all unused ports are turned off. For each subscriber, we can say port migration. and I am usually an ardent opponent of experiments on subscribers, who, for their part, fulfill monthly payment obligations, but everything was done for them. So, within a few days we got a tight binding of subscribers to ports and forced the installers to work according to the regulations. To summarize the topic of port bindings, I can say that impb bindings are created throughout the network, all unused ports are turned off. For each subscriber, we can say port migration. and I am usually an ardent opponent of experiments on subscribers, who, for their part, fulfill monthly payment obligations, but everything was done for them. So, within a few days we got a tight binding of subscribers to ports and forced the installers to work according to the regulations. To summarize the topic of port bindings, I can say that impb bindings are created throughout the network, all unused ports are turned off. For each subscriber, we can say port migration. all unused ports are turned off. For each subscriber, we can say port migration. all unused ports are turned off. For each subscriber, we can say port migration.

Before moving on, I want to return to what has already been said, to indicate what has been achieved and to identify what remains to be decided.
We have:

Device table;
3 tables with settlements (here areas and districts), streets and houses (or other objects);
The table of subscribers.

From the above postulates, we determine that the IP address of the hardware determines the IP address of the subscriber. It follows that in the table of subscribers we store the device as an IP address. But the ip address is just one of the parameters of the device stored in the devices table. The main identifier in it has always been gid. But gid implies a change - the device can be removed, and a new one is installed instead, with the same IP address assigned to it. By the way, I'm working on this task now.

We mark the ports as burnt and they will never be turned on, more precisely until the device is returned for repair and it comes back with the mark “restored”. But sending the device for repair, they can remove it from the card (they could), and all information will be lost. Today, now, we do not know which field and where is more important. Unfortunately, not all devices have a serial number (in my opinion, even devices returned with an erased sn from the repair, I mean not a piece of paper, of course), but everyone has the mac devices we need. Therefore, using PostgreSQL triggers, I write all events indicating all fields (gid, ip, sn, and mac). Marks of burnt ports are stored in a table with binding to the mac-address.

Now about the map. Initially, anyone could put a point in Kugis. Today, this operation is prohibited by triggers in the database. All work with iron is transferred to the web interface. There is the concept of “warehouse” (there are several of them, according to the number of branch offices), the addition of devices is allowed to certain persons, indicating the mac address, serial number and model of the device. Any sneeze, as I said, is written to the log. Moreover, Zabbix regularly removes all parameters from the hardware, so unauthorized replacement of the switch leads to the fact that subscribers will not be able to work, and a piston will be inserted into the head. In the comments to the last article, as far as I remember, there was a comment that it was impossible to build such a system. Maybe. Tightened the nuts, punished the perpetrators, and that’s it.

Further, the ip address of the device is generally a magic variable. It is monitored in several systems, including during the formation of a configuration file. Yes, we transferred this work to a script. The script creates an config at the ip-address and so far it only remains to fill in, and in the future it will be filled in automatically so that they do not see too much. So, if preliminary preparation was not performed on the map with transferring the device to the desired point (the place of future deployment), the config will not be generated. So I went to the next point.

The fact is that initially, all these lines, polygons and points on the map were not connected in any way. There was an OSM substrate, there were 4 layers, which I wrote about above, and that’s all. Having sketched a couple of triggers to change the geometry of the point (read devices) and set adhesion when drawing, we were able to glue the connection lines to the devices. Thus, when moving on the device’s map, the contact point of the “link” with the device also changes. The so-called links are tied to equipment (exactly like optics to couplings) in several ways. Firstly, geometrically - the start or end point of the line should coincide with the coordinates of the device point. Secondly, there are fields dstart, dstartport, dend, dendport, which determine from which device and port to which device and port this link goes.

The picture before changing the coordinates of the device:

and after:

Now, analyzing the mac addresses on the switch ports and knowing the mac addresses of devices and subscribers, we can see what information in these links is not reliable. Thanks to this system, it is visually possible to see which traffic flows between the switches, which ports are used for highways, which forks are forwarded, etc. etc. All this information is also used when creating the device config. Now, in order to bind the second subscriber to the port of the managed switch, you need to indicate on the card the device from which you will connect and put a link from the managed switch to this stupid piece of iron. What to do, there are such situations - the 25th subscriber, and ports 24, but we need to know about this. Otherwise, the subscriber will not be able to bind to the port,

Actually, this can’t be called the end of the article, but what can you do - you have to go after your daughter (today is a short day in the kindergarten), and if I save it in a draft in mid-sentence, then returning once again I will delete the writing.

I have already given several sub-totals, so I will not repeat myself on the details. We were able to connect 2 completely unconnected systems into one, and as a result, we got the equipment automatically configured (we have three different types of configurations for each switch). Of course, this is not all:

The types of objects used (couplings, switches, etc.) were divided twice (base classes, and more detailed implementations of each class);
Using the devices on the map, we calculated the house id from OSM, expanded our own house table by adding osm_id and osm_geometry there (and thus we know the edits of the houses we are interested in);
They inflicted part of the houses that did not exist in the OSM project;
Added the ability to participate one switch in several segments (this is when 2 providers use the same switch);
Transferred subscribers from vpn to ipoe (only with mac address binding).

and many, many other things.

I understand that the article, like last time, turned out to be somewhat blurry, but I cannot imagine clearly all the processes that occur inside the company. If any topic requires a more detailed consideration, please contact - I will answer.

Tags:

Tale of how we were friends with a billing card

Also popular now: