Weekdays technical support: stories about what happens when you can’t reach the user

    Among other things, we provide support for Votsap. Evening, nothing portends problems, when suddenly a video call window opens. Close-up - telecom equipment installed on the customer’s site ... And it burns. Literally. You can see the light, it seems - the insulation of the wires near the power supply. A man asks what to do. I shout:
    - Carcasses!
    He:
    - Can I?
    - Can!
    And only then does it stew.



    It turned out that not all of them can be extinguished by conventional means: in response, they can shock with a couple of tens of thousands of volts. Or even quenching will interfere with the operation of important equipment. In general, he saw a fire, called support and, while the connection was established, he found and prepared a fire extinguisher.

    In general, hello, Habr! I am from the remote technical support team, and we often communicate with users all over the country and abroad. And they do rather strange things. Below are the bikes.

    What we do and what it is


    CROC can take on support offices, production and individual services. We have been doing this for many, many years. There is a call center team that responds to standard scripts and helps in typical situations, the second line (me and my colleagues) is for analyzing difficult cases when you need to get into the network, server or application software configuration level, and mobile engineers who ride and change iron. Plus, a reboot command in each city, but about this it will be further. There is a lot of romance in the work, because we often make very tight SLAs for banks and retailers, and support transport infrastructure facilities. For obvious reasons, I don’t mention the names of the customers, and the security guards still changed some of the not very important details so that no one could be clearly recognized.

    Heat


    In the heat peaks, communication with one of the local servers is lost. There are a lot of such servers at the facilities, they are mounted quite compactly in technical rooms, and there are difficulties with cooling everywhere, moreover, an external forced one is often used. Well, that is, a powerful fan aimed directly at the rack. Colleagues call it the buzzword “freecooling,” but this is a fan aimed at the rack.

    But this does not happen every day in the heat, but only about every second. We begin to understand - sometimes, as in a detective story: it turns out that there are two people working in the same room. One specialist knows what a stand is, or is closely aware of the mysterious connection of flashing lights and a fan. The second specialist is a grandmother. She does not know. And when the heat reaches its maximum, the grandmother feels the thermal threshold, then she takes and turns the fan on herself. Because her little fan is not so powerful.

    The logical consequence is that the grandmother cools, the rack overheats. Further along the temperature threshold, a regular thermal shutdown occurs. And we have another ticket.

    The case is not uncommon, we are not used to it. We write memos and train the key people of the customer, and they must train linear people. But the right thing does not always happen. In another similar room, the stand was turned off at night for six to eight minutes. Then they learned: the new watchman was not warned, he cut off the power of the rack from the outlet, turned on the kettle, and then returned everything as it was.

    There are simply strange inputs. Another woe-electrician brought the power supply of the air conditioner to the light switch in the technical room. While there is someone there - everything works. People leave - the stand turns off. As a result, there is now a sign “Do not turn off the light !!! I will rip out my hands !!! ”It seems that the electrician has already been pulled out, so he cannot make the correct wiring, he has to decide with this crutch.

    Toilet Permission


    We send the visiting engineer to service one of the nodes of a large network. The girl-engineer goes to the place. I must say that this is a very peculiar room with high ceilings, which was built during the birth of the USSR. After several reconstructions over the men's toilet, a space was created above the booths where equipment can be put. A common situation in the country, by the way: there is not enough space for iron, they make a "false ceiling". For some reason, it’s usually there. I myself connected the switches a couple of times while standing on the toilet.

    The girl comes to the head of the facility and asks for permission to visit the men's toilet. At first, people for a long time do not understand why she needs it. Then the bureaucratic machine turns on: the case is unfamiliar, and nobody knows what to do. In the end, she had a lot of trouble making everything right. The guys just closed the toilet for the duration of the work officially and allowed to do anything inside.

    In retail chains, for some reason, equipment is often mounted next to pipes with water or fans. In a pair of server rooms and in the room we watched the water flow. The last case was generally seen on monitoring cameras: it begins to rain. There is a rack with equipment (naturally powered), next to it are three basins, and it drips evenly and monotonously from the ceiling. Everything worked out, and, it seems, this situation confused us only. Only our engineers were worried about the customer.

    Another time, a pipe over the server broke. The engineer directly on the video removes the switch from the mount, flips it - a glass of water pours out of it. Typically, the switch continues to work. We brought it to our laboratory, and gave the customer a new one in return.

    Somehow, the telecom equipment survived after the launch of the powder fire extinguishing system in one of the customer’s offices. They simply shook out all the powder (it was quite difficult, I had to disassemble it), but the piece of iron itself still works.

    Teachings


    Audit network equipment at one secure facility. The technical manager is standing before the commission. He defended himself. At the end, she complains:
    - The food we have from the city is bad, constantly the tension is not right. Now, if you take a plug, insert it into this outlet, it is usually bad. Knocks down the rack.

    And inserts a plug to show.

    The rack was not only knocked out, but also the gateway was disabled, and then the server. The hard drive burned down on the server, where applications for managing the object were spinning. Everything stood just reinforced concrete.

    The commission was reappointed the next day. And we had to pick up new equipment overnight and bring it back into place.

    In a similar case (only there was a real power failure, and not such exercises), the object was serviced by a large domestic provider. Very large and very domestic. We open a request that their equipment is burned out. They have an SLA of eight hours. The answer of their support:
    - Well, yes, we know that there the iron broke. Don’t you see we have lunch? The installer will arrive tomorrow or the day after tomorrow.

    It turned out that they have SLA, but there is no penalty for violation.

    The second case with the exercises was this. Bank. Two in the morning, application for a critical piece of iron. Four hours to replace. With shouts: “Colleagues, everything is lost!” (But only in one word) - we reach the Americans, they say where to pick up the piece of iron in Moscow, go there, collect, at this time a colleague crawls on his knees in front of the logisticians. We are in time. In an hour and a half we bring them. They don’t even let us into the building:
    - Thank you, but we don’t need it anymore.
    - Guys! What was it?
    - Teachings!

    Homeless SMS


    We support foreign mobile operator. One of the services that we are monitoring is converting SMS in the spirit of “The subscriber tried to call you, but he has no money” into an unanswered call. That is, instead of the message, it comes unanswered, but the phone does not ring. The operator, by the way, thought that the probability of a call back was much higher.

    One fine day, all transactions disappear from the chart. There are simply no calls without money at all. We begin to understand, but can not find the ends. Only an hour later it comes to the fact that there are no calls at all in the country.

    And then they start at night. This is the Muslim holiday of Ramadan, and the call schedule is skewed. This happens on New Year's Eve, when on January 1 there are almost no calls in the morning, and there it happened in the spring.

    Even with foreign customers it is always necessary to check their engineers, where exactly they are connected. One Swedish vendor puts systems for managing people. In Russia - two installations. On one they ask to upgrade to the latest version, because they need some new feature. The other has been working steadily for almost half a year, and there are no questions. The Swedes connect, silently update the second customer, report to the first about the update, close the case.

    We are preparing to apologize and compensate (because the system did not work for the second 20 minutes, and now it will be necessary to coordinate a new window for the first), when it suddenly turns out that:

    1. The first customer is satisfied and confirms the ticket.
    2. The second did not notice any downtime.

    We didn’t tell anyone then, but it was very strange.

    Shooting legs


    When the customer for support is hosted in the cloud and asks for direct access to the car instead of describing what will happen to us, we bet how quickly they shoot their legs there. This is not the first or even the hundredth case. Customer admins regularly lose remote access to the machine for a variety of reasons. Here's a fresh case: they set up a new authentication there, and she took and dropped the current users. And in order to pass this authentication and forward the remote access again, you need to somehow get inside and set everything up first. In general, setting up a firewall for remote access is a long road.

    In such cases, we hire a reload team. That is, an admin who can reboot the server or play a remote-controlled robot with Vatsap. This is so that when you set up something in Khabarovsk, then do not fly on a business trip at night to Khabarovsk.

    For a new network hardware and normal configs, a large vendor has a regular team to roll back to the previous config. Activate the timer for half an hour. If you do not cancel this task in half an hour, then there will be a restart and restoration of the previous version. If everything is well configured - check (two times) and cancel this task. When I'm sure everything works.

    Sometimes you need to ride to put equipment. We have a guy named 13th. Because when a business trip to Surgut fell, he was already packing up a piece of iron to the airport, and along the way he was told that the same piece of iron was much more needed for the same customer in Krasnodar. And they changed the ticket. The second time he came for a replacement, and there everything went up during the flight, and he sent us photographs of his feet on the beach in a working chat.

    But the best case was this. The customer took and removed the connection between two workers in a pair of servers before leaving. We sit, the request comes: "Nothing works." We are connected, we look:
    - What did you do?
    - Before leaving home, I deleted the connection between the servers.
    - What for?
    “Why wasn’t that so?”

    Do you have binoculars?


    When we tested the recognition system of people climbing over a fence for one transport company (recognition for video surveillance), we somehow drove out in the morning to mark out places for installing video cameras. It was important to find the “rabbits” and not to scare them away, so that later they would put cameras in places of frequent climbing. They took binoculars, but they didn’t need them, because the “hares” did not hesitate and were not afraid.

    Last month, a photo studio was opened in the building opposite our office. With large windows and natural light. Naked or very conditionally dressed models are regularly filmed there, but their faces are not visible at all from far away. Therefore, binoculars were in demand. On the day of especially hot shooting, several tickets were immediately received with a request from colleagues from the office.

    On control


    I came to a customer who has many offices in the Russian Federation. There is a main server in Moscow and many connected from additional offices in the Russian Federation. Poking around in one of the regional glands. A local leader comes up to me and says:
    - You ’ve been picking too long.
    - Well, the work is like that.
    - Do you understand that this is under control of the very ...
    - The president of the company?
    - No, most ...
    - Specifically, this server?
    - Yes.
    I laughed. He is like this:
    - You’re doing the wrong thing laughing.
    And left.
    And I thought that we had a dangerous job. Maybe he really is in control. Maybe I could get in the face for such insolence. Personally from ...

    Wi-Fi


    A non-stop customer opens incidents for problems with Wi-Fi. But I must say that this is a big hangar, in the hangar there is a warehouse, and there, because of the shelving with metal (there are blanks for the plant), it did not always reach the center. We did a quick radio survey to them and recommended what and where to put it. They reported that they did everything on it. And now, it seems that the central access point does not cling and constantly disappears. They sent a mobile engineer there. It turned out that at the moment when the location of the points was calculated, there was a crane in the center of the hangar. Actually, the installers of the customer really liked him, and they fixed the point directly on him. And the crane goes around the warehouse, and when it leaves in one direction, it is no longer in the other network. For some time they tried to understand why the network was either lost, then it was repaired, and then they knocked on us.

    Best case


    A complex application, we deal with the user for almost half an hour on the phone. I already curse everything, because this is the same case when a person cannot clearly articulate what he did. And it does not report everything that it sees on the screen. And he doesn’t say everything that he is doing right now. I already foresee that the need to do everything slowly and deliberately infuriates him no less than me. But for another reason. And then during the next explanation that if he doesn’t read everything that he sees on the screen, I can’t help him, he suddenly says:
    “Sorry, we have a fire here.”

    And hangs up. In the ticket, I wrote “the building burned down with the equipment” and went personally to check - it’s not enough ...

    References



    Also popular now: