Hetzner may unexpectedly shut down your server

    We are a small group of web developers, we write websites to order, hosted either at home or at different providers around the world. We have a small department of those. support, we try to the best of our ability to respond in time to emerging problems. The article is written for those who have their own servers on Hetzner, so that they are ready for certain support features.


    There are such cases as described below. We were not ready for him, both morally and technically. But we were faced with the fact that the server can be disconnected, and it is not possible to quickly understand or eliminate the reason for the shutdown.

    10.36 GMT +2



    The site has stopped pinging. Our support team immediately tried to figure out what was the reason.

    The customer of this site is located in the US (he sleeps at night), the support team in Ukraine, Hetzner in Germany. The site was laid at the most successful time, when it was night in the USA, and in Ukraine and Germany, normal working hours, which in theory should give us the opportunity to restore the site. Eh, ...

    When trying to enter the Rescue Mode, we came across a strange message:

    "The Ip is locked" with a link to: " wiki.hetzner.de/index.php/Leitfaden_bei_Serversperrung/en "

    We read ... Tips like: "First of all, please examine the log files of the server »I was somewhat puzzled.

    Q: How to check Log files if access is disabled?

    Further, even more interesting:

    "Before the server can go back online, the problem has to be solved. "We require a signed statement
    from you via email or fax about the cause of the problem, explaining how you have solved the problem
    and what you have done to avoid the same problem in future."


    Q: There is no access, the server is turned off, you can turn it on only if you fix it, but do you need to go in to fix it? ..

    It turned out that this type of problem should (according to Hetzner's rules) be solved through KVM. They requested KVM, but it is not clear what to fix. In the admin panel there is no mention of specifics, there is only a link to a general document about possible problems.

    Well, it’s logical, if nothing is clear, you need to ask support:

    Question to Hetzner 11:36> I can not ping and login to the server
    Reply from Hetzner 12:25> please check your mails, you should have received a mail why the IP is locked.


    Support works, but the situation reminds me of a joke about programmers in a balloon, when the answer turned out to be correct and absolutely useless, and time passes. Each question / answer is about an hour delay.

    I urgently call our client in the USA with the request to check and forward all mail from hetzner. Finally, I get a more or less clear explanation:

    Dear sir or madam
    We have noticed that you have been using other IPs from the same sub net in
    addition to the main IP mentioned in the above subject line.
    As this is not permitted, we regret to inform you that your server has been
    deactivated.
    Guidelines regarding further course of action may be found at
    http://wiki.hetzner.de/index.php/Leitfaden_bei_Serversperrung/en.
    Yours faithfully
    Your hetzner support team
    09: 29: 55.027863 a8: be: dd: 56: e7: 15> cf: 40: 04: 22: 32: 1f, ethertype IPv4 (0x0800), 
    length 66: 188.40.25.34.42709> 5.9.xx.xx.80: Flags [.], ack16154, win 661, 
    options [nop, nop, TS val 1003012 ecr 2687744519], length 0
    


    13:21



    KVM (LARA) gave (+1 h), 3 hours after downtime, finally we start to do something and there is hope to deal with the problem. We get access to the server through Lara.

    Given that there is no network, the root kit check fails. We check that we can check, but even tcpdump fails to start - port is down and packets are refused to be sent.

    We have the architecture of the system:



    The border guard system is hosted with KVM, the kernel 3.5.2, regular GLSA updates, the port is only SSH (I'm lying, there was still nrpe, we checked, it seems there are no links to the fact that nagios-nrpe was broken).

    Understanding that a “hodgepodge” of different technologies was installed on the Web part of the project, including PHP code, because the virtual machines are physically separated from the outside world, they certainly can’t capture someone else’s IP address in any way.

    Examination of the Gentoo base host system showed no changes. Everything is fine in the messages, dmesg, last file too.

    In general, after a series of studies they wrote to them in support that we ourselves can’t find anything and we need help from them to identify the problem. They also asked to make sure that incomprehensible traffic comes specifically from our VLAN.

    15:40



    We got the answer:
    Hetzner> Please complete and sign the following statement and return it to us via fax or email:
    Hetzner> www.hetzner.de/pdf/en/Comment_Serversuspension.pdf

    Hmm ... Then I get a little bit of a mess. 4 hours passed, we did not advance one iota, soon morning in the US, adequate (in my opinion) help from Hetzner did not follow, and instead of helping, they asked to send them a fax. Che is not happy ...

    Q: what should I sign in the form of “fixing the problem” if we couldn’t find the problem and, accordingly, didn’t fix anything (and said about it)? .. Asked a question. To which I received a reply that they do not have the right to turn on the server for any proceedings until they receive a fax / scan with a signature.

    Considering that we have already decided to recover from backups at this moment (it took a lot of time questions / answers and attempts to understand what the problem is), there was no rush, we calmly filled out the form and began to expect results.

    After a while, we get the answer:

    > Dear Client,
    >>
    as requested we've let this issue checked by our network department and it seems
    > that you server answers on each requests even for another MAC's. So please check
    > your server again and solve this issue.


    Ok ... Given that the server was turned on for a period of about a minute, we did not have time to look.

    After that, we asked what now can be done with the server without having network access to it, we were advised to format the server and install a new system by mounting a remote ISO image.

    We tried to do this, but were not technically ready (there was no small ISO at hand, all the big ones were for desktop, tried to load X, and generally were not very suitable for installation via KVM). The story ended at about 10 pm, when we had the KVM cut off during the installation of the system (as a rule in Hetzner - free KVM is given for no more than 2 hours).

    In a few days





    In a calm environment, on Monday morning, they submitted an application for KVM, got access, installed a small system on sda1 (raid collapsed) through the Lara / ISO mount image, sent a fax scan to resolve the problems, and received a response that the server is activated. But for some reason he didn’t ping ...

    After a repeated application for activation, the site finally pinged, we went from the network, and the first thing I did was complete tar cjvpf backup of the old main system, downloaded it to myself for experiments and deployed locally.

    Locally, too, could not find the problem. He lifted a separate machine as default router, included it in the one on which the copy was running, looked at tcpdump traffic at both ends, lifted NAT from the gw address for the grid. No strange packages were found.

    We checked on the root kit, checked all the packages and files - there is not a single modified MD5, there are no unnecessary processes, etc.

    What was the problem? One can only guess. Perhaps they have the same MAC addresses on the network, maybe something else ... Who else can tell?

    In general, even if we had problems (of which I doubt), it is impossible to find out. And they, for their part, are unlikely to be recognized or able to identify a specialist for help.

    Conclusions (and by the way we have many other clients already / still host on Hetzner).



    1) Always have a small ISO image of your system at hand so that at the request of hetzner you can quickly format your server. It is desirable that the main partition and services / data be separated.

    2) It is highly advisable to organize netflow streaming outside of Hetzner, so that later it is possible to check the charges.

    3) Always be prepared for the fact that the server can disappear for good (which, incidentally, is the right strategy).

    PS Admins tell me that this is normal: first we chop off the problem, then we figure out what it is. But if the first happens promptly, then the trial should follow.

    Also popular now: