ikormachev June 19, 2013 at 19:27

Survival Instructions for a Full-time System Administrator

Alone in the field - CIO!

From time to time I am invited to small companies to "deal" with their only system administrator. The list of complaints about the work of their full-time specialist is always about the same: they slowly complete the tasks, often “nothing works”, and even there was such a case ...
Moreover, often the level of complaints about the work of the system administrator does not correlate with his level of technical competence, but, mainly due to the lack of minimum administrative competencies for system administrators, which are so necessary for working in a company in which you are the only person who understands IT.

Having seen enough of the completely undeserved suffering of the regular system administrators, I decided to write this short instruction, which, I hope, will help them avoid conflicts with the management and remove the concept of “stress resistance” from the vacancies of system administrators.

So, let's start from the very first:

1. When you get a job, find out about the targets in your work

Usually, an employer, hiring a system administrator to work, expects that now everything will work with him “well”. But what he puts into this concept is often obscured and in most cases ends with the fact that the system administrator from an IT specialist turns into a whipping boy in every case when he doesn’t get as good as he wanted.
When still getting a job, agree with your employer what expectations he has from your work: starting from things close to your body (your work schedule, availability on weekends and evenings, how you will spend your vacation), ending with business questions ( which IT services are critical, what their maximum downtime is acceptable, what maximum data loss is acceptable, etc.). All these parameters should be measurable (work from 9 to 18, but up to 22 is available by phone, the Internet should, in which case, be repaired in 2 hours, etc.). Having received this information, do not be too lazy to fix it in writing - this will greatly help you in the future, but for now, when you get to work:

2. Back up and coordinate the backup scheme with the manual

Set up automatic backups and regularly check that they are created correctly. Once again, agree on a backup scheme with the management and do it in writing - in the future it will help you answer the questions “why did we have the database backed up only once a day, and not every hour”. If the management wants a greater depth of storage or the frequency of backups - indicate the necessary investment in equipment. And yes, do not store backups on the same disk arrays as the main data, because:

3. Assess risks and do not rely on the reliability of hardware, software and communication channels

It is this expensive branded (besides, the only) server that will fail at the most inopportune moment and deprive you of sleep for a couple of days, forcing you to frantically think out what can be done in this situation. Carry out a small analysis of operational risks: what will you (the organization) do if the server / computer / printer breaks down, the Internet channel fails, will there be some kind of software error or a fire in the server room? How long does it take to recover? Does this time meet the expectations of the business (see paragraph 1)? What can be done to reduce downtime and negative consequences? After that, again document your findings and submit them to the management for judgment:

4. Align operational risks and disaster recovery plans with management

Bring your findings from the previous paragraph to management so that they take operational risks for granted or allocate money to eliminate them or minimize negative consequences. Coordinate not only operational risks, but also disaster recovery plans - in the event of an accident, this will greatly simplify your life, as there will be no “unexpected situations” for you, and you won’t have to wait until “Petr Petrovich is in touch” to buy a replacement power supply in a failed server. After agreeing on operational risks, you will most likely be allocated money for a small upgrade. And here the most important thing:

5. Plan any changes in the IT infrastructure in advance, but, most importantly, create both a change plan and a rollback plan in case something goes wrong

The phrase "yes, here things are for 5 minutes" when administering the servers, in my opinion, is akin to the phrase "boys, look as I can" when driving a car - one that starts with such a phrase will one day have a big file at the end. Whatever large or small upgrades you plan to carry out, they must be planned, even if you and only you will carry them out. When planning, you need to consider what services will be affected by your changes, what steps you need to take in the process of changes, how to verify that the modernization was successful and, most importantly, what you will do if something nevertheless goes wrong. Well, after you have clarified everything for yourself, again go to the manual and:

6. Coordinate the time for any work related to the downtime or possible downtime of the service in the IT infrastructure with the management and always justify the need for such work

Worse than an admin who intentionally harms, there are only admins who harm for the best reasons. Today, on the last day of reporting, you decided to purge the server from dust, you decided to update the system on the servers in the middle of the project session, and you do the migration of mail to the new mail system at the very moment when the whole company is waiting for the very valuable and most important letter . The list can be continued, but there is only one sense - always inform your management of your plans so that it at least understands the need for those minor inconveniences, which sometimes can not be done without. If you do not have enough experience to make any changes, then do not hesitate to seek the advice of professionals:

7. Clearly define your area of competence and incompetence and do not be afraid to discuss this issue with management

Once upon a time, I thought that a professional is someone who knows and knows everything. With age, it came to understand that a professional is one who knows the boundaries of his competence and incompetence. There are professional network administrators, professional storage systems engineers, professional technical support specialists, etc., but there are no people who know everything about everything. If the company you are servicing has a zoo of technology, then the attempt to service all of this alone (even in connection with small volumes) is highly respected by colleagues in the workshop, but it is unlikely to be ever appreciated by the employer, but ask you will be for it all to the fullest. Indicate to management what systems you are familiar with in full, and where your knowledge may not be enough - let them decide whether to leave everything as it is, hire another specialist or use the services of expert technical support from an external IT company. In general, take as a rule and:

8. Always share responsibility for decisions made, especially if it is not made by you individually

Your task is to buy equipment, you have chosen a model, requested an invoice, but at the last moment the purchasing department vetoed it - it’s cheaper on the Yandex market! Ok, not a problem, but in this case, let the purchasing department be responsible for the purchase of this equipment. I know a couple of companies where the procurement department has been waiting 2 years for delivery of laptops purchased at the “lowest price”. In the same way with all other decisions - it is best to confirm them with the head, showing in advance possible alternatives, and receiving from him confirmation of the correctness of the decision. And yes, just in case:

9. Agree on all decisions in writing

Encourage everyone that all decisions are agreed in writing - a simple email will help you when the proceedings begin (not “if”, but exactly “when”). It is necessary to coordinate all decisions without exception, so that people do not have a defensive reaction when you suddenly for no reason send a letter of approval, although before that all decisions were made orally. It’s a very common situation when your colleagues ignore written approvals - in this case, coordinate the decision verbally with them, and after the conversation send them a letter like “Dear Pal Palych, I will send you a list of decisions agreed with you: ...” - in the case of proceedings against Pal Palych will at least have questions why he still didn’t react to your letter. In general, written approvals are one of the procedures that you will need to introduce in order to

10. Create understandable and convenient rules for everyone to interact with you. Do your best to keep users from worrying

In small organizations, there is a very common way to solve problems in the computer: go to the system administrator, take his hand, bring to your computer, show the problem, do not let go until the problem is solved. Naturally, there is no question of any thoughtful, systemic, focused work on the current maintenance and configuration of servers with this approach of users. But usually, users come to this method of solving problems empirically, because if you just say that you have a problem, then the admin will not understand when, and when he arrives, he won’t be able to work, and he doesn’t have to wait, and this all makes users worried and, ultimately, makes them go so first glance, deeds. Don’t make your users worry: give them timelines, when you can help them, indicate the time when the problem will be fixed, and in cases when you have a lot of serious work planned in the coming days, write them letters in advance that these days you have reception hours from so many to so many ( of course, indicating the reason). Ask them to submit non-urgent tasks in writing and promise to work them out on time. And most importantly: keep your promises - the trust and love of users will be provided to you.

Well, after all of the above, do not forget to do the following:

11. Check backed up regularly

You all know the old joke that there are two types of sysadmins: those that do not back up yet, and those that already do. I would change this joke by adding that there is a third type - which also checks the created backups. Understand me correctly, but in our business, losing data is the height of unprofessionalism. No matter how many days a day you are busy at work, always take the time to check that your data backups are created, created correctly and that you can restore these same data from them.

12. Repeat the cycle: “removal of needs - risk analysis - coordination - alignment” at least once a year

Small business is changing very quickly: yesterday in the company they wrote out 3 bills a week, and today they are issued 20 pieces per minute. Yesterday everyone needed only a daily backup of the data, and today the loss of information in 5 minutes is already critical for business. Your task is always to be at the forefront of all changes in the business and get ahead of them.

Well, as an afterword, I want to add that in any organization, the formation of a structural unit begins with its head. So if you are the only person who understands IT in your company, then, I assure you, you are already an IT director, regardless of what is written in your work, they will ask you as a manager, not as an executor. I hope that the above instruction will allow you to take a slightly different look at your daily work, make it easier to join a new role for you, and make your work as calm and predictable as possible.

Ivan Kormachev
IT Department Company
www.depit.ru

Tags: