The evolution of our retail IT infrastructure - consistent molting



    Starting a retail or large online store without a strong IT side in terms of at least data analysis is a very bad option. But you can’t use the full functionality right away - you’ll bury complexity alive.

    Therefore, I will now tell how we consistently "molted" from simple systems to more complex ones. The general principle - at first you do it cheaply and angrily to work. Then - right. Then - optimally, but expensive. And all this is for reinvested profits, and not immediately in capital expenditures.

    So, when we didn’t have a store, from the IT infrastructure there was only a cell phone, a personal laptop with Yandex.Maps for calculating deliveries, and a spreadsheet in Excel for bookkeeping. Our site with 20 games hosted on Masterhost, we did not think about any database replication or anything like that - everything was in static HTML.

    1C


    About the second store, there were so many of us that only manual operations were no longer possible. Even then, the big call center needed a kind of general magazine (it was first paper, then moved to Google.Docs), in retail it was necessary to analyze too much data, plus there was a risk of abuse within the company.

    So we discovered 1C. First, trade, then all its other varieties. The first glance was like a combine harvester: immediately a lot of all that was unnecessary. Guess what we started to do? That's right, they put the developer to "sharpen" it under us, that is, remove all those functions that "slowed down".

    The database began to quickly grow crutches for tasks. When you don’t have a code that needs to be supported, it’s simple: hrenak-hrenak - and in production. This is now, to make, for example, a gift certificate, you need to work with lawyers, bookkeeping and train the entire network throughout the country for six months. Then it would take two hours.

    Having driven a dozen crutches, we are faced with the fact that some particularly cunning employees still learned to steal. We found and fired them, but the investigation of the incidents showed that for conducting a strict accounting, it was not at all necessary to throw out the functionality that seemed to us superfluous. Guess what we started to do. That's right, they put the developer to restore everything as it was out of the box.

    Then it became more fun.It turns out that these cumbersome complex and non-obvious procedures, which are by default in 1C, were invented not by some sadists in order to piss off users (I still miss the 7th version a bit, where everything was clear and understandable in terms of formation queries - but completely unobvious, for example, for sellers on the GUI). It turns out that this whole body kit was needed.

    A year later, we realized that if you do not analyze a huge amount of data about a product, sales, and much more, efficiency decreases. The first mechanisms for calculating logistics and delivering goods to stores were considered empirically, almost according to sensations. Then it turned out that, in general, there has been a mathematical apparatus for a long time, just its power begins to unfold with large numbers. We grew up to these large numbers (there were more goods than buyers could manually track) and began to seriously use automation to calculate the same composition of deliveries to stores based on demand, supplier reliability, optimal storage location and so on. Each formula lay on top of the database and required additional data.

    So we came to the conclusion that, without exception, all company data must be stored in a single space, from where they can be picked up by a robot. In a couple of years, it will become pretentiously called Big Data and will grow into a methodology, but for us it was just an unstructured collection of tables in a large database. Everything was stored - from purchases and prices to statistics for the last year and even product descriptions on the site. By the way, yes, the site takes all the information from one of the read-only-copies of such database tables - this, among other things, made it possible to show the availability of goods in stores in real time without any dancing with a tambourine. Not without excesses, really. There was a case when we displayed on the site the reason for the lack of goods. This reason was written by buyers in a special field. About two hours on one of the product pages hung a note: "There is no game, because the supplier is a deer."

    Now 1C is the main source of information for everyone within the company. Through it, everything is linked to each other. 1C has overgrown with a bunch of our modules: from call center solutions to managing stocks on a site and even managing sorting products on a site.

    At first, there were problems in the regions precisely with 1C - in the mode we need, she wants a half megabit minimum, which was often not very realistic in a shopping center somewhere in cities for 400 thousand inhabitants. Now even a cellular channel allows you to keep such a band, so everything is in order (and we began to do asynchronous replication).

    Box office


    Initially, all cash registers were not connected to 1C. We used a special program, there were a lot of errors on the reports, constantly someone was punching checks in the wrong section, wrote explanatory notes, the accounting department swore very much. Now all cash desks are connected to 1C, various buns have appeared, such as a check with information about the accumulated discount, the ability to quickly see the cash balance, how many refunds were made, how much money was put into the safe, quickly find the check for day-to-day refund, and In the evening, quickly fill out a report for management, since the system fills almost all the data itself.

    Updates


    We have branches throughout the country, the most distant - in Yuzhno-Sakhalinsk. Therefore, there is no single time when all stores are offline. The scatter of time zones suggests that there are only a couple of hours at night, when you can quickly roll update 1C. It turns out that at least two of our IT people gather at a remote nightly Sabbath at least once a week (in the development cycle during off-season - more often), who check everything. It is necessary to keep within 1 hour, so that there is time to roll back, if that.

    I remember there was a case when a large-scale reinstallation in several stages was needed. First, the provider in the data center added equipment to the server, then the system administrator did the work, then the turn of the 1C department came, all this happened from 2-4 a.m. At 3 nights, 1C specialists lost contact with the system administrator. Half a life flashed before our eyes, ours directly vividly imagined how we would explain to the shops why it was impossible to sell anything. Fortunately, the administrator just got a phone and he quickly got in touch again.

    After that, we began to backup the database more than once a day, but asynchronously. And, it should be noted, despite the internal work of IT, the work of the company's departments over the past three years has never stopped.

    Our 1C specialists tell another funny fact: the less users know about changes or updates to the system, the calmer it is to work. It has been noted many times that if you warn users about an overnight update, in the morning they will begin to pay special attention to the system. And there will be at least 3 calls from users who have something broken and they know for sure that this is due to an update. It is because of this.

    Website


    First, a simple static site on Masterhost. A little later, when there were more games - a more complex option there. By next November, we first saw traffic that could put the site down. We moved to another site, but still lay down. For a long time they were tormented with different sites in the Russian Federation, then they just took and moved to Amazon. It took already several instances - the main site, a backup stub with a ping service, several base casts for different purposes (shops, a site cache, other retail services), plus the base itself.

    The site still crashed several times. The second major drop is my first habra effect, when we found out that the deployment of a new instance is being done incorrectly. Third - lightning in a data center in Ireland (many here remember it). Then we were dropped for almost half a day (specifically, the order form, the rest worked) RetailRocket script, which did not respond - since then we have insured against this.

    Internal resources


    Here the development is relatively standard. First, personal laptops, then the first desktops on the local network, with an increase in the number of stores, a server farm and local terminals. We accumulated a lot of internal data, and we stored it on our network storage, where mirror RAID was raised for reliability. When one day the store died, we learned that:
    1. The last backup of heavy data (such as a photo) is a month ago.
    2. Nobody checked the second piece of RAID for performance, and for three months now nothing has been written there, but there is no indication.
    3. And our disk with the necessary data went to the country of eternal hunting due to a controller failure.

    Fortunately, the patient underwent an operation to transplant the controller, and we raised the data from it.

    Inside the company we constantly communicate by mail, not all messengers got it. There used to be ICQ, now it's Skype for chats. Intercity for the call center - its own lines, for buyers - mobile unlimited, for wholesale - Skype. Broadcasts on a corporate blog. Trackers - at will in departments, somewhere there are, somewhere Google calendars are used instead of them, somewhere there is 1C functional for repetitive tasks. There is no single tracker.

    All important is mail. We change files through dropboxes, many, especially in development, where there are many pictures, bought large accounts. From some point in the office they almost completely switched to tablets in addition to desktops - this determined more attention to Google services and cloud cover in general.

    We send transaction letters (“your order has been confirmed”) from our Amazon servers, mailing lists - at first as well, then we learned through Unisender. It is important to separate, because if a person unsubscribes from the newsletter, they will not receive a letter about the order either - I know one travel agency that has been badly burned on this. We switched to an external service two years ago, when it was necessary to fight for the percentage of delivery. Previously, they sent from their servers and almost through the console, therefore, not without incidents: there was, for example, “hello, this is a test” for 50 thousand people in the first year of work.

    Iron


    Our system administrator at one point took and founded his own business. Now it is a company that supports us and several other retailers. The first dialogue with a new unfamiliar and terribly shy admin was this: “Something does not turn off my computer ... Yes, I tried it. Yes, I removed it from the outlet. No, it doesn’t turn off. Yes, there was a Windows in a virtual machine. Yes, a laptop. ”

    The most important thing for us was the transition to business processes, which resulted in an SLA for each jamb on the one hand, and a clear pay-as-you-go payment on the other. For 5 years, the guys have no questions: silently come, do and leave. When our girls caught the encryption trojan on personal laptops (“Oh, I don’t have anything to attach, I sent it to you, look, please”), one of the “dudes in the sweater” struck me to the core, disassembling the malware, finding there is an error in the random number generator and having reshaped in four hours the codes for decryption to the result. In general, there are no impossible tasks.

    I remember how I called for support on the day of the system administrator early in the morning, said that it was complete trash, burns and urgently needed to be fixed, opened the connection to his desktop and asked to connect the most experienced one. When he became legitimate, he saw his congratulations.

    General philosophy


    If we knew at the beginning how many crutches and “links” would be needed, we probably would have done it right away. But I think that would not be very effective. As I said, it is optimal to work on what is, albeit crooked and unscaled. There was a need and profit - took profit, invested in a new level. And so turn by turn. Yes, it’s more expensive, but all capital costs turn into operating costs, which is wildly happy for the retail model.

    We are often asked how to properly organize this or that piece of IT. It seems to me - do it conveniently now, without thinking about cool technologies, the future and advice in the spirit of "do not forget to do this, otherwise you will have to fix it later," written by theorists. When you have a dedicated IT department, then its head will think about such matters. At the time of starting a business and the first 1-2 years of development, the main thing is that everything works here and now. And some rakes are very useful, because if you do not step on them while they are small, later the error will cost much more.

    Also popular now: