Restarting Media Editions: An Overview



    I happened to work ( fb ) in the online edition of Lenta.ru. Take the path from developer to technical director. Successfully implement a full restart. Along the way, doing similar projects on a smaller scale. Now the team and I are preparing to restart the online newspaper Vedomosti ( fb ).

    I’ll talk about the development of media projects. Galloping across Europe, catch on the main topics. To you, dear readers, please outline the questions that need to be addressed in more detail. For example, my colleague plans to write about the deployment of the system, the fault-tolerant scheme of the site.



    Technology


    One of the questions that interests the management of the publication is in what language the development will be conducted. The question arises purely for practical reasons - the cost of developers. The answers to this question are quite simple.

    Firstly, the choice of language is based on how well the leading developers know it. If you have developers and they know the language A, then it is foolish to require them to know the language B. Replacing the leading developers is not an easy task.

    Secondly, there are popular languages ​​on the market: PHP, Python, Ruby and the rest, less popular. There is some misconception that some, cheaper than others, or developers of one language, are easier to find than developers of another. The misconception is that it is very difficult to measure the level of professionalism between developers of different languages. Everything becomes confusing when they compare the experience and the desired reward of each. As a result, we have: it is equally difficult to find a suitable developer for any programming language. There are either a lot of them, but they are all unskilled, or they ask a lot, or they are specialists of a different profile, etc.

    There is another factor in choosing a development platform. Typically, publications are part of some rather big holding. And depending on the degree of development of corporate culture, an appropriate decision will be imposed on the publication.

    At one time, Lenta.ru switched from Perl to Ruby, while Python was practiced in Rambler. Now Vedomosti is moving from PHP to Ruby, while PHP has not been imposed on us.

    Choosing related tools, such as an operating system or a database, usually doesn't bother anyone.

    Ultimately, for the publication it does not matter what language the project will be written in, what libraries and services it will use. More importantly, the team that makes the project owns the tool on a professional level.

    Formulation of the problem


    The full restart process is a long business. But this is not a reason to relax in the beginning, and work to the limit in the end. Leaders are determined with goals that they want to achieve, tasks that they want to solve. Editorial teams are dozens of people. And the size of the entire staff of the publication can easily exceed one hundred. If restarting involves significant changes, this means that a very large number of people will need to change their approach to work.

    To do something good, you need to abstract from the current state, simulate an ideal world, and after that, build a strategy for transition from one state to another. Otherwise, you can stomp around the old well. This task is solved by the editor-in-chief and the management of the publication. Analyzing statistics, studying best practices and setting ambitious goals, they form a concept.

    As a result, designers get the task of visualizing the new face of the publication. Developers are thinking through a strategy for translating ideas. The complexity of the task is that, among other things, it is necessary to create a tool for each department - the editorial office, marketing, support service. That is, in addition to creating a public website in a different view, you need to create a management system for the editorial office, commercial department, marketing and support services.

    The reality is that in a public site every detail will be thought out by the editor-in-chief and drawn by the designer. The creation of content management systems is often left to the developers. This, of course, is not a complete initiative: collecting requirements, describing work scenarios - all in the best traditions of smart books. But there is a lot of room for imagination.

    Puzzle


    When creating a media publication project, it’s convenient to build a service-oriented architecture. This will allow parallel development of independent parts of the project. The main thing is to have a big picture and fix the interface of interaction between services. System components can use a common code, access each other via some protocol over a TCP connection, and use a common data storage.

    Public websitealmost always worth separating from all internal services. For the site to work, you need an application for generating HTML pages or JSON for the API. Depending on the architecture of the project, the application can directly connect to the classic database (MySQL, PostgreSQL, MongoDB ...) or use the service layer. Such an application can be easily scaled. All internal services, including content management systems, can be closed by IP addresses, which adds a plus in favor of security.

    The choice of database is based on what data structures you need to store and retrieve.

    For example, in Lente.ru, we used MongoDB for a public site. We needed to receive documents with a complex tree structure. In this case, the main database we had was MySQL (later migrated to PostgreSQL). In the main database, everything was stored in a normalized form. In MongoDB, the data was user-friendly. To synchronize between the databases, we used a background service that monitored the changed data, formed a new view for them and recorded it in MongoDB. The service was based on the queue in Redis, where the change messages got.

    In the case of Vedomosti, we went the other way. In addition to classical data structures, PostgreSQL has internal types: array, hstore, jsonb. Thanks to them, it is possible to simplify the storage of relationships between associated entities, dynamic attributes, complex tree structures. Thus, we get normalized data in the face of one service, while they are presented in a form convenient for a public site, of course with some compromise with respect to MongoDB.

    Editorial Content Management System (CMS)is the main editorial tool. With its help, content is created. The management and organization of the editorial process is carried out. A picture of the day is being formed on a public site. With a public site, they share only the database. This is a standalone application, access to which is carried out only through authorization. Even better if access is limited to a specific list of IP addresses. It is with this system that project development begins.

    To statistics collection servicespecial performance requirements. First, let's answer the question: “why is it needed?”. Often editorial offices need to collect statistics in the context of parameters that are not available in third-party services. For example, Vedomosti practices subscription access. Accordingly, there is a request to collect data from a specific group of users. We do not transmit information about the subscription status to third-party services. Moreover, we do not need to build a complete replacement for open metrics. Only minimal functionality covering analysts' requests.

    Such a service is also divided into components: collection of events in an intermediate queue, processing and writing to a normalized database, selection of aggregated data. For the first two components, we use Golang, for the intermediate queue - Redis, for normalized data - PostgreSQL, for fetching aggregated data - Ruby.

    The service for storing media content is also placed in an independent application. His tasks include receiving a file, saving it to disk, returning a data structure that describes the necessary meta information: path, size, type. If we work with images, then the service should be engaged in the generation of versions with different geometric sizes. Certain analogue described in Habra article . There are several points regarding media content.

    The first is accessibility management for specific files. You can post a note that uses some kind of image. At one point, you may need to hide the image from the link, but not delete it from the internal photo bank. In the Ribbon, we used symbolic links. If the file is public, a link was created for it. If the file needs to be hidden, including when hiding the linked note, the symbolic link was deleted. In Vedomosti, the separation of public and private images is implemented through file permissions.

    The second is the content delivery network. Once in the Tape, videos were distributed to users directly from their servers. And then the moment came when we scored our gigabit by showing an interesting video in good quality, and our site began to slow down. It’s the right decision to use CDN, since today it’s a very affordable service. As a test, at one time we used a third-party CDN service for static images. But in this matter, the practical benefits were too expensive, and the financiers refused us such pleasure.

    The third is image scaling. Depending on the context of use, images are scaled to several versions. The specifics of media publications implies that there is a separate person - a bill editor who monitors how the images turned out as a result of resize and crop. And if he does not like something, he should have a tool that allows you to replace a specific version of the image. Otherwise, you can get images of ladies with their heads cut off in the Miss World photo gallery in preview versions.

    Media content is also audio and video. The work of converting the video to different quality is best left to a professional video platform.

    It is useful to have a separate authorization service for users of the site. Often it is closely related to the commenting system..

    Development process


    Earlier, I wrote that you need to pay due attention to design. You should not be afraid to write code . It is impossible to foresee everything, to make a good system from scratch. It is necessary to iteratively develop the system. We wrote the planned functionality, analyzed, be prepared to rewrite again. This is normal. The time you spend writing code is much less than the time you spend thinking. Your code is not carved into the rock.

    Short paragraph: version control systems . For some reason, it is still relevant to focus on this.

    The modern approach to development involves writing tests. The difficulty is that in publications with a legacy, managers have a certain idea of ​​the speed of development. And here you come, with fashionable technologies, the motto that will become easier now. And as a result, similar tasks are done either the same time as before, or even longer. You understand that tests are a good thing, and you are trying to convey this truth to the editor in chief. Most likely you will not succeed. Editorial staff needs a product, and developers need tests. The fact that for the quality of the product you need to spend time writing tests is just a given.

    Do not change your work environmentin the active phase of the next stage of development. With the release of operating system updates from developers who like to be on the cutting edge of a wave of new versions, a different day can often be lost. It is like the health of a soldier in the army. You are needed in a healthy state, your workstation should not fail, just because a new version of your favorite OS was released yesterday.

    A bit strange. In addition to the main activity, you need to find time to work on third-party projects . With an insidious goal: to roll in new technologies and practices. It is not always possible to use various interesting solutions in the combat system. Do this in small third-party projects.

    Dynamic programming languages ​​make extensive use of metaprogramming. This can make life difficult for you and your colleagues in the future. Try to prefer a simple code to magic spells whenever possible.

    Monitoring and Profiling


    The editors use various metrics to analyze the success of their activities. Developers must take care of the profiling and monitoring tools of their applications in advance. This is a very dangerous activity - to start the system without monitoring the status. It's not just about the indicators of consumption of RAM and processor time. To create a good application, you need to track the entire program execution stack. Take the time to set up such an environment.

    Database selections may change as project requirements change. Once again, profiling comes to the rescue - you quickly find out what needs to be optimized.

    Prepare data that is as close to real as possible for stress testing. Run testing for a long period of time and explore the weaknesses of your application.

    Relocation process


    Moving from the old platform to the new one is tied to the editorial office and archive. The editors should restructure their processes, get used to the new content management system. archive transfer should be with preserving referential integrity.

    There are three stages for the editors. At the first stage, they test CMS in free mode. At this time, something may break, change on demand. The second stage begins after the full import of all content into the new platform. Moreover, the new site is not yet publicly available. The editors work in two systems - the old and the new. This is due to the fact that most often there is no backward compatibility in the structure of old and new content. Usually this period lasts a week. And already from the moment of the public restart, the third stage begins, when everyone is happy, the editors are working on the new platform. The old site is shutting down.

    Moving the archive takes quite a while. With each import run, there are some bugs, after the correction of which the import starts from the beginning. Notes may change their addressing. At the same time, it’s considered good practice that clicks on old links initiate a redirect to a new address. To implement this, you need to prepare a routing table in advance. It will be needed in the future. There are situations when it is necessary to change the page address, while it is necessary that transitions to the old address also lead to a redirect to the new one. You simply make a note with a list of associated addresses, one of which is marked as main.

    mobile version


    We made a mistake, albeit a forced one, when we restarted Lenta.ru without a mobile browser version of the site. But this was a deliberate risk, and we quickly corrected, in addition, we released already two versions - pda and mobile. The first was intended for older phones, with a minimum of images. We called such phones “Alconokia". The second is for smartphones with large displays, but you yourself know what. Over time, we began to focus on the mobile version by implementing an automatic redirect from the desktop version.

    In addition to implementing automatic redirects, we have implemented the ability to remember the version selection. That is, if you logged in from the phone, you were transferred to the mobile version, but you do not need it, you choose the desktop one. Now with each next call you will not be automatically sent anywhere.

    It’s also nice to show the user a message that we automatically redirected him to another version.

    We implemented this logic on the nginx side. With the help of a terrible regular expression, the type of device was determined - mobile or not, and a flag was set $ismobile = 1. We looked at the cookie value named view_version, which determined the stored value about the preferred version. When you first visit the site, this value is not defined. Below is an example of code that determined whether to redirect or not:
    if ( $ismobile = 1) { set$mobile_rewrite1;}
    if ( $cookie_view_version = 'm' ) { set$mobile_rewrite1; }
    if ( $cookie_view_version = 'www' ) { set$mobile_rewrite0; }
    

    Accordingly, if the value of the variable $mobile_rewriteis equal to one, then we do a redirect to the mobile version, simultaneously setting a one-time cookie, which served as a trigger for displaying an informational message.

    Service Setup


    There are several points to note in continuing the configuration of the web server. While the main protocol for transmitting data on the web is HTTP / 1.1, it is important to use several domains to distribute statics. If you use custom fonts on the site or make API calls from the client page, then do not forget to specify the correct CORS headers in the settings of the corresponding web server.

    When building a service-oriented architecture, it may happen that some of your internal services are not protected by application authorization. As an example, a separate image download service. Access to it should have only authorized editors. At the same time, you have a separate authorization service for the same editors. The authorization service is primitive - receives headers, and responds positively or negatively. Using the ngx_http_auth_request_module module, we can make a subquery to the authorization service for each request to the image download service. A live configuration example can be found here .

    There are only two hard things in Computer Science: cache invalidation and naming things.

    - Phil Karlton


    To name the hosts on the servers in the Ribbon, we used cigarette brands. For Vedomosti we select from the name of star systems. For applications, names were chosen among species of birds. For example, the rooster we were engaged in generating statics in old versions. In Vedomosti, the talisman is a big fish - big fish. Sharks business.

    Media projects in Russia are not highly loaded. The last peak of attendance in the Ribbon in the spring was about 20 million views per day. It is quite simple to build a system that can withstand such loads without caching. We practice using cache for a short time interval of several tens of seconds. This removes the cache invalidation problem from us. Typos correction on the site comes out with a delay of less than a minute. At the same time, this allows you to use such a wonderful option in nginx asproxy_cache_lock. Of ten identical requests to the web server, only one will be sent to the backend. This allows you to evenly distribute the load on the application.

    Visualization of a small DDoS attack:


    Backup


    Of course, you need to do data backups. At the same time, it is convenient to have a hot reserve and periodically performed full backup.

    A hot reserve will save you when you need to restore the most recent version of the data. The easiest way is simple real-time replication.

    In the event that an important piece of data is deleted from the data by crazy traffic, you can recover some of it from the daily backup.

    People


    Be careful . Over time, some processes become routine. Naturally there is a desire to automate something. At the end of the next stage of development, you look back, and your hands are already itching to do better. We must try to patiently understand each situation. Stop, try to look at your work differently. It may turn out that those tasks that seem important to you are not. And vice versa.

    Plan your work. Spend enough time designing. There is no need to whip up a prototype using your favorite framework just because it gives you pleasure and you know what you will do in the next couple of hours or days. You do not have time to look back, and you have already done a lot of work, which with a high probability must be redone or completely discarded.

    Be friendly and patient. When working in a team, it is important to be able to communicate with colleagues. Most often, more than one person works on a project. And this means that you will have a relationship between colleagues. The result of all your work depends on how well you relate to each other. This is a professional and personal relationship. Tension will increase as the project's completion date approaches. And the coordination of the team’s work depends not only on Vasya and Petit, but also on you.

    The format of professional communication should be developed in your team. The simplest example is setting and completing a task. You must articulate objectives. It would seem the most obvious point. Nevertheless, the rake steadily hits the face. Just because you have a thought in your head does not mean that it flies in the head of a colleague. A task, the retelling of which takes several minutes, is very difficult to describe in two words. Try to express your thoughts with the words “on paper”. But after a colleague reads your opus - talk! Discuss the task, make sure that your idea is correctly understood. In the opposite direction, the principle is the same. If you have completed the task, tell us about it. You cannot imagine how much time this will save you.

    Department Positioning


    I would like to note a little the role of developers in the publication. You need to understand that the development in relation to the editorial staff is a kind of maintenance staff. Our task is to give a convenient tool. It’s unacceptable to be arrogant about editors who don’t understand why one-on-one communication cannot be made so sharply instead of a one-to-one relationship. Well, they downloaded a video file instead of a picture. The system should not crash with errors from incorrect user input. If you are waiting for a number, but a line has arrived, it means that you yourself are to blame for having allowed this, and not for the editor, to be disabled. Try to write a dozen other news a day without copy-paste, with an analysis of sources to confirm the facts.

    It's not about fulfilling all the whims of the editors. You just have to be patient with the desires and wishes of your colleagues - you are doing a common thing.

    Total


    Restarting a media publication is not much different from others. Team cohesion is just as important. You should be one step ahead of the editors, anticipating the new functionality that they will want to bring to life. We design - we do, we design - we remodel. It is fraught to underestimate the role of monitoring the status of servers, applications, and client parts. After the restart, you have to pay a lot of attention to analyzing the behavior of your new platform. Bring the matter to the end.

    We will look forward to your questions, perhaps to continue.

    Also popular now: