Using Golang to Create Microservices in The Economist: A Retrospective

Original author: https://www.infoq.com/profile/K-Jonas
  • Transfer
Hello! Already on May 28, we are launching the first group at the Golang Developer course . And today we are sharing with you the first publication dedicated to the launch of this course. Go.



Key Excerpts

  • The Economist needed more flexibility to distribute content to an increasingly diverse digital channel. To achieve this goal and maintain a high level of performance and reliability, the platform has moved from a monolithic to a microservice architecture.
  • Tools written in Go were a key component of the new system, which enabled The Economist to provide scalable, high-performance services and quickly create new products.
  • Go, aimed at concurrency and API support, along with its construction of a static compiled language, facilitated the development of distributed event processing systems that could scale. Testing support was also a plus.
  • In general, The Economist’s team experience with Go was positive, and this was one of the decisive factors that helped scale the Content Platform.
  • Go will not always be a suitable tool, and this is normal. The Economist has a polyglot platform and uses different languages ​​where it makes sense.

I joined The Economist's development team as a Drupal developer. However, my real task was to participate in a project that would fundamentally change the Economist content delivery technology. The first few months I spent studying Go, several months working with an external consultant to create an MVP (minimum viable product), and then joined the team again to oversee their immersion in Go.
This shift in technology was triggered by The Economist's mission to expand its digital audience, as news consumption moved away from print media. The Economist needed more flexibility to deliver content to more and more diverse digital channels. To achieve this goal and maintain a high level of performance and reliability, the platform has moved from a monolithic to a microservice architecture. Tools written in Go were a key component of the new system, which enabled The Economist to provide scalable, high-performance services and quickly create new products.

Implementing Go in The Economist:

  • Allowed engineers to quickly develop and implement new functionality.
  • Approved best practices for fast-failing services with intelligent error handling.
  • Provided reliable support for concurrency and network operation in a distributed system.
  • It showed a lack of maturity and support in some areas needed for content and media.
  • Facilitated a platform that could scale for digital publishing.

Why did The Economist choose Go?

To answer this question, it will be useful to highlight the overall architecture of the new platform. The platform, called Content Platform, is an event handling system. It responds to events from different content authoring platforms and launches a stream of processes executed in separately working microservices. These services perform functions such as standardizing data, analyzing semantic tags, indexing in ElasticSearch, and sending content to external platforms such as Apple News or Facebook. The platform also has a RESTful API, which in combination with GraphQL is the main entry point for front-end clients and products.

In developing a common architecture, the team investigated which languages ​​would fit the needs of the platform. Go has been compared with Python, Ruby, Node, PHP, and Java. While each language has its own strengths, Go is best suited to the platform architecture. Go, aimed at concurrency and API support, along with its construction of a static compiled language, facilitated the development of distributed event processing systems that could scale. In addition, the relatively simple syntax of Go made it easy to get involved in development and start writing working code, which promised immediate benefits for a team undergoing such a large technological transition. In general, it was determined that Go is the language most suitable for usability and efficiency in a distributed cloud system.

Three years later: did Go fit with these ambitious goals?

Some platform design elements were well aligned with the Go language. Failing Fast was a critical part of the system because it consisted of distributed independent services. In accordance with the principles of the Twelve-Factor App ("12-factor application"), the application had to start quickly and quickly fail (fast fail). Go's design as a static, compiled language provides fast startup times, and compiler performance is constantly improving and has never been a problem for design or deployment. In addition, Go's error handling design allowed applications to fail not only faster, but also smarter.

Error processing

A feature that engineers quickly notice in Go is that it is of type Error rather than an exception system. In Go, all errors are values. The Error type is predefined and is an interface. An interface in Go is essentially a named collection of methods, and any other user type can satisfy an interface if it has the same methods. The Error type is an interface that can describe itself as a string.

type error interface {
    Error() string
}

This gives engineers more control and functionality in error handling. By adding an Error method that returns a string in any user module, you can create your own errors and generate them, for example, using the New function below, which comes from the Errors package.

type errorString struct {
    s string
}
func (e *errorString) Error() string {
    return e.s
}

What does this mean in practice? In Go, functions allow multiple return values, so if your function may not work, it will most likely return an error value. The language encourages you to explicitly check for errors where they occur (as opposed to throwing and catching an exception), so your code should usually include a check “if err! = Nil. " At first, this frequent error handling may seem monotonous. However, error as a value allows you to use Error to simplify error handling. For example, in a distributed system, you can easily implement attempts to retry queries by wrapping errors.

Network problems will always occur in the system, whether it is sending data to other internal services or transferring it to third-party tools. This Net packet example shows how you can use the error as a type to distinguish temporary network errors from permanent errors. The Economist team used a similar error wrapper to create incremental retries when sending content to external APIs.

package net
type Error interface {
    error
    Timeout() bool   // Is the error a timeout?
    Temporary() bool // Is the error temporary?
}
if nerr, ok := err.(net.Error); ok && nerr.Temporary() {
    time.Sleep(1e9)
    continue
}
if err != nil {
    log.Fatal(err)
}

Go authors believe that not all exceptions are exceptional. Engineers are more likely to recover intelligently from errors than to crash the application. In addition, Go error handling allows you to better control errors, which can improve aspects such as debugging or usability of errors. Within the Content Platform, this design feature of Go allowed developers to make informed decisions regarding errors, which led to an increase in the reliability of the system as a whole.

Data consistency

Data consistency is a critical factor in the Content Platform. At The Economist, content is the foundation of the business, and the goal of the Content Platform is to ensure that content can be published once and is available everywhere. Therefore, it is important that each product and consumer has data consistency with the Content Platform API. Products mainly use GraphQL for API requests, which requires a static scheme, which serves as a kind of contract between consumers and the platform. Content processed by the Platform must be consistent with this scheme. The static language helped to implement this and made it easy to achieve data consistency.

Testing with Go

Another feature that promotes consistency is the Go test suite. Go’s fast compilation time, combined with first-class testing as a feature of the language, allowed the team to incorporate effective testing methods into design workflows and fast failures in assembly pipelines. Go's test tools make it easy to set up and run. Running “go test” will run all the tests in the current directory, and the test command has several useful flags. The cover flag provides a detailed code coverage report. The bench test runs benchmark tests, which are indicated by running the name of the test function with the word “Bench” rather than “Test”. The TestMain function provides methods for additional test setup, such as a dummy authentication server.

In addition, Go has the ability to create tabular tests with anonymous structures and stubs with interfaces, improving test coverage. Although testing is not new in terms of language features, Go makes it easy to create robust tests and easily integrate them into workflows. From the very beginning, The Economist engineers were able to run tests as part of the assembly pipelines without special configuration, and even added Git Hooks to run tests before pushing code into Github.

However, the project was not without effort in achieving data consistency. The first major issue for the platform was managing dynamic content from unpredictable backends. The platform consumes content from the original CMS systems mainly through JSON endpoints, where the structure and data types are not guaranteed. This meant that the platform cannot use the standard Go package to interpret json, which supports JSON deserialization into structures, but it sounds an alarm if the types of the struct and input data fields do not match.

To overcome this problem, a special method was required to map the server part to the standard format. After several iterations of the chosen approach, the team introduced its own deserialization process. Although this approach was a bit like reworking a standard library package, it gave engineers complete control over the processing of the source data.

Network support

Scalability was at the forefront of the new platform, and this was provided by standard Go libraries for networking and APIs. In Go, you can quickly implement scalable HTTP endpoints without the need for frameworks. In the example below, the net / http standard library package is used to configure a handler that accepts a request and response writer. When the Content Platform API was first implemented, it used the API framework. It was eventually replaced by a standard library, as the team acknowledged that it meets all their network needs without additional unnecessary compromises. Golang HTTP handlers scale because each request to the handler is executed in parallel in Goroutine, a lightweight thread without the need for customization.

package main
import (
    "fmt"
    "log"
    "net/http"
)
func handler(w http.ResponseWriter, r *http.Request) {
    fmt.Fprintf(w, "Hello World!")
}
func main() {
    http.HandleFunc("/", handler)
    log.Fatal(http.ListenAndServe(":8080", nil))
}

Concurrency

Model The Go concurrency model has provided multiple benefits in improving performance across the platform. Working with distributed data involves fussing with the guarantees promised to consumers. According to the CAP theorem, it is not possible to provide more than two of the following three guarantees at the same time: Data consistency. Availability. Resistant to separation. In the Economist platform, consistency was ultimately adopted, which means that reading from data sources will ultimately be consistent, and moderate delays in all data sources reaching a consistent state are acceptable. One way to minimize this gap is to use Goroutines.

Goroutines are lightweight threads managed by the Go runtime to prevent them from running out of threads. Goroutines allowed to optimize asynchronous tasks on the platform. For example, one of the Platform’s data repositories is Elasticsearch. When content is updated on the system, content that references this item in Elasticsearch is updated and reindexed. Thanks to the implementation of Goroutines, the processing time was reduced, which ensured quick consistency of elements. This example shows how items that are suitable for reprocessing are reprocessed in Goroutine.

func reprocess(searchResult *http.Response) (int, error) {
	responses := make([]response, len(searchResult.Hits))	
	var wg sync.WaitGroup
	wg.Add(len(responses))
	for i, hit := range searchResult.Hits {
		wg.Add(1)
		go func(i int, item elastic.SearchHit) {
			defer wg.Done()
			code, err := reprocessItem(item)
			responses[i].code = code
			responses[i].err = err
		}(i, *hit)
	}
	wg.Wait
	return http.StatusOK, nil
}

Designing systems is more than just programming. Engineers need to understand which tools where and when are appropriate. While Go was a powerful tool for most of The Economist's Content Platform's needs, some limitations required other solutions.

Dependency management

When Go was just released, it didn't have a dependency management system. Within the community, several tools have been developed to meet this need. The Economist used Git submodules, which made sense at a time when the community was actively promoting a standard dependency management tool. Today, although the community is already much closer to a coherent approach to dependency management, it is not there. The The Economist approach using submodules did not pose serious problems, but it was difficult for other Go developers, and this should be taken into account when switching to Go.

There were also platform requirements for which Go features or design were not the best solution. Because the platform added support for audio processing, Go's tools for extracting metadata were limited at the time, and so the team chose Exiftool Python instead. Platform services work in docker containers, which allowed installing Exiftool and launching it from the Go application.

func runExif(args []string) ([]byte, error) {
	cmdOut, err := exec.Command("exiftool", args...).Output()
	if err != nil {
		return nil, err
	}
	return cmdOut, nil
}

Another common scenario for the platform is the reception of non-working HTML code from the source CMS systems, analysis of the HTML code for correctness and sanation of the HTML code. Initially, Go was used for this process, but since Go's standard HTML library requires correct HTML, it required a large amount of custom code to parse HTML before processing. This code quickly became fragile and missed borderline cases, so a new solution was implemented in Javascript. Javascript has provided great flexibility and adaptability to control the process of checking and sanitizing HTML.

Javascript has also been a common choice for filtering and routing events in the Platform. Events are filtered using AWS Lambdas, which are lightweight functions that only run when called. One use case is filtering events on different bands, such as fast and slow. This filtering is based on a single metadata field in the event handler shell JSON object. The filtering implementation used a JSON Javascript pointer package to capture an element in a JSON object. This approach was much more efficient compared to completely disassembling the JSON that Go would need. While functionality of this type could be achieved with Go, using Javascript was easier for engineers and provided simpler lambdas.

Retrospective of Go

After implementing the Contact Platform and supporting it in production, if I were to conduct a retrospective of Go and the Content Platform, my review would be as follows:

What is already good?

  • Key language design elements for distributed systems.
  • A concurrency model that is relatively easy to implement.
  • Nice coding and fun community.

What can be improved?

  • Further advancement in standards of version control and vending.
  • Not enough maturity in some areas.
  • Details for specific user cases.

In general, it was a positive experience, and Go is one of the most important elements that allowed to scale the Content Platform. Go will not always be a suitable tool, and this is normal. The Economist has a polyglot platform and uses different languages ​​where it makes sense. Go will probably never be the best choice when you need to mess with text objects and dynamic content, so Javascript is still in the toolbox. However, Go's strengths are the foundation that allows the system to scale and grow.
When considering whether Go is right for you, consider the key issues of system design:

  • What are the tasks of your system?
  • What guarantees do you provide to your consumers?
  • What architecture and patterns are appropriate for your system?
  • How should your system scale?

If you are developing a system that addresses the challenges of distributed data, asynchronous workflows, and high performance and scalability, I recommend that you consider Go and its capabilities to accelerate your system goals.

Friends, we are waiting for your comments and invite everyone to the open webinar , which will be held on the 16th by the senior developer at Yandex and, in combination, our teacher is Dmitry Smal .

Also popular now: