Evernote Service Architecture Overview

    image
    As promised, we began to translate some posts from the English-language Technoblog Evernote , in which our engineers and developers talk about some details of the technical implementation of the service, share stories and simply interesting facts from their work.

    Today we would like to begin our review of the Evernote architecture with a general idea of ​​the structure of the service. Now we will not go into details regarding each component, leaving this for future posts.

    Let's start with the scheme presented above. All statistics indicated there are given on May 17, 2011.

    Network: Almost all traffic to / from Evernote goes to www.evernote.comvia the HTTPS port 443. It includes both all “web activity” and synchronization with all software clients through our Thrift- based API . In total, about 150 million HTTPS requests per day are generated with peak traffic of about 250 Mbit / s. Our nightly shift of administrators is especially “glad” to this circumstance, since the daily peak occurs at about 6:30 am Pacific time.

    We use BGP to send traffic directly through a fully independent network of channels provided by our primary ( NTT ) and additional ( Level 3 ) providers. It is filtered through Vyatta on the way to the A10 load balancers.that we installed in January when we reached the SSL performance limit for our old balancers. It’s quite convenient for us to process existing traffic using one AX 2500 plus a fault-tolerant server, but we are preparing to test their N + 1 configuration when creating our cluster in order to be ready for future growth.

    Shards: The core of the Evernote service is a server farm, which we call shards. Each shard processes all data and all traffic (web and API) for a cohort of 100,000 Evernote users. Since we already have over 9 million users, we get about 90 shards.

    Each shard is a pair of SuperMicro serverswith dual-core Intel processors, massive amounts of RAM, and fully clogged up Seagate industrial drives configured in mirrored RAID arrays. At the head of each server, we have a Debian host that manages a pair of Xen virtual machines . The core from the following set of applications is launched in the main virtual machine: Debian + Java 6 + Tomcat + Hibernate + Ehcache + Stripes + GWT + MySQL (for metadata) + hierarchical local file systems (for file data).

    All user data in the main virtual machine on one server is synchronously duplicated in the additional machine on another server using DRBD . This means that each byte of user data is stored on at least four different industrial drives physically located on two different servers. And add to that the nightly backup. If we have any problem with the server, we can transfer the work from the main virtual machine to an additional one located on another server, with minimal downtime, using Heartbeat .

    Since the data of each user is stored completely locally on one (virtual) host of the shard, we can run each shard as an independent island with virtually no interference and regardless of the situation in the rest of the farm. It also means that problems with one shard do not affect the work of others.

    The lion's share of the work of sending users to their shards is accounted for by load balancers, who have a set of instructions for finding shards using URLs and / or cookies.

    UserStore:Although the vast majority of all data is stored on logically independent shards (“NoteStore”), all of them are associated with a single database of accounts “UserStore” (also works on MySQL) with a small amount of information about each account, such as username, MD5 password function and user shard identifier. This database is small enough to fit in RAM, but we also provide the same reliable backup using a similar combination of mirrored RAID and DRBD replication for secondary and night backups.

    Image Recognition Farm:In order to provide a search for words inside images in your notes, we have allocated a pool of 28 servers that use their 8-core processors daily to process new images. On busy days, the flow can reach 1.3 or 1.4 million individual images. Currently, a combination of Linux and Windows is used there, but the other day we plan to complete the migration to Debian to get rid of unnecessary dependencies.

    These servers provide the recognition process we call AIR (“Advanced Imaging and Recognition”). The software for it was developed by our R&D team. This software processes each image, identifies areas containing text, and then tries to compile a weighted list of possible matches for each word using “recognition engines” that generate sets of assumptions. The work uses both the mechanisms developed by our team that specializes in AIR (for example, handwriting recognition) and licensed technologies from commercial partners with the best developments in their industry.

    Other services:All of the above servers are installed in pairs at the racks of our data center in Santa Clara, California. In addition to the hardware that provides the basic operation of the service, we also have small server groups for simpler tasks that require one or two Xen servers or virtual machines. For example, the operation of our SMTP gateway for incoming mail is provided by a couple of Debian servers with Postfix and a special mail processor written in Java installed on top of Dwarf . Our Twitter gateway for sending notes to your account using @myen is just a small script using twitter4j .

    Our corporate website is powered by Apache , blogs are powered byThe WordPress . Most of our internal redundancy topology is provided by HP products. We use Puppet for configuration management , and Zabbix , Opsview and AlertSite for monitoring . Night backup is provided by a combination of different software that transfers data via a dedicated gigabit channel to a backup data center.

    Wait, why? We understand that this post raises many obvious questions about why we chose X and not Y in a variety of situations. Why did we choose our own servers instead of using the services of a cloud provider? Why did we prefer old software (Java, SQL, local storage, etc.) instead of applying the latest recipes?

    We will try to answer these questions in more detail in the next few months.

    Also popular now: