Netflix best practices when migrating from Oracle DB to Amazon SimpleDB

Original author: Siddharth “Sid” Anand
  • Transfer
This is a partial translation of a Netflix article that addresses only the problem areas for switching from Oracle DB to Amazon SimpleDB and how the company can solve them.

Around the end of 2008, Netflix had only one data center. And this DC raised several questions for us. As the only point of failure, for example, due to problems with electricity, it could lead to dissatisfaction of our users with the service. In addition, with the simultaneous growth of streaming traffic and subscriptions to services, Netflix would soon outgrow this data center - we saw the inevitable need for electricity, better cooling, we needed more space and more equipment.

Alternatively, it was possible to build new data centers. However, in addition to high costs, this effort would lead to the fact that our technical staff could not deal with new products, so they would be busy expanding DC. In addition, we understood that managing multiple data centers is a difficult task. The construction and support of several data centers seemed to us a dangerous distraction from our core business.

Instead of embarking on this path, we have chosen a more radical one. We switched to the IAAS (infrastructure as a service) solution, which was offered at that time by the Amazon web service. With many data centers already operating, multiple redundancy levels for various services (such as S3 and SimpleDB), AWS promised better availability and scalability in a relatively short time.

Outsourcing various network and background tasks, Netflix focused on its core business, the supply of films and series.

In the transition to AWS, we formulated a set of best practices for working with AP systems such as SimpleDB.

Leaving the DBMS behind


Partial or nonexistent SQL support. In general, SimpleDB supports sub-SQL

  • Use GROUP BY and JOIN operations at the application level.
  • One way to avoid having to use JOIN is to denormalize multiple tables into one SimpleDB logical domain

Lack of connections between domains

  • Implement application level communications

Lack of transactions

  • Use the SimpleDB API: ConditionalPut and Conditional Delete

No triggers

  • It is possible to do without them

There is no circuit support - moreover, this is not obvious. A query with an invalid attribute name does not result in an error

  • Implementation of circuit validation at the application data access level.

Lack of sequence support

Sequences are often used as primary keys.

  • In this case, it is necessary to use a natural unique key, for example, in the customer contact domain, use the client’s mobile phone as the key.
  • If there are no natural keys, you must use the UUID.

Sequences are also often used for order numbers.

  • Use a distributed sequence generator.

No operations to work with time

  • You can do without them

There is no support for restrictions, in particular, there are no restrictions on the uniqueness of a field, there is no control of a foreign key, there are no integrity restrictions.

  • An application can check for limitations while reading data and fix the problem after the fact. This is called read-repair. Reading recovery must use the ConditionalPut or ConditionalDelete API so that the changes are atomic.

New challenges in SimpleDB


In addition, there were features that we encountered that were specific to SimpleDB. Here is some of them:

SimpleDB domains provide maximum write speed if you split the data into multiple domains.

  • All Netflix data with significant write load was distributed across multiple domains.

There is no support for native data types. All data is stored and processed as strings.

All comparisons (i.e. WHERE conditions) and sorting occurs only with strings.

  • Store all dates in ISO 8601
  • Add zeros in front of numbers used in sorting and / or WHERE comparisons.

Two separate API calls for DeleteAtributes and PutAttributes.

How to perform an atomic operation that requires both deleting one attribute and updating another attribute on the same line?

  • The easiest option is to use pseudo zeros (for example, the word NULL) instead of the DeleteAttributes operation.
  • This negates the optimization of the free space of SimpleDB tables and leads to data bloating.

Using case-sensitive domain names and attributes

In many DBMSs, table and column names are case insensitive, while in SimpleDB, on the contrary, put, delete, and select operations may work incorrectly without even reporting an error. The programmer's task is to detect all cases of name case mismatch.

  • Accept an agreement whereby only domain names and uppercase names can be used.
  • In the application at the data access level, it is necessary to ensure automatic casting to upper case.

Typos in the select, put, or delete attribute names may end without error notification.

Unlike case sensitivity, this problem arises from the lack of circuit validation. Simple DB is not only a sparse database, but also without schema support.

  • Implement a single point of access to the layer data in the application and implement verification there.

If you forget to specify LIMIT in select, it may take several queries to get all the data.

Multiple requests reduce the likelihood that the site will receive all the data for the required time interval.

  • Имя единую точку доступа к данным на уровне приложения, можно устанавливать LIMIT непосредственно в нем.
  • Максимальное значение может в любой момент быть изменено, если Amazon увеличит лимит.

Необходимо также иметь ввиду проблемы «целостности в конечном итоге» (eventual consistency).

Необходимо избегать анти-паттерна чтение непосредственно после записи.

  • Избегайте чтения непосредственно после записи.
  • Если это невозможно используйте ConsistentRead

Неиндексированные запросы могут быть очень дорогими

Анти-паттерн: SELECT * FROM MY_DOMAIN WHERE MY_ATTR_1 IS NULL

  • Используйте отдельный флаговый атрибут со строковым значением TRUE или FALSE вместо проверки на NULL. Таким образом, запрос будет использовать индекс: SELECT * FROM MY_DOMAIN WHERE MY_ATTR_1 = ‘FALSE’

Некоторые запросы работают медленно, хотя и проиндексированы.

Index selectivity affects speed just like other SQL engines

  • As in other databases, the performance of queries to select data is determined by the selectivity of the indices specified in WHERE. Make sure that you understand the selectivity of your indexes and that your WHERE expression contains the best of them.

As with any other multi-user system, significant jumps in response time can occur.

  • Protect your application from jumps that occur in SimpleDB or S3 with a caching frontend such as MemCached

Having decided to switch to SimpleDB and S3, Netflix quickly migrated to the cloud infrastructure, fully equipped with new aggressive plans for product launches and increased traffic. There were problems, but we managed with almost everyone.

Also popular now: