TipTop December 29, 2010 at 19:59

Google App Engine Data Storage

The basis for this article is taken post in Nick Johnson's blog (Nick Johnson). In addition to it, a few figures that are relevant at the moment are given and some notes are added.

App Engine provides many ways to store information. Some (for example, a data warehouse) are well known, but others are almost nonexistent, and all of them have different characteristics. This article will list the various possibilities and describe the advantages and disadvantages of each of them, so that you can make decisions with more information about data storage capabilities.

Data Warehouse (datastore)

The most famous, used and flexible data warehouse. Datastore is an App Engine non-relational database, it provides reliable long-term storage, and also provides maximum flexibility in storing, receiving and processing data.

Advantages:
- Reliable - data is stored seriously and permanently.
- Read and write - applications can both read and write data in a datastore. Also, datastore provides a transaction mechanism to ensure integrity.
- Consistent - the type of storage is the same for all instances of the application.
- Flexible - queries and indexing provide many ways to query and retrieve data
Disadvantages:
- Speed - since the datastore stores data on disk and provides guaranteed reliability, the writing process requires waiting for confirmation that the data has been saved, and the reading process is forced to take data from the disk.
How and where to use:

A specially trained datastore description is here .
The datastore should be used wherever it is necessary to reliably save data used in the future by the application.
Where better not to use:

Often, developers write to the database debugging and technical information that only they need. For such cases, the built-in App Engine logs are much better suited, we will talk about them below.

Memcache

Memcache is known as a mechanism of "secondary" data storage. The memcache API enables applications to optimistically cache data to avoid costly operations. Memcache is often used as a cache level for other APIs, such as a datastore, or for caching the results of any calculations.

Advantages:
- Fast - memcache access time is usually a few milliseconds
- Consistent - the type of storage is the same for all instances of the application. Memcache also provides atomic operations, so applications can guarantee the integrity of the data stored in it.
Disadvantages:
- Unreliable - data can be deleted from memcache at any time.
- Not always available - memcache is not available during App Engine maintenance periods.
How and where to use:

As a datastore cache, urlfetch or calculation results.
Where better not to use:

To store important data, do not forget that they can disappear from memcache at any time.

Instance memory

Application instances can also cache data using global variables or class members. This method provides the highest speed, but has some disadvantages.

Advantages:
- Fast - literally, as fast as possible, as the data is stored in the same process that it requests.
- Convenient - there is no need for an API; data is simply stored in global variables or in class members.
- Flexible - data can be stored in any format in which your program can process it. There is no need for their serialization / deserialization.
Disadvantages:
- Unreliable - instances can start or stop at any time, so applications should only use instance memory as a cache.
- Inconsistently - each instance has its own environment and, therefore, its own global variables. Changes in one instance will not be reflected in other instances.
- Limited capacity - instances have a limit on memory consumption, after which they are destroyed. The limit for data in instance memory is about 50 MB - when using a larger amount, instances will be destroyed very often.
How and where to use:

For caching frequently used and rarely changed data - session information, application settings, guest pages, etc. Using instance memory in dict variables is especially convenient — you can create unique key-data repositories for various types of data.
Where better not to use:

For caching frequently modified data or data with which the user interacts. Different requests of one user can be processed by different instances, and caching in this case will cause significant confusion.

Blobstore

BLOB storage allows you to easily and efficiently store and transfer large amounts of data downloaded by the user.

Advantages:
- Supports large files - up to 2 GB per blob.
- Eliminates the need to write handlers.
- Provides a mechanism for high-performance blob maintenance, especially images.
- Applications can read blobs as if they were local files.
Disadvantages:
- ~~Read only - the application cannot create blobs or modify already loaded ones.~~ On March 30, 2011, the Files API appeared in the App Engine - now the data in the blobstore can be changed.
- To use blobstore, you must enable billing.
How and where to use:

For storing custom images, files and other large objects.
Where better not to use:

BlobProperty in the datastore is better suited for small files with which the application is planned to be interacted with.

Local files

An application can read any files downloaded with the application and not marked as static content using standard file system operations. This adds read-only data that the application might need.

Advantages:
- Fast - reading local files involves only standard disk operations on the machine on which the application instance is running, so the speed is almost the same as memcache.
- Reliable - if the application works, then local files are always available.
- Flexible - you can use any format or mechanism for accessing local files.
Disadvantages:
- Read only - applications cannot modify files.
- Limited size - restrictions are 10MB per file and 150MB per application.
How and where to use:

Storage of application settings, templates, etc.
Where better not to use:

No contraindications noted

Task queue payloads

This is not a storage in its traditional sense, data can be attached to tasks from a taskqueue, which can eliminate the need to use other storage systems.

Advantages:
- Fast - data is sent to the task when it starts, so no additional API calls are required to receive the data.
- Used correctly, avoids the need to store data elsewhere.
Disadvantages:
- Only for one task - the load is only useful as storage for data sent to taskqueue
- Limited size - the size of tasks, including the load, should not exceed 10Kb
How and where to use:

Background processing of data, sending mail, updating the cache - any work, the transfer of which to the background execution will speed up the processing of the response to the user and does not affect the server’s response received by the user.
Where better not to use:

Processing more than 10Kb of data will require the use of other storage methods. Also, do not forget that in some cases tasks from taskqueue can be performed with a significant delay.

Email

In App Engine, email can be used not only to communicate with users, but also for technical purposes. In this case, the method of transferring data is similar to using taskqueue payload, but using email provides more options, for example, transferring data to another App Engine application.

Advantages:
- Flexible - you can send large volumes by sending "regular" mail or send "admin" mail without affecting mail quotas.
- Convenient - a data letter arrives as a POST request, for the convenience of processing which there is a standard InboundMailHandler.
- The ability to exchange data between applications.
Disadvantages:
- Spam - unscheduled emails may come to the application address, additional verification of incoming data is required.
- When sending, you need to use different methods depending on the amount of data - administrators can send letters no more than 16Kb, and sending ordinary letters is relatively expensive.
- For full work with the "admin" mail, it is desirable to include billing. An application with billing enabled can send 3,492,979 letters per day to administrators, while with 5,000 disabled only.
- A non-trivial process of connecting the application address as an administrator - a temporary handler is required to create a Google account at this address and include it in the list of administrators.
How and where to use:

Transferring small amounts of data (up to 16Kb) between applications.
Where better not to use:

Frequent transfer of large amounts of data will quickly use up a mail quota - URLFetch is more suitable for these purposes.

URLfetch

The URL retrieval API allows you to receive information from other hosts using HTTP and HTTPS requests.

Advantages:
- The ability to receive data from other applications / servers.
- Asynchrony - when receiving data asynchronously while waiting, you can perform other calculations.
- Size - the application can receive up to 32MB in one request, however, sending through this API can be no more than 1MB.
Disadvantages:
- Speed depends on the speed of another host.
- Traffic - for the URLFetch service and users, a single traffic quota. Excessive use of URLFetch can lead to denial of service for users.
How and where to use:

Background download and data processing, such as RSS. Interaction with third-party applications, for example with reCaptcha.
Where better not to use:

Obtaining data for the user when it is possible to implement faster methods.

Application Logs

Usually this method is undeservedly forgotten and a datastore is used to collect information about the operation of the application. However, if you do not want to reduce the performance of the application while collecting technical and debugging information, then this method is much better.

Advantages:
- Fast - logging takes a few milliseconds.
- Convenient separation of messages by priority.
Disadvantages:
- Only record - the application does not have access to the logs.
- Only text data, Cyrillic in the logs can cause errors.
- The need to parse - the presence of the request_logs function in the developer’s tools only allows you to get logs in the form of text, a separate parser is required to process it.
How and where to use:

To collect information about the operation of the application, measure the execution time of requests, notify about the slow operation of functions or about emergency situations.
Where it is better not to use:

In some cases, it is more expedient to store application statistics in a datastore. In such cases, it is better to transfer data to the taskqueue task and write it to the datastore in the background.

Conclusion

App Engine provides much more ways to store data than it seems at first glance. Each of them has its own compromises, so it is likely that one (or more) of them will suit your application. Often the optimal solution includes a combination of methods, for example, datastore and memcache, or local files and instance memory.

Tags: