Zav February 15, 2016 at 17:16

You earn on information (why do you need an API and how to correctly design it)

Hello, my name is Alexander Zelenin and I am a web developer.
Information is the basis of any application or service.

More than 10 years ago, I talked with the owner of a poker room, and he showed me a page that brought about $ 10,000 a day . It was a completely banal page. There were no styles or graphics on it. Solid text broken by headers, sections, and links. It just didn’t fit in my head - how can this bring such money?

The secret is that “this” was one of the first comprehensive guides to playing poker online. The page had PageRank 10/10 (or 9, not the point), and in the search results it was the first thing that came across.

The purpose of your application, whatever it is, is to convey (receive, process) some information to the user.

Online store: product information, methods of purchase and delivery.

Even if it is a terrible, ugly and inconvenient site, users will still find the product they are looking for. Especially if you are trading something unique enough (at least in your area). Plus, search engines will help you, leading the user immediately to the right product.

Of course, the conversion may be lower, or the user may not be very happy with the experience with the site, but if the product itself is exactly what he was looking for - everything else will be insignificant.

I do not consider stores selling "on emotions" and purchases that the user can later regret.

Online multiplayer game: information about the player, friends and the world around him

Examples may vary depending on the genre and other parameters, but in general the user is interested in such things as world history, correspondence / communication with allies, information about current events, information about his character / village / ship / anything else.

Very often, the way to access this information goes beyond the boundaries of the game client itself. Using a mobile application, you can check whether someone is attacking you, or put up some goods at an in-game auction without even entering the game itself.

Music streaming service - meta-information + music files

The user wants to find the music he is interested in. All wrappers, smart queues, licensing and other husks are of little interest to anyone.

Of course, it’s good to use licensed content, but if the user cannot find what he was looking for, he will leave and find it elsewhere. On the Internet, people do not remember information as such, they remember the place where they found this information. Therefore, if your site does not have songs of group X, but there is a link to the page of group X where they sell their albums, your service is still a plus, because the user remembered where he got the information about group X and will return to you again look for information about group Y once.

I worked in several musical projects, and very often everything rested precisely on the availability of the necessary tracks, despite dozens of terabytes of data.

Video service - videos

At some point, youtube gained a critical mass of videos and became a market leader. They had not the most convenient site, not the best conditions. In general, a lot was wrong, but it was the abundance of content that attracted visitors, and as a result, there was only more content.

I think you already got the idea. Examples can be cited endlessly (here's another: they don’t go to Wikipedia for design. Moreover, some of the information from Wikipedia is displayed immediately in the search results, without even opening the site itself), and if you think this is not applicable in your case, write in the comments (or by mail / in PM), and I will explain why it is still applicable.

So: whatever you do, information will always be primary. Users will certainly find good, high-quality information and will contact you.

I’ll tell you how to organize work with information so that it is:
1. Scalable - replication, sharding, etc. It is configured WITHOUT interference with the application.
2. Convenient for users - easy to document, understand how to use.
3. Convenient for your developers - rapid prototyping, optimization options only necessary.

This approach does not make sense for you if you have a small project with a small number of components and developers.

Table of contents:

Information Consumers
How to work with information (API)
API inner layer
API outer layer
Heavy Duty Optimization
Users
Scaling
Caching
Versioning
Total

Assumptions / Lack of Information

In the text of the article, I use a number of assumptions: for example, that you already have something implemented or implemented.

Practically in each of the questions one can delve into infinitely; in terms of volume, a detailed analysis draws on a whole cycle of similar articles. If some information was not indicated, I considered that it was not important for the concept perception.

If you think that something is still missing - please let us know, and the article will be supplemented in accordance with the wishes.

Information Consumers

Information consumers can be divided into two categories - internal and external.

Internal ones are your products and services. The difference is that for "their" API can provide much wider functionality with less restrictions. For example, the same Google maps on their own domain could work using webgl, and as a result, much faster, but the built-in ones couldn’t (at the moment, the situation could change, did not check).

External - end users or products not owned by your company. For example, for Google maps you are an external user. Usually access to information from outside is severely limited, and special access keys are often required.

How to work with information?

To work with information, we will provide a web API. We will implement 2 layers (external and internal). A layer means at least a separate application.

Why do i need an API?

The API allows you to provide data in a platform-independent form. It is not always known how and where the data will be used, and the development of the API is a good way to say "we have information - contact us."

All code examples are just one of many implementations. This will provide the ability to use the data regardless of the methods of implementation of the final product (including offline applications, provided at least a one-time access to the network).

First of all, it is necessary to describe the models and data collections. If the application is implemented on Javascript (nodejs on the server), it will be possible to use the same models on both clients and servers.

A model is a description of a certain entity (for example, a music track): its fields, their properties, ways of accessing and providing information. A model can duplicate a database schema, but can also expand it with additional information. Moreover, the model can contain information from several tables / collections and present it as one entity. On the server, the model should be expanded with descriptions of working with tables, server access, and so on. On the client, the model is expanded with data access addresses.

When accessing the data, the model may also contain additional meta-information about the request (runtime, position of the record in the database, communications), virtual fields (for example, if path is stored in the database - the relative path to the file, you can add a virtual url field that will be calculated "on the fly").

As an example, I will give a code describing a certain music service.

The examples will be in Javascript, however, everything described applies to any language. I have done similar things also in php, python and c ++. Everything needs to vary depending on the size of the project.

Model.extend('Track', { // Название модели
	attributes: {
		id: 'integer', // Поле и его тип
		title: 'string',
		url: 'string', // Ссылка/путь до файла
		duration: 'integer',
		album: 'app.model.Album.model', // Связь с другими моделями
		artist: 'app.model.Artist.model'
	}
})

Data validation (validation)

In order not to litter the code, I will omit detailed descriptions of the checks in the examples. If desired, you can specify any number of criteria, texts of validation errors, etc. Validation is applicable both on clients and on servers.

One example:

Model.extend('Track', {
	attributes: {
		...
		title: {
			type: {
				value: 'string',
				errorText: 'Должнен иметь тип «Строка»'
			},
			required: {
				value: true,
				errorText: 'Поле обязательно для заполнения'
			},
			length: {
				min: 5,
				max: 32,
				errorText: 'Длинна поля должна быть от 5 до 32 символов'
			}
		}
		...
	}
})

A collection is a collection of entities (usually of the same nature, that is, for example, music tracks). The data set may also contain additional data related to the set itself. As meta-information, the number of selected tracks, the number of remaining (not selected) tracks, page number, number of pages can be presented. The virtual field can be the total duration of all tracks.

Model.List.extend('Track.List', { // Название коллекции
	attributes: {
		duration: 'virtual' // Виртуальное поле, вычисляется в момент запроса
	}
}, {
	duration: function(tracks) { // Вариант реализации виртуального поля
		return _.reduce(tracks, function(totalDuration, track) {
			return (track.duration || 0) + totalDuration;
		})
	}
})

API inner layer

This layer will be available only to our products.

Since our models already contain a lot of information, we can provide access to data using the minimum amount of code.

Model Expansion

We expand the model on the server and client, describing the name and path. The general implementations of the findOne, update, destroy, and create methods are described in the abstraction of the model and do not require a separate implementation if they are not fundamentally different.

Model.extend('Track', { // 
	findOne: 'GET /track/{id}', // Найти один трек по уникальному идентификатору
	update:  'PUT /track/{id}', // Обновить изменённые поля
	destroy: 'DELETE /track/{id}', // Удалить из базы
	create:  'POST /track' // Добавить новый трек
})

We expand the model only on the server:

Model.extend('Track', {
	'GET /track/top/today': function() { // Возвращает лучший трек за сегодня
		var track = ...;
		...
		return track;
	}
})

We expand the model only on the internal client:

Model.extend('Track', {
	findTodayTop: 'GET /track/top/today'
})

Model.extend('Track.List', {
	findByArtistId: 'GET /track/byArtistId/{artist_id}' // Найти все треки, принадлежащие указаному артисту
})

On this layer, we have maximum query flexibility.

Request and Response Example

app.model.Track.List.findByArtistId({
	format: 'json',
	artist_id: 20974,
	fields: [ // Поля, которые хотим получить в запросе
		track, // Все поля основной модели
		track.artist, // Все поля связанного музыканта
		track.album.name // Название альбома, в который входит данный трек. Без других полей.
	],
	offset: 5, // Пропускаем первых 5 записей
	limit: 10, // Получаем не более 10 записей
	sort: [
		'track.title': 'ASC' // Сортируем треки в алфавитном порядке
	],
	cache: 1800 // Кэшируем на стороне клиента на полчаса
})

In response, we get something like:

{
	"result": {
		"tracks": [
			...,
			{
				"id": 856, // Все собственные поля модели
				"title": "Великолепный трек",
				"url": "/какой/то/путь/до/трека",
				"duration": 216,
				"artist": {
					"id": 20974,
					"title": "Великолепный музыкант",
					... // остальные поля
				},
				"album": {
					"name": "Великолепный альбом" // только имя
				}
			},
			...
		]
	},
	"offset": 5, // Отступ
	"count": 7, // Выбрано сейчас
	"totalCount": 12 // Всего найдено
}

It is not necessary to invest in this way. If you are afraid of duplicates (although, in terms of traffic, then gzip copes with them perfectly), you can collect them in separate fields in the initial result.

API outer layer

The outer layer is available directly to end customers. It can be a website or just an API for third-party developers.

On the outer layer, we DO NOT provide such flexibility as on the inner layer, but only give access to the basic features: the main request parameter, indentation, quantity, etc. And all this with limitations.

In part, it is a proxy to the internal API with a number of important additions.

Immediately an example:

app.get('/api/track/:id', ..., function(req, res) {
	return app.model.Track.findOne({id: req.params.id});
})

Instead of “...” we do the necessary rights checks, modify the request, determine the format. Data is returned in the same format and in the same way as requested.
Those. for http and json, the data will return. For socket and xml, the response will be through the socket and in xml.

Thus, we can completely control what is accessible from the outside, how, on what conditions and so on.

Heavy Duty Optimization

Prior to that, we described working with the database as abstractly as possible, and, of course, such requests will be executed much slower than optimized ones. With the first step, we discover (using the profiler, or in another way convenient for you) that some of the requests is slow.

Suppose we noticed that the query is slowly working, in which a track is selected along with information about the album and the musician. To optimize, we need to create a new internal method:

Model.extend('Track', {
	'GET /track/{id}/withAdditionalData': function() {
		var track = ...;
		// Тут выбираем данные максимально оптимальным способом
		// также используем по возможности кэш
		return track;
	}
})

and change the call on the outer layer to the inner one on the presented one. All. For the end client, nothing has changed, and the cache, and the paths, and the received data are the same, but now everything has become faster.

Users

The main task when working with users is to check their rights.

As soon as the user logs in (the method is not important: cookies, key, another option), we make one request to the inner layer, confirming the identity and obtaining permissions. Further checks we will do on the outer layer.

Scaling

We get great advantages at the scaling stage.

And we can launch the external and internal API layers in any number of instances, balancing the load with the help of balancers. Due to this separation, we can run many applications with an external layer as close to the end client as possible, having received our own CDN network with data caching.

Databases are expanded in classic ways, depending on the task.

Caching

For the inner layer, we cache the results of queries to the databases, and on the outer layer, the results obtained from the inner layer. The end client may also have caching.

In one of the previous examples, there was the line “cache: 1800” - it can provide a cache both on the external layer, storing the result on the server for half an hour, and on the client, adding the result, for example, to localStorage (well, or another client storage).

Versioning

With the development of your project, new methods will appear, and any old ones will leave. To indicate versions, I definitely recommend Semantic Versioning . We are particularly interested in changing the major version of the API, without backward compatibility. API access paths can be divided simply: / api / {version}

There are several ways to organize files and support for different versions, for example:
1. Make folders v1, v2 and put all the code related to them. When modifying one of the API versions, the other is not affected.
2. Various repositories function as different applications.

Total

We have full control over the movement of data at all stages.
The developers are satisfied, they do not have to wait from the API team to implement a super-smart method for receiving data in a certain format.
Developers are satisfied, they can only optimize what slows down.
Customers are satisfied - the API is more stable and does not change the path due to the fact that some request was slow.
Customers are satisfied - the access speed increases due to the location of the server as close as possible.

I will be glad to add additional sections to the article upon your request.
Also, if you want to clarify the code somewhere - I will write and attach, ask.
Perhaps this is the beginning of a large series of articles on various topics ( or the beginning has already been laid ). A lot of material has accumulated, but it is not systematized.

Offtopic question about creating courses for those who want to develop web projects

I want to create a full course on the development of web projects. Right from scratch to full-stack.

According to the plans, it will include: video lectures + text lessons, homework, independent projects, working with a mentor, various intensives, a bunch of code (in the format of starting / finalizing online) and so on. As a feature, I want to make a “network” format, not a sequential one. Those. after passing through certain topics, others open up, and the student can choose what interests him at the moment. According to preliminary estimates, the duration of training will be in the region of six months of full-time classes for a couple of hours a day.

It is clear that such a project is not being implemented so quickly. Therefore, I want to approach its development precisely with the involvement of a partner, on whose basis it can be implemented / sold. Because if I take on the related issues, then the implementation period will be completely beyond the limits. Well, plus different portals have different tools, which I want to take into account.

I have already written a number of projects like coursera with a description of my venture, but I have not received any answers. Moreover, what a shame, in general, no, even failures.

The market has studied and is confident that what I am implementing is more than competitive.

I will be glad to any suggestions for partnership or advice to whom you should contact.

Tags: