Library development: from API to public release

Let's look at the library from the wrong side, which is more familiar to us, that is, the user side, but from the point of view of the developer of the mobile development library. Let's talk what approaches should be followed when developing your library. We begin, of course, with the design of such an API that you yourself would like to use, which would be convenient. Let's think about what needs to be considered to make not just a working code, but a really good library, and we will get to release a real adult public release. Asya Sviridenko , who will share her considerable experience in developing the mobile SpeechKit library in Yandex, will help us in this .

The material will be useful not only to those who develop a library or framework, but also to those who want to separate part of their application into a separate module, and then reuse it, or, for example, share their code with the rest of the developer community, putting it in public access.

For everyone else, the story will be filled with genuine stories from the life of the mobile SpeechKit team, so it should be fun.

Content

Minute SpeechKit .
Designing a convenient, understandable API that you want to use.
Development . What to add to the code so that it not only works and performs the functionality, but also helps your users.
Running - what you need not to forget when you roll the release.

Minute SpeechKit

I will not ask if you heard about SpeechKit, because even inside Yandex, not everyone knows what it is.

SpeechKit is the door to all Yandex speech technologies . Using this library, you can integrate speech technologies into your application: speech recognition and synthesis, voice activation.

You’ve probably heard about Alice’s voice assistant - she’s just running on SpeechKit. SpeechKit itself does not include recognition or synthesis, it happens on the server. But it is through our library that everything can be integrated into the application.

Next comes the question: if everything happens on the server, what does the library do? Why is it needed?

The library does a lot:

Synchronization of all processes. For example, using a voice assistant, the user presses a button, says something, interrupts the assistant, makes requests — it all goes through the library. For a user of our library, this is transparent; they should not worry about all this.
Networking. Since everything happens on the server, you need to receive data from there, process it, give it to the user. Now SpeechKit can go to several different servers within one network connection: one is engaged in recognition, the other is highlighting the meaning, the third is recognizing music, etc. It's all hidden inside the library, users don't have to worry about it.
Work with audio sources. We deal with the speech of a person, and work with audio also takes place inside SpeechKit. And we can not only write from a standard device, but also receive data from anywhere. It can be a file or stream - we can work with all this.

SpeechKit is used in internal commands. Now it was integrated by 16 teams of Yandex. And we even know about several external teams that also did it.

Design

Let's think about what we mean by a convenient application. Usually, this is a thoughtful and understandable UX, the solution of our problems, stable work, etc.

When we say that the library is convenient, first of all we mean that it has such an API that is understandable to use. How to achieve this?

Basic principles

These are some aspects that I learned from my experience with SpeechKit.

First of all, remember that your users are developers .

On the one hand, this is good, because you do not explain to ordinary users: “You see, we have a backend and therefore nothing works, and we are fine!” You can explain this to developers — you can explain a lot to developers!

On the other hand, you get such users who will definitely take the opportunity to find a hole and break something if you leave it. We all use libraries with you and try to squeeze the most out of them. They declare that they do only this, this and this, and we think: “No, now we are going to give a little bit here, we will give it, and everything will be as it should.”

In addition, the fact that users are developers means that you will always have a lot of advice and recommendations on how to develop and how to make things better.

The second important point is fully correlated with the first.

Anything that is not allowed in your library should be prohibited
so that there are no unwanted loopholes.

If your users start doing something with the library that you did not expect, this is a direct path to bugs, and to those that are hard to debug. Try to use everything that gives the language and the technology that you use: public / private, final, deprecated, readonly. Reduce scopes, disable inheritance and use of some methods, mark properties that cannot be changed - consider everything you can to prevent them from doing something that your library is simply not designed for.

Do not allow double points in the interpretation of your library's API.

If this particular class can be created in the only way - deny all others. If this property cannot be null, indicate this explicitly. In iOS, there is a nullable / nonnull, designated initializer, the same is in Java and Android. Use all this in order for the user to open the file, open your class, run through it with his eyes and immediately understand what can and cannot be done.

Case SpeechKit API

Using SpeechKit as an example, I’ll tell you how we refactored version 2 into version 3. We changed the API a lot and tried to use all these principles.

The need arose from the fact that the API was complex and "theoretical . " It had global components that had to be called first - did not call - everything does not work. Very strange settings were made. The API was quite “theoretical” because SpeechKit was originally part of the Navigator, and then this piece was brought to the library. The API essentially worked with the cases used in the Navigator.

Gradually, the number of users grew, and we began to understand what they really needed: what methods, callbacks, parameters. They came to us with requests that the API did not allow to implement. This was repeated time after time, and it became clear that the API does not stand up. So we got involved in refactoring.

The refactoring process was long (six months) and painful (everyone was unhappy) . The main difficulty was not to take a mountain of code and rewrite. It was impossible to just go on refactoring, but it was necessary to support all the active versions that were in use. We couldn’t just say to our users: “Guys, yes, it doesn’t work for you, yes, you need this feature - we’ll do everything in version 3, please wait half a year!”

As a result, refactoring took a lot of time, and the process was painful, and for users too. Because in the end, we changed the API without backward compatibility. They came to them and said: “Here is a new, beautiful SpeechKit, please take it!” - in response, they heard: “No, we do not plan to switch to your version 3.0 at all”. For example, we had a team that switched to this version for a year. Therefore, we supported the previous version for a whole year.

But the result was worth it. We got simple integration and fewer bugs.. This is what I mentioned in the basic principles of API design. If you are sure that your API is used correctly, there are definitely no problems in this part: all classes are called correctly, all parameters are correct. Finding bugs is much easier, fewer cases where something can go wrong.

Below is an example of what the main class that deals with recognition looked like before refactoring.

// SpeechKit v2@interfaceYSKRecognizer: NSObject@property (nonatomic, strong, readonly, getter=getModel) NSString* model;
@property (nonatomic, assign, getter=isVADEnabled) BOOL VADEnabled;
- (instancetype)initWithLanguage:(NSString *)language model:(NSString *)m;
- (void)start;
- (void)cancel;
- (void)cancelSync;
@end@interfaceYSKInitializer: NSObject
- (instancetype)init;
- (void)dealloc;
- (void)start;
+ (BOOL)isInitializationCompleted;
@endexternNSString *const YSKInactiveTimeout;
externNSString *const YSKVADEnabled;
@interfaceYSKSpeechKit: NSObject
+ (instancetype)sharedInstance;
– (void)setParameter:(NSString *)name withValue:(NSString *)value;
@end

This is a common class that inherits from NSObject. Consider separately every detail of it. It is clear that we can inherit from it, redefine some methods in it - everything that can be done with NSObject.

Further at creation two lines are transferred to it (language and model). What are these lines? If you pass in the language "Hello, world", then the output will be a translation, or what? Not very clear.

In addition, since this is the heir to NSObject, we can call it init, new, etc. What will happen? Will it work, or will it wait for some parameters?

Of course, I know the answers to these questions, I know this code. But people who look at it for the first time do not understand at all what this is all about. Even methods with setter and getter do not look at all what it would look like in iOS. Methods start, cancel, cancelSync (and the one that just cancel - is it aSync?) - what happens if you call them together? A lot of questions to this code.

Next comes the object I was talking about (YSKInitializer), which you definitely need to start in order for everything to work - this is generally some kind of magic. It can be seen that this code was written by developers who do not write for iOS, but do C ++.

Further settings for this replicator were set via global components that were transferred to another global object, and in fact it was impossible to create two different replicators with different sets of parameters. And this was probably one of the most sought-after cases that did not support the API.

How v3 better v2

What did we get when we refactored and switched to version 3?

Fully native API.

Now we have an iOS API like iOS-API, Android API like Android.

An important point that we did not immediately realize was that platform guidelines are much more important than the uniformity of your library's API.

For example, classes for Android are created using builders, because this is a very clear pattern for Android developers. In iOS, this is not so popular, so another approach is used: we create objects with a special class of settings.

I remember how we argued for a long time on this topic. It seemed to us important that the developer took our code on iOS or Android, and the coincidence would be 99%. But it is not. Let the code be better similar to the platform for which it is developed.

Simple and clear initialization .

We need this object - these are its settings, you create them, you pass them - a profit! That is, there are no hidden global settings that need to be transferred somewhere.

Lack of global components.

We threw out global components that confused everyone, scared and raised many questions even from the developers of this library, not only from the users.

Now the same class in the new version looks like this (this is still Objective-C - it was impossible to switch to Swift then).

// SpeechKit v3NS_ASSUME_NONNULL_BEGIN
__attribute__((objc_subclassing_restricted))
@interfaceYSKOnlineRecognizer: NSObject<YSKRecognizing>@property (nonatomic, copy, readonly) YSKOnlineRecognizerSettings *settings;
- (instancetype)initWithSettings:(YSKOnlineRecognizerSettings *)s audioSource:(id<YSKAudioSource>)as
NS_DESIGNATED_INITIALIZER;
+ (instancetype)new __attribute__((unavailable("Use designated initializer.")));
- (instancetype)init __attribute__((unavailable("Use designated initializer.")));
@endNS_ASSUME_NONNULL_END@protocolYSKRecognizing <NSObject>
- (void)prepare;
- (void)startRecording;
- (void)cancel;
@end@interfaceYSKOnlineRecognizerSettings: NSObject<NSCopying>@property (nonatomic, copy, readonly) YSKLanguage *language;
@property (nonatomic, copy, readonly) YSKOnlineModel *model;
@property (nonatomic, assign) BOOL enableVAD;
- (instancetype)initWithLanguage:(YSKLanguage *)l model:(YSKOnlineModel *)m NS_DESIGNATED_INITIALIZER;
@end@interfaceYSKLanguage: YSKSetting
+ (instancetype)russian;
+ (instancetype)english;
@end

This is the heir to NSObject, but now we are clearly talking about not being inherited from it. All methods that are characteristic of this object are transferred to a special protocol. It is created using the settings and audioSource. Now all the settings are encapsulated in a single object, which is passed specifically here to set up the settings for a specific refiner.

Moreover, we brought the work with audio out of here, that is, the reclaimer now is not the component that writes audio. This component deals with recognition issues, and any source can be passed here.

Other creation methods through new or through init are prohibited, because this class needs default settings. Please, if you want to use it, create at least some default settings.

The main thing is that those settings that are transmitted here are immutable, that is, you cannot change them in the process of work. Do not try, when something is recognized, to replace the model or language. Accordingly, we do not give users the opportunity to change an object with settings that has already been transferred.

NS_ASSUME_NONNULL_BEGIN / NS_ASSUME_NONNULL_END macros to emphasize that these settings cannot be null: audioSource cannot be null - it all must have some specific value in order to work.

As I said, the start and cancel methods (cancelSync gone) moved to a separate protocol. There are places in the library in which you can use any other not our recliner. For example, we use Apple native, which implements this protocol and into which our components can be transferred.

The settings here are NSCopying so that we can copy them, and they could not be changed during work. In init, the required parameters are language, model, and NS_DESIGNATED_INITIALIZER. It does not show the part of the code that is identical to the deprecate methods, but the idea is clear. These are required parameters with which settings are created. They must be, and must be nonzero.

The rest of the set is about 20 settings of the rekgnizer are set here. Even the settings of a language or model are also separate classes that do not allow us to transmit something abstract, with which we cannot work. That is, we clearly say: “Please do not give us something with which we do not know how to work. The compiler will not let you do it. ”

So, we talked about what can be done with the API. The development also has its own nuances.

Development

First of all, the library should do what you wrote it for - to perform its functionality well. But you can make your code a really good library. I offer several remarks that I collected in the process of developing SpeechKit.

Code not only for yourself

Collecting Debug information is absolutely necessary, because you do not want users to say that their service is not working because of your library.

IOS has a debug information level that shows what information you need to collect. By default, it will collect absolutely everything it can find: all calls, all values. It's great, but it's a very large amount of data. The -gline-tables-only setting allows you to collect information about function calls. This is more than enough to find the problem and fix it.

This is enabled in the Xcode settings (Build Settings), and is called the debug information level. For example, we enabled this setting and reduced the size of the SpeechKit binary file from 600 MB to 90 MB. This is not very necessary information and we just threw it out.

The second important thing is to hide private characters . You all know that every time you lay out your library in iTunes, you risk getting a new warning that you are using something wrong, you are not adding something. Therefore, if you use libraries that Apple considers private, don't forget to hide them. It means nothing to you, you can also work with them, but as soon as your users try to upload the application with your library to iTunes, they will get an error. Not everyone will ask you to fix it, most simply refuse to use your solution.

Avoid character conflicts.: add prefixes to everything that you have, to your classes, to categories. If the library has a UIColor + HEX category, be sure that your users have exactly the same category, and when they integrate your library, they will get character conflicts. And again, not everyone will want to tell you and tell about it.

Another question is when you yourself use third-party libraries in your library. There are a couple of nuances that are worth remembering. First, if you are using something that has appeared in a version older than your library, remember to use Weak Linking (Xcode -> Build Phases -> Link Binary With Libraries -> Status is enabled). This allows not to fall, if suddenly this library is not.

The Apple documentation describes in detail how it works. But weak linking does not mean that the library will not load if it is not used. That is, if it is important for your users to start the application and may not need that part of your library that uses a third-party library and takes time to start, weak linking will not help you. With it, the library still loads, is it used or not.

If you want to load at runtime, it will help to get rid of the linking problem at the start, then you need to use dlopen and dynamic loading. This requires a lot of fuss, and you must first understand whether it makes sense. Facebook has posted a rather interesting code for an example of how they dynamically link.

Last thing -try not to use global entities inside . There are some global components in each platform. It is advisable not to pull them into your library. This seems obvious because it is a global object, and users of your library can take it and configure it the way they want. You use it in your library, you need to somehow save its state, reconfigure, then restore the state. There are many nuances, and there is where to go wrong. Remember this and try to avoid.

For example, in SpeechKit, up to the third version inside the library, we were working with audio, and we clearly set up and activated the audio session. An audio session in iOS is such a thing that every application has - don't say that you don't have it. It is created at the start, is responsible for the interaction of the application and the system media daemon and says what your application wants to do with audio. This is a singleton object in the truest sense of the word. We calmly took it, set it up as we needed, but this led to the fact that users had minor problems like changing the volume of the sound. Another method of audio sessions, which is responsible for setting the settings, is quite long. It takes about 200 ms, and this is a noticeable slowdown on activation or deactivation.

In the third version, I happily rendered an audio session from the library. After that, almost all users of all services that have SpeechKit integrated have been told how terribly unhappy they are. Now we need to know that there is some kind of audio session that needs to be specifically configured for our SpeechKit.

The conclusion from this is this: anyway, try not to use global entities, but be prepared for the fact that your users will not always be happy with your solutions.

Making users comfortable

How else can you help your users?

Add logs: different levels, dynamic inclusion .

The easiest way is to attach a file, for which a mega debug mode is launched. It really helps to make debugging in a situation where your users have users who have an error, and you need to understand exactly what happened.

Support all OS versions of users.

Remember that when you talk about versioning in a library is not the same as versioning in a regular application. In a typical application, we look at the statistics that, for example, only 2% of our users use iOS 8, which means you can stop supporting iOS 8. In the library, this is not the case, giving up the OS version means giving up your user and all his users. This may be half of your users in principle.

Therefore, you need to monitor which versions are used by those applications that use your library, and based on this, you can already conclude whether you support something or not. We didn’t give up iOS 7 for a long time. It seems to me that there were already people who abandoned iOS 8 and were ready to abandon iOS 9. We still supported iOS 7, because we had a browser that kept all users to the last and we worked closely with him and could not leave him in such a situation.

Again, your users will not say: "Let us turn off this functionality on the version that does not support it" - no, they will simply remove your library and find the one that supports the whole range of versions.

Add a minimum increment in new versions.

This is very “not very” for library developers. In the release I want to release everything that's ready. You made the features, fixed the bugs - now we'll put the whole pack and roll it into release. Release is also a process. For your users, this is not the case. When they are in the process of testing their product and are preparing it for release, they do not want to get you to build with new features that need to be further tested.

We really had cases when we rolled back some releases, divided them into pieces and rolled them out into pieces. Then those teams for which we implemented the changes could take exactly the version in which there are small changes, but not all at once.

This is really not very convenient for development, but a minimal increment in versions will make your users a little bit happier.

Tests do not happen much

This is true for a regular application and for a library. But in the case of the library again there are features.

Self-tests , of course, are needed, but in addition to them it's great to have a test application for your library. It will help you integrate what you wrote yourself and understand what problems or pitfalls may arise. You can feel for yourself what your users are.

If your library somehow interacts with the network, includes encryption, there is at least something related to data and security, give it to the security checker. You absolutely do not want to be the library in which you find vulnerability - this is a stigma for life. Practically in all large companies there is a whole department which is engaged in checking products for safety - give them to them. If you do not have this, there is an external audit . If you cannot afford external search for tests online, run them, make sure that your library does not allow user data leaks.

The last thing that is very important in the tests - from the very beginning try to add measurements of everything you can : time, power consumption, everything that is typical for your particular library. You still have to do it in the end, so why not think about measurements from the very beginning.

This will not protect against changes and the need to speed up the library, but it will help to figure out what went wrong. If you have graphics, it will help to monitor in real time what kind of functionality has added time delays or increased power consumption.

There is almost never time for this, because it’s not the library’s functionality, it’s not what you develop it for. But this is what helps you maintain it in good condition and in good quality.

HereYou can read how we in Yandex measure the power consumption of mobile devices. There was a funny story about time measurements. As library developers, it is difficult for us to measure behavior in specific cases, because not all SpeechKit scripts are used by all teams. To track the time, we used our test application. Special cases of use were written, for example, a recliner or components for speech synthesis, logs were written and saved every step, and as a result, cool graphics were built.

Everything would be nothing, but we work with audio, and in order to check everything, in certain cases the audio track is actually played. Moreover, it is necessary to make a lot of measurements, so the test was left for the night: put the speakers, put some device next to it, and launched the audio files. In the morning everything was turned off, the next night it happened again, and then again. It was not at all some magical creatures who walked around the office - just the cleaners were frightened. There really was a very strange text that was read at intervals.

As a result, it was decided to make a local test bench, which we called the Cabinet. This is a natural wardrobe, only soundproof. There are a lot of devices in it, a whole farm with devices, each of which can be run many times during the working day, because it will not hurt anyone.

Launch

Finally we come up the last important part - this is the launch. The code is written, a good API is designed so that users are comfortable. Now how is it all release a release.

I will start with local releases for users inside Yandex. The scheme here is the same as in the development of a regular application: regular, monthly or weekly releases.

The process consists of the usual stages, but when developing the library, each of these items has its own peculiarities.

Planning

For me, this is the most painful part, because the library has several product teams. In a typical application, there is one product manager who sets the tasks that the team prioritizes and one by one begins to do.

If there are several product commands, then each of them receives requests that must be processed in real time. I will give advice: if there is a person who knows how to deal with a multitude that arrives at one moment, tasks, try to take him to your team. Because there must be someone between all external managers and the development - the one who will take over the functionality of prioritizing tasks.

The second important feature especially characteristic of SpeechKit arises when other internal commands are involved in the implementation of the task, for example, the backend. It should be borne in mind that there may be a delay, that something will not be ready on time, or that something is not completely ready. And it is you who should warn about users, because it is you, as a library, the entrance to technology. Users do not need to know that there are still n teams behind your back. They will not talk with them about the dates, they talk about this to the developers of the library. That you know about the timing and the importance of product features, you need to bring to the internal teams and customers of the functionality that there may be some overlap in time.

Development

Development, as in any application, usually begins as in a startup: we just work day and night, all these processes are not important to us. Then words about Agile-methodology are recalled and the construction of team work processes begins.

After we worked as a startup, we realized that there is a problem - unpredictability. No one could say exactly what features and when they will be launched. And it was very important!

Then we decided to try Scrum . He really helped, we began to plan a number of tasks, implement them, release them. That is, we sort of coped with the task of making releases predictable. I say “sort of like,” because we should not forget about the problem of several product teams.

Scrum did not last long, because we had planned, developed tasks, a sprint - you know all these words - but during the sprint there were food tasks and bugs. We tried to work with it. We even had a rule - not to take any tasks in the sprint, if this is not a bug in the production of some team. Guess how many times this rule worked? Approximately zero, because it is impossible to tell the other team: “Yes, you have regressed and you found a bug with us, but we will correct it when you are already in public with this bug.” It’s not possible, and we had to take it into work, and take some features that were high-priority at that moment, and Scrum broke down completely. The process was just terrifying! It happened that the entire scoop was redrawn by the middle of the sprint.

Now we have switched to a kind of kanban. There is a board with tasks that are set in order of priority, and we just take the upper tasks. On the one hand, we lost in the predictability of our release. Now we cannot say to the teams using our library that if the task hit the board, it will definitely be in the next release. But then we can say for sure that the most important tasks will fall into the release. Now it is more important for us.

Support

It is worth remembering that when you release a release it is not just one version that you sent and that someone took to use. Perhaps the changes you made in this release are also needed in other versions that are used by other teams. This is what I said about the minimum increment of versions. You can not tell your users: "We fixed the error in version 4, and you have version 3 - just go to the fourth." Sometimes it is possible to do this, but it is better not to abuse it. If there are any bugs or minor additions in the release, look at who has what versions and release the fixes for all versions that are currently being used.

From here follows the next point - all your releases should be fast.. Configure Continuous Integration so that you can really press one big red button and send to those versions you need, because there will be a lot of releases .

Prioritize it

A little bit about how we solved the problem of prioritizing tasks. I will highlight two types of tasks.

1. Grocery tasks.

Everything is clear here - first of all you need to look at the importance for the company . If Arkady had somehow come to us and asked for a killer feature to do for Yandex, we would, of course, have left everything and would have done it. Although he never did.

The release time of other commands is an important parameter for the priority of product tasks. If one feature is needed in a month, and another in a week, then it seems obvious what to do. But do not forget to warn the team that is waiting for the first feature that they have begun to do something more high priority.

2. Wishlist users.

The situation with users' wishfuls is a bit more complicated, because, as I said, your users are developers, they want to make your library better, they know how to make it better, they know you better how to make it better!

We acted as follows. First we looked at how useful other teams were . That is, if it is useful not only to the one who offers it, but also to someone else, then we undertake it.

Another very important and most holivarny question is whether the consistency of the library is preserved . That just did not try to draw into SpeechKit during this time. We defended as best we could, because we are a library that does something concrete. Do not try to do everything - remember this, even if it will make life easier for someone alone.

We look further, as far as it will simplify life of the user . If the work leads to the fact that the user instead of 4 lines of code causes 2, it seems that this is not quite the right approach to prioritization. If a huge canvas of code is replaced by one call, or it becomes possible to do something that could not be done before, then we take it on the board.

The last is how long to implement it . When a feature is interesting, it’s cool, but to make it a month, you need to carefully weigh everything.

Documentation. It. Seriously

Especially for the library, because it is used by someone who did not write this code. Therefore, be sure to add documentation in the code . It should be written in files so that people can open, read, see help and see how to use all this.

Add a quick start . We are all looking for libraries like this: we find something, take a piece of code from GitHub, insert it to ourselves, launch it. It works - hurray, it does not work - we are looking further. Having a quick start in the documentation will help to be closer to users, your library will be easier to integrate and understand what it can.

After that, give examples of usingso that you can understand how to do something more tricky and complex with your library, understand how to set up parameters, calls, etc.

Public release

The last important thing you should not forget when you release a public release:

Server. Be sure to warn the backend command so that the server can withstand the load and the increase in the number of users. If there is any specific inside information, do not give traffic outside, say this to your backend.
License . When we release our code outside, it may not be entirely correct to use it. If you have an open source code, add an OpenSource license, if not, contact a lawyer who will write a good license to protect you from possible claims.
Support. Remember that support will fall entirely on your shoulders. You will not have the first or second line, which will explain to users what to transfer to the function. This is all you do. My user support for SpeechKit sometimes takes more than half the working time.

Results

Do not allow users to break you, consider this in your API.
Write code that will make life easier for your users, and not only will perform the necessary functionality.
The release cycle should be adjusted to several commands.
Remember that your library will definitely change someone's life for the better :) You do the work for someone, your code can be reused.

Yandex.SpeachKit on GitHub for iOS , for Android , and Mobile SDK documentation .

AppsConf - the most useful mobile development conference - will become even more useful on April 22 and 23, 2019 , and now is the time to book a ticket, or get together and apply for a report.

What are the plans of the Program Committee for the April conference, I spoke recently. Asya, for example, promises to prepare a new exciting report.

Tags:

Library development: from API to public release

Content

Minute SpeechKit

Design

Basic principles

Case SpeechKit API

How v3 better v2

Development

Code not only for yourself

Making users comfortable

Tests do not happen much

Launch

Planning

Development

Support

Prioritize it

Documentation. It. Seriously

Public release

Results

Also popular now: