Hello Logify, or monitor errors on installed applications

    As you know, there are no programs without errors, and there are many tools and approaches designed to improve the quality of released applications, from unit tests to code analyzers. However, even if you use all of them at the same time, no one can guarantee that your applications are free from any errors. And if the problems that arise during development and testing are visible to us right away, well, or almost immediately, and we have the opportunity to get detailed information about what happened and quickly fix it, then post-release errors that occur on the user side are more insidious.

    Most importantly, most likely, they simply will not inform you about them. How often did you send errors to Microsoft when you were asked about this? :) Users, as a rule, either simply restart the application, swearand continue to use further, or delete it completely. If you are lucky, and you will be informed about the fall, then often it looks something like this:

    and this makes absolutely no clarity in understanding the problem. As a result, users form a negative experience from using your program, and you have no way to do something about it.

    Well, since there is no hope for users, you will have to take matters into your own hands. To begin with, we know that we can handle all exceptional situations in our programs. For example, in .NET we can hang a global handler for all executions. The simplest thing that can be done next is to write to the log all the actions that fly to us in this handler. This is at least something.

    However, just the log will not send itself, and we have to wait for a message from the user to find out about the problem and ask him to send this log, and this is an extra waste of time, and indeed, the user may not write at all. Alternatively, we can send our log by e-mail, but it’s scary to imagine what our mailbox will turn into if we receive a sufficiently large number of notifications. And this is quite real on large projects. Anyway, sorting out what is written in a text log file of 100,500 lines is not a very pleasant occupation.

    As a result, we come to the conclusion that the most appropriate solution is a service that will spin in the cloud and receive information about errors that have occurred, process it, structure it and show us through the web interface in a convenient form. Well, why do not we write such a service for ourselves?

    Initially, we made the simplest collector of crashes in our demos in order to receive data on these crashes, and based on them to fix problems in our components, thereby improving their quality. It was just the count stack and the name of the demo module, however, we quickly realized that this data was not enough, and began to expand it. As a result, at the moment we are collecting very detailed information about the environment where the error occurred, starting with the list of loaded libraries and the studio and framework version, and ending with the OS version and the current installed culture. Moreover, if we need, we can always add any additional information through the CustomData mechanism. For example, you can look at the test report .

    When there was enough information to find problem areas, we were faced with the fact that sometimes there are a lot of reports. After all, if the problem is trivial enough, then it is likely that many users will come across it. And in this situation, we get a report from each of them. As a result, we have a bunch of reports that, if not identical, are extremely similar:

    It’s clear that working with all this, to put it mildly, is difficult, plus, among all these identical reports, unique ones can also be lost that alert about other problems. To get rid of this, we implemented a search for duplicates in a specific set of key fields and collapsing them into one report:

    Yes, that’s already much better.

    However, in addition to duplicates, there were quite a few reports in the inbox that were simply not interesting to us, for example, they could be reports about already fixed problems. This prompted us to the task of automatically ignoring incoming reports under certain conditions, and we were given a lot of conditions: so that according to the version it was possible to ignore, and so that by keywords in the tie, and so that by certain lines of the tie, and that the error without ours components in the tie did not come to us, etc., etc. As a result, we identified 4 main types of ignore rules, and this covered all our tasks. In general, this topic deserves a separate article, so here I will provide a link to the documentation so that you can make a general impression about this mechanism: Ignoring Filters

    In the process of working with the service, we found another unusual application for him: he can perfectly help the support department. Often users write that they have some kind of problem, but it is reproduced only on their user's machine, and there is no way to give or receive some detailed information about the error itself. Or, for example, errors occur on programs that are limited in rights and cannot write a log file. In all these situations, Logify successfully helps us, because collecting data about crashes on deployed applications is its main task.

    As a result, Logify has been integrated into many of our products and is now delivering tangible benefits. In the process of development, Logify has clearly ceased to be just a project for internal use, and we decided to release it as a separate service. It was combed and presented to the public with a redesigned UI, first in beta form, and at the moment it is already fully launched in production. So who cares, I invite you to try: Logify . Customers are available on GitHub .

    Also popular now: