How not to do federal information systems
This article will be interesting to a narrow circle of Habr readers - developers of federal information systems and a wide circle - those who have already had to with these systems, have or will have to interact in the future.
The narration will be conducted on the example of the FIS GIA and reception (this name was assigned by D. Medvedev on August 31, 2013, the previous year and a half the system was known under the name given by V. Putin - FIS Unified State Examination and Admission).

The federal information system for ensuring the state final certification of students and admitting citizens to educational organizations ( full name is here ) is a system created in the interests of Rosobrnadzor where universities and colleges are required to enter information on the progress of the admission campaign for 3 years, including personal data of all applicants . Namely, before the start of admission, the number of places, lists of entrance examinations, allowed benefits, and during - data on applications of applicants, including name and passport details, are transferred in a certain format almost in real time.
Theoretically, the supervisory authority thus checks whether the reception procedure approved by the Ministry is being violated somewhere. In practice, punishments so far have occurred only for non-transmission of data to the system.
All institutions of higher and secondary vocational education are required to daily transmit information on the progress of admission to the FIS. For this, there is provided a web interface for inputting and viewing data, as well as an automated interaction service for batch transmission in XML format. Theoretically, everything is beautiful, but there are thick nuances. The first is the speed of interaction: in manual mode it takes up to 20 minutes to enter one application during peak hours, and in automated mode, packages can wait for days in line for processing. The second one is errors in the operation of the software, which cause discrepancies in the data. But first things first.
In the subject area of the admission campaign, one applicant can submit up to three applications to the university. In the case of data transfer in XML, various entities are represented as elements of a tree. How would you solve the problem of submitting applications in XML? Obviously, the applications are tied to the applicant and it is logical to place information about the identity of the applicant in the element of a higher level, and about the applications submitted by him - in the nested elements. However, FIS developers did the opposite: information about the applicant is repeated in each application and may even contradict each other, and then several lines appear in the web interface with the same name, but different, for example, passport data. Moreover, links from all such lines lead to the same card, which displays only one random passport from several conflicting ones.
Another wonderful thing. It is clear that in XML only data is exchanged, and the internal representation in the system is still relational and is stored in a decent DBMS. And then a very good idea arises - to add to the exchange protocol primary entity keys used in university systems that are data sources. After all, this should simplify the identification of new entities and updating of old ones. But should we proceed from the fact that all client systems have a similar data model and is it relational in them at all? Surely, university programmers of data exchange clients will find it amusing to encounter the need to generate a unique and unchanging identifier where they were not born, or to come across an error when the identifier of the certificate and, for example, the diploma of the Olympiad cannot coincide (apparently,
Competent documentation when publishing in a production system, with which hundreds and thousands of smaller motley systems should fit together is the key to success. It is sad that the FIS developers could not solve this problem even in 3 years, although there is still some progress. The published XML structure as a PDF document and an XSD schema is certainly necessary. But it’s important to at least verify that the XSD is, firstly, valid, and secondly, that it does not conflict with the reference XML document. Otherwise, hundreds of third-party developers will fix clumsy regexs and annoying length = "50" instead of maxLength = "50" instead of those to whom it should be.
In addition, a formal description of the exchange protocol is categorically insufficient, because in the case of a complex data structure, the system will not accept any valid packet, but only one that satisfies a number of additional checks for adequacy. One example with foreign keys is given above.
In general, not to cross the thin line between the necessary checks and excessive restrictions that do not miss the correct data - almost how to stay on the knife blade. And the main thing here is a thorough understanding of the subject area before starting development. In particular, FIS developers this year began to cut the acceptance of applications for zero quotas, which is quite acceptable in some cases. When the goal is to collect information, it is better to allow the loading of seemingly incorrect data for subsequent analysis, and to cut off only deliberately incomplete ones.
FIS in particular and, I suspect, state systems in general, are a wonderful example of unstable “partners” for honing interaction skills with remote systems, when absolutely everything should be checked. For example, XML was sent in an HTTP request and another XML is expected in response, but:
1. The network connection may simply break.
2. A timeout may occur and, by the way, it is better to make it reasonable in advance, since otherwise the wait for an answer may stretch for hours.
3. The answer may come not XML at all, but anything.
4. XML may arrive that does not conform to the scheme declared by the developer.
5. XML will come, but the data in it will be inconsistent. Example - in the request 100 objects were sent for import, in the response the number of successfully imported and the list of unloaded due to errors is expected. In fact, the answer is only 83 objects, and where to look for the remaining 17 and which ones are eventually loaded at all remains a mystery.
Theoretically, all the described situations are commonplace, but not in any system they all occur regularly with high probability.
For those who have read up to this paragraph - the most interesting. FIS GIA and admission is located in a closed network of the Federal Testing Center, to which universities are connected via ViPNET VPN clients. In addition, for a decent amount of money, a unique solution of a little-known monopolist firm is imposed to filter data on the client side, "so as not to pump out excess of millions from the system with personal data". There is no explanation why this filtering should be done for each client, and not only once on the server side. By indirect indications, this unique solution is just a proxy server that filters valid URLs when working with the FIS server.
However recently inquiring mindsnoticed that if in viewing the results of importing packages in the web interface you accidentally (or intentionally) specify a different package identifier, it will open! And it will not only open, but also allow you to download an XML file with all the data of all applicants, including passports, data on previous education, information about benefits, including medical, etc. Thus, any user with access to the FIS has the opportunity to obtain, by a simple exhaustive search, the data of a significant part of applicants for the past 3 years.

In conclusion, some commonplace suggests itself, but since thousands more thousands of IT people still have to deal with this FIS and its like, I think you can write them.
For those who have to interact with insufficiently thought out, poorly documented specific information systems made by state order - be prepared for everything and immediately put all conceivable and inconceivable errors into the exchange algorithm, you cannot trust in anything. Even in conditions of lack of time for development, it is better to lay the maximum checks.
Good luck to you!
The narration will be conducted on the example of the FIS GIA and reception (this name was assigned by D. Medvedev on August 31, 2013, the previous year and a half the system was known under the name given by V. Putin - FIS Unified State Examination and Admission).

What is it all and who needs it?
The federal information system for ensuring the state final certification of students and admitting citizens to educational organizations ( full name is here ) is a system created in the interests of Rosobrnadzor where universities and colleges are required to enter information on the progress of the admission campaign for 3 years, including personal data of all applicants . Namely, before the start of admission, the number of places, lists of entrance examinations, allowed benefits, and during - data on applications of applicants, including name and passport details, are transferred in a certain format almost in real time.
Theoretically, the supervisory authority thus checks whether the reception procedure approved by the Ministry is being violated somewhere. In practice, punishments so far have occurred only for non-transmission of data to the system.
Interaction of educational institutions with FIS
All institutions of higher and secondary vocational education are required to daily transmit information on the progress of admission to the FIS. For this, there is provided a web interface for inputting and viewing data, as well as an automated interaction service for batch transmission in XML format. Theoretically, everything is beautiful, but there are thick nuances. The first is the speed of interaction: in manual mode it takes up to 20 minutes to enter one application during peak hours, and in automated mode, packages can wait for days in line for processing. The second one is errors in the operation of the software, which cause discrepancies in the data. But first things first.
Data Model Design
In the subject area of the admission campaign, one applicant can submit up to three applications to the university. In the case of data transfer in XML, various entities are represented as elements of a tree. How would you solve the problem of submitting applications in XML? Obviously, the applications are tied to the applicant and it is logical to place information about the identity of the applicant in the element of a higher level, and about the applications submitted by him - in the nested elements. However, FIS developers did the opposite: information about the applicant is repeated in each application and may even contradict each other, and then several lines appear in the web interface with the same name, but different, for example, passport data. Moreover, links from all such lines lead to the same card, which displays only one random passport from several conflicting ones.
Another wonderful thing. It is clear that in XML only data is exchanged, and the internal representation in the system is still relational and is stored in a decent DBMS. And then a very good idea arises - to add to the exchange protocol primary entity keys used in university systems that are data sources. After all, this should simplify the identification of new entities and updating of old ones. But should we proceed from the fact that all client systems have a similar data model and is it relational in them at all? Surely, university programmers of data exchange clients will find it amusing to encounter the need to generate a unique and unchanging identifier where they were not born, or to come across an error when the identifier of the certificate and, for example, the diploma of the Olympiad cannot coincide (apparently,
Documentation
Competent documentation when publishing in a production system, with which hundreds and thousands of smaller motley systems should fit together is the key to success. It is sad that the FIS developers could not solve this problem even in 3 years, although there is still some progress. The published XML structure as a PDF document and an XSD schema is certainly necessary. But it’s important to at least verify that the XSD is, firstly, valid, and secondly, that it does not conflict with the reference XML document. Otherwise, hundreds of third-party developers will fix clumsy regexs and annoying length = "50" instead of maxLength = "50" instead of those to whom it should be.
In addition, a formal description of the exchange protocol is categorically insufficient, because in the case of a complex data structure, the system will not accept any valid packet, but only one that satisfies a number of additional checks for adequacy. One example with foreign keys is given above.
Limitations and checks when interacting with external clients
In general, not to cross the thin line between the necessary checks and excessive restrictions that do not miss the correct data - almost how to stay on the knife blade. And the main thing here is a thorough understanding of the subject area before starting development. In particular, FIS developers this year began to cut the acceptance of applications for zero quotas, which is quite acceptable in some cases. When the goal is to collect information, it is better to allow the loading of seemingly incorrect data for subsequent analysis, and to cut off only deliberately incomplete ones.
Errors in the system and recommendations to developers of interfaced systems
FIS in particular and, I suspect, state systems in general, are a wonderful example of unstable “partners” for honing interaction skills with remote systems, when absolutely everything should be checked. For example, XML was sent in an HTTP request and another XML is expected in response, but:
1. The network connection may simply break.
2. A timeout may occur and, by the way, it is better to make it reasonable in advance, since otherwise the wait for an answer may stretch for hours.
3. The answer may come not XML at all, but anything.
4. XML may arrive that does not conform to the scheme declared by the developer.
5. XML will come, but the data in it will be inconsistent. Example - in the request 100 objects were sent for import, in the response the number of successfully imported and the list of unloaded due to errors is expected. In fact, the answer is only 83 objects, and where to look for the remaining 17 and which ones are eventually loaded at all remains a mystery.
Theoretically, all the described situations are commonplace, but not in any system they all occur regularly with high probability.
Organization of connection to the system and PD protection
For those who have read up to this paragraph - the most interesting. FIS GIA and admission is located in a closed network of the Federal Testing Center, to which universities are connected via ViPNET VPN clients. In addition, for a decent amount of money, a unique solution of a little-known monopolist firm is imposed to filter data on the client side, "so as not to pump out excess of millions from the system with personal data". There is no explanation why this filtering should be done for each client, and not only once on the server side. By indirect indications, this unique solution is just a proxy server that filters valid URLs when working with the FIS server.
However recently inquiring mindsnoticed that if in viewing the results of importing packages in the web interface you accidentally (or intentionally) specify a different package identifier, it will open! And it will not only open, but also allow you to download an XML file with all the data of all applicants, including passports, data on previous education, information about benefits, including medical, etc. Thus, any user with access to the FIS has the opportunity to obtain, by a simple exhaustive search, the data of a significant part of applicants for the past 3 years.

Summary
In conclusion, some commonplace suggests itself, but since thousands more thousands of IT people still have to deal with this FIS and its like, I think you can write them.
For those who have to interact with insufficiently thought out, poorly documented specific information systems made by state order - be prepared for everything and immediately put all conceivable and inconceivable errors into the exchange algorithm, you cannot trust in anything. Even in conditions of lack of time for development, it is better to lay the maximum checks.
Good luck to you!