FIAS or KLADR: select the address directory

    July 1, 2014, one of the most significant events in the history of the Russian state took place: from that moment in our country a reference database of addresses for all, even the smallest settlements, finally appeared! The name of this database is FIAS. Actually, the FIAS directory itself appeared much earlier, but it was July 1 that Federal Law 443 came into force , according to which all state and municipal structures should now rely on it as the only correct base of addresses. We decided to investigate whether it is worth switching to FIAS, and what pitfalls will be faced by those who decide to do this.

    After reading the article, you will learn:
    • What is the difference between FIAS and KLADR
    • Can FIAS be used instead of CLADR already
    • Should I worry about switching to FIAS for those who are already working with KLADR
    • Will FIAS solve current address problems
    • What awaits those who are just starting to use address directories
    • What are the most noticeable and important problems when working with FIAS and KLADR


    Why not CLADER?


    At the moment, the main address directory of Russia is KLADR. Why didn’t he arrange people, and where did the need for a new one come from?

    Initially, KLADR, most likely, was conceived as an understandable structured reference book containing up-to-date information on the addresses of all of Russia. Currently, unfortunately, this is far from the truth. There are many features in the recordings of KLADR, and now we will talk about the most interesting of them.

    Hell in house numbers or a programmer's nightmare

    The record of the house number and its extension (everything that comes after the number: building, structure, letter) in KLADR is stored on one line, separated by commas. At the same time, the general rules for the formation of the house part, described in the documentation, are not always applied in practice. So, if you decide to connect the KLADR up to the house, you will have to figure out what to do with the following notation:
    not to look nervous
    1kA, 1_A, 313, 2k1_A, 1p, 21_25, 5 / 34k1, 21/13 / a, 6vld2, 5 / 2vld2b, 42vld1_4, 21k5 / 2st2b, ​​2k6str__7, N (1-700), dvld14_14A, 5k3DVZE, .2, kGruzoruzhenie1, vld22 / 7construction3EST, construction of IDP_11 ...
    There are 6436 different types of recordings of the house part without taking into account the numbers.


    It seems that, due to the abundance of various spellings, even its creators are confused in the directory, since on the same street you can often find different valid records of the same house. For example, in the village of Novy (Krasnogorsk district of the Moscow region) in KLADR there is a record with house 8 and separately with dvld8. Theoretically, home ownership and a house are two different things, but in reality few people write “home ownership,” and we can safely assume that a dvd and just a house are one and the same.

    Theoretically, KLADR is a directory of addresses that everyone should rely on when compiling any directories with addresses, and therefore must keep a certain key to this database in order to be able to synchronize with the KLADR itself to receive updates. But the KLADR code - the only identifier in this directory - can vary from version to version for the same objects. Therefore, in other directories you will not find it as a key to the address base, everywhere only the address without any id is used to indicate the address. This is bad because the addresses in the directories may contain errors, be irrelevant or not exist at all, and to bring it to the CLADR, it will take a lot of effort (or use the dadata.ru service ).

    Where is this street, where is this lane?

    In KLADR, the address record is divided into levels (region, district, city, settlement and street), and for each level there is a type and a name. For example, a type is an autonomous okrug, a name is Yamalo-Nenets ... Unfortunately, it is not always possible to determine exactly what a name is and what a type is. And it is not always clear what the CLADR problem is, and what is actually called that.

    For example, you can find such addresses:
    And again: do not watch the faint of heart
    Type: “Autonomous Okrug”

    Name: “Khanty-Mansiysk Autonomous Okrug - Yugra”
    According to KLADR, the address should be spelled like this: Russia, Autonomous Okrug Khanty-Mansiysk Autonomous Okrug - Yugra, ...

    Type: “Chuvashia”
    Name: “Chuvash Republic -”
    Yes Yes, right, with a hyphen at the end. And the type is excellent.

    Type: "Street"
    Name: "QUARTER NEW CHEREMUSHA 32A"
    We regularly receive wonderful addresses for the look: Moscow quarter Novye Cheryomushki 32A k8, apartment xxx - note that, according to KLADR, the house number is in the street name and the street type is not “Quarter” and “street”.

    Type: “Lane”
    Name: “Ul. Soviet "
    In the village of Dosotuy in the Chita region, there is Sovetskaya Street and Lane Ul. Soviet ". Therefore, the addresses Dosotui st. Sovetskaya and Dosotuy lane st. Soviet - different addresses


    Leo or Tolstoy?

    There are a lot of mistakes in KLADR. Five-character indexes, duplicate records of houses with double numbering and more.

    Here are some of them in more detail:
    • There are no streets or even settlements. It is especially pronounced for small settlements with a population of less than 10 thousand people. For example, Hospitalnaya Street in the town of Monino, Shchelkovo District, Moscow Region is on the maps, but is not in the KLADR.
    • Doubles. According to KLADR, in Moscow there are two different streets on March 8, which, judging by the index, are very far from each other, and they have identical houses. And there are also two Shosseynye streets, one of which is renamed Nikolay Sirotkin Street, which are also two. My special love is the city of Korenovsk, Krasnodar Territory: according to KLADR, they love Tolstoy very much, since there is Tolstoy Street, Leo Tolstoy Street and Leo Tolstoy Lane.
    • Slow updating of the directory. The Universiade Village, built last year for the Universiade in Kazan, appeared there six months after the completion of construction, despite the fact that KLADR is updated every week.
    • Frequent absence or unreasonable inheritance of indexes from higher levels. A house in a subordinate settlement may have a post office index in the parent city, which is very distant from this house. For example, Lenin Street in Neftekamsk in Bashkortostan has an index of 450,000, that is, the main post office in Ufa. When working with indexes in such cases, we have our own know-how. We return two indexes: the KLADR index for reporting to various structures and the index for delivering correspondence so that the letter does reach the address.


    Probably, the reason for the errors is that local authorities are responsible for the current state of the directory, and, possibly, the information entered is not verified in any way. Be that as it may, the problems of the directory are compounded by the lack of support: we wrote letters to the Federal Tax Service several times indicating errors, but none of them were fixed.

    So if the address is in KLADR, then it is not a fact that it exists in real life, and vice versa.

    What's up with FIAS


    Let's see what FIAS is and whether it solves the problems of KLADR.

    Data and structure

    The first thing you pay attention to when working with FIAS is more information than in KLADR. But useful information was not added as much as we would like. I highlighted the most significant address information in the form of a comparative table below.

    Field
    CLADER
    FIAS
    Regions and cities of federal significance
    +
    +
    Areas
    +
    +
    Cities and rural districts
    +
    +
    City districts
    - -
    Streets
    +
    +
    Homes and Extensions
    +
    +
    Index
    +
    +
    Center Status
    +
    +
    Action status (what happened to the object: renamed, reassigned, ..)
    + (conditionally encoded in the KLADR code, but very poor decryption of codes)
    +
    Relevance status
    +
    +
    Record start and end date
    - +
    The condition of the house (does it require repair, how much)
    - + (but the relevance of the data is in doubt, since more than 95% of the houses have the same status)
    Coordinates of the object
    - -
    Apartment data (list, quantity or range)
    - -
    Population (at any level)
    - -
    Sign of a monotown
    - -
    Unique ID for every home
    - +
    Purpose of the building (residential / non-residential)
    - -
    Storeys, year of commissioning, material of the walls of the house
    - -

    Thus, only the fixed house ID can be distinguished from the useful, which is supposed to never change and can serve as a key for external systems, as well as the start and end dates of the recording. Otherwise, all new information consists of identifiers that periodically duplicate each other or are part of others.

    Home Information Quality

    FIAS has two tables for houses. The data structure in itself is very pleasing: there is a field for everything.

    The first table, HOUSE, contains house numbers, and for each there is the following information:

    • House number
    • Sign of ownership (ownership, home, home ownership, land)
    • Case number
    • Building number
    • Sign of a building (building, structure, letter)
    • Index
    • Condition of the house


    What are the main differences from the table of houses in KLADR?

    Pros:
    • Structured information about the house number and its extension. Entries of the form dvld12str1 are returned to their normal form.
    • Entries of the form 11_13 are given to 11-13. In KLADR, according to the documentation, the intervals of houses are recorded through a hyphen (many houses in one record), so the hyphen in the house numbers had to be replaced with a lower underscore. FIAS does not have this problem. One line - one house.
    • Fixed ID for each home.


    Minuses:
    • The problem of houses with double numbering has not been solved: for them there may be a different number of unrelated records, and the house number may contain both records with a fraction of the form 6/9/20, and just numbers. For example, in Kazan there are addresses Kremlyovskaya 23 , Kremlyovskaya 23/17 and Kavi Najmi 17/23 , which designate the same building
    • Often there are cases when the house number contains a letter (in theory, it should go to the corresponding FIAS field - building number). Still sometimes there are records like “38 / 1UCH”.
    • There are overt mistakes in house numbers, for example, “08a” and “0p”. (Update: I managed to find out that the house 08A really exists, thanks bay73 )
    • House duplicates, houses that do not exist in real life, missing records for existing houses, lack of indexes and their incorrect inheritance - nothing improved compared to KLADR.


    The second table with houses, HOUSEINT, contains the intervals of the houses. In KLADR, the table of houses contains records of the type H (1-999), which means all odd houses from interval 1 - 999. In FIAS, they are divided into fields: the beginning of the interval, the end, and its sign. Unfortunately, the contents of this table are as far from the truth as in KLADR: for example, in Kirov there is an unbelievably long Schorsa street, on which there are all houses in the range from 1 to 9999.

    Quality of everything else

    Let's look a little higher - at addressable objects right up to the street. They are in the ADDROBJ table.

    Pros:
    • GSK, SNT and other objects of this kind are taken out along with their subordinate streets to separate levels. In KLADR they were at the level of settlements, which created confusion.
    • A LANDMARK table was added, in which it is written in a free form how to find the address (for example, “0.8 km north-east of the village of Lopatino” or “MKAD, 84th km”). And although the table is still small, it seemed to me very promising, especially if I gave it to the open source for replenishment.


    Minuses:
    • Problems with the regions have not yet been resolved: in the Chuvash Republic the type is Chuvashia, in the Khanty-Mansiysk Autonomous Okrug the type is in the name, etc. Of course, according to the documents it should be so, but it seems to me that one of the main tasks of the address directory is to bring the addresses to a standard form.
    • FIAS has a separate level for autonomous districts, but there are no objects on it, and all autonomous districts are on the 1st. Probably, this field is planned to be used for something else.
    • FIAS contains all the same addresses as KLADR, with all its errors.


    Format

    FIAS is available in three forms: KLADR format, dbf and xml. The latter seemed to me the most convenient - the files are not divided by region, unlike dbf, but are stored in a bundled form in xml. However, the weight of the source directory in this format is about 14GB.

    FIAS in dbf format weighs 9GB instead of 14GB, but it does not have a very convenient structure: tables of houses and regulatory documents are divided by region, and as a result, FIAS in this representation contains 187 files.

    FIAS in the format of KLADR in essence and content is the same as the KLADR itself, with rare exceptions, and it weighs the same 330MB. A line-by-line comparison of the KLADR and FIAS tables in the KLADR format revealed less than 0.1% of discrepancies, which are probably caused by different loading times of the considered KLADR and FIAS databases.

    What does business think

    How can switching from CLADR to FIAS affect the work? Is the business ready to switch to this directory?

    Our colleagues from banks, for whom the use of address information is key at all stages, do not see the business advantages of switching to FIAS, but plan to do this in order to meet the requirements of the regulator. Due to the transition of all federal agencies, ministries and departments to FIAS in the future, there may be requirements to use FIAS when communicating with them (government services, SMEV, reporting, Central Bank).

    conclusions


    The biggest problem of official directories in Russia was and remains the irrelevance of the information provided. Until there is a normal, well-established system for replenishing FIAS, data quality is not checked, and what is already in the directory is not refactored, we will encounter all the same problems as in KLADR.

    The main advantages of FIAS are initial attempts to standardize addresses and the presence of a stable key for each house.

    In summary:
    • FIAS contains more useful information than KLADR: ID, date of the beginning and end of the record, details of the status of actions on the object.
    • FIAS is better organized for the presentation of information: information about the house part is divided into components, a level has been added for additional territories and their streets.
    • FIAS files are much heavier than KLADR in its original form: 9GB instead of 300MB.
    • KLADR and FIAS in terms of content and relevance of addresses are one and the same by more than 99.9%.
    • Connecting FIAS and KLADR to the application from scratch is approximately equivalent in complexity: in the case of KLADR you have to deal with the house part, and with FIAS you can safely remove it to shrink the weight of the directory. In both cases, it will be necessary to understand the quality of the data, which will take the most time.
    • In the future, the FIAS identifier may be required when working with external systems: public services, SMEV, reporting, and the Central Bank.


    So, while the transition to FIAS only makes sense as a way forward. If you are already working with CLADR and do not interact with external systems, then you can not switch to FIAS, but use CLADR further. If you are just starting your acquaintance with addresses and planning to connect them to your product, or you need reporting and integration, then you should choose FIAS.

    PS: All the information in the article is relevant for the version of KLADR 07/03/2014 and the FIAS version 06/30/2014

    Also popular now: