Where is this street, where is this house ... Or how to determine the area in which the enterprise is located
In the process of developing a new service, we have an interesting problem. It is necessary to determine the affiliation of the enterprise to any administrative-territorial or municipal formation and assign to the enterprise the district or okrug in which it is located. The presentation for the end user should be a filter in the search form, which would only allow organizations to be found in a given district or district of the city. And you need to do this for companies throughout Russia.
Introductory data: we had a certain accumulated base, numbering a fairly large number of organizations throughout Russia. The base included the addresses of enterprises, which were an ordinary string. Accordingly, there were no obvious ways to make territorial binding.
To implement such a seemingly simple task, I had to rack my brains pretty much. Initially, there was an idea to use Google-maps in order to outline the contours of areas through custom maps, and to get the coordinates of organizations through Yandex geocoder. But this idea turned out to be utopian - not everyone can draw maps of regions for the whole of Russia.
A suitable solution came to my mind suddenly - to use the ready-made bases of administrative-territorial divisions of KLADR . This database contains a complete list of settlements, streets and houses of Russia. Also, the KLADR database contains for each territorial unit the OKATO code (All-Russian Classifier of Administrative-Territorial Division Objects). It is worth noting that the OKATO base itself is not part of the KLADR and should bedownloaded separately .
So, the base by which you can determine the county or district is available. It remains to figure out how to match her available addresses. Data on houses in KLADR is stored in a rather specific way: information about a house can contain many different designations, such as building, structure, parity, which should also be taken into account when determining the area. So we need to analyze the available addresses. You can do this in two ways:
The first is the simplest and most unreliable : feed the address of existing companies to Yandex geocoder , which will take the address into pieces. But there is also a big minus of this method - if for some reason there is no such address in the geocoder database, then it will return the building closest to the specified location. Or maybe nothing at all ...
The second way is the way of the Jedi. Implement the address parser yourself. Since the accuracy of address determination was critical for our service, it was decided to implement the parser on our own and by means. The simplest implementation example is here . In the example, the address string is parsed into an array whose keys are the types of territorial units. In the above example, there is one “but”: the address should already be in the “correct” format. That is, for example, the house in the address must go after the street, and not in front of it.
Now that the address represents a more understandable structure, it can be compared with the existing KLADR database and get the OKATO code. The KLADR base itself does not give an idea of the territory belonging to any district or okrug. With it, you can determine the maximum OKATO code itself and also the zip code. And the OKATO base itself can give the right idea . It is in it that information is located on intra-city districts, districts of cities of republican, regional, regional subordination.
So, the scripts are written, the codes are mapped. As a result, such a functional appeared:
Implemented and described by zdanchik