Creating a directory of address information with blackjack and API

Part 1. Tragic. “Why do I need all this ?!”


We somehow faced the need to enter the correct information about the location (registration, registration) of users, and the fact that this problem is not solved as easily and simply as we would like. At first we tried KLADR, in its free hypostasis. Not that we absolutely didn’t like everything decisively, but there was at least one very annoying circumstance - some addresses were missing from the directory. For example, there is a house 10, but 10k1 - sorry, they did not deliver. In general, KLADR was attractive in that it has a simple API and plug-ins (jQuery in particular) that can be easily integrated into the application, but repelled by content. We thought - if there is no such resource containing the full and most relevant address information, with APIs and plugins, then the only way out is to create such a resource yourself.

Part 2. Searching. “Where is the dog buried ?!”


And we heard about FIAS. About how he is infinitely full and beautiful. And this is the way out! There are bases on the FIAS website, and all you need is a young growing information system! True, the size of the base is more than 4 Gb, but oh well, this is all of Russia! The database is updated regularly, so there is where to roam. The point is small - expand the database, fasten the API and plugins. It should be noted that there are several relevant and important articles. For example, a series of these articles , which initially helped a lot.

Part 3. “How it works”


The problems began with the fact that the FIAS database is delivered in .dbf format, the data from which had to be somehow transferred to PostgreSQL, which it was decided to use as a database.

It should be noted that in addition to the .dbf format, the database is presented in xml format. As you know, a person wants everything at once. To turn it on and working. But to understand immediately what FIAS provides is quite problematic.

So, at fias.nalog.ru in the updates section there is the following:

  • Information about updates: according to official data, they are published at least once a week. The specifics are as follows. Sometimes broken updates come out. Their size is approximately 1.2 Mb and a broken archive is given when downloading, so rushing to tune in only for the latest update raises questions.

  • Updates in the form of fias_delta_. - in .dbf and .xml formats.
  • Actually the fias_ database. - in .dbf and .xml formats.
  • Database in the format KLADR. The name of the files is “Base. *” Arj or 7z. It is distinguished by the fact that many fields are missing (regarding tax data, etc.), the search is performed using the KLADRovsky id. Initially, there is a temptation to use this particular format, since it is easier to understand and implement later. But with a closer examination of the portal, it turns out that this version will be supported only until the end of 2017, and then will be deprecated.

Note: to work with .dbf files, you need to install a third-party extension for php_dbase.dll.

So what kind of version to choose for downloading .xml or .dbf? The differences are as follows:

  • xml - each file in the archive is a separate table. What is the problem? For example, the size of the “HOUSEXX” file where they are stored at home is more than 20 Gb in unzipped form. How to parse it? On the one hand, if you have more than 30 Gb RAM, then this will not be a problem. Also, if you are a connoisseur of competent libraries, such as SAX Parser for Java, which do not load the entire file into memory, but parse in parts, then something may work.

  • dbf - files are not only divided as tables, but also divided by region. It will turn out to be a more acceptable option if you do not need the addresses of the vast Motherland, but only one region (as in our particular case).

So, unpacked the files. What information do we need? We see the following set of files:

  • ADDROBXX.DBF, where XX - the region number - contains data directly about the region, autonomous okrugs, cities and other settlements, streets.
  • HOUSEХХ.DBF, where ХХ - region number - contains information about house numbers.
  • NORDOCXX.DBF, where XX is the region number contains information about the reasons for the changes in the various records.
  • ROOMXX.DBF, where XX - region number - contains information about the premises.
  • STEADXX.DBF, where XX is the region number contains information about land plots.
  • SOCRBASE.DBF - Contains information about abbreviations.
  • STRSTAT.DBF - Contains information about the type of building.

In addition to these tables, there are a number of others - service tables that contain information about abbreviations in other tables.

In most cases, it is enough to form an address right up to the house. Although, if anyone needs it, then you can go further.

Thus, we will create 2 tables in the postgresql database.

Table with addresses:

CREATE TABLE addrs
(
    "ACTSTATUS" integer,
    "AOGUID" character varying(36) COLLATE pg_catalog."default",
    "AOID" character varying(36) COLLATE pg_catalog."default",
    "AOLEVEL" integer,
    "AREACODE" integer,
	"AUTOCODE" integer,
    "CENTSTATUS" integer,
    "CITYCODE" integer,
	"CODE" character varying(20) COLLATE pg_catalog."default",
	"CURRSTATUS" integer,
	"ENDDATE" timestamp,
    "FORMALNAME" character varying(120) COLLATE pg_catalog."default",
	"IFNSFL" integer,
	"IFNSUL" integer,
	"NEXTID" character varying(36) COLLATE pg_catalog."default",
    "OFFNAME" character varying(120) COLLATE pg_catalog."default",
	"OKATO" VARCHAR(11),
	"OKTMO" VARCHAR(11),
	"OPERSTATUS" integer,
	"PARENTGUID" character varying(36) COLLATE pg_catalog."default",
	"PLACECODE" integer,
	"PLAINCODE" character varying(20) COLLATE pg_catalog."default",
    "POSTALCODE" integer,
	"PREVID" character varying(36) COLLATE pg_catalog."default",
	"REGIONCODE" integer,
    "SHORTNAME" character varying(15) COLLATE pg_catalog."default",
	"STARTDATE" timestamp,
	"STREETCODE" integer,
	"TERRIFNSFL" integer,
	"TERRIFNSUL" integer,
	"UPDATEDATE" timestamp,
	"CTARCODE" integer,
	"EXTRCODE" integer,
	"SEXTCODE" integer,
	"LIVESTATUS" integer,
    "NORMDOC" character varying(36) COLLATE pg_catalog."default",
	"PLANCODE" integer,
	"CADNUM" integer,
	"DIVTYPE" integer
)
WITH (
    OIDS = FALSE
)
TABLESPACE pg_default;
ALTER TABLE address
    OWNER to postgres;

Table with house numbers:

CREATE TABLE hous
(
    "AOGUID" character varying(36) COLLATE pg_catalog."default",
    "BUILDNUM" character varying(10) COLLATE pg_catalog."default",
	"ENDDATE" timestamp,
	"ESTSTATUS" integer,
    "HOUSEGUID" character varying(36) COLLATE pg_catalog."default",
    "HOUSEID" character varying(36) COLLATE pg_catalog."default",
    "HOUSENUM" character varying(15) COLLATE pg_catalog."default",
	"STATSTATUS" integer,
	"IFNSFL" integer,
	"IFNSUL" integer,
	"OKATO" VARCHAR(11),
	"OKTMO" VARCHAR(11),
	"POSTALCODE" integer,
	"STARTDATE" timestamp,
	"STRUCNUM" VARCHAR(15),
	"STRSTATUS" integer,
	"TERRIFNSFL" integer,
	"TERRIFNSUL" integer,
	"UPDATEDATE" timestamp,
	"NORMDOC" character varying(36) COLLATE pg_catalog."default",
	"COUNTER" integer,
	"CADNUM" VARCHAR(50),
	"DIVTYPE" integer
)
WITH (
    OIDS = FALSE
)
TABLESPACE pg_default;

Importing data is a simple way. Open the files in Excel and save them as csv. In addition, it is recommended to change the encoding, because unlike xml files that are presented in utf-8 encoding, dbf files are in win-866 encoding. Open the files in the editor (notepad ++ is suitable for this purpose) and convert it to utf-8.

Import table with addresses:

COPY addrs FROM 'PathToTheFile\ADDROB01.csv' DELIMITER ';' CSV;

Import table with houses:

COPY addrs FROM 'PathToTheFile\HOUSE30.csv' DELIMITER ';' CSV;

What is the ADDROBXX table made of?

Despite the abundance of fields, only a few of them are needed.

  • ACTSTATUS - the field contains the values ​​0 and 1. Upon careful examination, you will find that the table has many streets with the same name. The fact is that the tax base contains a history of objects, i.e. changes that occurred to any address elements. For example, Lenin streets can be 10 pieces. Of these, 9 will be irrelevant. They could be renamed or changed in another way. ACTSTATUS - 0 - irrelevant addresses, and 1 - will be the only current one.
  • AOGUID - record ID. For different historical records there will be the same AOGUID. They will differ in the AOID field.
  • AOLEVEL - object level. Number 1 is the subject of the federation, 4 is the city, 7 is the street, etc.
  • CENTSTATUS - center status. For example, cities in the regional center.
  • FORMALNAME - the name of the object.
  • PARENTGUID - ID of the parent element. When searching for the city to which the street belongs. PARENTGUID street will match the AOGUID street.
  • POSTALCODE - Postcode.
  • REGIONCODE - code of the subject of the federation.

Note: on fias-nalog.ru you can find a full description of all fields.

Important in the HOUSEXX table:

  • AOGUID - Street ID in ADDROBXX where the house is located.
  • HOUSEGUID - house ID.
  • The difficulty lies in the name of the house. It consists of 4 fields:
  • BUILDNUM - case number.
  • HOUSENUM - house number.
  • STRUCNUM - building number.
  • STRSTATUS is a sign of a structure (from 0 to 4, where 0 is none, 1 is a structure, 2 is a structure, 3 is a letter).

Fields can contain similar or identical information and it is necessary to get out of the situation experimentally. There is a method in the QuerryController class (link to the repository below): chooseBuilding, in which we tried to solve this problem. Maybe someone will do better. Database queries are in the same place.

Further, after the database was imported into Postgres, we started creating an API and a plug-in for our system.

For API, in order not to cast out, used Laravel. The query scheme turned out to be quite simple. The hierarchy of objects is as follows:

  • Subject of the federation
  • Region of the subject of the federation
  • Locality
  • Street
  • House

By the way, when testing the system, we were faced with the fact that not all settlements have streets, and not all streets have houses, which surprised and puzzled us a lot. This circumstance did not give rest, because if you provide the user with the opportunity to enter information on their own, they’ll write one so that the question involuntarily arises - “Did these people study at school ?!”
Therefore, it was decided not to give the user the slightest opportunity for initiative, but for such “missing” addresses to provide the opportunity to apply for inclusion of the missing address in the help system.

The query schema looks simple:


For the last paragraph, a list of houses with letters, buildings, buildings, and so on is formed.
After all the ordeals, we made a plugin on vue.js, to work with the directory, and its alternative on jQuery.

The region and the district can be removed as unnecessary, as they are being pulled together with the cities.
Autocomplete prompts pop up as you type, as in KLADR. True, the difference is that the KLADR plugin is designed for auto-completion, but here only the address selected from the prompts is considered valid.

In the ASPUDcomponent folder - is the VueJs component for working with the address database.
Sources are available in our repository .

Part 4. How to update it all?


The situation with updates is as follows: first you need to get update versions via SOAP. You can see how this is done in the UpdateController class (method: filesVersions ()).

Note: the version that is listed last in the list received does not necessarily match the version that can be downloaded on the main page. But do not rush to download only the latest version, as it may turn out to be a “bat." There have been such precedents. Next, the archive with the latest version is downloaded and unpacked. To work, you must use the extension for php (php_rar.dll).

Well, then the necessary region file (or, if necessary, all files) is selected to update the database.

Also popular now: