smirnovhi August 3, 2017 at 14:50

1C and ETL

From the sandbox

ETL and 1C. Data extraction

First look

If you, as an ETL specialist, are faced with the need to get data from 1C, then this is the first thing you can see when trying to figure out the database structure (this is from the case of MSSQL, the picture is similar for other DBMSs):

Business meaning in the names of tables and fields missing, no foreign keys.

A couple of affectionate ones about 1C. The real DBMS tables in it are hidden behind objects that the developer sees, who often does not know about the real structure of the database. Yes ... And all the code is in Russian. In addition, there are enumerations whose string representations using SQL are almost impossible to obtain. More on this here .

There are cases when there is no database (and 1C in the file version), but this, of course, directs you to integration without using DBMS tools.

However, do not despair, because everything is not as bad as it seems.

Attentive look

To capture data from 1C you have 2 ways:

Implementing a “High Level” Interface

You can use file uploads, web / json services and other features of 1C, which will be compatible with your ETL.

+

You do not have to climb into 1C. Everything on the 1C side should be done by 1Sniki
You do not violate the 1C licensing policy

Another source for errors appears in the form of additional uploads, downloads,
schedules, robots
This will work much slower due to the features of 1C interfaces
With any changes in the captured data, you will have to make changes to the uploads (but this can be circumvented by the configuration system)
This will cause more errors in data integrity than working directly with the DBMS

DBMS implementation

Faster
Allows you to ensure the completeness of data in the warehouse with the right approach

Violates the license agreement with 1C

So, after weighing the pros and cons, you decide to build integration through the DBMS, or at least
think about how you will do it next.

Data mapping

In order to associate business data, as understood on the 1C side, with real database tables, you need to do a little magic in 1C itself, namely, get a description of 1C metadata in a usable form (in connection with business objects and tables).
Again, there are at least as many as 3 approaches:

Using com-connection, web / json service to get a correspondence table from 1C
Do the same on the 1C side, forming a metadata table
Parse a binary file that is stored in the same database

The 3rd way seems to me somewhat risky due to the fact that 1C has the habit of making changes to its insides without warning. And, at the same time, quite complicated.
The choice between 1 and 2 is not so obvious, but for my taste, using a pre-formed table is much more convenient and reliable in daily use and there is no need to use anything other than pure SQL.

It is more convenient to store and maintain the relevance of the table using 1C, updating after each configuration update. At the same time, ETL can use View, which will show the data already in a more digestible form.

Preparing a metadata table

Create an object in 1C that contains configuration metadata (unfortunately, you won’t be able to do this with a script, but you can give instructions to 1C nickname)

Register of Details.StructureConfiguration
Fields:
NameTablesStorage
NameTables
SynonymTables
Purpose
NameFieldsStore
SynonymFields

All lines 150 characters

It turns out to be denormalized, but quite convenient.

Code 1C to fill the structure:

СтруктураБД = ПолучитьСтруктуруХраненияБазыДанных(,истина);
ЗаписиСтруктура = РегистрыСведений.СтруктураКонфигурации.СоздатьНаборЗаписей();
Для каждого СтрокаСтруктуры Из СтруктураБД Цикл
    Для каждого СтрокаПолей Из СтрокаСтруктуры.Поля Цикл
	Запись 					= ЗаписиСтруктура.Добавить();
	Запись.ИмяТаблицыХранения 	= СтрокаСтруктуры.ИмяТаблицыХранения;
	Запись.ИмяТаблицы 		= СтрокаСтруктуры.ИмяТаблицы;
	Запись.СинонимТаблицы 	= Метаданные.НайтиПоПолномуИмени(СтрокаСтруктуры.Метаданные);
	Запись.Назначение 			= СтрокаСтруктуры.Назначение;
	Запись.ИмяПоляХранения  	= СтрокаПолей.ИмяПоляХранения;
	Запись.СинонимПоля 			= Метаданные.НайтиПоПолномуИмени(СтрокаПолей.Метаданные);
	КонецЦикла;
Конеццикла;
ЗаписиСтруктура.Записать(истина);

Again, everything is quite simple and obvious, despite the Russian language. You must execute this code every time you update the configuration. You can do this with your hands in processing or with the help of a scheduled task.

The table can be viewed both in client mode and from the SQL side, knowing their names.

 SELECT * FROM _InfoReg27083 ORDER BY  _Fld27085

(_InfoReg27083 is the name that 1C gave to the register table with the structure, _Fld27085 is the name of the field with the name of the storage table)

You can make the View more convenient.

If it is not possible to make changes to the configuration, you can make the table by connecting via com, or by adding in processing the unloading to the database table that is involved in ETL.

And here about what types of tables are and why they are needed (access to ITS 1C is needed) .

The next step is to map the data and describe the transformation.

Field	Field1c	Transformation	...
_Fld15704	Document. Realization of Goods. Services. Weight	Check> = 0, round (10,2), ...	...

So we got a mapping table that can be used in further work.

Capturing data changes

Now in terms of strategy for capturing data changes. Here again, there are several options. It’s easier to take the whole table, which, of course, can cost the server significant additional costs.

However, there are other ways:

Use Object Versions
Use exchange plan

Use Object Versions

For objects of "reference" type 1C supports versions. The version number of the object is recorded in the _version binary field , which is carefully updated every time the record is updated. In MSSQL, for example, this is a field of type timestamp. Versions are supported for objects of the type “Document”, “Directory”, “Business Process”, “Task”, “Chart of Accounts”, “Plan of Characteristic Types”, “Constants”. Using the version is quite simple, keeping the value of the latest version for the object in the staging area and selecting objects larger in the version field during the next update. Together with the "main" object, you must remember to pick up its tabular parts (see Purpose - "Tabular part") in the structure (field of the form _DocumentXXX_IDRRef or _ReferenceXXX_IDRRef - link to the main table).

Use exchange plan

For non-reference types, this approach is not suitable, but you can use the object "exchange plan". In the structure table, their purpose = 'Registration of Changes'. A separate exchange plan table is created for each configuration object.

At the database level, this is a table with the following structure:

_NodeTRef, - identifier of the type of “node” of the exchange plan. It is not very interesting for
us _NodeRRef, - identifier of the exchange plan node
_MessageNo, - message number

Next are the key fields of the "main" table. They differ depending on the type of table with which the exchange plan table is associated:

_IDRRef - in this case, the directory or document ID
can be like this:
_RecorderTRef
_RecorderRRef

This will be a table of changes in the accumulation register, information register subordinate to the registrar, or the accounting register. There may also be a key in the information register table key if it is not subordinate to the registrar.

In order for such a change registration table to exist, you need to include the object we need in the exchange plan in the 1C configurator. In addition, an exchange plan node needs to be created, whose identifier (_IDRRef) will need to be used.

The exchange plan table can be found in the structure (see above). Because in the exchange plan, changes are recorded for all nodes, and not just for storage, we need to limit the selection to the _NodeRRef we need . An exchange plan can also be used for reference objects, but in my opinion this is a pointless waste of resources.

How to collect data through the exchange plan:

First, we write update to the exchange plan, where we put an arbitrary _MessageNO (always better 1).

For example,

UPDATE _DocumentChangeRec18901 set _MessageNO = 1 WHERE _NodeRRef = @_NodeRRef

Next, we select the data from the data table, linking it by key to the exchange plan table,

SELECT [fieldslist] FROM _Document18891 inner join _DocumentChangeRec18901 ON _Document18891._IDRRef = _DocumentChangeRec18901._IDRRef and _MessageNO = 1 AND _NodeRRef = @_NodeRRef

and confirm the change fence by deleting the change table entries.

DELETE FROM _DocumentChangeRec18901 WHERE _MessageNO = 1 AND _NodeRRef = @_NodeRRef

Total: We learned how to read 1C metadata on the ETL side, learned how to capture data. The remaining steps of the ETL process are well known. For example, you can read here .

Tags: