multik May 12, 2009 at 13:26

Tree-like DBMS

Everyone with experience in using, as a repository of data, tree-like DBMSs are invited to discuss. It would be useful to share the experience of developing tree structures, a description of the specifics of building a tree of indexes and algorithms for full-text information search inside a data warehouse.

Since any computer system, in order to optimize the exchange, exchanges between the memory and the disk in the form of blocks, the atomic element that stores data on the disk is a block. It's no secret that many DBMSs (the same ORACLE and MSSQL) actually store data in B-trees. A B-tree is a set of logically connected blocks arranged in a hierarchy, at each level of which blocks are defined, each of which has the same number of descendant levels. The description of the B-tree operation algorithm is beyond the scope of this blog.

Relational, object, or direct access is provided by a logical model. I’ll try to suggest that a reasonable use of a logical data model that is as close as possible to actual storage will allow more simple and quick processing of low-level data than using other logical models (SQL, etc.), although the requirements for the level of development of data access mechanisms are significantly increased. It is possible that direct access can be represented by a logical tree. An example of a logical data tree is the global in the Cache DBMS.

I will give some examples of the use, from personal experience, of tree-like data structures (globals).

The use of logical trees can be useful for describing an incomplete and fuzzy subject area, information about which will be supplemented in the process of using the system. An example of such a subject area is a newspaper ad . The expansion of the subject area is possible both due to the inclusion of new categories (initially there were only car ads, and in the future there will be sections for real estate, work, dating, etc.), and due to the refinement and increase in knowledge of the category already described (dynamic-speed characteristics cars, mass dimensions, etc.). Assume that different categories may have intersections with each other (known or unknown at the time of the initial description).

We describe the data structure in the form of a global. In the description below, there are no service words and symbols except:

s - set command
^ - the symbol of the global (logical tree)
[] - namespace (can be defined on a remote server)
$$$ - user constant

description of the structure of the vehicle description of the announcement (automotive topics only) As you can see from the description of the structure of the announcement (Classified) it contains a list of contacts. But contacts can have various variations:

//---------------------------UniVehicleModel---------------------------------------

 //Идентификатор марки ТС

 s ^[$$$MEDIA]$$$mBodyMediaGlobal($$$mType,"UniVehicleModel","@p","makeId")="id."


 //Идентификатор модели ТС

 s ^[$$$MEDIA]$$$mBodyMediaGlobal($$$mType,"UniVehicleModel","@p","modelId")="id."


 //Год выпуска ТС

 s ^[$$$MEDIA]$$$mBodyMediaGlobal($$$mType,"UniVehicleModel","@p","year")="ta."


 //Тип кузова ТС

 s ^[$$$MEDIA]$$$mBodyMediaGlobal($$$mType,"UniVehicleModel","@p","bodyTypeId")="id."

//-----------------Вариации объявлений------------------------------------

 //CarClassified

 s ^[$$$MEDIA]$$$mBodyMediaGlobal($$$mType,"CarClassified")="variationType"

 s ^[$$$MEDIA]$$$mBodyMediaGlobal($$$mType,"CarClassified","@t")="Classified"


//--------------------------------Classified--------------------------------------------- 

 //Объявление содержит в себе цену

 s ^[$$$MEDIA]$$$mBodyMediaGlobal($$$mType,"Classified","@p","Price")="enclosure"


 //Список контактов

 s ^[$$$MEDIA]$$$mBodyMediaGlobal($$$mType,"Classified","@p","contactList")="list"


 //Список изображений

 s ^[$$$MEDIA]$$$mBodyMediaGlobal($$$mType,"Classified","@p","imageList")="list"


 //Дополнительный текст

 s ^[$$$MEDIA]$$$mBodyMediaGlobal($$$mType,"Classified","@p","additionalText")="s."


 //Идентефикатор рубрики

 s ^[$$$MEDIA]$$$mBodyMediaGlobal($$$mType,"Classified","@p","rubricId")="id."


//Вариация легковое авто включает в себя ТС

 s ^[$$$MEDIA]$$$mBodyMediaGlobal($$$mType,"Classified","@v","CarClassified","@p","UniVehicleModel")="enclosure"

telephone
e-mail
address
GPS
other contacts

What is not displayed in the description of the announcement structure, but described in the contact structure: I apologize for the abundance of incomprehensible code in this blog, but suppose we need to expand the system (newspaper of announcements) and include the real estate subject area in it . It is enough to add a description of the new variation of the announcement: And add a description of the structure of the real estate market object: The structure is described. Perhaps I did not choose the most successful service flags for the description:

//------------------------------------Contact------------------------------------- 

 //Имя контактной персоны

 s ^[$$$MEDIA]$$$mBodyMediaGlobal($$$mType,"Contact","@p","contactPerson")="ta."


 //Вариация GPS

 s ^[$$$MEDIA]$$$mBodyMediaGlobal($$$mType,"Contact","@v","GPSContact","@p","GPS")="enclosure"


 //Вариация телефонный контакт

 s ^[$$$MEDIA]$$$mBodyMediaGlobal($$$mType,"Contact","@v","PhoneContact","@p","Phone")="enclosure"


 //Вариация адресный контакт 

 s ^[$$$MEDIA]$$$mBodyMediaGlobal($$$mType,"Contact","@v","AddressContact","@p","UniRealEstateAddress")="enclosure"


//Вариация веб-контакт 

 s ^[$$$MEDIA]$$$mBodyMediaGlobal($$$mType,"Contact","@v","WebSiteContact","@p","WebSite")="enclosure"


//Вариация eMail-контакт 

 s ^[$$$MEDIA]$$$mBodyMediaGlobal($$$mType,"Contact","@v","eMailContact","@p","eMail")="enclosure"


//---------------------Вариации контактов------------------------------- 

 //AddressContact

 s ^[$$$MEDIA]$$$mBodyMediaGlobal($$$mType,"AddressContact")="variationType"

 s ^[$$$MEDIA]$$$mBodyMediaGlobal($$$mType,"AddressContact","@t")="Contact"


 //PhoneContact

 s ^[$$$MEDIA]$$$mBodyMediaGlobal($$$mType,"PhoneContact")="variationType"

 s ^[$$$MEDIA]$$$mBodyMediaGlobal($$$mType,"PhoneContact","@t")="Contact"


 //GPSContact

 s ^[$$$MEDIA]$$$mBodyMediaGlobal($$$mType,"GPSContact")="variationType"

 s ^[$$$MEDIA]$$$mBodyMediaGlobal($$$mType,"GPSContact","@t")="Contact"


 //WebSiteContact

 s ^[$$$MEDIA]$$$mBodyMediaGlobal($$$mType,"WebSiteContact")="variationType"

 s ^[$$$MEDIA]$$$mBodyMediaGlobal($$$mType,"WebSiteContact","@t")="Contact"


 //eMailContact

 s ^[$$$MEDIA]$$$mBodyMediaGlobal($$$mType,"eMailContact")="variationType"

 s ^[$$$MEDIA]$$$mBodyMediaGlobal($$$mType,"eMailContact","@t")="Contact"

//-----------------Вариации объявлений--------------------------------------------

 //RealEstateClassified

 s ^[$$$MEDIA]$$$mBodyMediaGlobal($$$mType,"RealEstateClassified")="variationType"

 s ^[$$$MEDIA]$$$mBodyMediaGlobal($$$mType,"RealEstateClassified","@t")="Classified"


//Вариация недвижимость содержит в себе объект недвижимости

 s ^[$$$MEDIA]$$$mBodyMediaGlobal($$$mType,"Classified","@v","RealEstateClassified","@p","UniRealEstate")="enclosure"

//---------------------------------------UniRealEstate------------------------------------------ 

 //Этажность объекта недвижимости

 s ^[$$$MEDIA]$$$mBodyMediaGlobal($$$mType,"UniRealEstate","@p","floorQuanity")="n."


 //Этаж объекта недвижимости

 s ^[$$$MEDIA]$$$mBodyMediaGlobal($$$mType,"UniRealEstate","@p","floorNumber")="n."


 //Тип планировки

 s ^[$$$MEDIA]$$$mBodyMediaGlobal($$$mType,"UniRealEstate","@p","housePlaningType")="ta."


 //Общая площадь

 s ^[$$$MEDIA]$$$mBodyMediaGlobal($$$mType,"UniRealEstate","@p","totalArea")="ta."


 //Высота потолков

 s ^[$$$MEDIA]$$$mBodyMediaGlobal($$$mType,"UniRealEstate","@p","cellingHight ")="ta."

 //------------------------------------------------------------------------------------------------------------------

"@t" - types
"@p" - properties
"@v" - variations
"List" - lists

In any case, they are easy to replace with more correct ones.

In the future, you can easily delve into the description of the required subject area: be it contacts, cars, real estate, etc. Of course, a recursive structure processing mechanism is also needed, which, based on the tree described above, writes, reads, and updates data. That is, the body comes in (for example, xml), the mechanism runs through the body tree at the input, compares it with the structure description storage tree - and performs processing. Writing such an algorithm will not be difficult for a programmer with some experience, and I will not give my code for this mechanism - as I am sure there are more worthy examples. One of the advantages of storing the description of the data structure in the form of a logical tree is that the data processing mechanisms do not know anything about the subject area (about the input data), which can develop as knowledge accumulates. Of course, knowledge of the subject area - in some form, should be at the interface level (it is possible to use similar structures) - however, all mechanisms within the system (including CRUD mechanisms, indexing and search mechanisms) are not tied to the subject area (they do not know anything about the structure data).

Of course, the description of the data structure in the tree is not enough for this blog. In the near future I plan to describe the storage of data and search indexes in trees. Also in the global structure description it is very convenient to store rules (function names) that must be recursively called at the data processing stage - and can affect the structure traversal path. I would be grateful for fair criticism. Ready to answer clarifying questions.

The methodology described in this blog, to one degree or another, is used in a live Internet project.

Tags:

Tree-like DBMS

Also popular now: