CrushBy July 4, 2019 at 17:04

Functional DBMS

Recovery mode

The world of databases has long been captured by relational DBMSs that use the SQL language. So much so that the emerging species are called NoSQL. They managed to recapture a certain place in this market, but relational DBMSs are not going to die, and continue to be actively used for their purposes.

In this article I want to describe the concept of a functional database. For a better understanding, I will do this by comparing with the classical relational model. As examples, tasks from various SQL tests found on the Internet will be used.

Introduction

Relational databases operate on tables and fields. In the functional database, classes and functions will be used instead. A field in a table with N keys will be presented as a function of N parameters. Instead of relationships between tables, functions will be used that return objects of the class that is being linked. Instead of JOIN function composition will be used.

Before proceeding directly to the tasks, I will describe the task of domain logic. For DDL, I will use PostgreSQL syntax. For functional, its own syntax.

Tables and Fields

Simple Sku object with fields name and price:

Relational

CREATE TABLE Sku
(
    id bigint NOT NULL,
    name character varying(100),
    price numeric(10,5),
    CONSTRAINT id_pkey PRIMARY KEY (id)
)

Functional

CLASS Sku;

name = DATA STRING[100] (Sku);

price = DATA NUMERIC[10,5] (Sku);

We declare two functions that take one Sku parameter as input and return a primitive type.

It is assumed that in the functional DBMS, each object will have a certain internal code that is automatically generated, and which can be accessed if necessary.

We set the price for the product / store / supplier. It can change over time, so we add the time field to the table. I will skip the declaration of tables for directories in the relational database in order to shorten the code:

Relational

CREATE TABLE prices
(
    skuId bigint NOT NULL,
    storeId bigint NOT NULL,
    supplierId bigint NOT NULL,
    dateTime timestamp without time zone,
    price numeric(10,5),
    CONSTRAINT prices_pkey PRIMARY KEY (skuId, storeId, supplierId)
)

Functional

CLASS Sku;

CLASS Store;

CLASS Supplier;

dateTime = DATA DATETIME (Sku, Store, Supplier);

price = DATA NUMERIC[10,5] (Sku, Store, Supplier);

Indices

For the last example, we will build an index for all the keys and the date so that you can quickly find the price for a specific time.

Relational

CREATE INDEX prices_date
    ON prices
    (skuId, storeId, supplierId, dateTime)

Functional

INDEX Sku sk, Store st, Supplier sp, dateTime(sk, st, sp);

Tasks

Let's start with relatively simple tasks taken from the corresponding article on Habré.

First, declare the domain logic (for a relational database, this is done directly in the above article).

CLASS Department;

name = DATA STRING[100] (Department);


CLASS Employee;

department = DATA Department (Employee);

chief = DATA Employee (Employee);

name = DATA STRING[100] (Employee);

salary = DATA NUMERIC[14,2] (Employee);

Task 1.1

Display a list of employees receiving a salary greater than that of the immediate supervisor.

Relational

select a.*
from   employee a, employee b
where  b.id = a.chief_id
and    a.salary > b.salary

Functional

SELECT name(Employee a) WHERE salary(a) > salary(chief(a));

Task 1.2

Display a list of employees receiving the maximum salary in their department.

Relational

select a.*
from   employee a
where  a.salary = ( select max(salary) from employee b
                    where  b.department_id = a.department_id )

Functional

maxSalary 'Максимальная зарплата' (Department s) = 

    GROUP MAX salary(Employee e) IF department(e) = s;

SELECT name(Employee a) WHERE salary(a) = maxSalary(department(a));

// или если "заинлайнить"

SELECT name(Employee a) WHERE 

    salary(a) = maxSalary(GROUP MAX salary(Employee e) IF department(e) = department(a));

Both implementations are equivalent. For the first case, you can use CREATE VIEW in the relational database, which in the same way will first calculate the maximum salary for a particular department in it. In the future, for clarity, I will use the first case, since it better reflects the solution.

Task 1.3

Display a list of department IDs, the number of employees in which does not exceed 3 people.

Relational

select department_id
from   employee
group  by department_id
having count(*) <= 3

Functional

countEmployees 'Количество сотрудников' (Department d) = 

    GROUP SUM 1 IF department(Employee e) = d;

SELECT Department d WHERE countEmployees(d) <= 3;

Task 1.4

Display a list of employees who do not have an appointed manager working in the same department.

Relational

select a.*
from   employee a
left   join employee b on (b.id = a.chief_id and b.department_id = a.department_id)
where  b.id is null

Functional

SELECT name(Employee a) WHERE NOT (department(chief(a)) = department(a));

Task 1.5

Find a list of department IDs with the maximum total employee salary.

Relational

with sum_salary as
  ( select department_id, sum(salary) salary
    from   employee
    group  by department_id )
select department_id
from   sum_salary a       
where  a.salary = ( select max(salary) from sum_salary )

Functional

salarySum 'Максимальная зарплата' (Department d) = 

    GROUP SUM salary(Employee e) IF department(e) = d;

maxSalarySum 'Максимальная зарплата отделов' () = 

    GROUP MAX salarySum(Department d);

SELECT Department d WHERE salarySum(d) = maxSalarySum();

Let's move on to more complex tasks from another article . It contains a detailed analysis of how to implement this task in MS SQL.

Task 2.1

Which sellers sold in 1997 more than 30 pieces of product No. 1?

Domain logic (as before, we skip the announcement on the RDBMS):

CLASS Employee 'Продавец';

lastName 'Фамилия' = DATA STRING[100] (Employee);


CLASS Product 'Продукт';

id = DATA INTEGER (Product);

name = DATA STRING[100] (Product);


CLASS Order 'Заказ';

date = DATA DATE (Order);

employee = DATA Employee (Order);


CLASS Detail 'Строка заказа';


order = DATA Order (Detail);

product = DATA Product (Detail);

quantity = DATA NUMERIC[10,5] (Detail);

Relational

select LastName
from Employees as e
where (
  select sum(od.Quantity)
  from [Order Details] as od
  where od.ProductID = 1 and od.OrderID in (
    select o.OrderID
    from Orders as o
    where year(o.OrderDate) = 1997 and e.EmployeeID = o.EmployeeID)
) > 30

Functional

sold (Employee e, INTEGER productId, INTEGER year) = 

    GROUP SUM quantity(OrderDetail d) IF 

        employee(order(d)) = e AND 

        id(product(d)) = productId AND 

        extractYear(date(order(d))) = year;

SELECT lastName(Employee e) WHERE sold(e, 1, 1997) > 30;

Task 2.2

For each buyer (name, surname), find two products (name) for which the buyer spent the most money in 1997.

Extending the domain logic from the previous example:

CLASS Customer 'Клиент';

contactName 'ФИО' = DATA STRING[100] (Customer);


customer = DATA Customer (Order);


unitPrice = DATA NUMERIC[14,2] (Detail);

discount = DATA NUMERIC[6,2] (Detail);

Relational

SELECT ContactName, ProductName FROM (
SELECT c.ContactName, p.ProductName
, ROW_NUMBER() OVER (
    PARTITION BY c.ContactName
    ORDER BY SUM(od.Quantity * od.UnitPrice * (1 - od.Discount)) DESC
) AS RatingByAmt
FROM Customers c
JOIN Orders o ON o.CustomerID = c.CustomerID
JOIN [Order Details] od ON od.OrderID = o.OrderID
JOIN Products p ON p.ProductID = od.ProductID
WHERE YEAR(o.OrderDate) = 1997
GROUP BY c.ContactName, p.ProductName
) t
WHERE RatingByAmt < 3

Functional

sum (Detail d) = quantity(d) * unitPrice(d) * (1 - discount(d));

bought 'Купил' (Customer c, Product p, INTEGER y) = 

    GROUP SUM sum(Detail d) IF 

        customer(order(d)) = c AND 

        product(d) = p AND 

        extractYear(date(order(d))) = y;

rating 'Рейтинг' (Customer c, Product p, INTEGER y) = 

    PARTITION SUM 1 ORDER DESC bought(c, p, y), p BY c, y;

SELECT contactName(Customer c), name(Product p) WHERE rating(c, p, 1997) < 3;

The PARTITION operator works according to the following principle: it sums the expression specified after SUM (here 1) inside the specified groups (here Customer and Year, but there can be any expression), sorting inside the groups according to the expressions specified in ORDER (bought here, and if equal then the internal code of the product).

Task 2.3

How many goods need to be ordered from suppliers to fulfill current orders.

Extending domain logic again:

CLASS Supplier 'Поставщик';

companyName = DATA STRING[100] (Supplier);


supplier = DATA Supplier (Product);


unitsInStock 'Остаток на складе' = DATA NUMERIC[10,3] (Product);

reorderLevel 'Норма продажи' = DATA NUMERIC[10,3] (Product);

Relational

select s.CompanyName, p.ProductName, sum(od.Quantity) + p.ReorderLevel — p.UnitsInStock as ToOrder
from Orders o
join [Order Details] od on o.OrderID = od.OrderID
join Products p on od.ProductID = p.ProductID
join Suppliers s on p.SupplierID = s.SupplierID
where o.ShippedDate is null
group by s.CompanyName, p.ProductName, p.UnitsInStock, p.ReorderLevel
having p.UnitsInStock < sum(od.Quantity) + p.ReorderLevel

Functional

orderedNotShipped 'Заказано, но не отгружено' (Product p) = 

    GROUP SUM quantity(OrderDetail d) IF product(d) = p;

toOrder 'К заказу' (Product p) = orderedNotShipped(p) + reorderLevel(p) - unitsInStock(p);

SELECT companyName(supplier(Product p)), name(p), toOrder(p) WHERE toOrder(p) > 0;

Task with an asterisk

And the last example is from me personally. There is the logic of a social network. People can be friends with each other and like each other. From the point of view of a functional database, it will look like this:

CLASS Person;

likes = DATA BOOLEAN (Person, Person);

friends = DATA BOOLEAN (Person, Person);

It is necessary to find possible candidates for friendship. More formalized, you need to find all people A, B, C such that A is friends with B, and B is friends with C, A likes C, but A is not friends with C.
From the point of view of a functional database, the query will look like this:

SELECT Person a, Person b, Person c WHERE 

    likes(a, c) AND NOT friends(a, c) AND 

    friends(a, b) AND friends(b, c);

The reader is invited to independently solve this problem in SQL. It is assumed that friends are much fewer than those who like. Therefore, they are in separate tables. In the case of a successful solution, there is also a problem with two asterisks. In it, friendship is not symmetrical. On a functional database, it will look like this:

SELECT Person a, Person b, Person c WHERE 

    likes(a, c) AND NOT friends(a, c) AND 

    (friends(a, b) OR friends(b, a)) AND 

    (friends(b, c) OR friends(c, b));

UPD: solving the problem with the first and second asterisk from dss_kalika :

SELECT 
   pl.PersonAID
  ,pf.PersonAID
  ,pff.PersonAID
FROM Persons                 AS p
--Лайки                      
JOIN PersonRelationShip      AS pl ON pl.PersonAID = p.PersonID
                                  AND pl.Relation  = 'Like'
--Друзья                     
JOIN PersonRelationShip      AS pf ON pf.PersonAID = p.PersonID 
                                  AND pf.Relation = 'Friend'
--Друзья Друзей              
JOIN PersonRelationShip      AS pff ON pff.PersonAID = pf.PersonBID
                                   AND pff.PersonBID = pl.PersonBID
                                   AND pff.Relation = 'Friend'
--Ещё не дружат         
LEFT JOIN PersonRelationShip AS pnf ON pnf.PersonAID = p.PersonID
                                   AND pnf.PersonBID = pff.PersonBID
                                   AND pnf.Relation = 'Friend'
WHERE pnf.PersonAID IS NULL

;WITH PersonRelationShipCollapsed AS (
  SELECT pl.PersonAID
        ,pl.PersonBID
        ,pl.Relation 
  FROM #PersonRelationShip      AS pl 
  UNION 
  SELECT pl.PersonBID AS PersonAID
        ,pl.PersonAID AS PersonBID
        ,pl.Relation
  FROM #PersonRelationShip      AS pl 
)
SELECT 
   pl.PersonAID
  ,pf.PersonBID
  ,pff.PersonBID
FROM #Persons                      AS p
--Лайки                      
JOIN PersonRelationShipCollapsed  AS pl ON pl.PersonAID = p.PersonID
                                 AND pl.Relation  = 'Like'                                  
--Друзья                          
JOIN PersonRelationShipCollapsed  AS pf ON pf.PersonAID = p.PersonID 
                                 AND pf.Relation = 'Friend'
--Друзья Друзей                   
JOIN PersonRelationShipCollapsed  AS pff ON pff.PersonAID = pf.PersonBID
                                 AND pff.PersonBID = pl.PersonBID
                                 AND pff.Relation = 'Friend'
--Ещё не дружат                   
LEFT JOIN PersonRelationShipCollapsed AS pnf ON pnf.PersonAID = p.PersonID
                                   AND pnf.PersonBID = pff.PersonBID
                                   AND pnf.Relation = 'Friend'
WHERE pnf.[PersonAID] IS NULL

Conclusion

It should be noted that the given syntax of the language is just one of the options for implementing the above concept. It was SQL that was taken as the basis, and the goal was to make it as similar as possible to it. Of course, someone may not like the names of keywords, word registers, and more. The main thing here is the concept itself. If desired, you can make both C ++ and Python a similar syntax.

The described database concept, in my opinion, has the following advantages:

Simplicity . This is a relatively subjective indicator, which is not obvious in simple cases. But if you look at more complex cases (for example, tasks with asterisks), then, in my opinion, writing such queries is much simpler.
Инкапсуляция. В некоторых примерах я объявлял промежуточные функции (например, sold, bought и т.д.), от которых строились последующие функции. Это позволяет при необходимости изменять логику определенных функций без изменения логики зависящих от них. Например, можно сделать, чтобы продажи sold считались от совершенно других объектов, при этом остальная логика не изменится. Да, в РСУБД это можно реализовать при помощи CREATE VIEW. Но если всю логику писать таким образом, то она будет выглядеть не очень читабельной.

Отсутствие семантического разрыва. Такая база данных оперирует функциями и классами (вместо таблиц и полей). Точно также, как и в классическом программировании (если считать, что метод — это функция с первым параметром в виде класса, к которому он относится). Соответственно, «подружить» с универсальными языками программирования должно быть значительно проще. Кроме того, эта концепция позволяет реализовывать гораздо более сложные функции. Например, можно встраивать в базу данных операторы вида:

CONSTRAINT sold(Employee e, 1, 2019) > 100 IF name(e) = 'Петя' MESSAGE  'Что-то Петя продает слишком много одного товара в 2019 году';

Наследование и полиморфизм. В функциональной базе данных можно ввести множественное наследование через конструкции CLASS ClassP: Class1, Class2 и реализовать множественный полиморфизм. Как именно, возможно напишу в следующих статьях.

Despite the fact that this is just a concept, we already have some Java implementation that translates all functional logic into relational logic. Plus, the presentation logic and a lot of other things are beautifully bolted to it, which makes the whole platform . In fact, we use RDBMS (so far only PostgreSQL) as a “virtual machine”. Such translation sometimes causes problems, since the query optimizer of the RDBMS does not know certain statistics that the FSBD knows. In theory, it is possible to implement a database management system that will use some structure adapted specifically for functional logic as a storage.

Tags:

Functional DBMS

Introduction

Tables and Fields

Indices

Tasks

Task 1.1

Task 1.2

Task 1.3

Task 1.4

Task 1.5

Task 2.1

Task 2.2

Task 2.3

Task with an asterisk

Conclusion

Also popular now: