Functional DBMS
The world of databases has long been captured by relational DBMSs that use the SQL language. So much so that the emerging species are called NoSQL. They managed to recapture a certain place in this market, but relational DBMSs are not going to die, and continue to be actively used for their purposes.
In this article I want to describe the concept of a functional database. For a better understanding, I will do this by comparing with the classical relational model. As examples, tasks from various SQL tests found on the Internet will be used.
Relational databases operate on tables and fields. In the functional database, classes and functions will be used instead. A field in a table with N keys will be presented as a function of N parameters. Instead of relationships between tables, functions will be used that return objects of the class that is being linked. Instead of JOIN function composition will be used.
Before proceeding directly to the tasks, I will describe the task of domain logic. For DDL, I will use PostgreSQL syntax. For functional, its own syntax.
Simple Sku object with fields name and price:
Relational
Functional
We declare two functions that take one Sku parameter as input and return a primitive type.
It is assumed that in the functional DBMS, each object will have a certain internal code that is automatically generated, and which can be accessed if necessary.
We set the price for the product / store / supplier. It can change over time, so we add the time field to the table. I will skip the declaration of tables for directories in the relational database in order to shorten the code:
Relational
Functional
For the last example, we will build an index for all the keys and the date so that you can quickly find the price for a specific time.
Relational
Functional
Let's start with relatively simple tasks taken from the corresponding article on Habré.
First, declare the domain logic (for a relational database, this is done directly in the above article).
Display a list of employees receiving a salary greater than that of the immediate supervisor.
Relational
Functional
Display a list of employees receiving the maximum salary in their department.
Relational
Functional
Both implementations are equivalent. For the first case, you can use CREATE VIEW in the relational database, which in the same way will first calculate the maximum salary for a particular department in it. In the future, for clarity, I will use the first case, since it better reflects the solution.
Display a list of department IDs, the number of employees in which does not exceed 3 people.
Relational
Functional
Display a list of employees who do not have an appointed manager working in the same department.
Relational
Functional
Find a list of department IDs with the maximum total employee salary.
Relational
Functional
Let's move on to more complex tasks from another article . It contains a detailed analysis of how to implement this task in MS SQL.
Which sellers sold in 1997 more than 30 pieces of product No. 1?
Domain logic (as before, we skip the announcement on the RDBMS):
Relational
Functional
For each buyer (name, surname), find two products (name) for which the buyer spent the most money in 1997.
Extending the domain logic from the previous example:
Relational
Functional
The PARTITION operator works according to the following principle: it sums the expression specified after SUM (here 1) inside the specified groups (here Customer and Year, but there can be any expression), sorting inside the groups according to the expressions specified in ORDER (bought here, and if equal then the internal code of the product).
How many goods need to be ordered from suppliers to fulfill current orders.
Extending domain logic again:
Relational
Functional
And the last example is from me personally. There is the logic of a social network. People can be friends with each other and like each other. From the point of view of a functional database, it will look like this:
It is necessary to find possible candidates for friendship. More formalized, you need to find all people A, B, C such that A is friends with B, and B is friends with C, A likes C, but A is not friends with C.
From the point of view of a functional database, the query will look like this:
The reader is invited to independently solve this problem in SQL. It is assumed that friends are much fewer than those who like. Therefore, they are in separate tables. In the case of a successful solution, there is also a problem with two asterisks. In it, friendship is not symmetrical. On a functional database, it will look like this:
UPD: solving the problem with the first and second asterisk from dss_kalika :
It should be noted that the given syntax of the language is just one of the options for implementing the above concept. It was SQL that was taken as the basis, and the goal was to make it as similar as possible to it. Of course, someone may not like the names of keywords, word registers, and more. The main thing here is the concept itself. If desired, you can make both C ++ and Python a similar syntax.
The described database concept, in my opinion, has the following advantages:
Despite the fact that this is just a concept, we already have some Java implementation that translates all functional logic into relational logic. Plus, the presentation logic and a lot of other things are beautifully bolted to it, which makes the whole platform . In fact, we use RDBMS (so far only PostgreSQL) as a “virtual machine”. Such translation sometimes causes problems, since the query optimizer of the RDBMS does not know certain statistics that the FSBD knows. In theory, it is possible to implement a database management system that will use some structure adapted specifically for functional logic as a storage.
In this article I want to describe the concept of a functional database. For a better understanding, I will do this by comparing with the classical relational model. As examples, tasks from various SQL tests found on the Internet will be used.
Introduction
Relational databases operate on tables and fields. In the functional database, classes and functions will be used instead. A field in a table with N keys will be presented as a function of N parameters. Instead of relationships between tables, functions will be used that return objects of the class that is being linked. Instead of JOIN function composition will be used.
Before proceeding directly to the tasks, I will describe the task of domain logic. For DDL, I will use PostgreSQL syntax. For functional, its own syntax.
Tables and Fields
Simple Sku object with fields name and price:
Relational
CREATE TABLE Sku
(
id bigint NOT NULL,
name character varying(100),
price numeric(10,5),
CONSTRAINT id_pkey PRIMARY KEY (id)
)
Functional
CLASS Sku; |
We declare two functions that take one Sku parameter as input and return a primitive type.
It is assumed that in the functional DBMS, each object will have a certain internal code that is automatically generated, and which can be accessed if necessary.
We set the price for the product / store / supplier. It can change over time, so we add the time field to the table. I will skip the declaration of tables for directories in the relational database in order to shorten the code:
Relational
CREATE TABLE prices
(
skuId bigint NOT NULL,
storeId bigint NOT NULL,
supplierId bigint NOT NULL,
dateTime timestamp without time zone,
price numeric(10,5),
CONSTRAINT prices_pkey PRIMARY KEY (skuId, storeId, supplierId)
)
Functional
CLASS Sku; |
Indices
For the last example, we will build an index for all the keys and the date so that you can quickly find the price for a specific time.
Relational
CREATE INDEX prices_date
ON prices
(skuId, storeId, supplierId, dateTime)
Functional
INDEX Sku sk, Store st, Supplier sp, dateTime(sk, st, sp); |
Tasks
Let's start with relatively simple tasks taken from the corresponding article on Habré.
First, declare the domain logic (for a relational database, this is done directly in the above article).
CLASS Department; |
Task 1.1
Display a list of employees receiving a salary greater than that of the immediate supervisor.
Relational
select a.*
from employee a, employee b
where b.id = a.chief_id
and a.salary > b.salary
Functional
SELECT name(Employee a) WHERE salary(a) > salary(chief(a)); |
Task 1.2
Display a list of employees receiving the maximum salary in their department.
Relational
select a.*
from employee a
where a.salary = ( select max(salary) from employee b
where b.department_id = a.department_id )
Functional
maxSalary 'Максимальная зарплата' (Department s) = |
Both implementations are equivalent. For the first case, you can use CREATE VIEW in the relational database, which in the same way will first calculate the maximum salary for a particular department in it. In the future, for clarity, I will use the first case, since it better reflects the solution.
Task 1.3
Display a list of department IDs, the number of employees in which does not exceed 3 people.
Relational
select department_id
from employee
group by department_id
having count(*) <= 3
Functional
countEmployees 'Количество сотрудников' (Department d) = |
Task 1.4
Display a list of employees who do not have an appointed manager working in the same department.
Relational
select a.*
from employee a
left join employee b on (b.id = a.chief_id and b.department_id = a.department_id)
where b.id is null
Functional
SELECT name(Employee a) WHERE NOT (department(chief(a)) = department(a)); |
Task 1.5
Find a list of department IDs with the maximum total employee salary.
Relational
with sum_salary as
( select department_id, sum(salary) salary
from employee
group by department_id )
select department_id
from sum_salary a
where a.salary = ( select max(salary) from sum_salary )
Functional
salarySum 'Максимальная зарплата' (Department d) = |
Let's move on to more complex tasks from another article . It contains a detailed analysis of how to implement this task in MS SQL.
Task 2.1
Which sellers sold in 1997 more than 30 pieces of product No. 1?
Domain logic (as before, we skip the announcement on the RDBMS):
CLASS Employee 'Продавец'; |
Relational
select LastName
from Employees as e
where (
select sum(od.Quantity)
from [Order Details] as od
where od.ProductID = 1 and od.OrderID in (
select o.OrderID
from Orders as o
where year(o.OrderDate) = 1997 and e.EmployeeID = o.EmployeeID)
) > 30
Functional
sold (Employee e, INTEGER productId, INTEGER year) = |
Task 2.2
For each buyer (name, surname), find two products (name) for which the buyer spent the most money in 1997.
Extending the domain logic from the previous example:
CLASS Customer 'Клиент'; |
Relational
SELECT ContactName, ProductName FROM (
SELECT c.ContactName, p.ProductName
, ROW_NUMBER() OVER (
PARTITION BY c.ContactName
ORDER BY SUM(od.Quantity * od.UnitPrice * (1 - od.Discount)) DESC
) AS RatingByAmt
FROM Customers c
JOIN Orders o ON o.CustomerID = c.CustomerID
JOIN [Order Details] od ON od.OrderID = o.OrderID
JOIN Products p ON p.ProductID = od.ProductID
WHERE YEAR(o.OrderDate) = 1997
GROUP BY c.ContactName, p.ProductName
) t
WHERE RatingByAmt < 3
Functional
sum (Detail d) = quantity(d) * unitPrice(d) * (1 - discount(d)); |
The PARTITION operator works according to the following principle: it sums the expression specified after SUM (here 1) inside the specified groups (here Customer and Year, but there can be any expression), sorting inside the groups according to the expressions specified in ORDER (bought here, and if equal then the internal code of the product).
Task 2.3
How many goods need to be ordered from suppliers to fulfill current orders.
Extending domain logic again:
CLASS Supplier 'Поставщик'; |
Relational
select s.CompanyName, p.ProductName, sum(od.Quantity) + p.ReorderLevel — p.UnitsInStock as ToOrder
from Orders o
join [Order Details] od on o.OrderID = od.OrderID
join Products p on od.ProductID = p.ProductID
join Suppliers s on p.SupplierID = s.SupplierID
where o.ShippedDate is null
group by s.CompanyName, p.ProductName, p.UnitsInStock, p.ReorderLevel
having p.UnitsInStock < sum(od.Quantity) + p.ReorderLevel
Functional
orderedNotShipped 'Заказано, но не отгружено' (Product p) = |
Task with an asterisk
And the last example is from me personally. There is the logic of a social network. People can be friends with each other and like each other. From the point of view of a functional database, it will look like this:
CLASS Person; |
It is necessary to find possible candidates for friendship. More formalized, you need to find all people A, B, C such that A is friends with B, and B is friends with C, A likes C, but A is not friends with C.
From the point of view of a functional database, the query will look like this:
SELECT Person a, Person b, Person c WHERE |
The reader is invited to independently solve this problem in SQL. It is assumed that friends are much fewer than those who like. Therefore, they are in separate tables. In the case of a successful solution, there is also a problem with two asterisks. In it, friendship is not symmetrical. On a functional database, it will look like this:
SELECT Person a, Person b, Person c WHERE |
UPD: solving the problem with the first and second asterisk from dss_kalika :
SELECT
pl.PersonAID
,pf.PersonAID
,pff.PersonAID
FROM Persons AS p
--Лайки
JOIN PersonRelationShip AS pl ON pl.PersonAID = p.PersonID
AND pl.Relation = 'Like'
--Друзья
JOIN PersonRelationShip AS pf ON pf.PersonAID = p.PersonID
AND pf.Relation = 'Friend'
--Друзья Друзей
JOIN PersonRelationShip AS pff ON pff.PersonAID = pf.PersonBID
AND pff.PersonBID = pl.PersonBID
AND pff.Relation = 'Friend'
--Ещё не дружат
LEFT JOIN PersonRelationShip AS pnf ON pnf.PersonAID = p.PersonID
AND pnf.PersonBID = pff.PersonBID
AND pnf.Relation = 'Friend'
WHERE pnf.PersonAID IS NULL
;WITH PersonRelationShipCollapsed AS (
SELECT pl.PersonAID
,pl.PersonBID
,pl.Relation
FROM #PersonRelationShip AS pl
UNION
SELECT pl.PersonBID AS PersonAID
,pl.PersonAID AS PersonBID
,pl.Relation
FROM #PersonRelationShip AS pl
)
SELECT
pl.PersonAID
,pf.PersonBID
,pff.PersonBID
FROM #Persons AS p
--Лайки
JOIN PersonRelationShipCollapsed AS pl ON pl.PersonAID = p.PersonID
AND pl.Relation = 'Like'
--Друзья
JOIN PersonRelationShipCollapsed AS pf ON pf.PersonAID = p.PersonID
AND pf.Relation = 'Friend'
--Друзья Друзей
JOIN PersonRelationShipCollapsed AS pff ON pff.PersonAID = pf.PersonBID
AND pff.PersonBID = pl.PersonBID
AND pff.Relation = 'Friend'
--Ещё не дружат
LEFT JOIN PersonRelationShipCollapsed AS pnf ON pnf.PersonAID = p.PersonID
AND pnf.PersonBID = pff.PersonBID
AND pnf.Relation = 'Friend'
WHERE pnf.[PersonAID] IS NULL
Conclusion
It should be noted that the given syntax of the language is just one of the options for implementing the above concept. It was SQL that was taken as the basis, and the goal was to make it as similar as possible to it. Of course, someone may not like the names of keywords, word registers, and more. The main thing here is the concept itself. If desired, you can make both C ++ and Python a similar syntax.
The described database concept, in my opinion, has the following advantages:
- Simplicity . This is a relatively subjective indicator, which is not obvious in simple cases. But if you look at more complex cases (for example, tasks with asterisks), then, in my opinion, writing such queries is much simpler.
- Инкапсуляция. В некоторых примерах я объявлял промежуточные функции (например, sold, bought и т.д.), от которых строились последующие функции. Это позволяет при необходимости изменять логику определенных функций без изменения логики зависящих от них. Например, можно сделать, чтобы продажи sold считались от совершенно других объектов, при этом остальная логика не изменится. Да, в РСУБД это можно реализовать при помощи CREATE VIEW. Но если всю логику писать таким образом, то она будет выглядеть не очень читабельной.
- Отсутствие семантического разрыва. Такая база данных оперирует функциями и классами (вместо таблиц и полей). Точно также, как и в классическом программировании (если считать, что метод — это функция с первым параметром в виде класса, к которому он относится). Соответственно, «подружить» с универсальными языками программирования должно быть значительно проще. Кроме того, эта концепция позволяет реализовывать гораздо более сложные функции. Например, можно встраивать в базу данных операторы вида:
CONSTRAINT sold(Employee e, 1, 2019) > 100 IF name(e) = 'Петя' MESSAGE 'Что-то Петя продает слишком много одного товара в 2019 году';
- Наследование и полиморфизм. В функциональной базе данных можно ввести множественное наследование через конструкции CLASS ClassP: Class1, Class2 и реализовать множественный полиморфизм. Как именно, возможно напишу в следующих статьях.
Despite the fact that this is just a concept, we already have some Java implementation that translates all functional logic into relational logic. Plus, the presentation logic and a lot of other things are beautifully bolted to it, which makes the whole platform . In fact, we use RDBMS (so far only PostgreSQL) as a “virtual machine”. Such translation sometimes causes problems, since the query optimizer of the RDBMS does not know certain statistics that the FSBD knows. In theory, it is possible to implement a database management system that will use some structure adapted specifically for functional logic as a storage.