MUMPS Globals: Extreme Database Programming. Part 1
Note by the translator.
There is an interesting technology in the world of databases - MUMPS . This programming language and data access has been known for several decades, debugged and is an adult proven technology.
Let me give an analogy: if SQL can be compared with BASIC, then MUMPS is more like C - it provides the highest performance, flexibility and versatility, allowing you to create the most complex data structures.
Here is the translation of the first part of the article "Extreme Database programming with MUMPS Globals". If it seems interesting to the community, then the translation of the second part will follow.
Chapter 1. Globals: an alternative to the relational approach
The true heart of MUMPS technology is the storage engine. It is based on what are called global variables, or (a more popular name) globals. Globals are an incredibly simple concept, however, incredibly powerful.
Many people do not accept globals. These are primitive structures. They do not have controls, safety nets or advanced functionality provided by "decent" databases. As a result, MUMPS are often debited as a useless, inadequate, or generally incorrect system. If you do this too, you ignore the simple, convenient and very flexible data storage mechanism. In skillful hands, the absence of excess baggage and undemanding resources give an incredible sense of freedom. In inept hands, all the same can lead to disaster. This is a bit like an extreme sport, such as free climbing. Few “real” athletes will recommend it - because it is the fastest, most effective and most fun way to climb that mountain if you can handle climbing without insurance.
So if you accept the challenge and are ready to admit that there are other ways to organize the data warehouse besides the snapped-up relational and SQL approach, let's dig deep into it.
This is the first chapter that reveals the basics of globalization. The second chapter concentrates on their application in terms that are understandable to the RDBMS / SQL programmer. If you want, you can go directly to the second chapter and return to this text later to familiarize yourself with the basics.
All MUMPS systems and their modern modifications, for example, Caché, use globals as the basis of the storage mechanism. Many modern MUMPS systems and / or extensions offer a more “familiar” look at key global constructs. For example, on globals, you can create views that will work as an object, relational, or XML database. Moreover, in theory, the same physical database based on globals can be logically represented and used in one of these forms (and even all at once). As a result, many developers appeared who are not aware that the tools are based on globals. They do not know what it is, how it works, and how it is used. This text will help you learn this secret and, we hope that you will agree, a very fascinating world.
So what are globals?
In short, a global is a constant, sparse, dynamic, multidimensional array containing text values. MUMPS permits the use of both persistent disk and multidimensional arrays contained only in RAM, known as “local arrays”.
Unlike many systems, MUMPS allows you to use both alphanumeric indices. So in MUMPS you can work with a similar array:
Employee(company,country,office,employeeNumber) = employeeDetails
and a separate entry:
Employee(“MGW”,”UK”,”London”,1) = “Rob Tweed`Director`020 8404 3054”
In this example, the data elements that make up employee information (name, job title, phone number) are attached to each other in combination with the backslash character as a separator. MUMPS does not control or control how you organize your structures: there is no schema or data dictionary describing your records. This provides incredible flexibility and speeds up development. You can arbitrarily assign one or more data separators that will break the text value of the array record into any number of "fields". The maximum total length of a single array record depends on the implementation of MUMPS, however in Caché it is up to 32k. The length of the string is variable, and, as you understand, based on the presence of the separator character, the length of the individual fields is also variable.
In the example above, the employee record is contained in what is called a "local array". If you exit the MUMPS session, the array would disappear, just like a PHP array after a page or session disappears.
Now let's start the fun! To store an employee record on disk permanently, that is, in the form of a global, simply add "^" before the array name:
^Employee(company,country,office,employeeNumber) = employeeDetails
^Employee(“MGW”,”UK”,”London”,1) = “Rob Tweed`Director`020 8404 3054” Вот и все!
To create such an element in the global, you need to use the MUMPS "set" command:
set ^Employee(“MGW”,”UK”,”London”,1) = “Rob Tweed`Director`020 8404 3054”
What confuses and scares people the most about MUMPS is that the bulk of the commands can be reduced to one letter (lowercase or uppercase), so often you can see the following instead of the command written above:
s ^Employee(“MGW”,”UK”,”London”,1) = “Rob Tweed`Director`020 8404 3054”
Now, when you exit MUMPS, the recording will not disappear, but will remain on the disk. Whenever you return, you can read a disc entry through a global with indexes:
To extract a value from the global and assign it to a variable, they usually use the “set” command, for example:
Set companyRecord= ^Employee(“MGW”,”UK”,”London”,1)
The companyRecord variable now contains a string value: “Rob Tweed`Director`020 8404 3054”
Globals can contain as many indexes as you want, and these indexes can be any combination of text and numbers (real or integer). String indices must be surrounded by double quotation marks, while numeric indices should not.
Most MUMPS implementations in fact have a limit on the total length of the index, so you won’t be able to completely go beyond the bounds. However, you will be sure that there is enough space for any of your requirements.
To remove an entry from the global, you can use the kill command:
Actually, this is all about globals. The real challenge is getting such a primitive data structure to work for you. We will describe this in the rest of the document. We will try to do this in such a way that the SQL programmer of relational databases can understand similar techniques and views that a MUMPS programmer would use.
Remember that nothing in MUMPS supports any database design methodology. You can add control and verification tools that will monitor the logic, uniformity and absence of errors in your database. This means that you will do the work that the usual DBMS would do for you. However, you will quickly realize that you can automate the most common tasks and easily manage your MUMPS database.
Creating a simple multi-level structure
You can use several dimensions in the same global at the same time, for example:
^Employee(“MGW”,”UK”,”London”) = 2 ^Employee(“MGW”,”UK”,”London”,1) = “Rob Tweed`Director`020 8404 3054” ^Employee(“MGW”,”UK”,”London”,2) = “Chris Munt`Director`01737 371457”
Using 3 indices, we indicate the number of employees in a particular office, and four indicate the record of a specific employee.
Links between different globals must be indicated by the programmer. MUMPS, out of the box, does not provide automatic indexing and does not cross-reference.
In order to make a quick selection by phone number, we will create the ^ EmployeeTelephone global, where the phone numbers are the indices and the ^ Employee global indices will indicate the employee record:
^EmployeeTelephone(“020 8404 3054) = “MGW`UK`London`1” ^EmployeeTelephone(“01737 371457”) = “MGW`UK`London`2”
In this example, the value stored in each telephone number also contains a linking index indicating the corresponding employee record. If you know the phone number, then all you need to do is to disconnect the data using the reverse apostrophe (separator) and extract the employee’s master record.
S telNo=”020 8404 3054” S indexData=^EmployeeTelephone(telNo) S company=$piece(indexData,”`”,1) S country=$piece(indexData,”`”,2) S office=$piece(indexData,”`”,3) S employeeNo=$piece(indexData,”`”,4) S record=^Employee(company,country,office,employeeNo) S employee=$piece(record,”`”,1)
Notice the use of the $ piece MUMPS function to split the value into fields.
One of the great features of MUMPS is that you do not need to declare anything in advance. In the SQL world, changing the data structure requires changing the table structure (ALTER TABLE, etc.). In MUMPS you decide when and how to create, modify and delete entries in the global - all this is automatically and dynamically controlled only by you. You can add additional indexes and “fields” (pieces) to your globals at any time without any announcements or anything like that. If you want to start using a different global, just start doing it and it will be created for you dynamically.
Creating, reading and deleting records in the global
In short, in MUMPS, records in the global are created and retrieved with the “Set” command, and deleted with the “Kill” command.
- Creating an entry in the global:
Set ^Employee(“MGW”,”UK”,”London”,1)=“Rob Tweed`Director`020 8404 3054”
This code creates an index in the global and saves the record to disk.
- Reading a record from the global
This code retrieves an element from the global and places the value in a local variable called “data”.
- Removing a record from the global:
This code removes the specified global entry from disk immediately and permanently. Be careful with the Kill team - it's incredibly easy and incredibly dangerous to use. If you specify fewer indices than there are in the global, you will delete all elements indexed by a large number of indices. If you do not specify an index at all, you will delete the entire global, example:
This line will delete all records of all offices in England
And this command will delete the entire ^ Employee global - immediately, forever and irrevocably (if you did not backup it).
One of the most commonly used functions is to bypass some or all of the globals. Suppose you need to take all the records of employees to display a list from which the user can select one record, or count them. To do this, you need to use the MUMPS $ order function. The $ order function is one of the pearls of MUMPS, which allows you to manipulate any data stored in globals with incredible simplicity.
This functionality will not be intuitive to the programmer of "traditional" databases, but it is worth understanding, since it is both very powerful and simple.
The $ order function works at the same index level inside the global and returns the value of the next index in the selection that exists at this global level. You can specify the initial value, and the $ order function will find the next value, which is in the selected sequence. To find the first index at a specified level, use an empty start value (“”). To find the first index at a specified level that starts at “C”, use the initial value, which comes in order immediately before “C”, for example, “B ~”.
So, to find the first employee in the global:
S company=”” S company=$order(^Employee(company))
The company variable now contains the value of the first company index in the first record of the ^ Employee global.
When the last value is found, the next time the $ order function is run, it returns an empty value. So, if there was only one company in our global and we repeated $ order:
Then the company variable would contain an empty value (“”)
To get and process all the companies in the Employee global, you need to create a loop:
S company=”” For s company=$order(^Employee(company)) quit:company=”” do . ; сделать что-нибудь с company
This code demonstrates interesting features of concise coding MUMPS:
- The For command, followed by two spaces, sets up an infinite loop.
- Quit: company = ”” indicates the condition for exiting the loop and uses the construct known as the “post-condition”. This design says "if the company value is empty, exit the For loop." Quit should be followed by two spaces if any other command follows the command.
- “Do” at the end of a line means executing code that goes to the next level of a “dot”. “Do” will be executed at each iteration of the For loop until the company value is empty
- The code to be executed in each loop is written after a single point. Basically, any line starting with a period forms a subroutine that is executed by the “do” command at the end of the second line in the example.
So, we set $ order to an empty value to ensure that the command starts by looking for the first index stored in the global. We go through each value until the stored data ends and get the empty value returned by $ order, after which we end the loop.
You will find that this type of loop is one of the most common things in the MUMPS program.
We can expand this request and process the entire global at once. To do this, we start a cycle at each index level, starting from the first and moving to each subsequent:
s company=”” for s company=$order(^Employee(company)) quit:company=”” do . s country=”” . for s country=$order(^Employee(company,country)) quit:country=”” do . . s office=”” . . for s office=$order(^Employee(company,country,office)) quit:office=”” do . . . s employeeNo=”” . . . for s employeeNo=$order(^Employee(company,country,office,employeeNo)) quit:employeeNo=”” do . . . . s record=^Employee(company,country,office,employeeNo) . . . . ; сделать что-нибудь с записью
Pay attention to how we designated the levels of “points” to create a hierarchy of nested routines. Also, pay attention to how $ order is used to bypass the values of all 4 global indices.
Note Translator : In Caché, you can use curly braces for nested structures.
If we needed to find and process only records of employees of companies starting with “C”, then we use the following popular construction:
s company=”B~” f s company=$o(^Employee(company)) quit:$e(company,1) ’= ”C” do . ; сделать что-нибудь с записью
Pay attention to the function $ extract (or $ e). It extracts the character from the specified position of the string value: in this case, the first letter of the company name. Also note the way you specify “not equal to C”: it uses the MUMPS NOT operator, which is written with a single quote (').
This cycle can be read like this:
- The initial value of $ order is the value before “C” in sequential sampling
- Run an endless loop to search for all company records in the sample
- If the first letter of the company name is not “C”, exit the loop
- Otherwise, continue to process company records.
This ability to start and stop traversing a global from any specific index and at any level of nesting of indices is a unique function of the MUMPS globals.
Checking for a record in the global
You often need to know if a particular record exists in the global. You can use the MUMPS $ data function for this:
if $data(^Employee(company)) do xxx
The line reads as "if ^ Employee (company) exists, then execute routine xxx." The $ data function can be shortened to $ d.
$ data will return a few different values.
- If data exists at the specified index level and there are no subnodes, the value 1 will be returned
- If data exists at the specified index level and there are subnodes, the value 11 will be returned
- If data does not exist at the specified index level, but there are subnodes, the value 10 will be returned
- If data does not exist at the specified index level, the value 0 will be returned.
In MUMPS, any non-zero value when using the logical if statement evaluates to true. Thus, the first three values returned by $ data (1, 10, and 11) will be considered true. The latter situation (no data and subnodes) is evaluated as false.
As an example, consider the following global:
^test=3 ^test(“a”)=1 ^test(“a”,”b”,”c”)=2 ^test(“a”,”b”,”d”)=3 $data(^test) = 11 $data(^test(“a”,”b”)=10 $data(^test(“a”,”b”,”c”)=1 $data(^test(“b”)=0
Preventing errors like “data not defined”
If you try to extract a nonexistent record from the global, MUMPS will generate a run-time error, for example, <UNDEF>. To avoid this, you can either use the $ data function to check for the existence of the node, or use the more convenient $ get function. This will return the value of the global node if it exists, or an empty value if not. The $ get function can be shortened to $ g.
So, based on the example we used in the previous section with $ data:
$get(^test) = 3 $get(^test(“a”,”b”)=”” ; поскольку на этом уровне индекса нет данных $get(^test(“a”,”b”,”c”)=1 $get(^test(“b”)=”” $get(^nonExistentGlobal)=””
Caché and all other MUMPS systems include ways to view globals. The easiest way is the ^% G command for the command line, which you will find in Caché and some other MUMPS implementations. Run the following in a terminal session:
USER> D ^%G Global ^ Employee
Entering the global name will give the command to display the contents of the entire global. However, you can display individual indices, for example:
List all the values of only the first index
List all employee records with the first MGW index value
Lists all the second employee record indexes with the first “MGW” index.
Caché provides a browser application with a graphical interface called Caché Explorer, which allows you to view and edit globals.
We examined the basics of globals, as well as their creation and editing. The next chapter will examine globals from the perspective of a person familiar with relational databases.
You probably already understood that the MUMPS globals have virtually no control and no limit on your actions. This is both wonderful (you can quickly, flexibly design, create and modify the structure of your databases) and dangerous (in inept hands the database turns into an uncontrolled mess).
MUMPS relies on you programmers. There is no insurance, but there are no limits to what can be achieved or how you will achieve it. You'll understand MUMPS - a great and exciting work environment, thanks to the efficiency of programming and command execution.
When you master the MUMPS globals, you will probably be surprised: “Why can't all databases work as well!” This is very simple, intuitive, flexible and superior to any relational database. In addition, MUMPS is available for almost any platform and scales to the size of huge systems - some of the largest interactive systems in the world are based on MUMPS, sometimes with tens of thousands of concurrent users.
However, if you think you need tools and insurance that the relational world believes are necessary, MUMPS is definitely not for you. And if you still intend to do free climbing, go on to the next chapter.
To be continued. Part 2.