Postgres in retrospect
  • Transfer
We bring to your attention a translation of the article “Looking Back at Postgres” by Joseph Hellerstein , published in accordance with the international Creative Commons Attribution License version 4.0 (CC-BY 4.0). The authors reserve the right to distribute this work on personal and corporate websites with proper reference to the source.

Translated by Elena Indrupskaya. From myself I would add that “a programmer who desperately wanted to build a system with multi-versioning” - apparently, Vadim Mikheyev, and “volunteers from Russia” who rewritten GiST, we all know well.


This is a reminder of the Postgres project run at UC Berkeley and led by Mike Stonebraker from the mid-1980s to the mid-1990s. As one of many personal and historical memories, this article was requested for the book [ Bro19], dedicated to awarding Stonebreaker with a Turing Award. Therefore, the focus of the article is Stonebreaker’s leadership role and his thoughts on design. But Stonebraker was never a programmer and did not interfere with his development team. The Postgres codebase was the work of a team of brilliant students and, occasionally, full-time university programmers who had a little more experience (and only a little more salary) than students. I was lucky to join this team as a student in the last years of the project. I received useful material for this article from some of the older students involved in the project, but any errors or omissions are mine. If you notice any of them, please contact me and I will try to fix them.

1. Introduction

Postgres was Michael Stounbraker’s most ambitious project - his serious attempt to create a universal database system. Going on for a decade, the project has spawned more articles, candidates of science, professors and companies, than any other Stonebreaker activity. The project also covered more technical areas than any other system he built. Despite the risks inherent in this scale, Postgres has also become the most successful software artifact that came out of Stonbraker’s research teams, and its main contribution to open source. This is an example of the “second system” [ Bro75], which was a success. At the time of this writing, i.e. more than thirty years since the project began, the open source PostgreSQL system is the world's most popular independent open source database system and the fourth most popular database system. In the meantime, Postgres companies produced a total of over $ 2.6 billion (in acquisition value). By any measure, Stonebreyker’s Postgres vision had a huge long-lasting resonance.

1.1. Prehistory

Stonebreaker was a huge success early in his career with the Ingres Berkeley research project [ SHWK76 ] and the subsequent startup that he founded along with Larry Rowe and Eugene Wong: Relational Technology, Inc. (RTI).

As RTI developed in the early 1980s, Stonebriker began working on supporting data types in the DBMS that went beyond the traditional rows and columns of the original Codd relational model (Edgar Frank Codd). The motivating example at that time was the need to support databases of computer-aided design tools (CAD) for the microelectronic industry. In a 1983 paper, Stonebreaker and students Brad Rubenstein (Brad Rubenstein) and Antonin Guttman explained how much this industry needs to support “new data types, such as polygons, rectangles, text strings, etc.”, “ effective spatial search "," complex integrity constraints ", as well as" design hierarchies and multiple representations "in the same physical structures [ SRG83]. Having such a motivation, the group began work on indexing (including using Guttman’s R-trees for spatial indexing [ Gut84 ]) and on adding abstract data types (ATDs) to the relational database system. At the time, ADTs were a popular new programming language design that was first introduced by Barbara Liskov, later a Turing Award winner, and researched in application programming of databases by a new employee of Stonebreaker - Larry Row. An article in a 1983 SIGMOD Record [ OFS83] Stonebreaker and students James Ong and Dennis Fogg describe the study of this concept in an Ingres extension called ADT-Ingres, which included many of the presentation concepts that were studied more deeply and with better system support in Postgres.

2. Postgres: General Information

As the name implies, Postgres is “Post-Ingres”: a system designed to take what Ingres could do and go beyond. A distinctive feature of Postgres was the introduction of what he ultimately called the object-relational properties of a database: supporting the concept of object-oriented programming in the data model and the declarative query language of the database system. But Stonebreaker also decided to solve a number of other object-independent support for technological problems in Postgres, such as active database rules, versioned data, tertiary storage, and concurrency.

Two articles were written on the Postgres design: the description of the early design in SIGMOD 1986 [ SR86 ] and the intermediate description in CACM 1991 [ SK91]. The Postgres research project gradually came to naught in 1992 with the founding of Illustra, a start-up by Stonebreaker, which involved Stonebreyker, lead graduate student Wei Hong, and later Jeff Meredith, the chief programmer. In the list below, the possibilities referred to in Article 1986, are marked with an asterisk *, and the ability of the article in 1991, which was not in the article 1986 - Cross . Other tasks listed below were taken in the system and research literature, but they are not in any design specification. Many of these topics were considered in Postgres long before they were studied or reinvented by others. In many cases, Postgres was too ahead of its time, and interest in topics started later, from a modern point of view.

  1. ADT support in the database system
    • Complex objects (i.e. nested data or non-first normal form data (non-first-normal form - NF2)) *
    • Custom abstract data types and functions *
    • Extensible access methods for new data types *
    • Optimized request processing with expensive user-defined features
  2. Active databases and rule systems (triggers, warnings) *
    • Rules implemented as query rewrite
    • Rules implemented as record level triggers
  3. Log-based storage and recovery
    • Reduced complexity recovery code, viewing the log as data *, using non-volatile memory for a commit state
    • Storage without rewriting and temporal queries
  4. Support for new deep data storage technologies, especially optical discs *
  5. Support for multiprocessors and specialized processors *
  6. Support for various language models
    • Minimal changes to the relational model and support for declarative queries *
    • Access to the “fast path” from internal APIs bypassing the query language
    • Multilingual

We briefly discuss the contribution of Postgres for each of these positions in its relationship with subsequent work in the field of computing.

2.1. ADT support in the database system

Postgres' well-articulated goal was to support new object-relational properties: expanding database technology to provide the benefits of both relational query processing and object-oriented programming. Over time, the object-relational concept, first introduced in Postgres, became standard functionality in most modern database systems.

2.1.1. Complex objects

Quite often, data is presented as nested entities or “objects”. A classic example is a purchase order that has an embedded set of products, their quantities and prices. Religion of relational modeling dictated that such data should be restructured and stored in a format without nesting, using several flat tables of objects (orders, products) with flat tables of relations connecting them (product_ in order). A typical reason for this flattening is that it reduces duplication of data (because the product is redundantly described in many purchase orders), which, in turn, avoids complexity or errors when updating all redundant copies. But in some cases you want to save the nested view because it is natural for the application (for example, scheme layout mechanism in CAD), and updates are rare. This argument about data modeling is at least as old as the relational model.

Postgres' key approach was to “sit on two chairs” in terms of data modeling: Postgres retained the tables as the “outermost” data type, but allowed the columns to have “complex” types, including nested tuples or tables. One of its rare implementations, first explored in the ADT-Ingres prototype, was to allow table type columns to be declared declaratively as a query definition: “Quel as data type” [ SAHR84 ] (Quel - Ingres query language. - Note .) .

The “post-relational” topic of support for both declarative queries and nested data has reappeared over the years, often spawned by disputes about what is best. During the Postgres era in the 1980s and 1990s, some groups focused on object-oriented databases picked up this idea and developed it into the standard OQL language, which then ceased to be used.

At the turn of the millennium, declarative requests for nested objects became an obsessive research idea for the segment of the database development community in the form of XML databases. The resulting XQuery language (led by Don Chamberlin - person SQL) is obliged to support complex objects Postquel language Postgres. XQuery is widely used and used in industry, but has never been popular with users. Today, these concepts are re-examined in the query language projects for the JSON data model, popular in browser applications. Like OQL, in groups that initially rejected declarative requests in favor of developer-oriented programming (the “NoSQL” movement), these languages ​​often arise as a late addition solely from the desire to add queries back to the systems. At the same time, as Postgres has grown over the years (and the transition from Postquel query language to SQL versions that meet many of the goals under consideration), it has included support for embedded data, such as XML and JSON, in general-purpose DBMS, without requiring any or significant redesign. Disputes proceed with varying success, and Postgres’s approach to expanding the relational structure with extensions for nested data has repeatedly shown itself to be a natural end state for all parties after the arguments have subsided.

2.1.2. Custom abstract data types and functions

In addition to proposing nested types, Postgres put forward the idea of ​​introducing opaque, extensible ADTs that are stored in the database, but not interpreted by the kernel. In principle, this has always been part of Codd's relational model: integers and strings were traditional, but in fact the relational model covers any atomic data types with predicates. The task was to provide such mathematical flexibility in software. In order to use queries that interpret and manipulate these objects, an application programmer must be able to register user-defined functions (UDFs) for these types in the system and call these functions in queries. Also desirable so that user-defined aggregate (user-defined aggregate - UDA) functions summarize collections of these objects in queries. The Postgres database system was innovative, fully supporting these features.

Why put such functionality in a DBMS, and not in high-level applications? A typical answer to this question was a significant performance advantage of the code placed on the data, before pulling the data to the code. Postgres showed that this was quite natural within the framework of the relational environment: only minor changes were required in the relational metadata catalog and mechanisms for calling third-party code were created, but the query syntax, semantics and system architecture worked simply and elegantly.

Postgres was a bit ahead of its time in exploring this functionality. In particular, at that time the community of database researchers was not particularly worried about the security implications of downloading unsafe code to the server. This became perceived as a problem when technology was noticed in the industry. Stonebreaker brought Postgres to the market in its Illustra startup, which was acquired by Informix largely for its ability to support DataBlade expansion packs, including UDF. Informix with Postgres-based technology and a strong offering of parallel databases has become a significant threat to Oracle. Oracle has invested heavily in negative marketing risks related to Informix’s ability to run unprotected custom C code. Some attribute the death of Informix to this campaign, although Informix’s financial fraud (and the subsequent federal accusation of its then-CEO) certainly represented more serious problems. Now, decades later, all major database providers support the execution of user-defined functions in one or more languages, using new technologies to protect against server failures or data corruption.

Meanwhile, the technological stacks of big data of the 2000s, including the MapReduce phenomenon, which “spoiled a lot of blood” for Stonebreyker and David DeWitt [ DS08], are a re-implementation of the Postgres idea, a custom code hosted as part of the request. It appears that MapReduce largely combines Postgres software development ideas with concurrency ideas from systems such as Gamma and Teradata, with some minor innovations around restarting in the process of executing a query for workloads with extreme scalability. Postgres, Greenplum and Aster startups around 2007 showed that Postgres paralleling could lead to something much more functional and practical than MapReduce for most customers, but in 2008 the market was still not ready for this technology. . So far, in 2018, almost every big data stack basically serves the parallel SQL workload with UDF, which is very similar to the design,

2.1.3. Extensible access methods for new data types

Relational databases evolved around the same time as B-trees in the early 1970s, and B-trees helped give impetus to Codd’s “independence from physical data storage”: indexing B-trees provides a level of indirection that adaptively reorganizes physical storage without requiring changes to applications. The main limitation of B-trees and related structures was that they only support equality search and one-dimensional range queries. And what if you have 2-dimensional range queries that are typical for mapping and CAD applications? This problem was known during the Postgres, and R-tree [ Gut84], developed by Antonin Guttman in the Stonebreaker group, was one of the most successful new indices designed to solve this problem in practice. Nevertheless, the invention of the index structure does not solve for complex systems the task of supporting multidimensional ranges in a DBMS. There are many questions. Can you easily add an access method, such as R-trees, to your DBMS? Can you teach the optimizer to understand that the specified access method will be useful for certain queries? Can you ensure correct recovery and simultaneous access? It was a very bold postgres action point: a software architecture problem affecting most of the database engine, from the optimizer to the storage layer, as well as the logging and recovery system. Postgres R-trees have become a powerful driving force and a prime example of the elegant extensibility of the access method level and its integration into the query optimizer. Postgres showed how to register an abstractly described access method (in this case, an R-tree), and how a query optimizer can recognize an abstract predicate of a sample (in this case, a range selection) and compare it with this abstractly described access method. The issue of simultaneous access control was given less attention in the original work: the absence of one-dimensional ordering of keys made in this case the lock used in B-trees inapplicable. how to register an abstractly described access method (in this case, an R-tree), and how a query optimizer can recognize an abstract predicate of a sample (in this case, a range selection) and compare it with this abstractly described access method. The issue of simultaneous access control was given less attention in the original work: the absence of one-dimensional ordering of keys made in this case the lock used in B-trees inapplicable. how to register an abstractly described access method (in this case, an R-tree), and how a query optimizer can recognize an abstract predicate of a sample (in this case, a range selection) and compare it with this abstractly described access method. The issue of simultaneous access control was given less attention in the original work: the absence of one-dimensional ordering of keys made in this case the lock used in B-trees inapplicable.

The promising possibilities of Postgres extensible access methods inspired one of my first research projects at the end of graduate school: Generalized Search Trees — GiST [ HNP95 ] and the subsequent concept of indexability theory [ HKM + 02 ]. I implemented GiST in Postgres during the semester after receiving my doctoral degree, which made adding new indexing logic to Postgres even easier. In his thesis, Marcel Kornacker from Berkeley (Marcel Kornacker) solved the complex problems of recovery and simultaneous access posed by the extensible “template” type of the GiST index [ KMH97 ].

Today, PostgreSQL advantageously combines the original software architecture of extensible access methods (it has B-tree, GiST, SP-GiST and Gin indices) with extensibility and intensive concurrent access of the Generalized Search Tree Interface (GiST). GiST indexes support PostgIS's popular PostGreSQL geo-information system. Gin indices provide internal support for text indexing in PostgreSQL.

2.1.4. Query Optimizer with Costly UDFs

In the traditional query optimization, the task was to minimize the stream volume of the tuples (and, therefore, I / O operations) generated during the processing of the request. This meant that operators that filter tuples (sampling) are good at the beginning of the query plan, while operators that can generate new tuples (connection) must be executed later. As a result, query optimizers will “push” the sample operators below the connections and arrange them arbitrarily, focusing instead on smart optimization of connections and disk accesses. UDFs have changed the approach: if you have expensive UDFs in your sample operators, the order in which UDFs are executed can be crucial for optimizing performance. Moreover, if the UDF in the sample operator is indeed time consuming, it is possible Sampling should be done after connections (i.e., “pullup” the selection up - selection "pullup"). Accounting for these factors has complicated the search space for the optimizer. I took this problem as the first difficult task in graduate school, and ended up being the subject of my master's work with Stoneblock at Berkeley and my Ph.D. in Wisconsin under the guidance of Jeff Naughton, but with constant help from Stonebreyker. DBMS Postgres was the first to save the cost and selectivity of user-defined functions in a database directory. We approached the optimization problem by inventing the optimal order of the sampling operations, and then the optimal alternation of sampling operations along the branches of each connection tree considered when searching for a plan.

When I entered graduate school, it was one of the three topics that Stonebraker wrote on the blackboard in his office as options for choosing a topic for my dissertation. It seems that the second topic was the indexing of functions, and I do not remember the third one.

Cost-effective optimization was disabled in PostgreSQL's source tree trees at an early stage, largely because there were no convincing options for using expensive user-defined functions at that time. The examples we used revolved around image processing, and finally, in 2018, they became popular data processing tasks. Of course, today, in the era of big data and machine learning workloads, costly functions have become quite common, and I expect this problem to return to the fore. Once again, Postgres is well ahead of its time.

Ironically, the code I wrote in graduate school was completely removed from the PostgreSQL source tree by a young programmer named Neil Conway, who several years later began doing his PhD thesis under my supervision at UC Berkeley and is now one from Stonebreyker's “Candidate Grandchildren”.

2.2. Active databases and rule systems

The Postgres project began at the end of the artificial intelligence community’s interest in rule-based programming as a way to represent knowledge in expert systems. Such a course of thought did not lead to success: many believe that this caused the widely discussed "winter of artificial intelligence," which continued throughout the 1990s.

However, rule-based programming was maintained in the database developer community in two forms. The first is theoretical work around declarative logic programming using Datalog. She was a “bone-in-the-throat” for Stonebreaker: he seemed to really hated the subject and criticized it bitterly in several community reports over the years.

Datalog has survived as a mathematical basis for declarative languages ​​and, over time, has found application in various areas of computer science, including software-configurable networks and compilers. Datalog is a declarative construction of requests "on steroids" as a completely expressive programming model. In the end, I was involved in this, as in the natural design option, and followed this in various application settings outside of traditional database systems.

The second set of questions related to database rules was a practical work on what was ultimately called active databases and database triggers, which evolved into the standard functionality of relational DBMSs. Stonebreaker, in his peculiar manner, kicked in to work on a more practical option.

Stonebreaker’s work on database rules began with Eric Hanson’s Ph.D. work, which was originally done for Ingres, but quickly moved into a new Postgres project. It was expanded in the Ph.D. work of Spyros Potamianos on PRS2: Postgres Rules System 2. The theme in both implementations was the ability to implement the rules in two different ways. One of them is to interpret the rules as rewriting requests. This is reminiscent of the work on re-recording the submissions that Stonebreaker did at Ingres for the first time. In this scenario, the logic of the rule “subject to perform an action” is converted to “when prompted to overwrite it and execute instead of the original one”. For example, a request like "add a new line to the list of Mike awards" can be rewritten as "raise Mike's salary by 10%." The other way was to implement a more natural “conditional to perform an action” by checking the conditions at the row level using locks inside the database. When such locks were detected, the result was not a wait (as in the traditional simultaneous access control), but the execution of the corresponding action.

The code for the row level rules in PRS2 was, sadly, difficult. A small search in the Postgres archives in Berkeley found the following comment (probably Spiros Potamianos) to the Postgres source code version 3.1 from about 1991 (given in translation):

* Take a deep breath and read. If you can not get into the following
* code (i.e. if the boss did not force you to voluntarily do it
* dirty business) avoid it at all costs. Try to do something less dangerous.
* for your (mental) health. Go home and watch horror movies on TV.
* Read a little Lovecraft. Go to serve in the army. Go and spend some nights
* in the national park. Commit suicide ...
* What, you continue to read, really? Well, then you deserve what you got.
* Welcome to the dark labyrinth of tuple-level rules system, my
* poor colleague ...

As a result, to implement the rules in Postgres, neither the query rewrite method nor the row-level blocking method were declared “winners” — both were stored in the released system. In the end, the code for all the rules was rejected and rewritten in PostgreSQL, but the current source code still retains both the notion of triggers at the operator level and at the row level.

Postgres rules systems at one time had a very big impact and went "nostrils to nostrils" with research on the IBM Starburst and MCC HiPAC projects. Today, triggers are part of the SQL standard and are implemented in the engines of many major databases. However, they are used with some caution. One of the problems is that the developments mentioned above have not overcome the negative aspects that led to the “winter of artificial intelligence”: interactions in the pile of rules can become unacceptably confusing, even when the set of rules grows only slightly. In addition, in practice, the execution of triggers usually takes a relatively long time, so the databases being implemented that should work quickly tend to avoid the use of triggers. However, there was some artisanal in related areas

2.3. Log-based storage and recovery

Stonebraker described his Postgres storage design in this way:
When considering Postgres storage, we were guided by missionary zeal to do something unusual. All modern commercial systems use a write-ahead log (WAL) storage manager, and we felt that this technology is well understood. Moreover, the original Ingres prototype of the 1970s was used by a similar storage manager, and we had no desire to make another implementation. [ SK91 ]
Although it looks like pure intellectual restlessness, there were technological grounds for this work. Over the years, Stonebriker has repeatedly expressed dislike for complex proactive logging schemes, first developed by IBM and Tandem to restore databases. One of his main objections is based on the intuition of a software developer: no one should rely on something so complex, especially for functionality that will be used only in rare, critical scenarios after a failure.

Postgres repository combines the concepts of core storage and journaling historical information into a single, simple disk view. Basically, the idea was to store each record in the database in a linked list of versions marked with transaction identifiers — in a sense, this is “log as data” or “data as log” depending on your point of view. The only additional metadata that is needed is a list of identifiers of completed transactions and the time they were committed. This approach greatly simplifies recovery, since there is no “translation” from the journal view back to the main view. It also makes temporal queries possible: you can execute queries as of a certain moment in time and get access to the data versions that were recorded at that time.Sto87 ]. The final implementation in Postgres was somewhat simpler.

Stonebreyker’s view of the “radical simplicity” of the transactional repository was deeply opposed to the community at the time when database vendors allocated themselves by investing heavily in high-performance transaction processing algorithms. At that time, the winners of the benchmarking tests achieved high performance and recoverability thanks to well-optimized proactive logging systems. After they received well-performing proactive write logs, vendors also began innovations in the next phase, such as transactional replication based on log shipping, which was difficult to accomplish in Postgres. As a result, the Postgres storage system did not differ in performance.

Unfortunately, PostgreSQL is still not particularly fast processing transactions: the use of the proactive write log in it is somewhat halfway. Oddly enough, the PostgreSQL team saved a lot of service information stored with Postgres tuples to ensure multi-versioning, which was never the goal of the Postgres Berkeley project. The result is a storage system that can emulate the isolation of Oracle snapshots (snapshot isolation) with a fair amount of additional I / O overhead, but which does not follow Stonebrecker’s original idea of ​​temporal queries or simple recovery.

Mike Olson notes that his original intention was to replace the Postgres B-tree implementation with his own B-tree implementation from the Berkeley DB project, which was developed in Berkeley in the Postgres era. But Olson did not find the time for it. When years later, Berkeley DB received transaction support at Sleepycat Corp., Olson tried to convince the (then) PostgreSQL community to use it for recovery instead of “without rewriting” storage. They refused: there was a programmer in the project who desperately wanted to build a multi-version system (MVCC), and since he was ready to do the work, he won the argument.

Slow-running PostgreSQL storage is not intrinsic to the system. In Greenplum, the PostgreSQL branch as an interesting alternative integrated high-performance compressed storage. It was designed by Matt McCline — by Jim Gray’s Jim Gray team at Tandem. It also did not support temporal queries.

But the possibility of temporal queries was interesting and remained unique. Moreover, Stonebreaker's credo regarding the development of simple data recovery software today has echoes both in the context of NoSQL systems (which choose replication rather than WAL) and in main memory databases (MMDB - main memory databases, which often use multi-version). and compressed log logs). The idea of ​​versioned relational databases and temporal queries today is still attributed to the esoteric, appearing in random research prototypes and small open source projects. This is an idea that has matured to return to our era of cheap storage and continuous data streams.

2.4. Requests for data on media with new deep storage technology

In the middle of the Postgres project, Stonebraker signed up as one of the leaders for a large grant in the digital land research area called Project Sequoia. Part of the grant proposal was to process unprecedented volumes of digital satellite imagery, requiring up to 100 terabytes of memory, i.e., much more data than it would have been reasonable to store on magnetic disks at that time. The basis of the proposed solution was to explore the idea of ​​creating a DBMS (namely, Postgres), which facilitates access to a semi-autonomous “tertiary” storage provided by robotic drives with automatic change of disks for managing libraries of optical disks or tapes.

This resulted in several different studies. One of these was the Inversion file system — an attempt to provide an abstraction of the UNIX file system over a relational DBMS. In a review article for Sequoia, Stonebroker described it in his usual style as “a simple exercise” [ Sto95 ]. In fact, Mike Olson, a student of Stonebreaker (and the subsequent founder of Cloudera), was engaged in this for several years, and the final result was not completely unambiguous [ Ols93 ] and did not survive in practice.

A few years later, Inversion Bill Gates “fought with the same windmills” in WinFS — an attempt to recreate the world's most widely used file system over the server part of a relational database. WinFS was supplied in development versions of Windows, but never entered the market. Gates later called this his biggest disappointment at Microsoft.

Another main focus of research on this front was the inclusion of a tertiary repository on a stack of more typical relational databases, which was the subject of Sunita Sarawagi’s PhD thesis. The main topic was scaling, in which you think of space management (that is, data in the storage and memory hierarchies) and time (coordinating query and cache scheduling to minimize unwanted I / O). One of the key problems in this work was storing large multidimensional arrays in a tertiary repository and retrieving them, which overlaps with work in the field of multidimensional indexing. The main ideas included partitioning an array into chunks and storing chunks together, which are selected together, and replicating chunks so that a given chunk of data could have several physical “neighbors”. The second problem is to think about how the disk becomes a cache for tertiary storage. Finally, query optimization and scheduling had to take into account the long data load times from the tertiary storage and the importance of disk cache hits. This affects both the plan chosen by the query optimizer and the time for which this plan is scheduled for execution.

Robots on tapes and optical disks are currently not widely used. But tertiary storage problems are very common in the cloud, which in 2018 has a deep storage hierarchy: from attached solid-state drives to reliable disk-like storage services (for example, AWS EBS), to archive storage (for example, in AWS S3), to deep storage (for example, , AWS Glacier). Today, these storage levels are still relatively isolated, and the reasoning about end-to-end storage covering these levels is hardly supported by the database. I would not be surprised if the issues investigated on this front in Postgres will be revised soon.

2.5. Multiprocessor support: XPRS

Stonebraker never created a large parallel database system, but he led many stimulating discussions in this area. His article “Case for Shared Nothing” [Case for systems without shared resources] [ Sto86 ] documented coarse-grained architectural solutions in this area. He popularized the terminology used in the industry, and puzzled support for architectures without shared resources, such as Gamma and Teradata, which were rediscovered in the 2000s by the big data community.

Ironically, the most significant contribution of Stonebreaker to the parallel database area was the “shared memory” architecture called XPRS, which meant “eXtended Postgres on RAID and Sprite”. In the early 1990s, XPRS was Berkeley's “fairness league”: it combines Stonegreyker's Postgres system, distributed by John Ousterhout's Sprite OS, and RAID storage architects Dave Patterson and Randy Katz ). As with many inter-faculty work, the implementation of the XPRS project was actually determined by the graduate students who worked on it. It turned out that the main contribution was made by Wei Hong, who wrote his dissertation on the optimization of parallel requests in XPRS. In this way,

Of these three projects, Postgres and RAID have had a tremendous impact. Sprite is best remembered by Mendel Rosenblum’s PhD thesis (Mendel Rosenblum) about log structured file systems (LFS), which had nothing remarkable in common with distributed operating systems. All three projects contained new ideas for disk storage, in addition to modifying individual copies in place. LFS and Postgres repository manager are quite similar to the new attitude to the log as the main repository and the need for costly background reorganization. One day, I cautiously tapped Stonebreaker on the subject of LFS and Postgres rivalry or academic “fried facts” about their relationship, but I never learned anything interesting from him. Perhaps at the time in Berkeley, someone "muddied the water."

In principle, parallelism “explodes” the space of query optimizer plans, multiplying traditional choices made during query optimization (data access, connection algorithms, connection order) by all possible ways of parallelizing each choice. The main idea of ​​“Wei Hong’s optimizer” called Stonebreaker was to break the problem into two: run the traditional query optimizer in the spirit of System R for one node, and then “parallelize” the resulting plan, planning the degree of parallelism and placement of each operator, based on the data and system configuration. This approach is heuristic, but in it parallelism increases the cost of traditional query optimization additively, not multiplicatively.

Although the Wei Hong optimizer was developed in the context of Postgres, it has become the standard approach for many parallel query optimizers in the industry.

2.6. Support for various language models

Among the interests of Stonebreaker, repeatedly renewed since the times of Ingres, was the application programming interface (API) of the database system. In his lectures in the Database Systems series, he often included the GEM language of Carlo Zaniolo as a topic that is important for the backers of database systems to understand. This interest in language undoubtedly led him to partner with Larry Rowe in Postgres, which in turn deeply influenced the design of the Postgres data model and its object-relational approach. Their work focused mainly on applications for working with a large amount of data from the commercial sphere, including both business processing and new applications, such as CAD / ESC and GIS.

One of the problems that was imposed on Stonebriker at the time was the idea of ​​“hiding” the boundaries between programming language constructs and the database repository. Various competing research projects and companies exploring object-oriented databases (Object-Oriented Databases - OODB) aimed at the so-called “loss of correspondence” between imperative object-oriented programming languages ​​such as Smalltalk, C ++ and Java, and declarative relational a model. The idea of ​​OODB was to make the objects of the programming language, if desired, be marked as "permanent" and automatically processed by the built-in DBMS. Postgres supported storing nested objects and abstract data types, but its interface, based on declarative queries in relational style, assumed unnatural for the programmer to access the database (required him to use declarative queries), which were also expensive (required parsing and optimization). To compete with OODB vendors, Postgres has provided the so-called “fast path” interface (Fast Path): essentially the C / C ++ API to the internal database storage device. This allowed Postgres to have average performance on OODB academic tests, but it never solved the problem to allow programmers in different languages ​​to avoid the problem of loss of conformity. Instead, Stonebreyker placed a “object-relational” label on the Postgres model and simply bypassed the use of object-oriented databases as a zero-billion dollar market.

This turned out to be a sensible decision. Today, none of the OODB products exist in their intended form, and the idea of ​​“permanent objects” in programming languages ​​has been largely rejected. In contrast, the use of object-relational mapping layers (object-relational mapping (ORM), which is fueled by early work such as Java Hibernate and Ruby on Rails), is widespread, which makes it possible to relatively smoothly “fit” declarative databases for almost any imperative object. -oriented programming language as a library. This approach at the application level differs from both OODB and Stonebrecker object-relational databases. In addition, light-weight key-value storages are also successfully used in both non-transactional and transactional forms.

3. Impact on software

3.1. Open source

Postgres has always been an open source project with even releases, but for a long time at first it was intended to be used in research, not in production.

As the Postgres research project collapsed, two students of Stonebreaker, Andrew Yu and Jolly Chen, modified the system's parser to replace the original Postquel language with an extensible SQL. The first release of Postgres that supports SQL was Postgres95, and the next one was named PostgreSQL.

The open source development team became interested in PostgreSQL and “adopted” it even when the interests of the rest of the Berkeley team changed. The core PostgreSQL developers group has remained relatively stable over time, and the open source project has become highly developed. Initially, efforts were focused on the stability of the code and the functionality visible to the user, but over time, the open source community has significantly changed and improved the core of the system, from the optimizer to access methods and the underlying transaction and storage system. Since the mid-1990s, a very small part of the internal components of PostgreSQL came from the academic group at Berkeley. Her last contribution was probably my GiST implementation in the second half of the 1990s, but even it was substantially rewritten and cleaned up by volunteers from the open source community (in this case, from Russia). The part of the open source community that works on PostgreSQL deserves the greatest praise for implementing an orderly process that has served for decades to create a highly efficient and long-term project.

Although things have changed in 25 years, the basic architecture of PostgreSQL remains very similar to the Postgres university releases of the early 1990s, and developers familiar with the current PostgreSQL source code will easily read the Postgres 3.1 source code (1991). Everything, from the source code directory structure to the process structure and data structure, remains remarkably similar. The code from the Postgres team at Berkeley had excellent backbone.

Today, PostgreSQL is, without a doubt, the most high-performance open source database, and it supports the functionality that is often missing in commercial products. It is also (according to one influential rating site) the most widely used independent open source database in the world, and its influence continues to grow: in 2017 and 2018, it was the database with the fastest growing popularity in the world [ DE19c ]. PostgreSQL is used in a wide variety of industries and applications, which is not surprising, given its focus on wide opportunities.

According to DB-Engines, PostgreSQL is today the fourth most popular DBMS in the world, after Oracle, MySQL and MS SQL Server, all three being offered by specific companies (MySQL was acquired by Oracle many years ago) [ DE19a ]. Ranking rules are discussed in the description of the DB-Engines [ DE19b ] ranking methodology .

Heroku is a cloud-based SaaS provider that is now part of Salesforce. Postgres was introduced in Heroku in 2010 as the default database for its platform. Heroku chose Postgres for reliability. With support from Heroku, larger application development platforms, such as Ruby on Rails and Python for Django, have begun to recommend Postgres as the default database.

Today, PostgreSQL supports an extension infrastructure that makes it easy to add additional features to the system through user-defined functions and associated modifications. Now there is an ecosystem of PostgreSQL extensions, akin to the concept of llustra DataBlade extension packs, but with open source. The most interesting extensions include, for example, the Apache MADlib library for machine learning in the SQL interface and the Citus library for parallel query execution.

One of the most interesting open source applications built on Postgres is the PostGIS geographic information system, which uses many of the features of Postgres that initially inspired Stonebraker to launch the project.

3.2. Commercial introduction

PostgreSQL has long been an attractive starting point for creating commercial database systems, given its use under an “all-pervading” open source software license, reliable code, flexibility, and extensive functionality. Summarizing the acquisition costs listed below, we see that Postgres has generated over $ 2.6 billion in acquisition costs.

Please note that this is a measure in dollars of real financial transactions and is much more significant than the values ​​that are often used in high technology. Figures in the billions are often used to describe the assessed value of shareholdings, but often overestimated by a factor of 10 or more compared to the present value in the hope of its future value. The purchase transaction dollars of a company measure its actual market value at the time of the acquisition. It is fair to say that Postgres has created over $ 2.6 billion in real commercial value.

Many commercial efforts related to PostgreSQL have focused on what is probably its main limitation: the ability to scale to a parallel architecture without sharing resources.

PostgreSQL parallelization requires a fair amount of work, but a highly doable little experienced team. Today, PostgreSQL open source industry branches, such as Greenplum and CitusDB, provide this capability. It is a pity that PostgreSQL was not properly parallelized in open source much earlier. If in the early 2000s, PostgreSQL had been expanded in open source to support an architecture without resource sharing, it is possible that the direction of big data with open source would have evolved quite differently and more efficiently.

  1. Illustra was Stonebreyker’s second major startup, founded in 1992 to commercialize Postgres, since RTI brought Ingres to the market.

    Illustra was actually the third name suggested for the company. Continuing the theme of painting, given the name Ingres, Illustra was originally called Miro. Due to trademark issues, the name was changed to Montage, but it also ran into problems with trademarks.

    The founding team included some of the core of the Postgres team, including recent graduate student Wei Hong and then chief programmer Jeff Meredith, as well as Ingres graduates Paula Hawthorn and Michael Ubell. Postgres undergraduate Mike Olson joined shortly after founding, and I worked at Illustra in optimizing costly functions as part of my PhD work. There were three major works in Illustra: expanding SQL92 to support custom types and functions, such as Postquel, make the Postgres code base robust enough for commercial use, and stimulate the market for expandable database servers with examples of DataBlade extensions — specialized plug-in components of data types and functions.Mon96 ], and its DataBlade architecture was integrated into the more well-established Informix request processing code as an Informix Universal Server.
  2. Netezza was a startup founded in 1999 that branched PostgreSQL code to create a high-performance parallel processing mechanism for requests for custom-made equipment based on FPGA. Netezza was a fairly successful independent company that conducted the first public sale of shares in 2007. It was eventually acquired by IBM for $ 1.7 billion [ IBM10 ].
  3. Greenplum made the first attempt to offer a parallel, horizontally scalable version of PostgreSQL without resource sharing. Founded in 2003, Greenplum was branched from the PostgreSQL public distribution, but largely preserved the PostgreSQL API, including the API for user-defined functions. In addition to parallelization, Greenplum has expanded PostgreSQL with an alternative, high-performance, compressed column repository and parallel query-based optimizer of rules called Orca. Greenplum was acquired by EMC in 2010 for $ 300 million. [ Mal10], and in 2012, EMC incorporated Greenplum into its subsidiary, Pivotal. In 2015, Pivotal decided to release open source Greenplum and Orca again. One of Greenplum's Postgres API optimization achievements was the MADlib library for machine learning in SQL [ HRS + 12 ]. MADlib lives today as an Apache project. Another interesting open source project based on Greenplum is Apache HAWQ, developed by Pivotal, which runs the “upper half” of Greenplum (i.e., the parallel query handler and PostgreSQL extensibility application programming interfaces) over big data storages such as Hadoop file system.
  4. EnterpriseDB was created in 2004 as an open source software business that sells PostgreSQL in both basic and advanced versions and provides related services to corporate clients. A key feature of the improved EnterpriseDB Advanced Server is database compatibility with Oracle, which ensures the migration of applications from Oracle.
  5. Aster Data was founded in 2005 by two Stanford students to create a parallel analytics engine. Its main single-node engine was based on PostgreSQL. Aster focused on graph queries and analytics packages based on user-defined functions that could be programmed using the SQL or MapReduce interface. Aster Data was acquired by Teradata in 2011 for $ 263 million [ Sho11 ]. Although Teradata has never integrated Aster into its core parallel database engine, it still supports Aster as a standalone product for use cases outside the main Teradata data warehouse market.
  6. ParAccel was founded in 2006, selling a parallel version of PostgreSQL with column storage without resource sharing. ParAccel has extended the Postgres optimizer with new heuristics for queries with many connections. In 2011, Amazon invested in ParAccel, and in 2012 announced AWS Redshift, a data warehouse as a service with deployment in a public cloud based on ParAccel technology. In 2013, ParAccel was acquired by Actian (which also acquired Ingres) for an undisclosed amount of the transaction, which means that this was not a material expense for Actian. Meanwhile, AWS Redshift's offer has been a huge success for Amazon — for many years it has been Amazon’s fastest growing data warehouse service, and many believe that it’s ready to withdraw long-existing data warehouse products from business. such as Teradata and Oracle Exadata. In this sense, Postgres can achieve its ultimate dominance in the cloud.
  7. CitusDB (CitusDB is the name of the DBMS; the company is called Citus Data. - Note.) Was founded in 2010 to offer a parallel implementation of PostgreSQL without resource sharing. Although it started as a PostgreSQL branch, since 2016 CitusDB has been implemented through the PostgreSQL open extensions API and can be installed into the basic PostgreSQL installation. Since 2016, CitusDB extensions are available in open source.

4. Lessons

You can learn a lot from Postgres success, some of which are challenging conventional wisdom.

The lesson I’m learning of a higher order is that Postgres challenged the “Second System Effect” syndrome by Fred Brooks [ Bro75]. Brooks argued that after the successful first system, designers often create a second one that fails because of overloading with opportunities and ideas. Postgres was Stonebreyker’s second system, and it certainly was full of opportunities and ideas. The system also successfully prototyped many ideas, while delivering a software infrastructure that brought many ideas to a successful conclusion. This was not an accident - at its core, Postgres was designed with the possibility of extensibility, and this design was well thought out. With expandability as the core of the architecture, the opportunity to be creative and to worry less about frameworks appeared: you can try different extensions and let the strongest win. Done well, the “second system” is not doomed. She benefits from trust, favorite projects and aspirations, established during the use of the first system. This is an early architectural lesson from a more "server-oriented" school of database development that challenges the well-established view of the "component-oriented" school of operating system development.

Another lesson is that the emphasis on universality, “one size fits all”, can be a winning approach for both research and practice. However, the Stonebreaker since MIT (In 2001, Stonebreyker took the position of a computer science professor at the Massachusetts Institute of Technology (MIT). - Approx. Trans.) Made a stir in the world of databases in the early 2000s with the thesis “one size is not suitable for everyone”. Under this banner, he launched a flotilla of important projects and start-ups, but none of them could scale with Postgres. It seems that the Stonebreaker of the time of Berkeley is challenging the later experience of the Stonebreaker of the times of MIT, and I see no problems in that.

As Emerson (Ralph Waldo Emerson) said, "a stupid sequence is a scarecrow of small minds."

Of course, there is wisdom in the motto “one size doesn’t suit everybody” (you can always find modest markets for non-standard solutions), but the success of the system, which is Berkeley’s Stonebreaker card, far beyond its original goals, shows that the overwhelming majority of problems with databases data is completely solved using a good general purpose architecture. In addition, the design of this architecture in itself is a technological challenge and achievement. In the end, as in most scientific and technical debates, there is not only one good way to do something. Both Stonebreakers have something to teach us. But by nature I am still a fan of the wider program that the Stonebreaker of the time of Berkeley adopted.

The last lesson I learn from Postgres is the unpredictable potential that can be found in the open source you are exploring. In his Turing lecture, Stonebriker talks about the “intuitive insight” of the PostgreSQL system, which is successfully developed in open source code, mainly due to people not from Stonebriker’s environment. Here is a quote that sounds remarkably modest:
A team of volunteers who have picked up the project, none of which has anything to do with me or Berkeley, has been following this open source system since 1995. The Postgres system, which you get from the Internet, is the result of this command. This is open source at its best, and I just want to mention that I have nothing to do with this and with this group of people to whom we are all in a huge debt. [ Sto14 ]
I’m sure that all of us who wrote open source would like to see such "intuitive insight" come to us. But the point is not only in “intuitive foresight”. The source of luck is undoubtedly rooted in the aspirations, breadth and insight of Stonebreyker in the project and in the team, which he oversaw the creation of the prototype Postgres. If there is any lesson in it, it can be like this: “do something important and let it go”. It seems to me (learning from Stonebreaker) that you cannot miss a single part of this lesson.

5. Thanks

I’m grateful to my old Postgres buddies Wei Hong, Jeff Meredith and Mike Olson for their memories and information, as well as Craig Kerstiens for his contribution to modern PostgreSQL.


  • [Bro75] Frederick P Brooks. The mythical man-month, 1975.
  • [Bro19] Michael L. Brodie, editor. Making Databases Work. Morgan & Claypool, 2019.
  • [DE19a] DB-Engines. DB-Engines ranking, 2019. . (Last accessed January 4, 2019).
  • [DE19b] DB-Engines. Method of calculating the scores of the DB-Engines ranking, 2019. (Last accessed January 4, 2019).
  • [DE19c] DB-Engines. PostgreSQL is the DBMS of the year 2018, January 2019. (Last accessed January 4, 2019).
  • [DS08] David DeWitt and Michael Stonebraker. Mapreduce: A major step backwards. The Database Column, 1:23, 2008.
  • [Gut84] Antonin Guttman. R-trees: A dynamic index structure for spatial searching. In Proceedings of the 1984 ACM SIGMOD International Conference on Data Management, SIGMOD '84, pages 47–57, New York, NY, USA, 1984. ACM.
  • [HKM + 02] Joseph M. Hellerstein, Elias Koutsoupias, Daniel P. Miranker, Christos H. Papadimitriou, and Vasilis Samoladas. Bounds for range queries. J. ACM, 49 (1): 35–55, January 2002.
  • [HNP95] Joseph M. Hellerstein, Jeffrey F. Naughton, and Avi Pfeffer. Generalized search trees for database systems. International Conference on Very Large Data Bases, VLDB '95, pages 562–573, San Francisco, CA, USA, 1995. Morgan Kaufmann Publishers Inc.
  • [HRS + 12] Joseph M Hellerstein, Christoper Re, Florian Schoppmann, Daisy Zhe Wang, Eugene Fratkin, Aleksander Gorajek, Kee Siong Ng, Caleb Welton, Xixuan Feng, Kun Li, et al. The MADlib analytics library: or MAD skills, the SQL. Proceedings of the VLDB Endowment, 5 (12): 1700–1711, 2012.
  • [IBM10] IBM to acquire Netezza, September 2010. (Last accessed January 22, 2018).
  • [KMH97] Marcel Kornacker, C. Mohan, and Joseph M. Hellerstein. Concurrency and generalized search trees. ACM SIGMOD International Conference on Data Management, SIGMOD '97, pages 62–72, New York, NY, USA, 1997. ACM.
  • [Mal10] Om Malik. Big Data = Big Money: EMC Buys Greenplum. In GigaOm, July 2010. .
  • [Mon96] John Monroe. Informix acquires illustra for complex data management. In Federal Computer Week, January 1996.
  • [OFS83] James Ong, Dennis Fogg, and Michael Stonebraker. Implementation of data abstraction in the relational database system ingres. ACM Sigmod Record, 14 (1): 1–14, 1983.
  • [Ols93] Michael A. Olson. Inversion of file system. 1993.
  • [SAHR84] Michael Stonebraker, Erika Anderson, Eric Hanson, and Brad Rubenstein. Quel as a data type. In Proceedings of the 1984 ACM SIGMOD International Conference on Data Management, SIGMOD '84, pages 208–214, New York, NY, USA, 1984. ACM.
  • [Sho11] Erick Shonfeld. Big pay day for big data. teradata buys aster data for $ 263 million. In TechCrunch, May 2011. (Last accessed January 22, 2018).
  • [SHWK76] Michael Stonebraker, Gerald Held, Eugene Wong, and Peter Kreps. The design and implementation of ingres. ACM Transactions on Database Systems (TODS), 1 (3): 189–222, 1976.
  • [SK91] Michael Stonebraker and Greg Kemnitz. The postgres next generation database management system. Commun. ACM, 34 (10): 78–92, October 1991.
  • [SR86] Michael Stonebraker and Lawrence A. Rowe. The design of postgres. ACM SIGMOD International Conference on Data Management, SIGMOD '86, pages 340–355, New York, NY, USA, 1986. ACM.
  • [SRG83] M Stonebraker, B Rubenstein, and A Guttman. Application of abstract data types. IEEE Trans, on Software Engineering, 1983.
  • [Sto86] Michael Stonebraker. The case for shared nothing. IEEE Database Eng. Bull., 9 (1): 4–9, 1986.
  • [Sto87] Michael Stonebraker. The design of the postgres storage system. International Conference on Very Large Data Bases, VLDB '87, pages 289–300, San Francisco, CA, USA, 1987. Morgan Kaufmann Publishers Inc.
  • [Sto95] Michael Stonebraker. An overview of the sequoia 2000 project. Digital Technical Journal, 7 (3): 39–49, 1995.
  • [Sto14] Michael Stonebraker. The land sharks are on the squawk box, 2014. (Last accessed January 4, 2019).

Also popular now: