Simple user aggregate

Original author: Josh Berkus
  • Transfer
  • Tutorial
Custom aggregates are one of those unique features of PostgreSQL that no one can understand. However, as soon as you create at least one truly working example, the clouds will open up and you will be surprised how you used to live without such a seasoned functionality. So let's create such a simple unit. It will return a state, that is, the most common value, for a logical (Boolean) field.

Who and why might want a weird one? Well, imagine that you monitor several of your web servers day and night, and want to have an hourly uptime picture. Suppose every 30 seconds the server status is entered in the table: TRUE - the server is running, FALSE- the server is down. Then if most of the time the server was working, we want to return TRUE , and if for the most part the server was standing, then FALSE , respectively . And if the monitoring system itself is lying and, therefore, there is no data, we will return NULL .

Of course, all this can be done using many different other mechanisms, for example, using the WINDOW mechanism . However, imagine that in one request you will need to work with other accumulated statistics, for example, downtime or server operation. In this case, PostgreSQL offers you an elegant mechanism.

First, we need a statistical function that will accumulate data about the Boolean field. Typically, such a function has two input parameters:
  • the parameter in which the calculated values ​​will be stored (after all, this function will be called for each row);
  • the type parameter of the column in which the value for the current row will be.


Suppose we want to store the number of UP and DOWN readings for the server. You can use an integer array for this. With the same success, this can be done with the composite type, approx. translator . Such a function is easy to write even in pure SQL:

CREATEORREPLACEfunction mode_bool_state(int[], boolean) RETURNSint[]
LANGUAGEsqlas $body$
SELECTCASE $2WHENTRUETHENarray[ $1[1] + 1, $1[2] ]
WHENFALSETHENarray[ $1[1], $1[2] + 1 ]
ELSE
    $1END;
$body$;


Note that the result of the int [] function will be served as the first parameter to the input of the same function when called for the next line, approx. translator .

To make a decision and output the final result, we create another function:
CREATEORREPLACEFUNCTION mode_bool_final(INT[]) RETURNSbooleanLANGUAGEsqlas $body$
SELECTCASEWHEN ( $1[1] = 0AND $1[2] = 0 )
THENNULLELSE $1[1] >= $1[2]
END;
$body$;


The point is small - declare the unit:
CREATEAGGREGATEmode(boolean) (
    SFUNC = mode_bool_state,
    STYPE = INT[],
    FINALFUNC = mode_bool_final,
    INITCOND = '{0,0}'
);


Here SFUNC and FINALFUNC are the names of our functions, STYPE is the data type for collecting statistics, INITCOND is the initial conditions.

Let's see how it works!

SELECT server_name,
    sum(CASEWHEN server_up THEN0.5ELSE0END) as minutes_up,
    mode(server_up) asmodeFROM servers 
WHERE montime BETWEEN'2013-04-01'and'2013-04-01 01:00:00';


server_name minutes_up mode
web1 56.5 TRUE
web2 0.0 FALSE
web3 48.0 TRUE
web4 11.5 FALSE


PS An article in English by Tom Brown also tells how to create a custom aggregate. In it, the author does not use the final optional FINALFUNC function , since the type for collecting STYPE data in his example is the same as the base type of the aggregate.

Also popular now: