Mysql PARTITION BY YEAR (date) / MONTH (date) / DAYOFWEEK (date)
Often I have to deal with tables that contain rarely or even never updated data. A good example of such data are various logs. Some tables are regularly cleaned of obsolete data, and some have to keep records “forever”. Therefore, such tables “swell” and working with them becomes a difficult operation for the entire system.
To reduce the load on the disk and the FS, partitioning was invented, for the simplest, partitioning. The file with the table data is cut according to some condition into several not large files - partitions. For the case with logs, it is reasonable to partition the tables by the field containing the dates of the event. It is often wise to cut a partition table by year by month or by day of month / week.
Something tells you that you will have to cut along the timestamp field.
Let's make a tablet:
But what if, for example, you need to spread out a table with logs by day of the month? That is, something is written in the table that is stored for a month or two, and then deleted. What if we want to cut like this:
We get:
the explanation for this is this: “TIMESTAMP is internally converted to the local sessions timezone.”
Okay:
Now:
Now, we get:
It is treated:
Once again:
All OK.
We get:
Excellent!
This type of “slicing” is suitable if you need to decompose the archive data into files “per year” or by month.
But what if, for example, you need to sort out a table with logs by the day of the month. that is, something is written in the table that is stored for a month or two and then it is rubbed.
That is, what to do if we want to cut like this:
Or so:
A timestamp field is not suitable.
Googling says that it is necessary to use datetime and a point.
ok, create a table:
Note:
The fact is that CURRENT_TIMESTAMP does not roll as a default value for a field of type datetime, NOW () cannot be specified as a default value because function.
But it is necessary that date_added is set automatically.
There are two ways out:
1. Or, in all queries in INSERT, add NOW ().
2. Either hang a trigger that will be date_added = NOW () at each insert;
When INSERT codes are done in many places and everywhere corrected for INSERT ... NOW () it will be impossible to use a trigger.
Sort of:
Now we have a table with the necessary types, keys and trigger.
And we can easily cut the table by month:
Or even the days of the week:
Or even 2 days on partition:
In general, now everything is in your hands.
To reduce the load on the disk and the FS, partitioning was invented, for the simplest, partitioning. The file with the table data is cut according to some condition into several not large files - partitions. For the case with logs, it is reasonable to partition the tables by the field containing the dates of the event. It is often wise to cut a partition table by year by month or by day of month / week.
Something tells you that you will have to cut along the timestamp field.
Let's make a tablet:
CREATE TABLE `foo` (
`id` int(11) NOT NULL AUTO_INCREMENT,
`date_added` timestamp NOT NULL DEFAULT CURRENT_TIMESTAMP,
`name` varchar(30) DEFAULT NULL,
`email` varchar(30) DEFAULT NULL,
PRIMARY KEY (`id`)
);
But what if, for example, you need to spread out a table with logs by day of the month? That is, something is written in the table that is stored for a month or two, and then deleted. What if we want to cut like this:
ALTER TABLE foo PARTITION BY RANGE (YEAR(date_added))
(
PARTITION p2011 VALUES LESS THAN (2012) ,
PARTITION p2012 VALUES LESS THAN (2013) ,
PARTITION p2013 VALUES LESS THAN (2014)
);
We get:
ERROR 1486 (HY000): Constant, random or timezone-dependent expressions in (sub)partitioning function are not allowed
the explanation for this is this: “TIMESTAMP is internally converted to the local sessions timezone.”
Okay:
SELECT UNIX_TIMESTAMP('2012-01-01 00:00:00');
+---------------------------------------+
| UNIX_TIMESTAMP('2012-01-01 00:00:00') |
+---------------------------------------+
| 1325361600 |
+---------------------------------------+
SELECT UNIX_TIMESTAMP('2013-01-01 00:00:00');
+---------------------------------------+
| UNIX_TIMESTAMP('2013-01-01 00:00:00') |
+---------------------------------------+
| 1356984000 |
+---------------------------------------+
SELECT UNIX_TIMESTAMP('2014-01-01 00:00:00');
+---------------------------------------+
| UNIX_TIMESTAMP('2014-01-01 00:00:00') |
+---------------------------------------+
| 1388520000 |
+---------------------------------------+
Now:
ALTER TABLE foo PARTITION BY RANGE (UNIX_TIMESTAMP(date_added))
(
PARTITION p2011 VALUES LESS THAN (1325361600) ,
PARTITION p2012 VALUES LESS THAN (1356984000) ,
PARTITION p2013 VALUES LESS THAN (1388520000) ,
PARTITION pMAXVALUE VALUES LESS THAN (MAXVALUE)
);
Now, we get:
ERROR 1503 (HY000): A PRIMARY KEY must include all columns in the table's partitioning function
It is treated:
ALTER table foo DROP PRIMARY KEY, add PRIMARY KEY (`id`,`date_added`);
Once again:
ALTER TABLE foo PARTITION BY RANGE (UNIX_TIMESTAMP(date_added))
(
PARTITION p2011 VALUES LESS THAN (1325361600) ,
PARTITION p2012 VALUES LESS THAN (1356984000) ,
PARTITION p2013 VALUES LESS THAN (1388520000) ,
PARTITION pMAXVALUE VALUES LESS THAN (MAXVALUE)
);
All OK.
We get:
CREATE TABLE `foo` (
`id` int(11) NOT NULL AUTO_INCREMENT,
`date_added` timestamp NOT NULL DEFAULT CURRENT_TIMESTAMP,
`name` varchar(30) DEFAULT NULL,
`email` varchar(30) DEFAULT NULL,
PRIMARY KEY (`id`,`date_added`)
) ENGINE=InnoDB
PARTITION BY RANGE (UNIX_TIMESTAMP(date_added))
(PARTITION p2011 VALUES LESS THAN (1325361600) ENGINE = InnoDB,
PARTITION p2012 VALUES LESS THAN (1356984000) ENGINE = InnoDB,
PARTITION p2013 VALUES LESS THAN (1388520000) ENGINE = InnoDB,
PARTITION pMAXVALUE VALUES LESS THAN MAXVALUE ENGINE = InnoDB
);
Excellent!
This type of “slicing” is suitable if you need to decompose the archive data into files “per year” or by month.
But what if, for example, you need to sort out a table with logs by the day of the month. that is, something is written in the table that is stored for a month or two and then it is rubbed.
That is, what to do if we want to cut like this:
PARTITION BY RANGE (MONTH(date))
Or so:
PARTITION BY RANGE (DAY(date_add))
A timestamp field is not suitable.
Googling says that it is necessary to use datetime and a point.
ok, create a table:
CREATE TABLE `foo` (
`id` int(11) NOT NULL AUTO_INCREMENT,
`date_added` datetime DEFAULT NULL,
`name` varchar(30) DEFAULT NULL,
`email` varchar(30) DEFAULT NULL,
PRIMARY KEY (`id`,`date_added`)
) ENGINE=InnoDB;
Note:
`date_added` datetime DEFAULT NULL
The fact is that CURRENT_TIMESTAMP does not roll as a default value for a field of type datetime, NOW () cannot be specified as a default value because function.
But it is necessary that date_added is set automatically.
There are two ways out:
1. Or, in all queries in INSERT, add NOW ().
2. Either hang a trigger that will be date_added = NOW () at each insert;
When INSERT codes are done in many places and everywhere corrected for INSERT ... NOW () it will be impossible to use a trigger.
Sort of:
DELIMITER $$
USE `test_db`$$
CREATE
TRIGGER `foo_add` BEFORE INSERT ON `foo`
FOR EACH ROW BEGIN
SET NEW.date_added = IFNULL(NEW.date_added, NOW());
END;
$$
Now we have a table with the necessary types, keys and trigger.
And we can easily cut the table by month:
ALTER TABLE foo PARTITION BY RANGE (MONTH(date_added))
(
PARTITION p01 VALUES LESS THAN (02) ,
PARTITION p02 VALUES LESS THAN (03) ,
PARTITION p03 VALUES LESS THAN (04) ,
PARTITION p04 VALUES LESS THAN (05) ,
PARTITION p05 VALUES LESS THAN (06) ,
PARTITION p06 VALUES LESS THAN (07) ,
PARTITION p07 VALUES LESS THAN (08) ,
PARTITION p08 VALUES LESS THAN (09) ,
PARTITION p09 VALUES LESS THAN (10) ,
PARTITION p10 VALUES LESS THAN (11) ,
PARTITION p11 VALUES LESS THAN (12) ,
PARTITION p12 VALUES LESS THAN (13) ,
PARTITION pmaxval VALUES LESS THAN MAXVALUE
);
Or even the days of the week:
ALTER TABLE foo PARTITION BY RANGE (DAYOFWEEK(date_added))
(
PARTITION p01 VALUES LESS THAN (2) ,
PARTITION p02 VALUES LESS THAN (3) ,
PARTITION p03 VALUES LESS THAN (4) ,
PARTITION p04 VALUES LESS THAN (5) ,
PARTITION p05 VALUES LESS THAN (6) ,
PARTITION p06 VALUES LESS THAN (7) ,
PARTITION p07 VALUES LESS THAN (8) ,
PARTITION pmaxval VALUES LESS THAN MAXVALUE
);
Or even 2 days on partition:
ALTER TABLE foo PARTITION BY LIST (DAY(date_added))
(
PARTITION p00 VALUES IN (0,1) ,
PARTITION p02 VALUES IN (2,3) ,
PARTITION p04 VALUES IN (4,5) ,
PARTITION p06 VALUES IN (6,7) ,
PARTITION p08 VALUES IN (8,9) ,
PARTITION p10 VALUES IN (10,11),
PARTITION p12 VALUES IN (12,13),
PARTITION p14 VALUES IN (14,15),
PARTITION p16 VALUES IN (16,17),
PARTITION p18 VALUES IN (18,19),
PARTITION p20 VALUES IN (20,21),
PARTITION p22 VALUES IN (22,23),
PARTITION p24 VALUES IN (24,25),
PARTITION p26 VALUES IN (26,27),
PARTITION p28 VALUES IN (28,29),
PARTITION p30 VALUES IN (30,31)
);
In general, now everything is in your hands.