greabock January 16, 2014 at 00:48

Computing Intersections in Linear and Closed Namespaces

From the sandbox

Hello! And at once I apologize for the too sophisticated name, but it most fully reflects the material presented below.

I think many of you are faced with the need to calculate intersecting intervals. But the task I faced the other day was not so trivial. But first things first.

Calculating intersecting intervals in a linear namespace

If you already have an idea of intersecting intervals, then go right here .

The calculation of the intersections of time intervals (time intervals) on a straight line of time is not difficult. We can conditionally have five types of temporary intersections.
We denote one segment of time as "\ \" and the other as "/ /"

Offset along the time axis "/ \ / \"
Offset backward along the time axis "\ / \ /"
Occurrence "/ \ \ /"
Absorption "\ / / \"
Match XX

Thus, we can express each specific intersection using the signs <,> and =. And having the signs <= and> = in the arsenal, we can reduce the number of templates for calculation to four (thus, merging, “entry” and “coincidence” or “absorption” and “coincidence” or even “displacement” and “coincidence”) . In addition, either “entry” or “absorption” (but not both) can also be abolished, considering it a special case of “displacement”.
So, having a table of the form:

user	start	end
user1	2	7
user2	5	9
user3	8	eleven
user4	1	12

To select from the table all users crossing a given interval (say 4-8), we use the query:

SET @start:=4;
SET @end:=8;
SELECT * FROM `table`
WHERE
(`start` >= @start  AND `start` < @end)  /*смещение вперед*/
OR
(`end` <= @end AND `end` > @start)  /*смещение назад*/
OR
(`start` < @end AND `end` > @start) /*вхождение - на самом деле здесь не обязательно оно обработается одним из предыдущих выражений*/
OR
(`start` >= @start AND `end` <= @end)/*поглощение и совпадение*/

This query will return the first, second and third users. It's pretty simple.

Hm. But what if you want to select disjoint intervals?
In fact, everything is even simpler: in contrast to the intersection, there are only two cases of NOT intersection:

Backward offset "\ \ / /"
Forward offset "/ / \ \"

and on a continuous line of time we only need to check whether the end of one interval is less than the beginning of another.
And SQL query comes down to

SET @start:=4;
SET @end:=8;
SELECT * FROM `table`
WHERE
`start` >= @end  OR `end` <= @start  /*оба случая смещения*/

And here we recall the negation of expressions. If calculating non-intersections is much easier than intersecting, then why not just drop all non-intersections?

WHERE 
NOT ( `start` >=  @end  OR `end` <= @start  )

We open the brackets (thanks Yggaz ):

WHERE `start` < @end AND `end` > @start

Woah la! Everything is much more concise!

All this is very beautiful, and works great ... while the time line is straight.

Computing Intervals in Closed Namespaces

We figured out the calculations on the time line. So what is a “closed” namespace?

This is a namespace that, when the first-order names are exhausted, does not switch to a new order, but returns to its beginning.

Linear namespace:

0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, ...

Closed namespace:

0, 1, 2, 3, 4, 5 , 6, 7, 8, 9, 0, 1, 2, 3, 4, ...

Suppose we have a table with a schedule of work schedule (let it be a round-the-clock call center):
* minutes “00” are omitted for simplicity of expression

usrer	start	end
usrer1	0	6
usrer2	6	12
usrer3	12	18
usrer4	18	23

I work from 10 to 19 and I want to know which workers will cross my work schedule.
We make the selection given earlier in the scheme:

SET @start:=10;
SET @end:=19;
SELECT * FROM `table`
WHERE 
NOT ( `start` >=  @end  OR `end` <= @start )

Everything is fine! I received data from three workers whose interval overlaps with mine.

Ok, what if I work at night? Let's say from 20 to 6? That is, the beginning of my interval is greater than its end. A selection from the table under these conditions will return complete nonsense. That is, a crash happens when my time interval crosses the “zero” day mark. But after all, intervals intersecting the “zero” mark can also be stored in the database.
I encountered a similar problem two days ago.

The problem of sampling intervals by a linear method from a table with data of a nonlinear structure was on the face.
The first thing that occurred to me was to expand the space of the day to 48 hours, thereby getting rid of the “zero” mark. But this attempt crashed - because the intervals between 1 and 3 could not get into the sample between 22 and 27 (3). Oh! Now, if I had a twenty-four number system, I would simply remove the second-order sign and all things.

I tried to find a solution to this problem on the Internet. Information on the intersection of intervals on the "linear" time was plenty. But either this question did not have a wide discussion, or the SQL gurus held the solution “for themselves” - there was no solution anywhere on a closed system.

In general, asking the gurus on the advice forums, I got a solution: it was necessary to divide the intervals intersecting “zero” into two parts, thereby obtaining two linear intervals, and compare both of them with the intervals in the table (which also had to be divided into two parts if they cross zero). The solution worked and was quite stable, although it was cumbersome. However, the feeling that everything was “much simpler” did not leave me.
And now, having selected a couple of hours for this question, I took a notebook and a pencil ... I give what I got:

The bottom line is that day - there is a closed time line - a circle.
And time intervals are the essence of the arc of this circle.
Thus, to negate non-intersections (see the first part of the post), we can build a pair of non-intersection schemes:

In the first case, we have the usual linear scheme for negating non-intersections. In the second one of the arcs suppresses the “zero” mark. There is a third case, which can be immediately accepted as an intersection (without negation of non-intersection):

Both intervals intersect the “zero” mark, which means they intersect by definition. In addition, there are two more cases where the intervals (entered and taken from the table) are “swapped".
Having conjured a little with the base (somewhere even using the method of highly scientific poking), I managed to collect this request:

SET @start:= X ;
SET @end:= Y;
SELECT * FROM `lparty_ads`
	WHERE
	((`prime_start` < `prime_end` AND @start < @end) 
		AND  NOT (`prime_end`<= @start  OR  @end <=`prime_start` )
 	OR 
	(( (`prime_start` < `prime_end` AND @start > @end) OR (`prime_start` > `prime_end` AND @start < @end))
		AND NOT 
		( `prime_end` <= @start AND @end <= `prime_start` ))
	OR (`prime_start` > `prime_end` AND @start > @end))

And a simplified version from a comment from kirillzorin :

set @start := X;
set @end := Y;
select * from tab
where greatest(start, @start) <= least(end, @end)
      or ((end > @start or @end > start) and sign(start - end) <> sign(@start - @end))
      or (end < start and @end < @start);

The request is fully operational and, most funny, is true for any closed number systems - be it hour, day, week, month, year. And yet, this method does not use “crutches”, such as additional fields.

Most likely, I invented a bicycle. But in view of the fact that I myself did not find such information when I needed it - I assume that this method is not too widely covered. This concludes my story.

Tags:

Computing Intersections in Linear and Closed Namespaces

Calculating intersecting intervals in a linear namespace

Computing Intervals in Closed Namespaces

Also popular now: