Windows: Sleep (0.5)

As many people probably know, the number of milliseconds passed to the WinAPI's Sleep function is how much we want to fall asleep. Therefore, the minimum that we can request is fall asleep for 1 millisecond. But what if we want to sleep even less? For those interested in how to do this in pictures , welcome, under cat.

First, I recall that Windows (like any other non-real-time system) does not guarantee that the thread (some call it thread) will sleep for the requested time. Starting with Vista, the logic of the OS is simple. There is a certain amount of time allocated to the thread for execution (yes, yes, those same 20 ms that everyone heard about during 2000 / XP and still hear about it on the server axes). And Windows reschedules threads (stops some threads, starts others) only after this quantum expires. Those. if the quantum in the OS is 20 ms (the default in XP was just such a value, for example), then even if we requested Sleep (1), then in the worst case, control will return to us after the same 20 ms. There are multimedia functions to control this time quantum, in particular timeBeginPeriod / timeEndPeriod.

Secondly, I will make a brief digression, why such accuracy may be required. Microsoft says that only multimedia applications need this accuracy. For example, you make a new WinAMP with a blackjet, and it is very important here that we send a new piece of audio data to the system on time. My need was in another area. We had a H264 stream decompressor. And he was on ffmpeg'e. And he had a synchronous interface (Frame * decompressor.Decompress (Frame * compressedFrame)). And everything was fine until they screwed decompression on Intel chips in processors. By the way, I don’t remember what reasons I had to work with it not through the native Intel Media SDK, but through the DXVA2 interface. And it is asynchronous. So I had to work like this:

  • Copy data to video memory
  • Make Sleep, so that the frame has time to relax
  • We ask if the decompression is completed, and if so, then we take the expanded frame from the video memory

The problem was in the second paragraph. If you believe the GPUView, then the frames had time to decompress in 50-200 microseconds. If you put Sleep (1), then on core i5 you can decompress a maximum of 1000 * 4 * (cores) = 4000 frames per second. If we consider the usual fps equal to 25, then this leaves only 40 * 4 = 160 video streams at the same time decompress. And the goal was to stretch 200. Actually there were 2 options: either redo everything for asynchronous operation with hardware decompressor, or reduce Sleep time.

First measurements


To roughly estimate the current flow runtime quantum, we will write a simple program:

void test()
{
	std::cout << "Starting test" << std::endl;
	std::int64_t total = 0;
	for (unsigned i = 0; i < 5; ++i)
	{
		auto t1 = std::chrono::high_resolution_clock::now();
		::Sleep(1);
		auto t2 = std::chrono::high_resolution_clock::now();
		auto elapsedMicrosec = std::chrono::duration_cast(t2 - t1).count();
		total += elapsedMicrosec;
		std::cout << i << ": Elapsed " << elapsedMicrosec << std::endl;
	}
	std::cout << "Finished. average time:" << (total / 5) << std::endl;
}
int main()
{	
	test();
    return 0;
}

Here is a typical output on Win 8.1
Starting test
0: Elapsed 1977
1: Elapsed 1377
2: Elapsed 1409
3: Elapsed 1396
4: Elapsed 1432
Finished. average time: 1518

Immediately, I want to warn you that if you have MSVS 2012 for example, then you do not intend to std :: chrono :: high_resolution_clock. Anyway, we recall that the surest way to measure the duration of something is through Performance Counters. We rewrite our code a bit to be sure that we are measuring the times we are correct. First, write a class helper. I did tests now on MSVS2015, there the implementation of high_resolution_clock is already correct, through performance counters. I’m doing this step, suddenly someone wants to repeat the tests on an older compiler

PreciseTimer.h
#pragma once
class PreciseTimer
{
public:
	PreciseTimer();
	std::int64_t Microsec() const;
private:
	LARGE_INTEGER m_freq; // системная частота таймера.
};
inline PreciseTimer::PreciseTimer()
{
	if (!QueryPerformanceFrequency(&m_freq))
		m_freq.QuadPart = 0;
}
inline int64_t PreciseTimer::Microsec() const
{
	LARGE_INTEGER current;
	if (m_freq.QuadPart == 0 || !QueryPerformanceCounter(¤t))
		return 0;
	// Пересчитываем количество системных тиков в микросекунды.
	return current.QuadPart * 1000'000 / m_freq.QuadPart;
}


Modified test function
void test()
{
	PreciseTimer timer;
	std::cout << "Starting test" << std::endl;
	std::int64_t total = 0;
	for (unsigned i = 0; i < 5; ++i)
	{
		auto t1 = timer.Microsec();
		::Sleep(1);
		auto t2 = timer.Microsec();
		auto elapsedMicrosec = t2 - t1;
		total += elapsedMicrosec;
		std::cout << i << ": Elapsed " << elapsedMicrosec << std::endl;
	}
	std::cout << "Finished. average time:" << (total / 5) << std::endl;
}

Well, a typical output of our program on Windows Server 2008 R2
Starting test
0: Elapsed 10578
1: Elapsed 14519
2: Elapsed 14592
3: Elapsed 14625
4: Elapsed 14354
Finished. average time: 13733

Trying to solve the problem head on.


We rewrite our program a bit. And try to use the obvious:

std :: this_thread :: sleep_for (std :: chrono :: microseconds (500))
void test(const std::string& description, const std::function& f)
{
	PreciseTimer timer;
	std::cout << "Starting test: " << description << std::endl;
	std::int64_t total = 0;
	for (unsigned i = 0; i < 5; ++i)
	{
		auto t1 = timer.Microsec();
		f();
		auto t2 = timer.Microsec();
		auto elapsedMicrosec = t2 - t1;
		total += elapsedMicrosec;
		std::cout << i << ": Elapsed " << elapsedMicrosec << std::endl;
	}
	std::cout << "Finished. average time:" << (total / 5) << std::endl;
}
int main()
{	
	test("Sleep(1)", [] { ::Sleep(1); });
	test("sleep_for(microseconds(500))", [] { std::this_thread::sleep_for(std::chrono::microseconds(500)); });
    return 0;
}

Typical output on Windows 8.1
Starting test: Sleep (1)
0: Elapsed 1187
1: Elapsed 1315
2: Elapsed 1427
3: Elapsed 1432
4: Elapsed 1449
Finished. average time: 1362
Starting test: sleep_for (microseconds (500))
0: Elapsed 1297
1: Elapsed 1434
2: Elapsed 1280
3: Elapsed 1451
4: Elapsed 1459
Finished. average time: 1384

Those. as we can see, there is no gain on the move. Take a closer look at this_thread :: sleep_for . And we notice that it is generally implemented through this_thread :: sleep_until , i.e. unlike Sleep, he is not even immune to the clock, for example. Let's try to find the best alternative.

Slip that can


Searching on MSDN and stackoverflow directs us towards Waitable Timers as the only alternative. Well, let's write another helper class.

WaitableTimer.h
#pragma once
class WaitableTimer
{
public:
	WaitableTimer()
	{
		m_timer = ::CreateWaitableTimer(NULL, FALSE, NULL);
		if (!m_timer)
			throw std::runtime_error("Failed to create waitable time (CreateWaitableTimer), error:" + std::to_string(::GetLastError()));
	}
	~WaitableTimer()
	{
		::CloseHandle(m_timer);
		m_timer = NULL;
	}
	void SetAndWait(unsigned relativeTime100Ns)
	{
		LARGE_INTEGER dueTime = { 0 };
		dueTime.QuadPart = static_cast(relativeTime100Ns) * -1;
		BOOL res = ::SetWaitableTimer(m_timer, &dueTime, 0, NULL, NULL, FALSE);
		if (!res)
			throw std::runtime_error("SetAndWait: failed set waitable time (SetWaitableTimer), error:" + std::to_string(::GetLastError()));
		DWORD waitRes = ::WaitForSingleObject(m_timer, INFINITE);
		if (waitRes == WAIT_FAILED)
			throw std::runtime_error("SetAndWait: failed wait for waitable time (WaitForSingleObject)" + std::to_string(::GetLastError()));
	}
private:
	HANDLE m_timer;
};


And we will supplement our tests with new:

int main()
{	
	test("Sleep(1)", [] { ::Sleep(1); });
	test("sleep_for(microseconds(500))", [] { std::this_thread::sleep_for(std::chrono::microseconds(500)); });
	WaitableTimer timer;
	test("WaitableTimer", [&timer]	{ timer.SetAndWait(5000); });
    return 0;
}

Let's see what has changed.

Typical output on Windows Server 2008 R2
Starting test: Sleep (1)
0: Elapsed 10413
1: Elapsed 8467
2: Elapsed 14365
3: Elapsed 14563
4: Elapsed 14389
Finished. average time: 12439
Starting test: sleep_for (microseconds (500))
0: Elapsed 11771
1: Elapsed 14247
2: Elapsed 14323
3: Elapsed 14426
4: Elapsed 14757
Finished. average time: 13904
Starting test: WaitableTimer
0: Elapsed 12654
1: Elapsed 14700
2: Elapsed 14259
3: Elapsed 14505
4: Elapsed 14493
Finished. average time: 14122

As we can see, nothing has changed on served operations on the go. Since by default the flow runtime quantum on it is usually huge. I will not look for virtual machines with XP and with Windows 7, but I will say that most likely there will be a completely similar situation on XP, but on Windows 7 it seems like the default quantum of time is 1ms. Those. The new test should give the same indicators that the previous tests on Windows 8.1 did.

Now let's look at the output of our program on Windows 8.1
Starting test: Sleep (1)
0: Elapsed 1699
1: Elapsed 1444
2: Elapsed 1493
3: Elapsed 1482
4: Elapsed 1403
Finished. average time: 1504
Starting test: sleep_for (microseconds (500))
0: Elapsed 1259
1: Elapsed 1088
2: Elapsed 1497
3: Elapsed 1497
4: Elapsed 1528
Finished. average time: 1373
Starting test: WaitableTimer
0: Elapsed 643
1: Elapsed 481
2: Elapsed 424
3: Elapsed 330
4: Elapsed 468
Finished. average time: 469

What do we see? It is true that our new slip was able! Those. on Windows 8.1 we have already solved our problem. Because of what happened? This happened due to the fact that in windows 8.1 the time quantum was made just 500 microseconds. Yes, yes, the flows are executed in 500 microseconds (on my system, by default, the resolution is set to 500.8 microseconds and is not set less, unlike XP / Win7 where it was possible to set exactly 500 microseconds), then they are rescheduled according to their priorities and run on a new run.

Conclusion 1 : To make Sleep (0.5) necessary, but not sufficient, the correct slip. Always use Waitable timers for this.

Conclusion 2: If you write only under Win 8.1 / Win 10 and are guaranteed not to run on other OSes, then you can stop using Waitable Timers.

We remove the dependence on circumstances or how to increase the accuracy of the system timer


I already mentioned the multimedia function timeBeginPeriod. The documentation states that using this function you can set the desired accuracy of the timer. Let's check. Once again, we modify our program.

program v3
#include "stdafx.h"
#include "PreciseTimer.h"
#include "WaitableTimer.h"
#pragma comment (lib, "Winmm.lib")
void test(const std::string& description, const std::function& f)
{
	PreciseTimer timer;
	std::cout << "Starting test: " << description << std::endl;
	std::int64_t total = 0;
	for (unsigned i = 0; i < 5; ++i)
	{
		auto t1 = timer.Microsec();
		f();
		auto t2 = timer.Microsec();
		auto elapsedMicrosec = t2 - t1;
		total += elapsedMicrosec;
		std::cout << i << ": Elapsed " << elapsedMicrosec << std::endl;
	}
	std::cout << "Finished. average time:" << (total / 5) << std::endl;
}
void runTestPack()
{
	test("Sleep(1)", [] { ::Sleep(1); });
	test("sleep_for(microseconds(500))", [] { std::this_thread::sleep_for(std::chrono::microseconds(500)); });
	WaitableTimer timer;
	test("WaitableTimer", [&timer] { timer.SetAndWait(5000); });
}
int main()
{
	runTestPack();
	std::cout << "Timer resolution is set to 1 ms" << std::endl;
	// здесь надо бы сперва timeGetDevCaps вызывать и смотреть, что она возвращяет, но так как этот вариант
	// мы в итоге выкинем, на написание правильного кода заморачиваться не будем
	timeBeginPeriod(1);
	::Sleep(1); // чтобы предыдущие таймеры гарантированно отработали
	::Sleep(1); // чтобы предыдущие таймеры гарантированно отработали
	runTestPack();
	timeEndPeriod(1);
    return 0;
}


Traditionally, typical findings of our program.

On Windows 8.1
Starting test: Sleep (1)
0: Elapsed 2006
1: Elapsed 1398
2: Elapsed 1390
3: Elapsed 1424
4: Elapsed 1424
Finished. average time: 1528
Starting test: sleep_for (microseconds (500))
0: Elapsed 1348
1: Elapsed 1418
2: Elapsed 1459
3: Elapsed 1475
4: Elapsed 1503
Finished. average time: 1440
Starting test: WaitableTimer
0: Elapsed 200
1: Elapsed 469
2: Elapsed 442
3: Elapsed 456
4: Elapsed 462
Finished. average time: 405
Timer resolution is set to 1 ms
Starting test: Sleep (1)
0: Elapsed 1705
1: Elapsed 1412
2: Elapsed 1411
3: Elapsed 1441
4: Elapsed 1408
Finished. average time: 1475
Starting test: sleep_for (microseconds (500))
0: Elapsed 1916
1: Elapsed 1451
2: Elapsed 1415
3: Elapsed 1429
4: Elapsed 1223
Finished. average time: 1486
Starting test: WaitableTimer
0: Elapsed 602
1: Elapsed 445
2: Elapsed 994
3: Elapsed 347
4: Elapsed 345
Finished. average time: 546

And on Windows Server 2008 R2
Starting test: Sleep (1)
0: Elapsed 10306
1: Elapsed 13799
2: Elapsed 13867
3: Elapsed 13877
4: Elapsed 13869
Finished. average time: 13143
Starting test: sleep_for (microseconds (500))
0: Elapsed 10847
1: Elapsed 13986
2: Elapsed 14000
3: Elapsed 13898
4: Elapsed 13834
Finished. average time: 13313
Starting test: WaitableTimer
0: Elapsed 11454
1: Elapsed 13821
2: Elapsed 14014
3: Elapsed 13852
4: Elapsed 13837
Finished. average time: 13395
Timer resolution is set to 1 ms
Starting test: Sleep (1)
0: Elapsed 940
1: Elapsed 218
2: Elapsed 276
3: Elapsed 352
4: Elapsed 384
Finished. average time: 434
Starting test: sleep_for (microseconds (500))
0: Elapsed 797
1: Elapsed 386
2: Elapsed 371
3: Elapsed 389
4: Elapsed 371
Finished. average time: 462
Starting test: WaitableTimer
0: Elapsed 323
1: Elapsed 338
2: Elapsed 309
3: Elapsed 359
4: Elapsed 391
Finished. average time: 344

Let's analyze the interesting facts that are visible from the results:

  1. On windows 8.1, nothing has changed. We conclude that timeBeginPeriod is smart enough, i.e. if N applications requested the resolution of the system timer in different values, then this resolution will not decrease. On Windows 7, we would not notice any changes either, since there the timer resolution already stands at 1 ms.

  2. On a server OS, timeBeginPeriod (1) worked in an unexpected way: it set the resolution of the system timer to the highest possible value. Those. on such OSes somewhere a workaround of the form is clearly wired:

    void timeBeginPerion(UINT uPeriod)
    {
    	if (uPeriod == 1)
    	{
    		setMaxTimerResolution();
    		return;
    	}
    	...
    }

    I note that this has not happened before on Windows Server 2003 R2. This is an innovation in the 2008 server.

  3. On the server OS, Sleep (1) also worked in an unexpected way. Those. Sleep (1) is interpreted on server OSes, starting from the 2008 server not as “ pause 1 millisecond ”, but as “ make the minimum possible pause ”. Then there will be the case that this statement is not true.

Let's continue our conclusions:

Conclusion 3 : If you write only under Win Server 2008/2012/2016 and you are guaranteed not to run on other OSes, then you don’t have to bother at all, timeBeginPeriod (1) and subsequent Sleep (1) will do everything that you necessary.

Conclusion 4 : timeBeginPeriod for our purposes is good only for server axes. but its joint use with Waitable timers covers our task on Win Server 2008/2012/2016 and on Windows 8.1 / Windows 10

What if we want everything at once?


Let’s think about what we should do if we need Sleep (0.5) to work under Win XP / Win Vista / Win 7 / Win Server 2003.

Only native api will come to our aid - the undocumented api that is available to us from user space via ntdll.dll. There are interesting NtQueryTimerResolution / NtSetTimerResolution functions there.

We will write the AdjustSystemTimerResolutionTo500mcs function.
ULONG AdjustSystemTimerResolutionTo500mcs()
{
	static const ULONG resolution = 5000; // 0.5 мс в 100-наносекундных интервалах.
	ULONG sysTimerOrigResolution = 10000;
	ULONG minRes;
	ULONG maxRes;
	NTSTATUS ntRes = NtQueryTimerResolution(&maxRes, &minRes, &sysTimerOrigResolution);
	if (NT_ERROR(ntRes))
	{
		std::cerr << "Failed query system timer resolution: " << ntRes;
	}
	ULONG curRes;
	ntRes = NtSetTimerResolution(resolution, TRUE, &curRes);
	if (NT_ERROR(ntRes))
	{
		std::cerr << "Failed set system timer resolution: " << ntRes;
	}
	else if (curRes != resolution)
	{
		// здесь по идее надо проверять не равенство curRes и resolution, а их отношение. Т.е. возможны случаи, например,
		// что запрашиваем 5000, а выставляется в 5008
		std::cerr << "Failed set system timer resolution: req=" << resolution << ", set=" << curRes;
	}
	return sysTimerOrigResolution;
}


To make the code compile, add declarations of the necessary functions.
#include 
#ifndef NT_ERROR
#define NT_ERROR(Status) ((((ULONG)(Status)) >> 30) == 3)
#endif
extern "C"
{
	NTSYSAPI
		NTSTATUS
		NTAPI
		NtSetTimerResolution(
			_In_ ULONG                DesiredResolution,
			_In_ BOOLEAN              SetResolution,
			_Out_ PULONG              CurrentResolution);
	NTSYSAPI
		NTSTATUS
		NTAPI
		NtQueryTimerResolution(
			_Out_ PULONG              MaximumResolution,
			_Out_ PULONG              MinimumResolution,
			_Out_ PULONG              CurrentResolution);
}
#pragma comment (lib, "ntdll.lib")


Typical output with Windows 8.1
Starting test: Sleep (1)
0: Elapsed 13916
1: Elapsed 14995
2: Elapsed 3041
3: Elapsed 2247
4: Elapsed 15141
Finished. average time: 9868
Starting test: sleep_for (microseconds (500))
0: Elapsed 12359
1: Elapsed 14607
2: Elapsed 15019
3: Elapsed 14957
4: Elapsed 14888
Finished. average time: 14366
Starting test: WaitableTimer
0: Elapsed 12783
1: Elapsed 14848
2: Elapsed 14647
3: Elapsed 14550
4: Elapsed 14888
Finished. average time: 14343
Timer resolution is set to 1 ms
Starting test: Sleep (1)
0: Elapsed 1175
1: Elapsed 1501
2: Elapsed 1473
3: Elapsed 1147
4: Elapsed 1462
Finished. average time: 1351
Starting test: sleep_for (microseconds (500))
0: Elapsed 1030
1: Elapsed 1376
2: Elapsed 1452
3: Elapsed 1335
4: Elapsed 1467
Finished. average time: 1332
Starting test: WaitableTimer
0: Elapsed 105
1: Elapsed 394
2: Elapsed 429
3: Elapsed 927
4: Elapsed 505
Finished. average time: 472

Typical output with Windows Server 2008 R2
Starting test: Sleep (1)
0: Elapsed 7364
1: Elapsed 14056
2: Elapsed 14188
3: Elapsed 13910
4: Elapsed 14178
Finished. average time: 12739
Starting test: sleep_for (microseconds (500))
0: Elapsed 11404
1: Elapsed 13745
2: Elapsed 13975
3: Elapsed 14006
4: Elapsed 14037
Finished. average time: 13433
Starting test: WaitableTimer
0: Elapsed 11697
1: Elapsed 14174
2: Elapsed 13808
3: Elapsed 14010
4: Elapsed 14054
Finished. average time: 13548
Timer resolution is set to 1 ms
Starting test: Sleep (1)
0: Elapsed 10690
1: Elapsed 14308
2: Elapsed 768
3: Elapsed 823
4: Elapsed 803
Finished. average time: 5478
Starting test: sleep_for (microseconds (500))
0: Elapsed 983
1: Elapsed 955
2: Elapsed 946
3: Elapsed 937
4: Elapsed 946
Finished. average time: 953
Starting test: WaitableTimer
0: Elapsed 259
1: Elapsed 456
2: Elapsed 453
3: Elapsed 456
4: Elapsed 460
Finished. average time: 416

It remains to make observations and conclusions.

Observations:

  1. On Win8, after the first launch of the program, the resolution of the system timer was reset to a large value. Those. Conclusion 2 was made by us incorrectly.

  2. After manual installation, the spread of real slips for the WaitableTimer case increased, although on average the slip is held for about 500 microseconds.

  3. On a server OS, Sleep (1) very unexpectedly stopped working (like this_thread :: sleep_for ) compared to the timeBeginPeriod case . Those. Sleep (1) began to work as it should, meaning " pause for 1 millisecond ."

Final conclusions


  • Conclusion 1 remained unchanged: To make Sleep (0.5) necessary, but not sufficient, the correct slip. Always use Waitable timers for this.

  • Conclusion 2 : The resolution of the system timer on Windows depends on the type of Windows, on the version of Windows, on currently running processes, on what processes could run before. Those. nothing can be claimed or guaranteed! If you need any guarantees, then you yourself must always request / set the desired accuracy. For values ​​less than 1 millisecond you need to use native api. For larger values, it is better to use timeBeginPeriod .

  • Conclusion 3 : If possible, it is better to test the code not only on your working Win 10, but also on the one indicated by the main customer. It must be remembered that server OSes can be very different from desktop

Also popular now: