Pthread_cond_timedwait: problem, solution, discussion
Hello, dear Habrausers!
Continuing a series of posts on multithreaded programming, I would like to touch upon one fundamental problem of using signal variables in Linux, unfortunately, which does not yet have a beautiful universal solution (or maybe it is simply unknown to me). Many, unfortunately, do not even realize that such a problem has a place to be.
Consider a simple example of using a signal variable:
The point of using pthread_cond_timedwait is that we either wait for the signal (pthread_cond_signal or pthread_cond_broadcast) to be received as a notification that somethingHappens (), or we stop waiting after the timeout we have set. In the second part of the phrase lies the very potential problem! Please note that the time passed as the third parameter in pthread_cond_timedwait is set in absolute form! But what if the time is moved back (!) After we get the current time (gettimeofday) and before we fall asleep waiting on pthread_cond_timedwait?
What will be the behavior of pthread_cond_timedwait if our process is already sleeping on this call? Everything is clean here! On all platforms on which I experimented with moving the time back, the change was simply ignored, i.e. really, inside the call, time is still transformed from absolute to relative value. I wonder why this is not rendered in the function interface? That would solve all the problems!
Critics may argue that this is some kind of negligible situation, so that the translation of the system time falls into this very small piece of code. Let me disagree. On the one hand, if the probability of an event is not equal to zero, then it will happen (it is customary to call it the “general effect”), and on the other hand, everything depends heavily on a particular program. We faced this problem when developing a video surveillance system, and these are dozens of threads (threads), in each of which pthread_cond_timedwait is done 25 times per second, and transferring the time an hour ago led to a probability close to 100% , some stream and fall asleep for this hour plus 1/25 of a second!
What to do?
As I said at the beginning of my story, there is no beautiful solution to this problem, but it’s impossible not to solve it at all! In our system, we organized a separate stream, we will call it the “system time monitoring stream”, which tracks “time transfers back” and, if detected, “wakes up” all signal variables. Those. in fact, the solution assumes the presence in the system of some dedicated manager in which it is necessary to register all the signal variables used. It turned out something like this:
Now all we have to do is create an instance of the SystemTimeManager class and remember to register all the signal variables we use in it.
In conclusion, I would like to draw the attention of a respected community to the topic of this article "problem, solution, discussion." I hope the problem is described quite clearly. The solution to the problem described, though not the most elegant, I brought - I hope it will be useful to someone. However, the last - discussion - I can not do without you, dear Habrausers. Maybe someone has some other, more elegant solutions to this problem?
Continuing a series of posts on multithreaded programming, I would like to touch upon one fundamental problem of using signal variables in Linux, unfortunately, which does not yet have a beautiful universal solution (or maybe it is simply unknown to me). Many, unfortunately, do not even realize that such a problem has a place to be.
Consider a simple example of using a signal variable:
struct timeval now;
struct timespec timeout;
gettimeofday(&now, 0);
timeout.tv_sec = now.tv_sec + 2; // 2 sec
timeout.tv_nsec = now.tv_usec * 1000; // nsec
retval=0;
pthread_mutex_lock(&mutex);
while(!somethingHappens() && retval==0)
{
retval=pthread_cond_timedwait(&condition, &mutex, &timeout);
}
pthread_mutex_unlock(&mutex);
The point of using pthread_cond_timedwait is that we either wait for the signal (pthread_cond_signal or pthread_cond_broadcast) to be received as a notification that somethingHappens (), or we stop waiting after the timeout we have set. In the second part of the phrase lies the very potential problem! Please note that the time passed as the third parameter in pthread_cond_timedwait is set in absolute form! But what if the time is moved back (!) After we get the current time (gettimeofday) and before we fall asleep waiting on pthread_cond_timedwait?
What will be the behavior of pthread_cond_timedwait if our process is already sleeping on this call? Everything is clean here! On all platforms on which I experimented with moving the time back, the change was simply ignored, i.e. really, inside the call, time is still transformed from absolute to relative value. I wonder why this is not rendered in the function interface? That would solve all the problems!
Critics may argue that this is some kind of negligible situation, so that the translation of the system time falls into this very small piece of code. Let me disagree. On the one hand, if the probability of an event is not equal to zero, then it will happen (it is customary to call it the “general effect”), and on the other hand, everything depends heavily on a particular program. We faced this problem when developing a video surveillance system, and these are dozens of threads (threads), in each of which pthread_cond_timedwait is done 25 times per second, and transferring the time an hour ago led to a probability close to 100% , some stream and fall asleep for this hour plus 1/25 of a second!
What to do?
As I said at the beginning of my story, there is no beautiful solution to this problem, but it’s impossible not to solve it at all! In our system, we organized a separate stream, we will call it the “system time monitoring stream”, which tracks “time transfers back” and, if detected, “wakes up” all signal variables. Those. in fact, the solution assumes the presence in the system of some dedicated manager in which it is necessary to register all the signal variables used. It turned out something like this:
class SystemTimeManager
{
public:
SystemTimeManager();
~SystemTimeManager();
void registerCond(pthread_mutex_t *mutex, pthread_cond_t *cond);
void unregisterCond(pthread_cond_t *cond);
private:
static void *runnable(void *ptr);
private:
time_t _prevSystemTime;
pthread_t _thread;
bool _finish;
pthread_mutex_t _mutex;
std::map _container;
};
SystemTimeManager::SystemTimeManager ():
_prevSystemTime(time(0)),
_finish(false)
{
pthread_mutex_create(&_mutex, 0);
pthread_create(&_thread, 0, runnable, this);
}
SystemTimeManager::~SystemTimeManager()
{
_finish=true;
pthread_join(_thread, 0);
pthread_mutex_destroy(&_mutex);
}
void SystemTimeManager::registerCond(pthread_mutex_t *mutex, pthread_cond_t *cond)
{
pthread_mutex_lock(&_mutex);
_container.insert(std::make_pair(cond, mutex));
pthread_mutex_unlock(&_mutex);
}
void SystemTimeManager::unregisterCond(pthread_cond_t *cond)
{
pthread_mutex_lock(&_mutex);
std::map it=_container.find(cond);
if(it!=_container.end())
_container->erase(it);
pthread_mutex_unlock(&_mutex);
}
void * SystemTimeManager::runnable(void *ptr)
{
SystemTimeManager *me=reinterpret_cast< SystemTimeManager *>(ptr);
while(!_finish)
{
If(time(0)<_prevSystemTime)
{
pthread_mutex_lock(&me->_mutex);
for(std::map it=_container.begin();
it!=_container.end(); ++it)
{
pthread_mutex_lock(it->second);
pthread_cond_broadcast(it->first);
pthread_mutex_unlock(it->second);
}
pthread_mutex_unlock(&me->_mutex);
}
_prevSystemTime=time(0);
sleep(1);
}
}
Now all we have to do is create an instance of the SystemTimeManager class and remember to register all the signal variables we use in it.
In conclusion, I would like to draw the attention of a respected community to the topic of this article "problem, solution, discussion." I hope the problem is described quite clearly. The solution to the problem described, though not the most elegant, I brought - I hope it will be useful to someone. However, the last - discussion - I can not do without you, dear Habrausers. Maybe someone has some other, more elegant solutions to this problem?