"Never say never" or Working with timezones correctly

    This article talks about the problems that await a programmer working with time zones. In theory, everything seems to be fine, simple and clear, but life is a complicated thing, and in practice, sometimes, completely unexpected situations arise.

    TL; DR: Working with time zones is pain and humiliation. Never work with time zones!

    So, all around repeatyou, that when receiving time from the user, you need to immediately transfer it to UTC, you only need to work with time in UTC and you need to store time strictly in UTC. Advice, at first glance, looks reasonable, and following it makes your life easier ... Unless your program involves complex work with dates. Record the date and time of user registration on the site in the database? Save message sending time or order creation date in the online store? Display a message in the log indicating the date-time? Use UTC and everything will be in order, you can not even read this article further. Any current time can be completely safely converted to UTC and forget about the problems. But what if we want to work with time in the future? Or in the past? For example, if we write a calendar service, or a service for delayed sending messages?

    UTC is not a panacea

    I will explain with an example. Let's say we created the same pending message service. Having visited our site, the user can create a reminder for himself at any time (of course, in the future) by mail or SMS. Our site is extremely simple: we set the date, time, enter the reminder text and the communication channel (email address or phone number), put the data received from the user into the database, and then periodically select it and send messages. Everything, profit and respect of grateful people!

    No, not all. Following the advice to always store everything in UTC everywhere, we converted the date and time received from the user into UTC and put them in the database. Let a user from Moscow visit our site on March 2, 2014 and create a reminder at 09:00 AM on November 3, 2014. Accordingly, we put the value “2014-11-03 05:00:00” into the database, because on that day, March 2, 2014, the offset for the Europe / Moscow time zone for November 3, 2014 was “UTC + 4”.

    See what I'm getting at?

    Yes, on July 21, 2014, the State Duma of the Russian Federation adopted a bill to abolish daylight saving time. According to this law, from October 26, 2014, the offset for the Europe / Moscow time zone became “UTC + 3” instead of “UTC + 4” (and also the daylight saving time was canceled, but this is not about that now). Accordingly, if we send a notification to the user on November 3 at 5:00 a.m. UTC, he will receive it at 8:00 a.m. Moscow time, and I am sure that the user will be perplexed, because he asked that the notification come to him exactly at nine in the morning.

    The conclusion is simple: you can store time in UTC, but only for events in the present and recent past, that is, for those dates whose timezone will not change. Storing time in UTC for dates in the future is dangerous, because no one knows what other laws the governments of which countries will adopt, and what will happen to time zones in ten years, five years, or even in a year.

    On the other hand, if you store the local time of the user and his timezone in the database, it will be practically impossible to work with such data. Back to our notification service example: two users created by notification. The first user from Moscow asked me to send him an SMS on December 15, 2014 at 15:00 (we write to the database his local time “2014-12-15 15:00:00” and his time zone “Europe / Moscow”). The second user from New York, asked to send him an e-mail on December 15, 2015 at 7:00 PM (we write his local time “2014-12-15 19:00:00” and his time zone “America / New_York” in the database ) So far so good: we have recorded the local time at which the user would like to receive his notification, and he will receive it strictly at that time,

    Problems begin when you write a script that selects notifications from the database for sending. If all dates were recorded in UTC, everything would be simple - every minute we select messages to send:
    SELECT * FROM reminders WHERE remind_time < NOW();

    Provided that "SELECT NOW ();" returns the time in UTC. But we recorded in the database the local time of the user and his time zone, what should be done? Suffer :-) After all, “NOW ()” by UTC is “+3” hours in Moscow (and the message is already late) and “-5” hours in New York (it is too early to send a message).

    No, of course, you can think of many ways to select from the database those notifications that it is time to send, but all of them on a more or less loaded service will lead to performance problems, and in general we want to do everything right, without crutches, right?

    What are the options? There are many of them, but I see only one more or less acceptable option: to store three values ​​in the database: time in UTC (for sampling by this field), local time of the user and his time zone (time zone). Yes, we will store redundant data, but I do not know of a single loaded service that would not resort to data denormalization. In the real world, this is normal. What are the benefits we get? In case of time zone changes, we can go through the entries for the changed time zones with a special script, and update the time in UTC if it has changed as a result of updating the time zone. In my humble opinion, this is a good compromise.

    Still worse than it sounds

    Like everything, huh? No, we just started :-) The government can not only change the configuration of time zones, but also add new ones and throw out old time zones. So, for example, for residents of the Russian city of Chita (and not only for it, but not about that now), on October 26, 2014, a new time zone “Asia / Chita” was introduced (before this time zone did not exist) instead of the “ Asia / Yakutsk ". The difference with UTC for the previous time zone (“Asia / Yakutsk”) is “+09: 00”, and for the new time zone (“Asia / Chita”) this difference is “+08: 00". The problem is that we only store the time and time zone of the user in the database, but not its geographical location. And for entries with the time zone “Asia / Yakutsk” we can’t know in any way whether our user is from Chita or from Yakutst, and we can’t reliably determine the time the message was sent to the user. Checkmate! Do not forget to suffer, friends.

    If you have the opportunity to find out the geographic location of the user and the next time he visits the site, determine that he is in a region with a changed time zone (Chita for the case above), you can ask him for the correct time zone. And to propose updating the timezone for all its events (with recalculation of time in UTC for each event), but pitfalls and nuances may also arise here that are beyond the scope of this article. By the way, partly for this reason, in the mail.ru Calendar settings, we ask the user to choose their geographical location (city), and not the time zone, as other services do :-) And even so, I’ll honestly say that there are problems from time to time.

    With the storage of time in the past is also not so simple. If this past is relatively recent (say, we are talking about the twenty-first century), then there should be no problems with storing time in UTC (although no one will guarantee you, of course). If we are talking about the twentieth century or (oh, horror) of more ancient times, problems are guaranteed. To begin with, for many periods of the history of the last century, information on the translation of watches is constantly changing to this day. So, for example, in the update of the time zone database tzdata version 2014g dated August 30, 2014, for a number of time zones of the USSR, changes were made by several seconds or minutes for dates until 1926. Just someone noticed a discrepancy and notified the tzdata compilers about this. Or here's another example from times closer to us:

    The database of time zones is updated several times a year, new time zones appear all over the world, the rules of the existing ones change, information about the past time is updated, some changes are constantly happening, and they must be constantly taken into account.

    So how do you keep time right?

    So, how is it possible to properly store time in the database? It’s better, of course, not to do this, but if you really need it, here are my personal recommendations (I will be glad to hear criticism or suggestions):
    1. If you need to store the time of an event that just happened, the current time, after a certain action, store it in UTC. These can be log entries, time of user registration, ordering or sending a letter.
    2. If the time is not tied to the user or his time zone, store it in UTC. This may be, for example, the time of the next solar eclipse.
    3. If you need to store time in the past or in the future, save the local time of the user, and save his timezone nearby. And even better, so that for sure, keep the geographic location of the user. If you need to make samples for this time, save the time in UTC nearby, and update this time when changing the time zone information.
    4. If you need to know exactly the time for any date for a given geographical location (for example, for astronomical calculations) - store the exact coordinates of the user, but not his time zone. However, if you are faced with such a task, then you already know how to do it right.

    The first option covers possible use cases for 99% of the programs and, quite possibly, this will be enough for you. However, it is necessary to clearly understand and realize the choice of one or another version of the action.

    We work with time

    With storage of time, it seems, sorted out. However, you can often hear the advice "always work with time in UTC." It is understood that as soon as you receive time from the user, you must immediately transfer it to UTC and work only with time in UTC. Sounds logical, doesn't it?

    Not true. At least not in all cases, and here is a concrete example.

    Let us return to our example with the service of deferred messages. Everything is fine, the service is developing, users are satisfied, but they are asking to add the functionality of repeated notifications. And repetitions are not only simple (“every day”, “every other day”, “every month”), but also quite complex (“every week on Tuesdays”, “every month on the last Friday of the month”, etc.). In order not to write your bike for these repetitions, we will study ready-made solutions. There is such a thing as "recurring events." There is a special format for describing the rules of repetition, which, of course, does not take into account all possible options (for example, you cannot specify “two days in two”), but it covers most cases. Examples of the application of this format can be seen in the description of the RRULE field of the iCalendar specification and inObject documentation rrule module python-dateutil for Python.

    Take the python-dateutil module and use it in our code. Everything seems to be fine, but users complain, and the study of these complaints leads us with rather unexpected results.

    One option for recurring events is to repeat by day of the week. We can describe an event that repeats, for example, at 12:00 every week on Tuesdays and Fridays. Here's how it might look in practice, in real code:
    >>> import datetime
    >>> from dateutil import rrule
    >>> list(rrule.rrule(rrule.WEEKLY, count=4, byweekday=(rrule.TU, rrule.FR),
                         dtstart=datetime.datetime(2014, 11, 3, 12, 0)))
    [datetime.datetime(2014, 11, 4, 12, 0),
     datetime.datetime(2014, 11, 7, 12, 0),
     datetime.datetime(2014, 11, 11, 12, 0),
     datetime.datetime(2014, 11, 14, 12, 0)]

    It would seem that all is well. Now let's imagine that a user from Moscow created a recurring event that occurs at one in the morning. As soon as we got the time “2014-11-03 01:00:00” from him, we, according to the recommendations of smart people, immediately transfer it to UTC (we are not interested in the translation process now, we should know that in fact we take three hours from the received time), and we get the following time in UTC: datetime.datetime (2014, 11, 2, 23, 0). So far so good. Let's get the replays for the received time:
    >>> list(rrule.rrule(rrule.WEEKLY, count=4, byweekday=(rrule.TU, rrule.FR),
                         dtstart=datetime.datetime(2014, 11, 2, 23, 0)))
    [datetime.datetime(2014, 11, 4, 23, 0),
     datetime.datetime(2014, 11, 7, 23, 0),
     datetime.datetime(2014, 11, 11, 23, 0),
     datetime.datetime(2014, 11, 14, 23, 0)]

    Something seems to have gone wrong. If we transfer the obtained values ​​to the local time of the user (we add three hours to each), we will see that the repetitions have moved and the event repeats all the same at one in the morning, but on Wednesdays and Saturdays. And this is not a python-dateutil module error, the code worked correctly. This is our mistake, in this particular case we needed to work with the local time of the user.

    By the way, many calendar services have this bug, for example, the iCal program in OS X, in certain cases, considers repeats to be completely wrong.

    Don't forget to suffer

    The conclusion can be made simple and completely banal: never listen to categorical statements recommending that you never do this or that thing. Always think over and work out all possible options, especially carefully work out the architecture of the project, write quality tests and keep them up to date.

    And working with time zones is pain and suffering, yes. If there is even the slightest opportunity not to work with them - use it, you will not regret it. Finally, I’ll give a couple of examples of incorrect operation of real programs:

    Python 2.7.6
    ➜ date
    воскресенье,  9 ноября 2014 г. 22:44:32 (MSK)
    ➜ python -c "import datetime; print datetime.datetime.now()"
    2014-11-09 22:44:33.310904
    ➜ python -c "import datetime; print datetime.datetime.utcnow()"
    2014-11-09 19:44:34.405287

    Everything seems to be fine. We look further:
    ➜ date +%z
    ➜ python -c "import time; print time.timezone/3600"

    Wat? No, it’s like this is not a bug, but a feature , but this is not easier for anyone. What is the point in the code that can break at any moment (and breaks!)?

    Firefox 33.0.3
    new Date(2015, 0, 6) 
    "Tue Jan 06 2015 00:00:00 GMT+0300 (Russia TZ 2 Standard Time)" 
    new Date(2015, 0, 7) 
    "Tue Jan 06 2015 23:00:00 GMT+0300 (Russia TZ 2 Standard Time)" 
    new Date(2015, 0, 8) 
    "Thu Jan 08 2015 00:00:00 GMT+0400 (Russia TZ 2 Daylight Time)"

    Wat? No, I understand that this issue has been raised many times , but living from this is not easier.

    In general, what can I say, do not forget to suffer :-)

    And how do you work with dates, times and time zones?

    Vladimir Rudnykh,
    Technical Director of the Mail.Ru Calendar

    Also popular now: