All entries for Monday 07 March 2022
March 07, 2022
This is not one of my mistakes, this is one I read about second hand (you'll see the pun in a moment), but it is a great illustration of the importance of writing the code you actually mean to, thinking properly about what it is that you mean, and watching out for weasel words that might confuse you.
Consider these two lines of code and decide for youself whether they have the same effect:
if (DateTime.Now.Second == 0 && DateTime.Now.Minute == 0)
if (DateTime.Now.Minute == 0 && DateTime.Now.Second == 0)
Decided yet? If you think they are the same, take a second now (that's a hint!) to think about the meaning of each part, and what its value will be. Suppose the original author intended to run some process once per hour, so put one of these into a loop. Can you see why one of these might succeed, and the other fail occasionally (assume the rest of the code takes some tiny fraction of a second)?
By the way, if you're worried about not knowing what language this is in, don't be. This is a perfect example of good "self documenting code" which we've written a bit about before. Based on the names, we can infer (correctly) that DateTime is some sort of thing, which can supply the date-and-time Now through the "DateTime.Now" construct, and that we can then look at the Minutes and Seconds value of the time Now. That's all we need to know.
So, back to the question. Are the lines the same? Well, the order is different. There's a double-& AND operation. We don't strictly know whether the left or right condition will be checked first (it might depend on language), and we don't actually know if both will be (if you're not sure why that is, we've talked about short circuiting in logical operations here). But that shouldn't matter unless the DateTime object is having some weird side-effects and I promise it isn't. The problem is much more fundamental than that.
If you haven't spotted it, it's time to write down very clearly what we wanted to check. We want to know if, right now, the value of both the minutes and the seconds entries are 0. It looks like that's what either line does, but there's a nasty weasly word sneaking in here - the word "now". What can "now" actually mean in programming terms, where we know operations ultimately are carried out one by one in sequence? For instance, what would you expect the following code could print?
if (DateTime.Now.Second == 0) print(DateTime.Now.Second)
If you said 0 only, hold on a second (that's a hint, too) and look again, and think about what this code will actually do in terms of basic operations. It will get a value for DateTime.Now.Second. It will compare this to 0. If the comparison is true it will get a value for DateTime.Now.Second, and it will print this value. And there is the problem! The value we check and the value we print are not always the same.
If it still isn't clear to you why the original two lines are not equivalent, reread that last paragraph a few times and apply the same idea to the original question. If it's been obvious to you all along, great. You're unlikely to make the mistake this original programmer did. But why did they make this mistake? There could be several reasons:
- They might simply not have spotted there could be a problem - this is pretty likely, but not very interesting
- They might have thought somehow the compiler/runtime would recognise the values as "the same" - this is a pretty fundamental mistake, perhaps due to thinking of "Now" as a property rather than an operation. In some very imprecise sense, we should think of "Now" as having a side-effect on itself because it takes time, so multiple calls will not have the same result as a single one
- They might have blindly removed a temporary variable containing DateTime.Now, without thinking deeply about the consequences. This is quite likely because it looks so simple, and is why refactoring like this needs, if anything, more attention than writing the code in the first place
- Lastly, they might have thought carefully and precisely, but ultimately wrongly about it. If they confused the idea of seconds as a time point with seconds as a time interval, they may have thought that both the Seconds and the Minutes property would surely be evaluated within the same second, and thus work as expected, where actually even a microsecond between them could be the difference between 1:59.999 and 2:00.000
So, that explains why the lines don't work as intended, but there's one weird thing left - how come one of them (the first one, if evaluation occurs left to right) works, but the other doesn't? Effectively, the minutes value depends on the seconds value - the minute is incremented only when the seconds reads 59. We know the whole operation takes far far less than 1 second (ignoring interrupts or anything else suspending operation), so once we have confirmed the seconds value as 0, we have at least 59-60 seconds to check the minutes value, during which it cannot change. Certainly this will occur within that interval, and the code will work perfectly (again, ignoring external iterrupts, which our code can never account for anyway).
So, does that mean the first option is actually safe, or good code? Well, not without a comment it certainly isn't! If we're relying on this sort of subtlety, somebody is going to be tripped up by it, and it might even be us. It's also a bit delicate - suppose we were to need to use DateTime.Now in another part of the condition, or in the body? We'd have to think very hard about whether we're still "safe" and we simply don't need to. A temporary variable is much clearer, and the possible optimisation here is a) tiny and b) best left to our compiler anyway. Oh and c) possibly a de-optimisation anyway, depending on the cost of the DateTime.Now lookup. Also, it'd be easy to forget that the left-to-right ordering is important, and we might try this in a situation or language where that is not guaranteed.
Ultimately, I simply don't like relying on such a subtlety. It's too clever and it doesn't need to be. It's not that I don't like clever code, because I do. I like clever code that makes one exclaim "oh yes, that's so simple, it's brilliant", and this aint that. This is more "oh blimey, is that how it works". Keep it simple, and if you can't keep it simple, at least make it satisfying.
Before we go, I hope something has been nagging at you throughout this post. I started off by saying we should think hard about what we mean and what we want to actually do. I introduced this idea of minute-0 and second-0 and then proposed it as a solution to doing something once per hour. Did it strike you as a pretty terrible solution to that problem? Because it is. Suppose we start running our code at 9:00:30 AM. Would we expect whatever this block controls to wait an entire hour to fire for the first time? What if the code is never running at the change of the hour? This block would never run. Suppose we start 1000 copies of this code. Do we really want all 1000 to execute this block at the same time? Suppose some other code took longer than expected and this block isn't reached once per second - some hours might be simply skipped. In nearly all cases we'd be better checking whether at least an hour has elapsed since last we did the thing, and we'd avoid all of these problems. We meant once per hour, not on the first second of every hour, and we should have coded that, not this.
This post is inspired by an article I read a while ago, namely https://thedailywtf.com/articles/an-hourly-rate In particular, a comment mentions the different behaviour of the two possible lines, which this post focuses on.