Dovecot: Time moved forwards

That's because I suspect the mechanism dovecot uses is borked and in fact responsible for all those (alleged) time diffs.
As said: this even happens on a perfectly synced host that has very little time deviation according to ntp - but dovecot claims it is constantly off (at *every single mail* that is delivered) by several orders of magnitude more than ntp is stating/adjusting.
Agreed, though I don't see it "at every single mail". I looked at the commits you posted, and at the code around them, but could not find anything obvious.
 

From the first link:
Has nothing to do with epoll. Dovecot uses kqueue on FreeBSD. This indicates
clock issues, you should use ntpd to avoid this, alternatively your VM/machine
is lagging and clock is jumping forward.

Aki

At least we know it uses kqueue on FreeBSD, but that boilerplate "solution" is simply useless. Hosts are perfectly synced, yet the errors still flood the log with wildly varying offsets, even if the messages are only a few seconds apart:
Code:
May  7 06:15:00 dovecot dovecot[76104]: master: Warning: Time moved forwards by 0.199006 seconds - adjusting timeouts.
May  7 06:15:07 dovecot dovecot[76104]: master: Warning: Time moved forwards by 0.265371 seconds - adjusting timeouts.
[...]
May  7 07:22:08 dovecot dovecot[76104]: master: Warning: Time moved forwards by 0.395615 seconds - adjusting timeouts.
May  7 07:22:39 dovecot dovecot[76104]: master: Warning: Time moved forwards by 0.409265 seconds - adjusting timeouts.
[...]
May  7 09:33:10 dovecot dovecot[76104]: master: Warning: Time moved forwards by 0.155254 seconds - adjusting timeouts.
May  7 09:33:18 dovecot dovecot[76104]: master: Warning: Time moved forwards by 0.319233 seconds - adjusting timeouts.
[...]
May  7 11:32:08 dovecot dovecot[76104]: master: Warning: Time moved forwards by 0.127303 seconds - adjusting timeouts.
May  7 11:32:12 dovecot dovecot[76104]: master: Warning: Time moved forwards by 0.236399 seconds - adjusting timeouts.
[...]
May  7 12:07:12 dovecot dovecot[76104]: master: Warning: Time moved forwards by 0.479545 seconds - adjusting timeouts.
May  7 12:07:23 dovecot dovecot[76104]: master: Warning: Time moved forwards by 0.357021 seconds - adjusting timeouts.

I just picked a few examples I saw while scrolling through the log.
I initially thought this occurs at every processed mail, because the number of occurences of those messages in the logfiles roughly matched the number of mails that host processes in the same timeframe (i.e. a single day). But actually those messages seem to occur roughly 30 seconds after every "imap-login: Disconnected:" or "imap(user)<pid><mailid>: Disconnected" log entry. So it seems imap disconnects are triggering that bogus code, not processed mails.

ntpdate always adjusts time in the range of 3-4 orders of magnitudes lower when called manually than dovecot claims it is off:
Code:
# service ntpd stop && ntpdate pool.ntp.org
Stopping ntpd.
Waiting for PIDS: 12290.
 7 May 12:29:15 ntpdate[56703]: adjust time server 131.188.3.223 offset -0.000112 sec

So the mechanism dovecot is using is definitely broken and needs to be fixed/disabled or at least its bogus log messages muted...

I am on leave this week, so I hopefully find the time putting together a bug report with all the details to clearly show the host clock isn't at fault.
 
Still waiting for "moderator approval" of my message to the dovecot mailing list... because apparently they don't accept plaintext messages:
Code:
The message's content type was not explicitly allowed
 
Still waiting for "moderator approval" of my message to the dovecot mailing list... because apparently they don't accept plaintext messages:
Code:
The message's content type was not explicitly allowed
I'm a bit late to this and don't know whether it's significant or not.

I found that the vast majority (but not 100%) of these "Time moved forwards" logs occurred at almost an exact number of minutes after a POP3 (MUA) or LMTP (MTA) interaction. In this case postfix and dovecot are running on the same VM where time is maintained with NTP.

Code:
2024-07-01T02:30:34.410259+10:00 postfix.example.com dovecot 805 - - pop3(user@example.com)<27997><prT4/x0cEPAK1gED>: Disconnected: Logged out top=0/0, retr=0/0, del=0/0, size=0
2024-07-01T02:33:34.436940+10:00 postfix.example.com dovecot 799 - - master: Warning: Time moved forwards by 0.100146 seconds - adjusting timeouts. [3 mins]

2024-07-01T11:03:21.522368+10:00 postfix.example.com dovecot 805 - - lmtp(29305): Disconnect from local: Logged out (state=READY)
2024-07-01T11:06:21.740665+10:00 postfix.example.com dovecot 799 - - master: Warning: Time moved forwards by 0.100341 seconds - adjusting timeouts. [3 mins]

2024-07-02T02:38:34.523156+10:00 postfix.example.com dovecot 805 - - pop3(user@example.com)<30613><9PVtOjIcrvoK1gED>: Disconnected: Logged out top=0/0, retr=0/0, del=0/0, size=0
2024-07-02T02:41:34.559409+10:00 postfix.example.com dovecot 799 - - master: Warning: Time moved forwards by 0.110271 seconds - adjusting timeouts. [3 mins]

2024-07-02T20:35:17.834048+10:00 postfix.example.com dovecot 805 - - pop3(user@example.com)<32614><JX0VRUEcOcQK1gED>: Disconnected: Logged out top=0/0, retr=0/0, del=0/0, size=0
2024-07-02T20:37:17.747916+10:00 postfix.example.com dovecot 799 - - master: Warning: Time moved forwards by 0.110443 seconds - adjusting timeouts. [2 mins]

2024-07-02T22:05:17.846288+10:00 postfix.example.com dovecot 805 - - pop3(user@example.com)<32754><z8vzhkIcOMYK1gED>: Disconnected: Logged out top=0/0, retr=0/0, del=0/0, size=0
2024-07-02T22:07:17.907880+10:00 postfix.example.com dovecot 799 - - master: Warning: Time moved forwards by 0.100034 seconds - adjusting timeouts. [2 mins]

2024-07-03T05:04:43.632593+10:00 postfix.example.com dovecot 805 - - lmtp(33749): Disconnect from local: Logged out (state=READY)
2024-07-03T05:08:43.817282+10:00 postfix.example.com dovecot 799 - - master: Warning: Time moved forwards by 0.100075 seconds - adjusting timeouts. [4 mins]

2024-07-04T10:39:35.751931+10:00 postfix.example.com dovecot 805 - - lmtp(36667): Disconnect from local: Logged out (state=READY)
2024-07-04T10:41:35.854820+10:00 postfix.example.com dovecot 799 - - master: Warning: Time moved forwards by 0.109982 seconds - adjusting timeouts. [2 mins]

2024-07-04T11:34:39.801657+10:00 postfix.example.com dovecot 805 - - lmtp(36776): Disconnect from local: Logged out (state=READY)
2024-07-04T11:39:40.054802+10:00 postfix.example.com dovecot 799 - - master: Warning: Time moved forwards by 0.106985 seconds - adjusting timeouts. [5 mins]

2024-07-04T22:14:30.398091+10:00 postfix.example.com dovecot 858 - - pop3(user@example.com)<2359><VAGS42oc08sK1gED>: Disconnected: Logged out top=0/0, retr=0/0, del=0/0, size=0
2024-07-04T22:16:30.421074+10:00 postfix.example.com dovecot 854 - - master: Warning: Time moved forwards by 0.100003 seconds - adjusting timeouts. [2 mins]

2024-07-05T07:32:19.165897+10:00 postfix.example.com dovecot 858 - - lmtp(3545): Disconnect from local: Logged out (state=READY)
2024-07-05T07:34:19.210231+10:00 postfix.example.com dovecot 854 - - master: Warning: Time moved forwards by 0.100668 seconds - adjusting timeouts. [2 mins]

2024-07-05T11:02:13.222128+10:00 postfix.example.com dovecot 858 - - lmtp(3866): Disconnect from local: Logged out (state=READY)
2024-07-05T11:07:13.380022+10:00 postfix.example.com dovecot 854 - - master: Warning: Time moved forwards by 0.100777 seconds - adjusting timeouts. [5 mins]
 
I found that the vast majority (but not 100%) of these "Time moved forwards" logs occurred at almost an exact number of minutes after a POP3 (MUA) or LMTP (MTA) interaction. In this case postfix and dovecot are running on the same VM where time is maintained with NTP.
Interesting. I don't have a POP3 listener configured. Maybe this is why I see them less frequently than some?
 
Digging into the src package, the offending line is in src/master/main.c line 625 i_warning("Time moved forwards by %lld.%06lld seconds - adjusting timeouts.",

Commenting out the warning and removing the trailing bracket on the next line to avoid main.c:626:36: error: extraneous ')' before ';'

And lo, all is right with the world again, no noisy Time moved forwards errors any more.

Enjoy your quiet logs again. :D
 
Unless swapping the word Warning for Debug gives you a warm glow, I'll stick with my solution of completely removing a benign message (according to the Dovecot forum) spamming the logs.
 
Turns out all "i_*" messages are sent to the logs; to actually move that to debug it seems the "e_debug" event handler has to be used - which is way out of my league.
Of course I managed to catch a period of ~2 1/2 hours where the bogus message wasn't triggered while testing the patch :rolleyes:
 
The point is that the problem should be solved, and you can't solve it if you can't see it.
In an ideal world the problem should/could be solved, the reality is this has been rumbling on since 2021, the horse is dead Jim, upstream aren't interested, time to move on.
 
So I've been seeing this Dovecot error for a bit and stumbled upon this thread. My delicate sensibilities rebel against something so crass as to just comment out the alerts or try to hide them from view. If there is an issue with time I want to know, and if not I want to know why the error is happening and ideally fix it. I'm fairly certain I have no clock issue and TZ is not an issue. In doing some research I came across the following details:

The Dovecot code is using select() and tracking the time with a timeval struct. It appears (in my arguably abbreviated glance) that Dovecot relies on some Linux type behavior to modify that timeval which is then used to detect time elapsed or whatever. From the FreeBSD man page on select():

Version 2 of the Single UNIX Specification ("SUSv2") allows systems to
modify the original timeout in place. Thus, it is unwise to assume
that the timeout value will be unmodified by the select() system call.
FreeBSD does not modify the return value, which can cause problems for
applications ported from other systems.

Apparently FreeBSD does NOT modify the time passed in to the select but it sure looks like the Dovecot code expects it to. It seems like that is a probable cause for the jumps in time reported by Dovecot. I haven't bothered to look into potential harm from the situation but according to the Dovecot docs there is less concern for jumping into the future than the past. In any case the time jump is not really happening anyway as far as I can tell and is a hallucination caused by expecting some Linux behavior as default.

FWIW
 
  • Thanks
Reactions: sko
Thanks for that insight.

The problem is however, that upstream consensus always has been "I'ts not a problem, ignore the message, we don't care to fix this". So unless someone here is able and willing to fix their broken linuxisms I don't think this bogus messages and underlying code will be addressed by upstream, as the last several years have shown...
 
I have various local patches and commits anyways, so my ports trees are usually rebased and poudriere recognizes them as "dirty". But this never really bothered me TBH...
 
I just keep another poudriere instance running on head and work from there...

Code:
pkg lock -l
Currently locked packages:
dovecot-2.3.21.1_1
opendmarc-1.4.2_3
postfix-current-3.10.20240724,6

Then lock down the packages I want after. Overlays would perhaps be the ideal solution but that silver bullet eludes me atmo.
 
I just built dovecot with that patch (currently only in latest; had to cherry-pick into quarterly) and am monitoring the logs. Looks good so far.
Now merged into quarterly.

Interesting analysis from the main Dovecot author:
Some people were saying it happens mainly on idle hours. Maybe kqueue() is accurate with low timeout values, but not accurate on high timeout values? So if Dovecot asked kqueue() to wait for <100ms, it would be very accurate. But if it asks to wait for 10000ms, kqueue() would think it's okay to return after 10100ms. If that's the case, this check could be changed to allow higher time jumps only on higher timeout waits.

He implemented allowing higher time jumps in his patch.
 
Yes, I've asked in the PR if it will be merged to quarterly and my wish was granted ?

Regarding the analysis: I've usually seen it somewhat regularly after user disconnects (see post #29) and mainly during working hours, but also during the night when those servers are mostly idle. So no, I wouldn't have linked it to system load...

But regardless, I'm happy this got finally fixed in a proper way.
 
Back
Top