Page 1 of 1

Top of the hour data missing / not processed

Posted: Fri 20 Oct 2023 8:27 pm
by KarlS
A week ago I updated my Raspi4 and afterwards rebooted. For the next two hours everything ran as normal, but ever since on the full hour the data in the monthly log file is missing, the extra web files are not processed and the 3rd party data are not uploaded.

I'm pretty sure this is NOT a CumulusMX problem. I have attached parts of the log and data files to illustrate the problem:
  • 20231013-042556.txt: The start-up process after the reboot completes normally and the log entries for the top of the hour are written for 5:00 (lines 145-148) and 6:00 (lines 178-181). Starting at 7:00 and ever since these 4 log entries are missing. This morning I activated data and debug logging, but no errors were reported (lines 551-560). The normal processing resumed at 10 minutes after the hour (lines 684-711)
  • FTP-20231020-074152: This is the FTP log showing the transmissions from 7:50 – 8:10. Note there is no entry for 8:00
  • Oct23log.txt: This is part of this month's log. The reboot was on Oct 13th at 4:25. The data continue at the 10 minute interval until 6:50, the 7:00 data are missing and every full hour since.
This is a Raspberry 4B running nothing else but CumulusMX v3.26.0. I don't think that I changed anything in the RaspiOS configuration after the reboot. My understanding of UNIX is only rudimentary, so I wouldn't even know where to start looking for an error. Any help will be highly appreciated.

log&data.zip

Re: Top of the hour data missing / not processed

Posted: Sat 21 Oct 2023 1:46 pm
by mcrossley
OK, that is seriously weird. I'm going to have to dive into the code and see if I can find an explanation....

Re: Top of the hour data missing / not processed

Posted: Sat 21 Oct 2023 4:23 pm
by mcrossley
I need a log file with debug logging enabled that covers a number of hours that includes times when the data log entry is created and when it isn't please.

Also, is there a particular reason you are using the HTTP station type with your GW1000 rather than the preferred API type? Especially has you seem to have a WS80 (described as WH80 on your web site) which updates every 4.75 seconds.

Re: Top of the hour data missing / not processed

Posted: Sat 21 Oct 2023 9:38 pm
by KarlS
Mark, I've changed the protocol to "Ecowitt API" per your suggestion. I've attached the debug log covering about 4 hours' worth of data and the relevant part of the Oct23log.txt file. I'll keep the debug log running in case you need more data.

The error in line 123 of the debug log "GetLiveData: Error - database disk image is malformed" was unexpected, but perhaps caused by the change of the protocol?

log&data.zip

Re: Top of the hour data missing / not processed

Posted: Sat 21 Oct 2023 10:31 pm
by mcrossley
KarlS wrote: Sat 21 Oct 2023 9:38 pm Mark, I've changed the protocol to "Ecowitt API" per your suggestion. I've attached the debug log covering about 4 hours' worth of data and the relevant part of the Oct23log.txt file. I'll keep the debug log running in case you need more data.

The error in line 123 of the debug log "GetLiveData: Error - database disk image is malformed" was unexpected, but perhaps caused by the change of the protocol?


log&data.zip
OSo the GW1000 is not updating your data every four seconds, so you should be seeing the faster update rate of the wind speeds on the dashboard?

So still no logging on the hour though.

"GetLiveData: Error - database disk image is malformed" - No idea, the GW1000 code doesn't go anywhere near a database!

BUT, it may give a clue, one of the last things done (and only done on the hour) before the log file is written is to purge old data from the recent data database.

What I'd like you to try is, stop MX, rename the file "/data/cumulusmx.db" to say "/data/cumulusmx-sav.db" and restart MX.
The database file will be recreated - you will lose the one-minute data for the graphs up to the point of restarting MX (will be 10 minute), and the wind speed average may be little odd for a few minutes.

See if that fixes the problem. If it does, could you zip up and attach the cumulusmx-sav.db file (you can delete it then).

Re: Top of the hour data missing / not processed

Posted: Sat 21 Oct 2023 11:06 pm
by KarlS
Wow, that did the trick. :clap: Attached is the old cumulusmx.db. Hopefully it will give you a hint what happened here.

cumulusmx-sav.zip

Re: Top of the hour data missing / not processed

Posted: Sun 22 Oct 2023 8:10 am
by HansR
mcrossley wrote: Sat 21 Oct 2023 10:31 pm "GetLiveData: Error - database disk image is malformed" - No idea, the GW1000 code doesn't go anywhere near a database!
Interesting.
I am very curious about the cause also in relation to other uses of databases.
Cosmic ray damage by accident? Accidental power error/switch off?

Re: Top of the hour data missing / not processed

Posted: Sun 22 Oct 2023 9:15 am
by freddie
mcrossley wrote: Sat 21 Oct 2023 10:31 pm "GetLiveData: Error - database disk image is malformed" - No idea, the GW1000 code doesn't go anywhere near a database!
If you can trap that error then can MX recreate the DB as it does when you delete the DB and restart? Or maybe create from memory if it is just the disk image that is in error? A bit of self-healing :)

Re: Top of the hour data missing / not processed

Posted: Sun 22 Oct 2023 9:22 am
by mcrossley
I'll test it using the corrupt file above later today - my concern now is that none of the DB open/insert/delete operations are raising an exception to MX, the operations seem to be failing silently. :o

Re: Top of the hour data missing / not processed

Posted: Sun 22 Oct 2023 2:14 pm
by mcrossley
It appears the RecentData table in the database is corrupt.

Interestingly, opening the DB in DB Browser does not raise any errors, nor does "browsing" the data in the corrupt table, but after 28 rows of 21,983 it shows "null" data for three rows, then "loading..." for every row thereafter.

So, it is corrupt in some way that is not detected by the SQLite software in normal operation.

Running an integrity check shows errors for those three rows showing "null", then a "database image is malformed" - that command also hangs at that point and locks up the database.
Screenshot 2023-10-22 145122.png
Running a full database compact removes the null rows, and just leaves the "database image is malformed" message. And the integrity checks now exit in timely manner.

It's not 100% clear to me how to handle this error in MX at the moment. I could run a "quick" integrity check at start-up, but in this case it would never return. I could run it as a separate task I guess and abort it after a period if it hangs and recreate the database file at that point.

Re: Top of the hour data missing / not processed

Posted: Sun 22 Oct 2023 2:39 pm
by freddie
Bug report for SQLite Devs?

Re: Top of the hour data missing / not processed

Posted: Sun 22 Oct 2023 3:16 pm
by HansR
freddie wrote: Sun 22 Oct 2023 2:39 pm Bug report for SQLite Devs?
Question is: where does it originate?

EDIT: SQLite is not very susceptible to this type of blocking lockups so the question is legitimate and important for a solution and effort analysis.

Re: Top of the hour data missing / not processed

Posted: Sun 22 Oct 2023 3:38 pm
by mcrossley
I added some code to MX to perform a quick integrity check on startup, and the SQLite library I'm using does not hang - good news.

So it now performs the check.
If the "image malformed" message is found it deletes the file and recreates it
If any other errors are found out tries a full compact, then starts the process again. If any errors are found on the second check, then the file is deleted and recreated.

That will do for now