Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Do the climsoft tables recognise invalid dates? #751

Open
rdstern opened this issue Jul 4, 2024 · 9 comments
Open

Do the climsoft tables recognise invalid dates? #751

rdstern opened this issue Jul 4, 2024 · 9 comments

Comments

@rdstern
Copy link

rdstern commented Jul 4, 2024

@mhabimana and @Patowhiz we were sent climatic data by DCCMS. This was daily data for 2 stations, rain, tmax and tmin for many years. One element gave odd errors in R-Instat, for tmin namely that there were 9988 values on February 29th in many non-leap years. One non-leap year (1985) had the value 23.8 on 29 February.

I assumed this could not have been exported from the current climsoft, but I have been assured it did. Am I therefore correct to deduce that the current Climsoft does not detect whether dates are valid?

If this is the case, then I assume it should be a simple correction?

@Patowhiz
Copy link
Collaborator

Patowhiz commented Jul 4, 2024

@rdstern I'm surprised by that. The database itself (leave alone the application) should not accept invalid dates.

@mhabimana @smachua any explanation as to why this may have been possible? Or should we follow up with DCCMS to have a look at the database directly?

@rdstern
Copy link
Author

rdstern commented Jul 4, 2024

@Patowhiz just to clarify, I am not suggesting this is a problem with data entered into Version 4. If it is a problem, it is inherited from earleir versions of Climsoft. So I wonder whether it was a problem earlier. If so, does the importing into Version 4 detect and clean those problems out?

@rdstern
Copy link
Author

rdstern commented Jul 4, 2024

@Patowhiz in addition, if this remains a legacy in these databases, then how do we clean them out now?

@Patowhiz
Copy link
Collaborator

Patowhiz commented Jul 4, 2024

@rdstern I understand. My previous comment referred to the two levels at which data validation typically occurs: Application and Database. If the application fails to detect an incorrect datetime, the database should be able to catch it. In Climsoft, the database stores datetimes in a MariaDB column of the DATETIME type. This means any invalid datetime should be rejected by the database server, and an error should be raised.

I think it would be useful to check with DCCMS on how such invalid datetimes are possible in Climsoft. Then we can think of how to clean and prevent such occurrences.

@mhabimana
Copy link
Contributor

@rdstern, @Patowhiz : It is possible that such values may have sailed in the database through data files import into Climsoft. Since they are discovered, it is just a matter of follow these up in the database and clean them. Correcting wrong values in Climsoft database should not be a problem. It is straight forward.

@Patowhiz
Copy link
Collaborator

Patowhiz commented Jul 4, 2024

@mhabimana thanks for your response. Even if it passed through the Climsoft import feature, I still don't understand how that is possible. The database server should throw a "SQL Error (1292): Incorrect datetime value...". The MariaDB server shouldn't be able to store such a datetime because its execution engine would not permit it.

Have you encountered this kind of invalid date before? I've been unable to reproduce this kind of invalid date in my Climsoft database.

@smachua
Copy link
Contributor

smachua commented Jul 4, 2024

@rdstern we need to confirm the source of that data. Climsoft V4 stores the quality controlled data in MariaDB with the datetime structure of yyyy-mm-dd HH:MM:SS which does not accept invalid dates according to MariaDB rules leave alone Climsoft. Please send us a copy of that we try to deduce its source.

@isedwards
Copy link
Member

Before data is moved to observationinitial it is stored in tables like form_hourly where the date is represented by three separate columns (see below). These individual integer fields will permit invalid dates if the Climsoft application does not catch the problem during key entry.

We've seen in the past some met services keep large volumes of data in the form tables and not move it through QC. It's possible they have exported the data from one of these tables.

CREATE TABLE IF NOT EXISTS `form_hourly` (
  `stationId` varchar(50) NOT NULL,
  `elementId` int(11) NOT NULL,
  `yyyy` int(11) NOT NULL,
  `mm` int(11) NOT NULL,
  `dd` int(11) NOT NULL,
  ...

@smachua
Copy link
Contributor

smachua commented Jul 4, 2024

@isedwards although key entry tables can store invalid dates I still doubt if the data described by @rdstern was exported from any of them for 2 reasons:

  1. Key entry forms are designed with functionalities to block saving of records with invalid or future dates.
  2. Data in key entry tables are without decimal points. So values like 23.8 may not have been from there.

I still think we need to confirm the source of the data sent to @rdstern so that the problem is well addressed.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants