Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[FIX] account_bank_statement_import_online_paypal: timezone handling #418

Open
wants to merge 1 commit into
base: 12.0
Choose a base branch
from

Conversation

hbrunn
Copy link
Member

@hbrunn hbrunn commented Sep 27, 2021

I had some issues with daily statements and hourly/daily (doesn't really matter) pulls:

In the original functions, all time information is nixed, so we effectively always request UTC times, which makes the statements imported in Odoo inconsistent with the reports you can look at on the website (you miss whatever is within your UTC+-N), which confuses users a lot, even though in the end, you get the correct balances.

What's the rationale to normalize in these functions at all? Can't we normalize next_run on input? statement_date_since needs to be adapted for the same reason.

@hbrunn hbrunn force-pushed the 12.0-account_bank_statement_import_online_paypal-timezone_handling branch from e3d475d to 4d45d42 Compare September 27, 2021 22:22
Copy link
Contributor

@alexey-pelykh alexey-pelykh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@hbrunn I'm not 100% follow the issue, sorry. If you request wide enough period, there won't be missing transactions, yet they may appear on adjacent dates/statements because of way transaction date is computed from transaction date-time https://github.com/OCA/bank-statement-import/blob/12.0/account_bank_statement_import_online/models/online_bank_statement_provider.py#L273

@api.multi
def _get_statement_date_step(self):
delta = super()._get_statement_date_step()
if self.service == 'paypal':
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I try to be minimally invasive here - if those functions work well for other providers, we shouldn't change the logic for them

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Probably I've expressed myself poorly: I mean that this specific change to the method shouldn't have any effect since step is either day, week or month - all having no hours minutes etc.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

when you handle datetimes (as opposed to dates), then time is very relevant. I'll check out what you write below later

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes, that's obvious, yet still - it seems to me that this code part effectively changes nothing.

@hbrunn
Copy link
Member Author

hbrunn commented Sep 28, 2021

consider we have a setting of GMT+2, we set the pulling to daily, and the next run date initially to 2021-09-19 00:00:00 (CEST) with is 2021-09-18 22:00:00 (UTC).

Then in the cronjob, date_since will be 2021-09-17 22:00:00 and date_until 2021-09-18 22:00:00

Within _pull, we get statement_date_since as 2021-09-17 00:00:00 and then go requesting those times in UTC from paypal.

So in the end, we don't miss transactions, but the statement for the 19th doesn't contain all transactions from 2021-09-19 00:00:00 (CEST) to 2021-09-20 00:00:00 (CEST), it contains the transactions from 2021-09-19 02:00:00 (CEST) to 2021-09-20 02:00:00 (CEST) - so the sums are correct, you don't miss transactions because they'll be in the next (or previous) statement, but when you compare Odoo's daily statements with paypal's, they won't match because they select different transactions for the 19th.

@alexey-pelykh
Copy link
Contributor

Ah, ok. There's a no-code solution for that already, I'll write in details as soon as I'll be near the laptop.

@alexey-pelykh
Copy link
Contributor

we set the pulling to daily

pulling is done hourly by default, setting of daily/weekly/monthly changes only how often separate statements are created. E.g. I'm using weekly statements in 99% cases.

the statement for the 19th doesn't contain all transactions from 2021-09-19 00:00:00 (CEST) to 2021-09-20 00:00:00 (CEST), it contains the transactions from 2021-09-19 02:00:00 (CEST) to 2021-09-20 02:00:00 (CEST)

yes, which is intended behaviour unless this setting is changed. Setting that to CEST should accomplish the result that you're seeking, if memory serves me.

@hbrunn
Copy link
Member Author

hbrunn commented Sep 28, 2021

well, but it doesn't. I've set the timezone there, and as you see from my debug description above, you will end up with wrong balances per day, because here in my case you'll compare something like 2021-09-18 22:42:00 (00:42 CEST) with 2021-09-19 00:00:00, and throw out the transaction (it will then show up on the statement before).

If we could redesign all of the code, it would simplify stuff a lot to use the date type in the base plugin, and only do timezone conversions/datetime stuff if the provider handles UTC internally like paypal. But as this is a massive change, that's more for the next migration than for a stable release

@alexey-pelykh
Copy link
Contributor

well, but it doesn't. I've set the timezone there, and as you see from my debug description above, you will end up with wrong balances per day, because here in my case you'll compare something like 2021-09-18 22:42:00 (00:42 CEST) with 2021-09-19 00:00:00, and throw out the transaction (it will then show up on the statement before).

Then it sounds like a bug, especially 00:42 CEST, somewhere in code related to datetime conversion, likely in base module itself. And it would be great to start with a test case that exposes this 00:42 shift first.

IIRC, I've seen such 00:42 shift sometime ago and it was due to which function was used to make timezone conversion.

If we could redesign all of the code, it would simplify stuff a lot to use the date type in the base plugin, and only do timezone conversions/datetime stuff if the provider handles UTC internally like paypal.

That would lead to a lot of copypaste between modules, most of them are timezone and datetime aware, so making base module being date-only is not great, imho. This was considered when initial development held place.

@alexey-pelykh
Copy link
Contributor

But at least now I understand why you want minutes to be erased, since 00:42 would screw up the filtering. But 00:42 shouldn't be there in the first place.

@hbrunn
Copy link
Member Author

hbrunn commented Sep 28, 2021

it is in the transaction date...

@hbrunn
Copy link
Member Author

hbrunn commented Sep 28, 2021

so how do we continue here? Do you think it's correct behavior that the statements show transactions from the UTC period of its date instead of the localized one?

@alexey-pelykh
Copy link
Contributor

Ah, true, misunderstood you. Well, regardless, it should be something handled by settings since it's quite a breaking change - and not a fix exactly.

@alexey-pelykh
Copy link
Contributor

Do you think it's correct behavior that the statements show transactions from the UTC period of its date instead of the localized one?

I think that the correct way is to let user decide which TZ should be used for statements. That PR that introduced TZ probably didn't cover it fully, yet it was supposed so, IIRC.

Setting TZ to CEST should've done the trick, while keeping it a UTC should've kept things unchanged. So if anything of that is false, I would consider it a bug of base module to be honest vs bug of this exact provider.

I have some companies that do UTC statements due to them being international, others like local timezones, yet since I've never heard of this issue before - I assume nobody checked or cared.

@hbrunn
Copy link
Member Author

hbrunn commented Sep 28, 2021

you cannot handle this with settings alone. The code as it is normalized every date to UTC, so without changing the code this won't change.

What I propose here is indeed only a workaround for the base module handling timezones incorrectly, but we can't rewrite all of this in v12 now.

I just realize for people who already have wrong intervals imported, this change is indeed a problem. But that can be fixed with a migration script that sets next_run to the force localized version of the date, then they'll keep getting the intervals they used to. Would this migration script make the patch mergeable for you?

@alexey-pelykh
Copy link
Contributor

What I propose here is indeed only a workaround for the base module handling timezones incorrectly, but we can't rewrite all of this in v12 now.

It shouldn't be a rewrite, at least in my mind. By default everything was UTC and that case is handled well, the case with non-UTC setting (that PR specifically) must have something missing in it, so I don't think it would be a massive rewrite. In any case, we can start with a test case in base module that exposes the issue.

I just realize for people who already have wrong intervals imported, this change is indeed a problem

Shouldn't be that dramatic, since data duplication won't occur. But re-pulling data may cause issues, true. But major version bump would do?

But that can be fixed with a migration script that sets next_run to the force localized version of the date,

I'm against anything forced. Ideally, there should be a setting that allows TZ to be set, if user wants to, that would do the amendment to the behaviour.

@alexey-pelykh
Copy link
Contributor

Also, next_run is hourly by default, why would we want any TZ there?

@alexey-pelykh
Copy link
Contributor

If I got that correctly, issue is: "When provider's Timezone is set to CEST, function that computes statement date from transaction produces wrong date (that is still UTC-ish)" - right?

@hbrunn
Copy link
Member Author

hbrunn commented Sep 28, 2021

the force localization is only for the migration script. There, we'll have something like 2021-09-19 00:00:00, and in order to produce the same intervals as before with the code I propose this needs to become 2021-09-19 02:00:00 if there's a timezone set on the provider. If timezone == utc or null, there's nothing to do. I guess this is the case with your customers.

Then for the test case: I can write code that will prove that with tz CEST set and daily statements, it will keep requesting the UTC interval. But that's not very interesting, right?

My reasoning for fixing this in a hackish way only for paypal is that I think the good way to do this right is:

  • only use fields.Date in account_bank_statement_import_online, change the functions accordingly to operate on datetime.date objects (then we can also remove the correction code)
  • change every provider to consume datetime.date parameters
  • change every provider to return the date field as datetime.date

Given we have quite a bit of providers in v12, I don't think it's feasible to do it here. Would be a good point for the roadmap for the v14 migration, or maybe even something we can slip into v13?

@hbrunn
Copy link
Member Author

hbrunn commented Sep 28, 2021

the short description is: Statements contain the UTC interval, not the localized interval if provider.tz is set

@alexey-pelykh
Copy link
Contributor

Statements contain the UTC interval, not the localized interval if provider.tz is set

Interval as "statement date is being in UTC"? That's being "since" date?

@hbrunn
Copy link
Member Author

hbrunn commented Sep 28, 2021

what we request from paypal when the statement says '2021-09-19' with CEST is actually the transactions from '2021-09-19 02:00:00 CEST' to '2021-09-20 02:00:00 CEST' (edit: applied the correction the wrong way)

@alexey-pelykh
Copy link
Contributor

the transactions from '2021-09-19 02:00:00 CEST' to '2021-09-20 02:00:00 CEST'

And you'd expect that to produce two statements (with provider tz being set to CEST)?

@hbrunn
Copy link
Member Author

hbrunn commented Sep 28, 2021

I'd expect the statement for the 19th to contain the transactions from '2021-09-19 00:00:00 CEST' through '2021-09-20 00:00:00 CEST', with the matching balance.

That's how my customer came to this in the first place, we installed the module, compared Odoo statements with paypal statements, noticed diverging balances. This was actually difficult to debug ;-)

@alexey-pelykh
Copy link
Contributor

Okay, I got your point, yet it's slightly other way round with same result.

Case 1:

User having CEST timezone makes a pull '2021-09-19 02:00:00'-'2021-09-20 02:00:00' via UI from a "daily" provider with CEST timezone. That request produces 1 statement with 2021-09-19 date containing '2021-09-19 00:00:00 CEST' through '2021-09-20 00:00:00 CEST'.

That's a strict no, super confusing.

Case 2:

User having CEST timezone makes a pull '2021-09-19 00:00:00'-'2021-09-20 00:00:00' via UI from a "daily" provider with CEST timezone. That request produces 1 statement with 2021-09-19 date containing '2021-09-19 00:00:00 CEST' through '2021-09-20 00:00:00 CEST'.

That's a yes, I'd expect that to happen as well.

Case 3:

User having CEST timezone makes a pull '2021-09-19 00:00:00'-'2021-09-20 00:00:00' via UI from a "daily" provider without timezone. That request produces 2 statements with 2021-09-18 and 2021-09-19 dates containing '2021-09-19 00:00:00 CEST' through '2021-09-19 02:00:00 CEST' in one and '2021-09-19 02:00:00 CEST' through '2021-09-20 00:00:00 CEST' in second. That's statement date being in UTC as well as transaction dates being in UTC.

That's a yes, I'd expect that to happen as well.

If we're on the same page, for you case 2 fails and you get two statements? This may sound silly, yet I'd prefer to iron out this timezone thing once and for all.

@hbrunn
Copy link
Member Author

hbrunn commented Sep 28, 2021

as a user you don't have control which intervals are requested exactly, this complicates things. How can you as a user now set the exact interval? It's part of the problem that whatever you fill in for next_date in the UI will be converted to UTC internally, and our code then sees '2021-09-18 22:00:00' with is capped to '2021-09-18 00:00:00' when I fill in '2021-09-19 00:00:00' in the UI with CEST as timezone.

So yes, case 1 we don't want, case 2 we want, and case 3 then yes, it's a choice not to set a timezone on the provider and then you get UTC, so the 19th CEST is split over two days/statements when the provider is UTC.

My patch was designed to not change anything when no timezone is set, so for UTC users things will just stay the way they are.

No worries about being thorough, timezone conversion stuff still gives me several knots in the brain every time I'm busy with them

@alexey-pelykh
Copy link
Contributor

alexey-pelykh commented Sep 28, 2021

as a user you don't have control which intervals are requested exactly, this complicates things.

There's a wizard that allows to pull specific period of time manually, e.g. to pull old data after installing the module. Surely, it's not the approach anyone would want to use regularly.

How can you as a user now set the exact interval?

Please describe in details why anyone would want that? Apart for occasional manual pull described above, online bank statements are supposed to be plug-and-play, configure once and forget.

It's part of the problem that whatever you fill in for next_date in the UI will be converted to UTC internally,

That field was never intended to be changed manually. In my deployments I never ever had to change it, or had any reason to.
Thus again, please describe what exactly you're trying to accomplish by changing that field manually. It's not even exposed to normal user under normal circumstances.

So yes, case 1 we don't want, case 2 we want, and case 3 then yes, it's a choice not to set a timezone on the provider and then you get UTC, so the 19th CEST is split over two days/statements when the provider is UTC.

Awesome, it would be great if you'd take a look as well on test cases that cover self.tz usage in main module, maybe you'll uncover something as well. I'll take a look at those as well tomorrow just to ensure I didn't miss anything. Until today I was under impression that case 2 is working, because I've even covered that with tests.

@hbrunn
Copy link
Member Author

hbrunn commented Sep 28, 2021

How can you as a user now set the exact interval?

Please describe in details why anyone would want that? Apart for occasional manual pull described above, online bank statements are supposed to be plug-and-play, configure once and forget.

that was a reply to your message, so no, I don't think anybody wants this. Misunderstanding

It's part of the problem that whatever you fill in for next_date in the UI will be converted to UTC internally,

That field was never intended to be changed manually. In my deployments I never ever had to change it, or had any reason to. Thus again, please describe what exactly you're trying to accomplish by changing that field manually. It's not even exposed to normal user under normal circumstances.

the same applies to the wizard, any time a user fills in a datetime anywhere in Odoo, it's converted to UTC based on the user's timezone

So yes, case 1 we don't want, case 2 we want, and case 3 then yes, it's a choice not to set a timezone on the provider and then you get UTC, so the 19th CEST is split over two days/statements when the provider is UTC.

Awesome, it would be great if you'd take a look as well on test cases that cover self.tz usage in main module, maybe you'll uncover something as well. I'll take a look at those as well tomorrow just to ensure I didn't miss anything. Until today I was under impression that case 2 is working, because I've even covered that with tests.

ah, and there I see the problem: In https://github.com/OCA/bank-statement-import/blob/12.0/account_bank_statement_import_online/tests/online_bank_statement_provider_dummy.py#L60 you don't convert to the timezone in question, you just declare the time is in this timezone. What we need here is tz.localize(date).astimezone(pytz.utc).replace(tzinfo=None), see http://pytz.sourceforge.net/#localized-times-and-date-arithmetic - you want to generate the interval in the target timezone (what you put in the context), and then pull transactions with provider.tz.

Why do we have all the checks for tzinfo in the code anyways? Internally, Odoo only ever uses naive datetimes expressing UTC times.

@alexey-pelykh
Copy link
Contributor

ah, and there I see the problem: In https://github.com/OCA/bank-statement-import/blob/12.0/account_bank_statement_import_online/tests/online_bank_statement_provider_dummy.py#L60 you don't convert to the timezone in question, you just declare the time is in this timezone. What we need here is tz.localize(date).astimezone(pytz.utc).replace(tzinfo=None), see http://pytz.sourceforge.net/#localized-times-and-date-arithmetic - you want to generate the interval in the target timezone (what you put in the context), and then pull transactions with provider.tz.

Awesome finding! So that means, that tz-related tests would be incorrect due to that.

Why do we have all the checks for tzinfo in the code anyways? Internally, Odoo only ever uses naive datetimes expressing UTC times.

Depends on which checks do you refer to, e.g. arguments to pull method - just in case I ever planned to call it from somewhere else. Defensive programming, nothing more.

Copy link

@thomaspaulb thomaspaulb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK to merge this since it solves an actual bug - but I would prefer:

  • adding a comment to the code referring to this PR, so that our children and grandchildren can understand why it's there
  • adding a point to the ROADMAP of the base module to fix the underlying issue and the test

@alexey-pelykh alexey-pelykh self-requested a review October 14, 2021 14:55
Copy link
Contributor

@alexey-pelykh alexey-pelykh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

After discussions above, I'm putting format request to change since this has to be handled otherwise

@github-actions
Copy link

There hasn't been any activity on this pull request in the past 4 months, so it has been marked as stale and it will be closed automatically if no further activity occurs in the next 30 days.
If you want this PR to never become stale, please ask a PSC member to apply the "no stale" label.

@github-actions github-actions bot added the stale PR/Issue without recent activity, it'll be soon closed automatically. label Jul 31, 2022
@rafaelbn
Copy link
Member

Hello @hbrunn , do you have this PR in production to solve the bug in your client?

@alexey-pelykh , could you label as "bug"?

@hbrunn
Copy link
Member Author

hbrunn commented Aug 2, 2022

yes, it's in production since last year

@github-actions github-actions bot removed the stale PR/Issue without recent activity, it'll be soon closed automatically. label Aug 7, 2022
@github-actions
Copy link

There hasn't been any activity on this pull request in the past 4 months, so it has been marked as stale and it will be closed automatically if no further activity occurs in the next 30 days.
If you want this PR to never become stale, please ask a PSC member to apply the "no stale" label.

@github-actions github-actions bot added the stale PR/Issue without recent activity, it'll be soon closed automatically. label Dec 11, 2022
@hbrunn hbrunn added no stale Use this label to prevent the automated stale action from closing this PR/Issue. and removed stale PR/Issue without recent activity, it'll be soon closed automatically. labels Dec 12, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
no stale Use this label to prevent the automated stale action from closing this PR/Issue.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants