Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

All models: Fix logic in users_sessions_this_run to account for sparse data #118

Open
emielver opened this issue Jan 10, 2022 · 1 comment
Labels
bug Something isn't working

Comments

@emielver
Copy link

Issue

In users_sessions_this_run we filter start_tstamp from the sessions on lower_limit and upper_limit from user_limits. By joining on start_tstamp we attempt to pull info from the first session per user:

  FROM {{.output_schema}}.sessions{{.entropy}} a
  INNER JOIN {{.scratch_schema}}.users_userids_this_run{{.entropy}} b
  ON a.domain_userid = b.domain_userid

  WHERE a.start_tstamp >= (SELECT lower_limit FROM {{.scratch_schema}}.users_limits{{.entropy}})
  AND   a.start_tstamp <= (SELECT upper_limit FROM {{.scratch_schema}}.users_limits{{.entropy}})

The lower and upper limits are created by taking the MIN and MAX of start_tstamp from users_userids_this_run, where the start_tstamp is the first timestamp for each user based on all of their sessions (taken from {mobile_}sessions --> {mobile_}sessions_userid_manifest_staged -> users_userids_this_run).

In cases when we have very few users, the upper_limit can have a lower value than the start_tstamp of sessions that we need to process. As a result, the users_sessions_this_run table is artificially truncated, which leads to incorrect information being calculated downstream in users_aggs, users_lasts and users.

Proposed Fix 1 (breaking changes)

  • In {mobile_}sessions_userid_manifest_this_run add a new end_tstamp TIMESTAMP column and fill it as MAX(start_tstamp) AS max_tstamp
  • In {mobile_}users_manifest table, add a new end_tstamp TIMESTAMP column and fill it as {{start_date}}::TIMESTAMP
  • In {mobile_}users_limits change the definition of upper_limit to MAX(end_tstamp) AS upper_limit

Proposed Fix 2 (no breaking changes)

  • In users_sessions_this_run remove start_tstamp filter related to the upper_limit:
    AND a.start_tstamp <= (SELECT upper_limit FROM {{.scratch_schema}}.{mobile_}users_limits{{.entropy}})
@emielver emielver added the bug Something isn't working label Jan 10, 2022
@colmsnowplow
Copy link
Collaborator

Good catch @emielver.

It may well be the case that in v1 the upper limit was included for some eventuality that I thought would matter, but in reality doesn't. If you folks are satisfied that just removing it won't cause any issues then I'm on board.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

2 participants