Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[ENG-6264] add monthly "item usage" report #10760

Closed

Conversation

aaxelb
Copy link
Contributor

@aaxelb aaxelb commented Sep 20, 2024

Purpose

compute monthly "usage" metrics for public items on osf

Changes

  • add osf.metrics.reports.PublicItemUsageReport (elasticsearch metric document model)
  • add osf.metrics.reporters.public_item_usage.PublicItemUsageReporter (monthly reporter)
  • in osf.metrics.counted_usage, add get_item_type and get_provider_id for consistent reuse with the new reporter

QA Notes

Please make verification statements inspired by your code and what your code touches.

  • Verify
  • Verify

What are the areas of risk?

Any concerns/considerations/questions that development raised?

Documentation

Side Effects

Ticket

@aaxelb aaxelb force-pushed the eng-6241 branch 2 times, most recently from b877b5a to 4119193 Compare September 23, 2024 20:11
@aaxelb aaxelb force-pushed the eng-6241 branch 3 times, most recently from 3ca15b1 to 48ee82b Compare September 30, 2024 17:31
@aaxelb aaxelb changed the title [WIP][ENG-6264] add monthly "item usage" report [ENG-6264] add monthly "item usage" report Sep 30, 2024
@aaxelb aaxelb marked this pull request as ready for review September 30, 2024 17:32
@aaxelb aaxelb force-pushed the eng-6241 branch 4 times, most recently from 980523c to cfd914b Compare October 1, 2024 13:13
Copy link
Contributor

@Johnetordoff Johnetordoff left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A good start! Just a few questions and odds and ends.

try:
_next_after = _agg_result.after_key
except AttributeError:
return # all done
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Might want to show more details about the error being caught. If you know the missing attribute is after_key it should explicitly check for that.

except _SkipItem:
pass

def _report_from_buckets(self, exact_bucket, contained_views_bucket):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: This would be a good place for type hinting.

osf/metrics/reporters/public_item_usage.py Outdated Show resolved Hide resolved
osf/metrics/reports.py Show resolved Hide resolved
def _mocks(self):
with (
# set a tiny page size to force aggregation pagination:
mock.patch('osf.metrics.reporters.public_item_usage._CHUNK_SIZE', 1),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems like you might want to vary this, this would be a good benchmarking tests. CHUNK_SIZE is currently hard coded, it would be nice to have it be optimized or have a reasoning behind it being 500.

_empty = list(_reporter.report(ym_empty))
assert _empty == []

def test_reporter(self, ym_empty, ym_sparse, ym_busy, sparse_month_usage, busy_month_item0, busy_month_item1, busy_month_item2):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You should split this into multiple test cases, I can't tell what this is testing for.

for _actionbucket in exact_bucket.agg_action:
if _actionbucket.key == CountedAuthUsage.ActionLabel.VIEW.value:
_report.view_count = _actionbucket.doc_count
# note: view_session_count computed separately to avoid double-counting
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why?

return _report

def _init_report_from_exact_bucket(self, exact_bucket) -> PublicItemUsageReport:
# in the (should-be common) case of an item that has been directly viewed in
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"should-be common"? without numbers this is very opaque to me. It would help to have some context about how often this happens, benchmarking would help us decide that.

osf/metrics/reporters/public_item_usage.py Outdated Show resolved Hide resolved
assert _busy_item1.item_osfid == 'item1'
assert _busy_item1.provider_id == ['prov0', 'prov1']
assert _busy_item1.platform_iri == ['http://osf.example']
assert _busy_item1.view_count == 6 * 9 + 11
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A lot of magic numbers around here.

@aaxelb
Copy link
Contributor Author

aaxelb commented Oct 24, 2024

further reviewed and merged with #10764

@aaxelb aaxelb closed this Oct 24, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants