Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feature/named entity recognition #11

Merged
merged 31 commits into from
Jul 13, 2023
Merged

Conversation

ColinDaglish
Copy link
Collaborator

@ColinDaglish ColinDaglish commented Jul 10, 2023

This PR adds a number of new features:

  • extract feature count (i.e. word count by row) returns as a data frame
  • total feature count (above summarised across all rows)
  • named entity identification (extracts entity names like "ONS", "NHS" etc)

This also restructures the package, splitting out analysis functions and quality check functions from pre-processing functions.

Updated test suite to acompany new functions.

@ColinDaglish ColinDaglish marked this pull request as draft July 10, 2023 09:58
@codecov
Copy link

codecov bot commented Jul 11, 2023

Codecov Report

Patch coverage: 57.35% and project coverage change: +4.71 🎉

Comparison is base (9a2fb6a) 64.57% compared to head (6a168c6) 69.28%.

Additional details and impacted files
@@            Coverage Diff             @@
##             main      #11      +/-   ##
==========================================
+ Coverage   64.57%   69.28%   +4.71%     
==========================================
  Files           5        9       +4     
  Lines         223      394     +171     
==========================================
+ Hits          144      273     +129     
- Misses         79      121      +42     
Impacted Files Coverage Δ
src/modules/visualisation.py 0.00% <0.00%> (ø)
src/run_pipeline.py 0.00% <0.00%> (ø)
src/modules/quality_checks.py 60.00% <60.00%> (ø)
src/modules/preprocessing.py 87.35% <75.00%> (ø)
src/modules/analysis.py 81.48% <81.48%> (ø)
tests/modules/test_analysis.py 87.75% <87.75%> (ø)
tests/modules/test_preprocessing.py 100.00% <100.00%> (ø)
tests/modules/test_quality_checks.py 100.00% <100.00%> (ø)

☔ View full report in Codecov by Sentry.
📢 Do you have feedback about the report comment? Let us know in this issue.

@ColinDaglish
Copy link
Collaborator Author

@brenng1 I am ready for the next set of checks on this PR. I'm now going to start thinking about presentation, that is - how we are going to package this up into a report or something.

@ColinDaglish ColinDaglish marked this pull request as ready for review July 11, 2023 15:50
@brenng1 brenng1 closed this Jul 13, 2023
@ColinDaglish ColinDaglish reopened this Jul 13, 2023
@ColinDaglish ColinDaglish merged commit 8951a4c into main Jul 13, 2023
3 of 4 checks passed
@brenng1 brenng1 deleted the feature/named_entity_recognition branch July 13, 2023 12:11
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants