Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

As a dev, I want the information extracted by the performance_assessment.py script to be aggregated in a human readable report. #52

Open
6 tasks
Endlessflow opened this issue Oct 16, 2024 · 0 comments
Assignees

Comments

@Endlessflow
Copy link
Contributor

Description

Context
Our initial performance_assessment script currently generates raw data on the system's performance and compiles it into a CSV report. To be useful to our team, this information needs to be transformed into a human-readable format.

Problem Statement
To derive meaningful insights from the raw data generated by our performance_assessment script, we need to compile the results in the CSV report into an aggregated and understandable format. For now, we will implement a basic aggregation method.

For each label, we will have:

  • The average Levenshtein distance score for each field.
  • The ratio of fields that are considered to have a passing score.
  • The ratio of missing fields from the pipeline output compared to expected results.

Acceptance Criteria
To consider this issue resolved, the following criteria need to be met:

  1. Data Aggregation:

    • The performance_assessment script must aggregate raw data and calculate the average Levenshtein distance score for each field across all labels.
    • The script must calculate and include in the report the ratio of fields that meet or exceed a predefined passing score threshold.
    • The script should also calculate and provide the ratio of missing fields from the pipeline's output compared to expected results.
  2. Report Generation:

    • The aggregated results should include simple visualizations (e.g., bar charts or pie charts) for better comprehension.
    • The aggregated data must be compiled into a user-friendly, human-readable report format (e.g., PDF or web dashboard).
    • Ensure that there is a clear definition at the beginning of the report explaining what constitutes a 'passing score' for field accuracy.
@Endlessflow Endlessflow self-assigned this Oct 16, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Status: In progress
1 participant