Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

JSON enrollment data importer (v1) #2131

Open
janno42 opened this issue Feb 27, 2024 · 2 comments · May be fixed by #2187
Open

JSON enrollment data importer (v1) #2131

janno42 opened this issue Feb 27, 2024 · 2 comments · May be fixed by #2187
Assignees
Labels
[C] Backend Focuses on backend implementation [P] Major Major priority [S] Big This issue might require fundamental changes and will probably require a lot of work to solve.

Comments

@janno42
Copy link
Member

janno42 commented Feb 27, 2024

Edits:
- 2024-03-25: Added "scale" which defines whether the Course is graded or not
- 2024-03-25: Added "relatedevents" which defines that two events belong to the same Course, changed previous logic with "lvnr"
- 2024-05-27: Removed that "All changes should be made by an 'import user' who will be the user for LogEntrys." Other automatic changes have no user set in the LogEntry, so we don't need it here either.


A new enrollment data importer must be implemented. This importer will process JSON files containing account and course/evaluation data.
This is a first version of the importer, it will be extended in future issues. The JSON files will contain additional information that is omitted here (e.g. "students" in "events" will have additional attributes, hence the bulky format).

The importer will be called by a management command (which will be called by a cron job) and will process a single file specified by a parameter. A second parameter specifies the Semester for which the data is to be imported.
Unlike the current importer, there won't be a GUI, so error messages will have to be sent by email.
In addition, a log should be created and also sent by email after a successful import. This log should contain

  • a list of name changes in UserProfiles
  • a list of new Courses/Evaluations created
  • a list of updated Courses/Evaluations
  • a list of removed Courses/Evaluations
  • a list of all attempted changes for evaluations that are in states >= APPROVED

The JSON file is formatted as described below.

  • The first part contains information about all the students who are participating in the courses of this semester ("students").
    • Each entry refers to a UserProfile which can be identified by the email.
    • The gguid is a unique ID that will be used to reference the UserProfile in a later step.
    • The data is mapped like this:
      • name -> last_name
      • christianname -> first_name_given
  • The second part of the file contains information about all the contributors of the courses of this semester ("lecturers").
    • Each entry refers to a UserProfile which can be identified by the email.
    • The gguid is a unique ID that will be used to reference the UserProfile in a later step.
    • The data is mapped like this:
      • name -> last_name
      • christianname -> first_name_given
      • titlefront -> title
  • The third part contains information about the courses/evaluations ("events").
    • The gguid should be stored as cms_id (Campus Management System ID) as a unique identifier for a Course to reference later changes. If no Course with this gguid exists, it needs to be created, otherwise it needs to be updated if there are changes in the metadata. The data is mapped like this:
      • title -> name_de
      • title_en -> name_en
      • type -> CourseType (name_de) - If there is no CourseType with this name, it should be created automatically.
      • courses.cprid -> Degree (name_de) - This should be mapped via import_names. If no Degree with a matching import_name exists, it should be created automatically.
      • lecturers -> responsibles - The corresponding user is referenced in lecturers via the above-mentioned gguid.
    • The gguid should be stored as cms_id as a unique identifier for an Evaluation to reference later changes (the same gguid will be set for the Course and its main Evaluation, they are only unique within classes). If no Evaluation with this cms_id exists, it must be created, otherwise it must be updated when the metadata changes. The data is mapped like this:
      • relatedevents.gguid: If the evaluation is not the main evaluation (so if isexam is true), this links the Evaluation to its Course via course.cms_id.
      • isexam: If true, name_de should be "Klausur" and name_en should be "Exam", otherwise both are empty (main Evaluation).
      • appointments.end -> If isexam is true, vote_start_datetime is 08:00 on the day after the given date, vote_end_date is three days after the given date. Otherwise, vote_start_datetime is 08:00 on Monday of the week before the given date, vote_end_date is Sunday of the week of the given date (using Monday as the first day of the week).
      • lecturers -> contributors (referenced by gguid in "lecturers") -- Contributors should only be added by the importer, not removed. In case contributors are removed in a later import, this should be ignored.
      • students -> participants (referenced by gguid in "students")
      • courses.scale: If set, wait_for_grade_upload_before_publishing is True, otherwise its False

Changes to existing evaluations should only be made when the evaluation is in a state < APPROVED. Attempted changes in later states should be logged.


Format

{
    "students": [
        {
            "gguid": 34 character ID String,
            "email": String,
            "name": String,
            "christianname": String
        }
    ],
    "lecturers": [
        {
            "gguid": 34 character ID String,
            "email": String,
            "name": String,
            "christianname": String,
            "titlefront": String
        }
    ],
    "events": [
        {
            "gguid": 34 character ID String,
            "lvnr": double,
            "title": String,
            "title_en": String,
            "type": String,
            "isexam": boolean,
            "courses": [
                {
                    "cprid": String,
                    "scale": String
                },
                {
                    "cprid": String,
                    "scale": String
                }
            ],
            "relatedevents": {
                "gguid": 34 character ID String
            },
            "appointments": [
                {
                    "begin": String (Datetime),
                    "end": String (Datetime)
                }
            ],
            "lecturers": [
                {
                    "gguid": 34 character ID String
                }
            ],
            "students": [
                {
                    "gguid": 34 character ID String
                }
            ]
        }
    ]
}


Example

{
    "students": [
        {
            "gguid": "0x1",
            "email": "[email protected]",
            "name": "1",
            "christianname": "1"
        },
        {
            "gguid": "0x2",
            "email": "[email protected]",
            "name": "2",
            "christianname": "2"
        }
    ],
    "lecturers": [
        {
            "gguid": "0x3",
            "email": "[email protected]",
            "name": "3",
            "christianname": "3",
            "titlefront": "Prof. Dr."
        },
        {
            "gguid": "0x4",
            "email": "[email protected]",
            "name": "4",
            "christianname": "4",
            "titlefront": "Dr."
        }
    ],
    "events": [
        {
            "gguid": "0x5",
            "lvnr": "1",
            "title": "Prozessorientierte Informationssysteme",
            "title_en": "Process-oriented information systems",
            "type": "Vorlesung",
            "isexam": false,
            "courses": [
                {
                    "cprid": "BA-Inf",
                    "scale": "GRADE_PARTICIPATION"
                },
                {
                    "cprid": "MA-Inf",
                    "scale": "GRADE_PARTICIPATION"
                }
            ],
            "relatedevents": {
                "gguid": "0x6"
            },
            "appointments": [
                {
                    "begin": "15.04.2024 10:15",
                    "end": "15.07.2024 11:45"
                }
            ],
            "lecturers": [
                {
                    "gguid": "0x3"
                }
            ],
            "students": [
                {
                    "gguid": "0x1"
                },
                {
                    "gguid": "0x2"
                }
            ]
        },
        {
            "gguid": "0x6",
            "lvnr": "2",
            "title": "Prozessorientierte Informationssysteme",
            "title_en": "Process-oriented information systems",
            "type": "Klausur",
            "isexam": true,
            "courses": [
                {
                    "cprid": "BA-Inf",
                    "scale": "GRADE_TO_A_THIRD"
                },
                {
                    "cprid": "MA-Inf",
                    "scale": "GRADE_TO_A_THIRD"
                }
            ],
            "relatedevents": {
                "gguid": "0x5"
            },
            "appointments": [
                {
                    "begin": "29.07.2024 10:15",
                    "end": "29.07.2024 11:45"
                }
            ],
            "lecturers": [
                {
                    "gguid": "0x3"
                },
                {
                    "gguid": "0x4"
                }
            ],
            "students": [
                {
                    "gguid": "0x1"
                },
                {
                    "gguid": "0x2"
                }
            ]
        }
    ]
}
@janno42 janno42 added [C] Backend Focuses on backend implementation [S] Big This issue might require fundamental changes and will probably require a lot of work to solve. [P] Major Major priority labels Feb 27, 2024
@richardebeling
Copy link
Member

Thoughts / Questions / Ideas

  • Ambiguity in emails: We should probably filter emails through clean_email to have INSTITUTION_EMAIL_REPLACEMENTS applied?

  • Currently only implied, maybe specify more explicit: We update UserProfiles of contributors and students with the values from the json file.

    • I could see conflicts with Update user data on login #2096 if SSO and CMS have different names stored. Maybe we should close it (depending on when we expect the CMS to go up)
  • Removal of Evaluations / Courses: Those will simply not be contained anymore in the JSON, so we would just delete all courses/evaluations of the import semester whose cms_id is not contained in the json file?

  • The logged attempted changes for state >= APPROVED should probably be grouped by course/evaluation.

  • We treat their gguid values as blobs that must bit-match to be identical? Then, for us, 0xf and 0xF would be two different values. Especially asking because I'd expect them to send RFC4122-compliant UUIDs where capitalization doesn't matter. In the worst case, we would delete a course and recreate it with a differently capitalized cms_id.

  • In general: We expect high-quality data (e.g. trimmed attributes). We trust the CMS to not mess up our data and we do not perform sanity checks.

    • The logged actions would allow to manually undo most damage except for deletions where data will be gone irrecoverably.
    • An import run could possibly delete (almost) all courses/evaluations of the semester during the preparation phase. We accept that? Possible counter-measures could be:
      • Require manual user confirmation if some modification/deletion threshold is reached
      • Make a backup of the courses/evaluations/contributions of the semester before the import

@janno42
Copy link
Member Author

janno42 commented Feb 27, 2024

  • Ambiguity in emails: We should probably filter emails through clean_email to have INSTITUTION_EMAIL_REPLACEMENTS applied?

yes, this logic needs to be applied here as well

  • I could see conflicts with Update user data on login #2096 if SSO and CMS have different names stored. Maybe we should close it (depending on when we expect the CMS to go up)

closed

  • Removal of Evaluations / Courses: Those will simply not be contained anymore in the JSON, so we would just delete all courses/evaluations of the import semester whose cms_id is not contained in the json file?

yes, but only if they have a cms_id set (so we won't delete evaluations that have been created manually)

  • The logged attempted changes for state >= APPROVED should probably be grouped by course/evaluation.

yes, please

  • We treat their gguid values as blobs that must bit-match to be identical? Then, for us, 0xf and 0xF would be two different values. Especially asking because I'd expect them to send RFC4122-compliant UUIDs where capitalization doesn't matter. In the worst case, we would delete a course and recreate it with a differently capitalized cms_id.

we can assume them to be standard-compliant case-insensitive UUIDs for now

  • In general: We expect high-quality data (e.g. trimmed attributes). We trust the CMS to not mess up our data and we do not perform sanity checks.

yes (for now)

  • An import run could possibly delete (almost) all courses/evaluations of the semester during the preparation phase.

do you mean if for some reason the CMS doesn't export them correctly anymore? courses/evaluations are only added to the JSON file when there are enrollments for them. so there should be almost no cases where they are manually deleted from that point on. then let's say for all deletions we only notify by email and expect the managers to do the deletion manually.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
[C] Backend Focuses on backend implementation [P] Major Major priority [S] Big This issue might require fundamental changes and will probably require a lot of work to solve.
Development

Successfully merging a pull request may close this issue.

3 participants