Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[DEPR]: DraftModuleStore (Old Mongo Modulestore) #62

Closed
6 tasks
feanil opened this issue Mar 10, 2022 · 16 comments
Closed
6 tasks

[DEPR]: DraftModuleStore (Old Mongo Modulestore) #62

feanil opened this issue Mar 10, 2022 · 16 comments
Assignees
Labels
depr Proposal for deprecation & removal per OEP-21

Comments

@feanil
Copy link

feanil commented Mar 10, 2022

CURRENT STATUS (as of the QUINCE release)

Most functionality has been removed from the long-deprecated Old Mongo Modulestore. This was the old system for storing and accessing course content, used by courses with Org/Course/Run style identifiers, e.g. MITx/6.002x/2012_Fall. All end-user access to this course content was removed with the Nutmeg release, so any course actively being run in the Nutmeg release or later should be unaffected.

To preserve compatibility with old installations, some limited Old Mongo functionality remains:

  • Files that were uploaded to these old courses (e.g. images, PDFs) will remain accessible. This is partly to preserve older certificates which relied on downloading assets from their originating course.
  • The metadata stored at the root CourseBlock is still accessible in a read-only way. This includes things like the course title, start and end dates, and various settings.

More details below:


Original Ticket

The DraftModuleStore (also sometimes referred to as "Old Mongo") is the interface used to store courseware content for courses with course run keys of the format "Org/Course/Run", e.g. "edX/Demo/2012". Newer courses of the format "course-v1:Org+Course+Run" use the DraftVersioningModuleStore (also commonly called "Split Mongo"). This has been available since the Birch release and the vast majority of sites have never used DraftModuleStore. One of the main motivations for the newer modulestore was atomic publishes, as it was possible to get partially applied updates if there was an error during the import process with DraftModuleStore.

Because DraftModuleStore has a very different data structure from DraftVersioningModuleStore, supporting both formats simultaneously has resulted in significant complexity, bugs, and performance issues going as far back as the initial implementation of cohorts. There are also thousands of extra tests run specifically to ensure compatibility across ModuleStores.

Some sites intentionally continued to use DraftModuleStore because of storage-related concerns. DraftVersioningModuleStore did not free up disk space used by old versions of the content. However, this problem has been addressed with tubular's structures.py script, which can be run on a regular basis to prune unused old versions of course content.

Proposal

  • Juniper will display a message in Studio for all courses using DraftModuleStore, saying that this course format will no longer be supported, and urging people to create a re-run of the course (which will make a copy of that course in the DraftVersioningModuleStore).
  • Koa will remove all support for DraftModuleStore. This will involve removing or modifying thousands of tests, as well as removing the MixedModulestore proxy class.
  • Course Overviews using the old course format should still be supported. Old-style courses should not suddenly disappear from your list of enrollments, but any attempt to access courseware content within them (learning sequences, files and uploads) will fail.
  • For the Koa release, there will be no Studio access at all for DraftModuleStore, and it will not be possible to do a data export from Studio for these courses.

Note: In previous conversations, I had discussed the possibility of having a data migration that would convert DraftModuleStore course content into DraftVersioningModuleStore courses while preserving IDs. I created a proof of concept for this approach as a hackathon project. This has the upside of letting us get rid of a chunk of the old code without giving up compatibility, but it also had a number of strong drawbacks, including:

  1. We would have to maintain a large set of tests that used both ID formats for course keys.
  2. It would subtly change opaque keys such that two different keys would serialize identically–we would be adding course-run and version information to "i4x:..." style keys, but for data compatibility reasons we would have to serialize without that information. This has implications for course key caching and would make debugging much trickier–the newer modulestore itself derives its own keys to pass around, and we’d end up in a spaghetti of keys which sometimes have or don’t have run and version information.

Because usage of the old modulestore outside of edX itself is limited and we did not want to introduce any more complexity to what is already a major source of bugs in edx-platform, this DEPR is going the simpler route of removing support altogether.

Compatibility notes

  1. We won't explicitly delete the old course content. If an Open edX site upgrades to Koa without realizing the implications and rolls back to Juniper, their DraftModuleStore content should still be there.
  2. We won't delete any student course state for these courses, so module state in courseware_studentmodule will be preserved.
  3. The relevant key types (SlashSeparatedCourseKeys and Locations) will not be removed from opaque-keys.
  4. Our goal would be to preserve the functionality of other pages that are not directly courseware, such as the student dashboard. However some functionality is so dependent on the modulestore’s existence that it’s likely they will be disabled for these courses rather than trying to port them to work without a backing modulestore. While more investigation needs to be done, it is likely that most if not all of the Instructor Dashboard for old-style courses will be unavailable in Koa.

Additional Info

Original Jira Issue: https://openedx.atlassian.net/browse/DEPR-58

Useful comments

From Mike Terry

Small update on the slow shuttering of access to these courses. I’ve landed a couple fixes that will slow down incoming enrollments:

  • Tests default to split store, instead of Old Mongo.
  • Old Mongo courses are marked as hidden (they no longer show up in prospectus searches, but they do have a page there still that ends up in search engines)
  • Old Mongo courses are invitation only
  • Shortly, I’m going to also mark Old Mongo courses as non-marketable so that prospectus won’t even generate a page for them.

And after enrollment stops, at some point we’ll turn off all access entirely to learners.

Approach

  • Remove user access
  • Incrementally remove functionality, to reduce rollout risk

In order to preserve CourseOverview metadata and further reduce risk, the end state is not a complete removal of Old Mongo, but instead a really limited version that only implements has_course and the ability to read from the root CourseBlock.

Implementation Tickets

@github-actions github-actions bot added the depr Proposal for deprecation & removal per OEP-21 label Mar 10, 2022
@ormsbee
Copy link

ormsbee commented Mar 16, 2022

An interesting implementation point came up in a forum discussion related to deleting courses (and how poorly supported that is).

The long term architectural goal is for CourseOverviews (or something like it) is to represent course metadata independent of the content storage represented by ModuleStore. In other words, things like enrollments, student state, certificates, etc. should continue to work, even if there is no data in the ModuleStore for that course content. Right now, that is not the case. CourseOverviews act almost as materialized caches for CourseBlocks, and need to reach back into the root CourseBlock when regenerating when there is a version bump to CourseOverview fields. There are certain other fields where CourseOverview isn't even a cache, and always delegates back to the CourseBlock it gets from ModuleStore.

For a while, I assumed that transition would happen as part of this DEPR effort–because this DEPR means that we'd be removing Old Mongo as a storage mechanism. But I've come around to the idea that we could do this DEPR with less risk by lobotomizing Old Mongo, instead of completely killing it. We could leave just enough of it behind to find a course (has_course) and return a CourseBlock object when something calls get_course on it, and then remove everything else–publishing, caching, parent-child relations, old-to-split migration code, the vast majority of tests, etc. That would give us the bare minimum needed to rebuild CourseOverviews when new fields are added to that model, and would decouple this DEPR from the higher risk changes required to change the data lifecycle of how that model is generated.

It would also potentially let us explore other options for how we want to separate catalog-level metadata from course content data. CourseOverview is the most readily available mechanism, but it has its own issues around regeneration, overrides, locking, data dependency, etc. Decoupling that longer term question from this DEPR may give us more space to design a robust solution.

@kdmccormick
Copy link
Member

@mikix @ormsbee do you mind being the assignees of this issue?

@ormsbee ormsbee self-assigned this Mar 17, 2022
@jmyatt
Copy link

jmyatt commented Mar 29, 2022

update: we plan to merge changes to shut off LMS access to old mongo with this PR on Apr 4, 2022 prior to the Nutmeg release branch being cut

@mikix
Copy link

mikix commented Apr 4, 2022

That PR landed - learner access to old mongo courses should be removed now. Hopefully opening the floodgates to further old mongo deprecation.

@feanil
Copy link
Author

feanil commented Apr 5, 2022

@mikix thinking about the release notes for Nutmeg, if people do have old-mongo courses, they'll disappear when the start running Nutmeg is that right? And if they want to recover those courses, they'll have to export them pre-nutmeg and load them into new split courses IDs. The student data won't carry over so probably they should close the course and start funneling new students to a newer coures if they're in this situation?

@mikix
Copy link

mikix commented Apr 5, 2022

@feanil if they still have runs from 2015 in active use and have been ignoring the banner in several releases of Studio saying we would remove all support in March 1, 2021, then I still have good news! Studio support remains for now. They should be able to still export their course or rerun it.

My change just affects learner access.

The long term plan is to remove Studio access and the data, but I think that approach is in Ormsbee’s hands now.

@ormsbee
Copy link

ormsbee commented May 11, 2022

Next steps:

  • Remove Studio access for Old Mongo courses.
  • Remove code related to Old -> Split migration + re-running Old Mongo courses as Split (note that split->split reruns will remain, so there are no UI changes).
  • Do an opportunistic removal of ModuleStoreEnum.Type.mongo in cases where both it and ModuleStoreEnum.Type.split are being tested in the same test (e.g. via ddt). Do not remove tests necessary for CourseOverviews compilation, meaning things that would read/write the root CourseBlock, get_course, or has_course.

Again, the goal is to get it down to a point where we can delete most of the tests and functionality but leave just a little bit on the edges so that things wanting to read the root course block and its settings can do so.

@ormsbee
Copy link

ormsbee commented May 11, 2022

@feanil: After Raccoon Gang gets through their current slate of DEPR work, could we discuss adding Old Mongo in a follow-on phase? Is there a place where we're collecting potential work like that?

@feanil
Copy link
Author

feanil commented May 12, 2022

@e0d this was not on the initial list, could it be added?

@e0d
Copy link

e0d commented May 12, 2022

Sure, @feanil and @ormsbee any sense of the t shirt size of this work?

@ormsbee
Copy link

ormsbee commented May 12, 2022

Weeks worth of work. I can take a stab at the first set of tickets, but there will likely be edge cases that I don't remember, which will necessitate further tickets. How detailed should I get on specifying those?

@e0d
Copy link

e0d commented May 12, 2022 via email

ormsbee pushed a commit to openedx/edx-platform that referenced this issue Sep 14, 2022
This removes user-facing Studio edit support for Old Mongo courses
(courses that have a CourseKey of the format {org}/{course}/{run}).
This does not affect our normal courses, which have CourseKeys
starting with "course-v1:".

After this commit:

* Old Mongo courses will continue to appear on the Studio course
  listing page, but are not clickable.
* Any attempt to directly access an Old Mongo course in Studio via URL
  fail with a 404 error.
* Course certificates will still be available for Old Mongo courses.
* Old Mongo courses will continue to be returned by CourseOverviews
  and get_course_summaries() calls.

We decided against removing Old Mongo courses from the listing entirely
because that would require very expensive CourseOverviews query to
filter them out. Making that query more efficient would involve a
database migration to add appropriate indexing, which is something else
that we are looking to avoid. CourseOverviews are used everywhere in
the system, so we want to avoid changing how they work so that we can
minimize risk.

This is part of the Old Mongo Modulestore deprecation effort:
  openedx/public-engineering#62
ormsbee pushed a commit to openedx/edx-platform that referenced this issue Sep 19, 2022
This commit removes code that was used to copy Old Mongo courses into
new Split Mongo courses. This includes both the migrate_to_split
management command, as well as the backend code that would be invoked
to re-run Old Mongo courses as Split courses using Studio (the UI for
this was already removed in b429e55).

This is a part of the Old Mongo removal effort tracked in:
  openedx/public-engineering#62
@dianakhuang
Copy link

This is currently being handled as a funded contribution project at tCRIL.

@ormsbee
Copy link

ormsbee commented Sep 7, 2023

I am marking this work as complete with the merge of openedx/edx-platform#31134. The bulk of Old Mongo code and tests is now gone.

There are three parts that will be left for future cleanup, for various reasons:

Course Static Assets

Reading static assets is still supported, so that we don't break links to certificate-related assets that are stored in contentstore. The long term fix for this is to migrate those files into credentials.

Read access for the root CourseBlock

The root CourseBlock is used in various places to get high level metadata about a course (name, settings, etc.), and is also used when re-building CourseOverviews. See this comment for more details.

MixedModuleStore/DraftModuleStore

Because we still have some access to the old ModuleStore, we need to maintain the MixedModuleStore facade.

@ormsbee ormsbee closed this as completed Sep 7, 2023
@dianakhuang
Copy link

@ormsbee could you write up the release notes for Quince on this ticket? https://openedx.atlassian.net/wiki/spaces/COMM/pages/3726802953

@ormsbee
Copy link

ormsbee commented Nov 2, 2023

@dianakhuang: Done.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
depr Proposal for deprecation & removal per OEP-21
Projects
Archived in project
Development

No branches or pull requests

7 participants