Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

API Loader - Optional deletion of existing molecules during Target upload. #26

Open
duncanpeacock opened this issue Jan 28, 2021 · 0 comments
Labels
enhancement New feature or request

Comments

@duncanpeacock
Copy link
Owner

Source: Frank - Meeting 28/02/2021

Problem:

Currently, when a target set is uploaded, the processing will upsert the molecules in the Target Set file and automatically delete any existing molecules that are not in the file. There is concern about this automatic deletion of molecules from a usability/tracking.

Proposed Solution:

Diamond would like the deletion to be made optional.

  1. A "delete molecules not in target file" flag could be added to the upload_tset page which would be unchecked as standard (forcing the user to make a decision.
  2. If this is unchecked then the molecules would not be deleted
  3. A list of the molecules will be provided in an exception list sent to the email address/upload results page.
  4. For the upload results page alternatively a link could be provided that gave you a list of the molecules affected.

Questions/Thoughts

  1. Concerned about knock effects by removing the link between zip file and targets as it's a basic relationship in the loading process - Need to check all the follow on processing still works and the files in the database are correct - specific concerns:
    a) metadata.csv provided with the upload is used to further create further files: hits_ids, sites and alternate names. What to do with these? Metadata.csv will also now be a partial upload so the other files will need to be upserted rather than deleted and recreated (to maintain the link with the target dataset). This makes it more complicated.
    b) It might be good to see whether we need these files any more, or whether the data could better be/should be better stored as tables now - if so better to spend the time doing the job properly?
    c) I'm also be bit concerned about the .zip file that were uploaded and will be downloadable at the end of the process. The downloadable file will need to reflect the whole Target and we probably need to store the uploaded file for comparison purposes.
  2. Do we also want the validate option to pick these up as exceptions? I would say yes from a users perspective but it's probably a couple of hours extra work.
  3. The email is currently sent from a configured gmail account - for Janssen we need to make this configurable for the Janssen email system. For Diamond, it could also optionally be configured to use the STFC mailer.
  4. The link to give a list of molecules that are not in the file could fall out of the .zip processing i.e. a downloadable link to not-loaded.zip? It won't be stored in the database so we only know by comparing the uploaded file with what is in the DB.
  5. It also occurs to me that we are only comparing the current file with what's in the DB - so effectively only the last update. if we are making database changes, we could also consider a fuller history - or will this be picked up as part of the data provenance project in the future?
@duncanpeacock duncanpeacock added the enhancement New feature or request label Jan 28, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

1 participant