WP4 - API to upload target dataset - followup actions #21

duncanpeacock · 2020-11-13T08:56:26Z

This is a placeholder for follow-up actions to implement the data loader API now that the first version (Minimum Viable Product) has been merged.

Implementation actions - covalent etc. functionality should be added to the data validation/creation repo.
Any changes required for the fragalysis loader? (See version 1)
Final status of authentication/authorisation (addition of owner-id??)
Do we need a status field for target during upload so that the react front "knows" that the target is being updated - or that the target is temporarily removed from the list of targets.

duncanpeacock · 2020-11-20T11:23:11Z

Two initial requests for changes from discussions on the upload target set functionality:

The upload screen should make it clear to the user that they can close the browser window and the upload will continue.
Also that they can save the link and come back and look.
Add functionality to send an email when upload/validation is complete with any relavent details.

duncanpeacock · 2020-11-24T15:08:39Z

When the running the data loader to load new targets, the inspirations seem to disappear. This is likely to be because of references to primary keys being regenerated in the upload.
Change new loader API so that the current references for computed sets linked to a target are saved and then restored when the target is uploaded.

duncanpeacock · 2020-12-17T13:05:26Z

Rachael has successfully tested the load. There is some remaining work and this can be tracked under the follow-up actions.

Agreed roadmap for the remaining target upload tasks following meeting on 17/12/2020:

It was confirmed that the current limitation of uploading one target dataset at a time is OK for now as most users of fragalysis will be interested in one particular dataset. There is no pressing need for a mass loader. This can be re-added/budgeted for at a future moment if necessary.
Testing will continue at Diamond as part of the preparations for rollout.
The current loader should be retained until the new loader code is fully operation/being used. At that point, at a suitable moment, it will be removed from the stack. This will be tracked as a separate issue on the fragalysis backend.

The remaining followup tasks on the loader to be tracked by this task are as follows:

The upload screen should make it clear to the user that they can close the browser window and the upload will continue and that they can save the link and come back and look.
Add functionality to send an email when upload/validation is complete with any relevant details.
Fix the compound set problem (see below)

Problem

When a target set is reloaded, links to existing compound sets are broken

Initial prognosis is as follows:

Fragalysis backend repo:

In tasks.process_design_compound:

The inspirations field in the compound model links to a manytomany field to the molecules model/table.
The likely cause is that when a new target is uploaded, it wipes out the link -> so the compound sets aren't visible.

Likely solution:

When a target set is uploaded, examine where molecules are removed/added and make sure that the many to many field is retained.

Place to start looking: targate_set_upload.analyse_mols

for mol_id in ids:
if mol_id not in [a['id'] for a in mol_group.mol_id.values()]:
print(mol_id)
this_mol = Molecule.objects.get(id=mol_id)
mol_group.mol_id.add(this_mol)

Is the manytomany field correct after the reload? otherwise it needs to be saved/replaced.

duncanpeacock · 2021-01-14T08:49:23Z

Analysis

When the proteins are loaded, existing proteins with alternate names are actually deleted and recreated rather than updated. This changes the id and breaks links. This has been confirmed by running the Mpro upload multiple times. The number of proteins stays the same, but the auto-incremented id increases each time by 295.

The problem is caused by the update to Protein.code that is made when the Protein has an alternate name.

The processing is as follows (all in target_set_upload.py - but also existing in the current loader):

Protein.code initially comes from the directory in the aligned folder – e.g Mpro-x1101_0A
New proteins with these codes are written in the function add_prot
Proteins for the target where the code is not in the list of folders are removed (remove_not_added function).
If the aligned folder also contains a metadata.csv file, this is processed and any alternate names are written to the alternate_names.csv file.
Then later on, in function rename_mol, Protein.code is modified as follows:
new_name = str(mol_target).replace('_0', '') + ':' + str(alternate_name).strip()

This produces difference results depending on whether the folder name has "_0" in it or not:
e.g.
Mpro-x0072A_0A becomes: Mpro-x0072A:AAR-POS-d2a4d1df-1
Mpro-x1101_0A becomes: Mpro-x1101A:AAR-POS-0daf6b7e-40
Mpro-x1101_1A becomes: Mpro-x1101_1A:AAR-POS-0daf6b7e-40

When the data is loaded again, the codes can't be found.because they have been modified.

Our first attempt at a fix failed because we tried to just use the part up to the colon, but that only works if the whole of the folder is in the key, not for the ones where the '_0' is stripped off, which is the normal situation.

Solution:

One possible solution is to:

Get the bit before the colon.
Do the replace '_0'
Use this to find initial molecules and update those.

But at the moment the remove_not_added function would fail. I can probably fix this by doing the same thing in the remove_not_added function (or make a list of the keys I've matched and get rid of all the others)

Questions:

Is the "_0" replacement working as desired? The result for Mpro-x1101_1A looks a bit odd.
Are there knock on affects?
What about the current loader - should I fix that too?

duncanpeacock · 2021-01-14T10:34:52Z

Discussed with Frank:
Decision is to remove the code to replace '_0'.
Code will always be original folder:alternate_name like the last example.
Mpro-x1101_1A becomes: Mpro-x1101_1A:AAR-POS-0daf6b7e-40

The current loader also needs to be fixed. Will raise/fix an issue on the fragalysis-loader repo.

duncanpeacock · 2021-01-19T09:54:58Z

The data upload problem is solved as per my previous message.
Currently working on the other changes (screen comms/email notification).

duncanpeacock · 2021-01-21T10:58:09Z

I also needed to change the compound set uploader so that when it checked protein.code, it checked up to ":" rather than "_" so it would comply with the new names.

duncanpeacock added enhancement New feature or request Medium Priority labels Nov 27, 2020

duncanpeacock self-assigned this Jan 14, 2021

duncanpeacock mentioned this issue Jan 14, 2021

API to upload target dataset - followup actions for backend m2ms/fragalysis-frontend#494

Open

duncanpeacock mentioned this issue Jan 21, 2021

ComputedSet inspiration fragments are lost when loader is run xchem/fragalysis-loader#42

Open

duncanpeacock mentioned this issue Jan 21, 2021

Api loader follow up xchem/fragalysis-backend#234

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

WP4 - API to upload target dataset - followup actions #21

WP4 - API to upload target dataset - followup actions #21

duncanpeacock commented Nov 13, 2020

duncanpeacock commented Nov 20, 2020 •

edited

Loading

duncanpeacock commented Nov 24, 2020

duncanpeacock commented Dec 17, 2020 •

edited

Loading

duncanpeacock commented Jan 14, 2021

duncanpeacock commented Jan 14, 2021

duncanpeacock commented Jan 19, 2021

duncanpeacock commented Jan 21, 2021

WP4 - API to upload target dataset - followup actions #21

WP4 - API to upload target dataset - followup actions #21

Comments

duncanpeacock commented Nov 13, 2020

duncanpeacock commented Nov 20, 2020 • edited Loading

duncanpeacock commented Nov 24, 2020

duncanpeacock commented Dec 17, 2020 • edited Loading

Problem

duncanpeacock commented Jan 14, 2021

Analysis

Solution:

Questions:

duncanpeacock commented Jan 14, 2021

duncanpeacock commented Jan 19, 2021

duncanpeacock commented Jan 21, 2021

duncanpeacock commented Nov 20, 2020 •

edited

Loading

duncanpeacock commented Dec 17, 2020 •

edited

Loading