Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Multi-file submissions #1121

Open
rien opened this issue May 11, 2023 · 0 comments
Open

Multi-file submissions #1121

rien opened this issue May 11, 2023 · 0 comments
Labels
enhancement New feature or request

Comments

@rien
Copy link
Member

rien commented May 11, 2023

We currently only support analyzing one file per submission, however for larger projects it would be useful to allow multiple files per submission.

A temporary workaround is to concatenate all files for one submission into one. This is not ideal since it is not clear in the analysis to which file a plagiarized fragment belongs.

Changes needed

A rough overview of the changes that need to be done to implement this feature:

dolos-core / dolos-lib

To make plagiarsm detection work over multiple files, we could essentially let Dolos perform the concatenation in a smart way. The changes in the library are mostly bookkeeping:

  • Rename File to Submission, such that Pair, SharedFingerprint, etc. now have references to submissions instead of files
  • Make a new File class that represents a single file, a Submission would have a list of Files instead of one content string
  • Make the changes necessary to handle winnowing and indexing the multiple files. It does not make sense to make kgrams that span over multiple files.
  • Make the changes necessary to handle the aggregation with matches that can occur over multiple files. Probably Region would need to be changed to also include the file that has been matched.

dolos-cli

  • Implement options to enable this feature
  • Implement how to detect which files belong together for a submission, similar to Support multiple submissions per student #1584 this could be achieved using directories or the CSV-file (with a submission_id grouping files).
  • Make changes to the output files to be able to communicate the different files to the front-end:
    • Make a submissions.csv that includes the information now contained in files.csv, except file contents
    • Change files.csv to include a reference to submissions.csv

dolos-web

  • Make changes to the API stores to parse and store the new format correctly
  • Edit the submissions page with a way to browse a submission's files

Pairwise Comparison

This part is probably the most complex, as this page has a lot going on already and as there is (to my knowledge) no out-of-the-box support for multiple files with the code editor that we use (Monaco).

I think the easiest way is to sort the files by name for each submission (as we expect similar file names for each submission, and that way these similar files are close together), concatenate them in the browser and add a "file separator" marker between subsequent files.

If we want to go more advanced, it should probably be possible to make a small file browser for each side with a similarity% next to it. This would require calculating this similarity for each file, which might have some caveats as well. I do not think we want to implement this advanced version right away, but keeping this in mind while implementing is probably a good idea.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
Status: Todo
Development

No branches or pull requests

1 participant