Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Harvest 2020 Publications for HCRI #140

Open
5 of 19 tasks
rickjohnson opened this issue Feb 8, 2021 · 0 comments
Open
5 of 19 tasks

Harvest 2020 Publications for HCRI #140

rickjohnson opened this issue Feb 8, 2021 · 0 comments

Comments

@rickjohnson
Copy link
Contributor

rickjohnson commented Feb 8, 2021

The office of research wants a query and report of the publications produced by faculty members for a given time frame. For this project, we’re focusing on HCRI and 2020 publications. In order to capture and track publications for the next annual review cycle for Harper Cancer Research Institute (HCRI). We’re looking at publications in PubMed, Scopus, and Web of Science. One challenge is disambiguation, we need to process a larger set of publications that might include the set of publications for our faculty researchers. The plan is to continue to use CrossRef and DOI resolvers to augment the publication information. For each publication we need to establish a confidence rating, a higher rating means that the publication is more likely to be from one of our faculty members.

We then need library members to review these publications to verify that they are in fact written by HCRI faculty.
Once a publication is reviewed, it is either added to the report or excluded from the reports. This task to queue up publications in the administrative application for review.

A checklist below outlines the concrete steps to harvest prepare publications for review. The process also includes updating the member list of the center to include new researchers as well as remove ones that are no longer members of the center. The process outlined below describes how the harvest is first performed and tested in a dev environment. A re-fetch of data is not performed in the pilot environment and the csv files created in dev are utilized to push metadata into the pilot environment for review by library employees.

  • Rebuild dev environment with pilot env data
  • Load backup from pilot env data
  • In dev environment load updated_person list with newly updated csv files (includes update to allow reload of data and not add duplicates) - also includes running 'make load authors' and 'make load_name_variances'
  • In dev environment fetch data from scopus to create relevant csv files
  • In dev environment fetch data from pubmed to create relevant csv files
  • In dev environment fetch data from web of science to create relevant csv files
  • In dev environment load scopus, pubmed, and wos data into DB and review for accuracy in logs
  • In dev environment run other related scripts such as applying journal names, confidence etc (will flesh out more later)
  • In dev environment open the Admin UI and confirm added publications appear correctly
  • In pilot env do git pull and restart services
  • In pilot env Run make migrate if needed
  • Copy csv data files to pilot env
  • In pilot env run 'make load_authors'
  • In pilot env run 'make load_name_variances'
  • In pilot environment fetch data from pubmed to create relevant csv files
  • In pilot environment fetch data from web of science to create relevant csv files
  • In pilot environment load scopus, pubmed, and wos data into DB and review for accuracy in logs
  • In pilot environment run other related scripts such as applying journal names, confidence etc (will flesh out more later)
  • In pilot environment open the Admin UI and confirm added publications appear correctly
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant