Skip to content

Commit

Permalink
Update ingest test fixtures
Browse files Browse the repository at this point in the history
  • Loading branch information
yuming-long authored Oct 4, 2023
1 parent 404fb71 commit c61291b
Show file tree
Hide file tree
Showing 12 changed files with 3,390 additions and 4,351 deletions.
Original file line number Diff line number Diff line change
Expand Up @@ -57,7 +57,7 @@
"text": "Lisa Federer, MLIS, Data Science Training Coordinator"
},
{
"type": "Title",
"type": "NarrativeText",
"element_id": "7f56b84c46cb41ebdcec2c9ac8673d72",
"metadata": {
"data_source": {
Expand Down Expand Up @@ -115,7 +115,7 @@
},
{
"type": "ListItem",
"element_id": "d94c6241299e6eff20ee6499cb9f64de",
"element_id": "8f90f5970c85f335b1bf50af611ce5c5",
"metadata": {
"data_source": {
"url": "abfs://container1/Core-Skills-for-Biomedical-Data-Scientists-2-pages.pdf",
Expand All @@ -130,10 +130,86 @@
"filetype": "application/pdf",
"page_number": 1
},
"text": "1. General biomedical subject matter knowledge: biomedical data scientists should have a general working knowledge of the principles of biology, bioinformatics, and basic clinical science; 2. Programming language expertise: biomedical data scientists should be fluent in at least one programming language (typically R and/or Python); 3. Predictive analytics, modeling, and machine learning: while a range of statistical methods may be useful, predictive analytics, modeling, and machine learning emerged as especially important skills in biomedical data science; 4. Team science and scientific communication: “soft” skills, like the ability to work well on teams and communicate effectively in both verbal and written venues, may be as important as the more technical skills typically associated with data science. 5. Responsible data stewardship: a successful data scientist must be able to implement best practices for data management and stewardship, as well as conduct research in an ethical manner that maintains data security and privacy."
"text": "1. General biomedical subject matter knowledge: biomedical data scientists should have a general working knowledge of the principles of biology, bioinformatics, and basic clinical science;"
},
{
"type": "UncategorizedText",
"type": "ListItem",
"element_id": "0b2857001b1a9eba5e46e26cba08e2ac",
"metadata": {
"data_source": {
"url": "abfs://container1/Core-Skills-for-Biomedical-Data-Scientists-2-pages.pdf",
"version": 167189396509615428390709838081557906335,
"record_locator": {
"protocol": "abfs",
"remote_file_path": "container1/Core-Skills-for-Biomedical-Data-Scientists-2-pages.pdf"
},
"date_created": "2023-03-10T09:32:44+00:00",
"date_modified": "2023-03-10T09:32:44+00:00"
},
"filetype": "application/pdf",
"page_number": 1
},
"text": "2. Programming language expertise: biomedical data scientists should be fluent in at least one programming language (typically R and/or Python);"
},
{
"type": "ListItem",
"element_id": "c6be5389b7bd00746d39b7bac468dea0",
"metadata": {
"data_source": {
"url": "abfs://container1/Core-Skills-for-Biomedical-Data-Scientists-2-pages.pdf",
"version": 167189396509615428390709838081557906335,
"record_locator": {
"protocol": "abfs",
"remote_file_path": "container1/Core-Skills-for-Biomedical-Data-Scientists-2-pages.pdf"
},
"date_created": "2023-03-10T09:32:44+00:00",
"date_modified": "2023-03-10T09:32:44+00:00"
},
"filetype": "application/pdf",
"page_number": 1
},
"text": "3. Predictive analytics, modeling, and machine learning: while a range of statistical methods may be useful, predictive analytics, modeling, and machine learning emerged as especially important skills in biomedical data science;"
},
{
"type": "ListItem",
"element_id": "1b8039583cbc15f654c89f2141eb6e10",
"metadata": {
"data_source": {
"url": "abfs://container1/Core-Skills-for-Biomedical-Data-Scientists-2-pages.pdf",
"version": 167189396509615428390709838081557906335,
"record_locator": {
"protocol": "abfs",
"remote_file_path": "container1/Core-Skills-for-Biomedical-Data-Scientists-2-pages.pdf"
},
"date_created": "2023-03-10T09:32:44+00:00",
"date_modified": "2023-03-10T09:32:44+00:00"
},
"filetype": "application/pdf",
"page_number": 1
},
"text": "4. Team science and scientific communication: “soft” skills, like the ability to work well on teams and communicate effectively in both verbal and written venues, may be as important as the more technical skills typically associated with data science."
},
{
"type": "ListItem",
"element_id": "2f87757b1d497a32c077be543632ed7d",
"metadata": {
"data_source": {
"url": "abfs://container1/Core-Skills-for-Biomedical-Data-Scientists-2-pages.pdf",
"version": 167189396509615428390709838081557906335,
"record_locator": {
"protocol": "abfs",
"remote_file_path": "container1/Core-Skills-for-Biomedical-Data-Scientists-2-pages.pdf"
},
"date_created": "2023-03-10T09:32:44+00:00",
"date_modified": "2023-03-10T09:32:44+00:00"
},
"filetype": "application/pdf",
"page_number": 1
},
"text": "5. Responsible data stewardship: a successful data scientist must be able to implement best practices for data management and stewardship, as well as conduct research in an ethical manner that maintains data security and privacy."
},
{
"type": "NarrativeText",
"element_id": "34b28172088bba51c6764df6d4e87674",
"metadata": {
"data_source": {
Expand Down Expand Up @@ -209,7 +285,7 @@
"text": "Core Skills for Biomedical Data Scientists"
},
{
"type": "Title",
"type": "NarrativeText",
"element_id": "4c5f925a7db08289f19dbe8635d8b4cd",
"metadata": {
"data_source": {
Expand Down Expand Up @@ -247,7 +323,7 @@
"text": "Methodology"
},
{
"type": "Title",
"type": "NarrativeText",
"element_id": "bcefa2402c4d32dbf76a40451d0fc3dd",
"metadata": {
"data_source": {
Expand All @@ -267,7 +343,7 @@
},
{
"type": "ListItem",
"element_id": "fdd38e2d80cc964e9bf3c7e09a760e21",
"element_id": "9e4072125e9465a2ff9f58529ce54428",
"metadata": {
"data_source": {
"url": "abfs://container1/Core-Skills-for-Biomedical-Data-Scientists-2-pages.pdf",
Expand All @@ -282,10 +358,29 @@
"filetype": "application/pdf",
"page_number": 2
},
"text": "a) Responses to a 2017 Kaggle' survey’ of over 16,000 self-identified data scientists working across many industries. Analysis of the Kaggle survey responses from the current data science workforce provided insights into the current generation of data scientists, including how they were trained and what programming and analysis skills they use. b) Data science skills taught in BD2K-funded training programs. A qualitative content analysis was applied to the descriptions of required courses offered under the 12 BD2kK-funded training programs. Each course was coded using qualitative data analysis software, with each skill that was present in the description counted once. The coding schema of data science-related skills was inductively developed and was organized into four major categories: (1) statistics and math skills; (2) computer science; (3) subject knowledge; (4) general skills, like communication and teamwork. The coding schema is detailed in Appendix A. c) Desired skills identified from data science-related job ads. 59 job ads from government (8.5%), academia (42.4%), industry (83.9%), and the nonprofit sector (15.3%) were sampled from websites like Glassdoor, Linkedin, and Ziprecruiter. The"
"text": "a) Responses to a 2017 Kaggle' survey’ of over 16,000 self-identified data scientists working across many industries. Analysis of the Kaggle survey responses from the current data science workforce provided insights into the current generation of data scientists, including how they were trained and what programming and analysis skills they use."
},
{
"type": "NarrativeText",
"type": "ListItem",
"element_id": "77162f0e50911686ff277d8f132430b3",
"metadata": {
"data_source": {
"url": "abfs://container1/Core-Skills-for-Biomedical-Data-Scientists-2-pages.pdf",
"version": 167189396509615428390709838081557906335,
"record_locator": {
"protocol": "abfs",
"remote_file_path": "container1/Core-Skills-for-Biomedical-Data-Scientists-2-pages.pdf"
},
"date_created": "2023-03-10T09:32:44+00:00",
"date_modified": "2023-03-10T09:32:44+00:00"
},
"filetype": "application/pdf",
"page_number": 2
},
"text": "b) Data science skills taught in BD2K-funded training programs. A qualitative content analysis was applied to the descriptions of required courses offered under the 12 BD2kK-funded training programs. Each course was coded using qualitative data analysis software, with each skill that was present in the description counted once. The coding schema of data science-related skills was inductively developed and was organized into four major categories: (1) statistics and math skills; (2) computer science; (3) subject knowledge; (4) general skills, like communication and teamwork. The coding schema is detailed in Appendix A."
},
{
"type": "ListItem",
"element_id": "537553a92c985f257ddf026fb12cc547",
"metadata": {
"data_source": {
Expand Down Expand Up @@ -324,7 +419,7 @@
},
{
"type": "NarrativeText",
"element_id": "0d1ffbb776fa283940e40707ea63b72a",
"element_id": "eed435329f99bc2f2a992e48715b19bc",
"metadata": {
"data_source": {
"url": "abfs://container1/Core-Skills-for-Biomedical-Data-Scientists-2-pages.pdf",
Expand All @@ -341,6 +436,25 @@
},
"text": "' Kaggle is an online community for data scientists, serving as a platform for collaboration, competition, and learning: http://kaggle.com ? In August 2017, Kaggle conducted an industry-wide survey to gain a clearer picture of the state of data science and machine learning. A standard set of questions were asked of all respondents, with more specific questions related to work for employed data scientists and questions related to learning for data scientists in training. Methodology and results: https://www.kaggle.com/kaggle/kaggle-survey-2017"
},
{
"type": "Footer",
"element_id": "d4735e3a265e16eee03f59718b9b5d03",
"metadata": {
"data_source": {
"url": "abfs://container1/Core-Skills-for-Biomedical-Data-Scientists-2-pages.pdf",
"version": 167189396509615428390709838081557906335,
"record_locator": {
"protocol": "abfs",
"remote_file_path": "container1/Core-Skills-for-Biomedical-Data-Scientists-2-pages.pdf"
},
"date_created": "2023-03-10T09:32:44+00:00",
"date_modified": "2023-03-10T09:32:44+00:00"
},
"filetype": "application/pdf",
"page_number": 2
},
"text": "2"
},
{
"type": "UncategorizedText",
"element_id": "d4735e3a265e16eee03f59718b9b5d03",
Expand Down
Loading

0 comments on commit c61291b

Please sign in to comment.