- How to Upload Data
- Introduction
- Project Details and Faculty Engagement
- Other Datasets
- Posters and Reports
The procedure of uploading the datasets differs depending on the dataset.
Most datasets can be uploaded in a simple way, whereas uploading project_details
and faculty_engagement
datasets requires a few more additional steps.
project_details
and faculty_engagement
datasets will be pre-populated by our data cleaning process of raw Survey Monkey data. You can then populate the generated files by filling in empty columns, then re-upload them to make them viewable from the web application.
When you upload datasets, make sure the file name follows our naming convention, and the column names are consistent with our database schema.
Raw survey monkey data file should be named survey_monkey_{year}.xlsx
, where {year}
is replaced by the current funding year. Please refer to our data lake schema for the naming convention, and data catalogue for the column names.
ex. When you upload raw survey monkey data file for the year 2030, it should be named survey_monkey_2030.xlsx
.
Firstly, you need to upload institution_data.csv
to the INSTITUTION_DATA/
folder.
Then you need to upload the raw survey monkey file to /raw/survey_monkey
folder within a bucket whose name starts with databasestack-tlefanalytics
. For details, please refer to our data lake schema.
Click Upload
to upload the file.
You can either drag and drop the file, or choose Add files
to upload it.
Click Upload
at the bottom right corner to complete the process.
To initiate the data cleaning process of raw survey monkey data file you uploaded in step 2, you will first copy the S3 URI of the uploaded file.
Go to tlef-analytcs/raw/survey_monkey
folder and click on the item you uploaded.
From the top menu, press Copy S3 URI
.
.
Next, navigate to AWS Glue console.
You can do so by typing Glue
on the top search bar.
From the left side bar, select ETL jobs
. This will direct you to the Your jobs
section.
From the list of jobs, select tlef-clean-survey-monkey-data
.
Select Job details
tab, then click Advanced Properties
at the bottom.
When you scroll down, you will see a section Job parameters
.
Replace the Value
of one of the parameters --SURVEY_MONKEY_S3URI
. After the fresh deployment, the value is n/a
. Replace it with the S3 URI of the file that you copied in the previous step.
After you update the parameter, click Save
at the top-right corner, then press Run
to start the job.
After our first Glue job is completed, you will see two files generated as a result of our data cleaning process.
/raw/project_details
folder contains a new dataset named project_details_{year}.xlsx
and /raw/faculty_engagement
contains faculty_engagement_{year}.xlsx
, where {year}
represents the funding year of the dataset you uploaded earlier.
These datasets are formatted and cleaned, but miss a few columns which should be manually completed by the user. To make changes to the dataset, you first need to download the datasets onto your local computer.
Click on the item you would like to download, then click Download
from the top menu to download the file onto your local computer.
Once you finish editing the pre-generated datasets, you will need to re-upload the complete datasets back to Amazon S3 at the appropriate locations.
Before you upload the files, make sure the file names are the same as what you uploaded eariler in step 2.
Specifically, remove any number that has been added to the end of the file name, if there is any.
ex. If your downloaded file is named like project_details(1).xlsx
, remove (1)
from the file name and re-upload it.
For project_details_{year}.xlsx
, upload the files to /raw/project_details
folder.
faculty_engagement_{year}.xlsx
files need to be uploaded to /raw/faculty_engagement
folder.
faculty_engagement_{year}.xlsx
will be automatically converted and copied to /staging/faculty_engagement
folder.
project_details_{year}.xlsx
will require an additional step to complee the data preparation process.
Navigate to the file location in S3, then click Copy S3 URI
.
Again, navigate to AWS Glue console. If you are unsure of how to go to the console, please refer to step 3-2.
Select ETL jobs
from the left sidebar, then click tlef-generate-new-grant-ids
.
Select Job details tab, then click Advanced Properties
at the bottom.
When you scroll down, you will see a section called Job parameters
.
Replace the Value
of one of the parameters --PROJECT_DETAILS_S3URI
.
After that, click Save
, then Run
to start the Glue job.
If you would like to edit data after step 6, you can repeat step 4 and 5 for changes to take place.
When you upload datasets, make sure the file name follows our naming convention, and the column names are consistent with our database schema. Please check the table below refer to our data lake schema for the naming convention, and data catalogue for column names.
Data Type | Naming Convention | Example |
---|---|---|
Focus Area | focus_area_{year}.xlsx |
focus_area_2024.xlsx |
Faculty Options | options_faculty.xlsx |
options_faculty.xlsx |
Focus Area Options | options_focus_area.xlsx |
options_focus_area.xlsx |
Student Reach | student_reach_{year}.xlsx |
student_reach_2024.xlsx |
Unique Student | unique_student_{year}.xlsx |
unique_student_2024.xlsx |
Unsuccessful Projects | unsuccessful_projects_{year}.xlsx |
unsuccessful_projects_2024.xlsx |
Co Curricular Reach | co_curricular_reach_{year}.xlsx |
co_curricular_reach_2024.xlsx |
Project Outcome | project_outcomes.xlsx |
project_outcomes.xlsx |
Once you name each file correctly, upload the files to the appropriate location in Amazon S3. Please refer to the table below, or refer to our data lake schema for details.
Data Type | Location |
---|---|
Focus Area | /raw/focus_area/ |
Faculty Options | /raw/options/faculties/ |
Focus Area Options | /raw/options/focus_area/ |
Student Reach | /raw/student_reach/ |
Unique Student | /raw/unique_student/ |
Unsuccessful Projects | /raw/unsuccessful_projects/ |
Co Curricular Reach | /raw/co_curricular_reach/ |
Project Outcomes | /raw/project_outcomes/ |
Navigate to your folder, and click Upload
.
You can either drag and drop the file, or choose Add files
to upload it.
Click Upload
at the bottom right corner to complete the process.
After you upload each file to the correct location in the S3 bucket, the data preparation process will automatically be triggered and the files will be stored into /staging/
folder.
You can then preview the data you uploaded from the web application.
If you would like to edit the contents of the datasets after the initial uploads, you can follow the procedures described below.
Select the checkbox next to the item you would like to make changes to, then click Download
from the top menu. This will allow you to download the file onto your computer. You can then edit the file in Microsoft Excel.
Before you re-upload, make sure the file names are the same as what you uploaded earlier in step 2.
Go to Amazon S3 console, and upload files by repeating step 2.
When you upload datasets, make sure the file name follows our naming convention. Please check the table below, or refer to our data lake schema for the details.
Data Type | Naming Convention | Example |
---|---|---|
Poster | <grant id>-Poster.png |
2030-TLEF-SP1-SCI-Evans-Poster.png |
Report | <project id>-Report.png |
2030-SP-SCI-001.png |
Once you name each file correctly, upload the files to the appropriate location in a S3 bucket whose name starts with databasestack-tlefanalyticsimage
. Please refer to the table below, or refer to our data lake schema for details.
Data Type | Location |
---|---|
Poster | poster/ |
Report | report/ |
Navigate to your folder, and click Upload
.
You can either drag and drop the file, or choose Add files
to upload it.
Click Upload
at the bottom right corner to complete the process.