How to Upload Data

Introduction

The procedure of uploading the datasets differs depending on the dataset.
Most datasets can be uploaded in a simple way, whereas uploading project_details and faculty_engagement datasets requires a few more additional steps.

Project Details and Faculty Engagement

project_details and faculty_engagement datasets will be pre-populated by our data cleaning process of raw Survey Monkey data. You can then populate the generated files by filling in empty columns, then re-upload them to make them viewable from the web application.

Step 1: Name Datasets Appropriately

When you upload datasets, make sure the file name follows our naming convention, and the column names are consistent with our database schema.
Raw survey monkey data file should be named survey_monkey_{year}.xlsx, where {year} is replaced by the current funding year. Please refer to our data lake schema for the naming convention, and data catalogue for the column names.

ex. When you upload raw survey monkey data file for the year 2030, it should be named survey_monkey_2030.xlsx.

Step 2: Upload Raw Data File

Firstly, you need to upload institution_data.csv to the INSTITUTION_DATA/ folder. Then you need to upload the raw survey monkey file to /raw/survey_monkey folder within a bucket whose name starts with databasestack-tlefanalytics. For details, please refer to our data lake schema.
Click Upload to upload the file.

You can either drag and drop the file, or choose Add files to upload it.
Click Upload at the bottom right corner to complete the process.

Step 3: Initiate Glue Job

Step 3-1: Copy S3 URI

To initiate the data cleaning process of raw survey monkey data file you uploaded in step 2, you will first copy the S3 URI of the uploaded file.

Go to tlef-analytcs/raw/survey_monkey folder and click on the item you uploaded.
From the top menu, press Copy S3 URI. .

Step 3-2: Update Glue Job Parameter

Next, navigate to AWS Glue console. You can do so by typing Glue on the top search bar.

From the left side bar, select ETL jobs. This will direct you to the Your jobs section.
From the list of jobs, select tlef-clean-survey-monkey-data.
Select Job details tab, then click Advanced Properties at the bottom.

When you scroll down, you will see a section Job parameters.
Replace the Value of one of the parameters --SURVEY_MONKEY_S3URI. After the fresh deployment, the value is n/a. Replace it with the S3 URI of the file that you copied in the previous step.

After you update the parameter, click Save at the top-right corner, then press Run to start the job.

Step 4: Complete Pre-Populated Datasets

After our first Glue job is completed, you will see two files generated as a result of our data cleaning process.
/raw/project_details folder contains a new dataset named project_details_{year}.xlsx and /raw/faculty_engagement contains faculty_engagement_{year}.xlsx, where {year} represents the funding year of the dataset you uploaded earlier.

These datasets are formatted and cleaned, but miss a few columns which should be manually completed by the user. To make changes to the dataset, you first need to download the datasets onto your local computer.

Click on the item you would like to download, then click Download from the top menu to download the file onto your local computer.

Step 5: Re-Upload Datasets

Once you finish editing the pre-generated datasets, you will need to re-upload the complete datasets back to Amazon S3 at the appropriate locations.

Before you upload the files, make sure the file names are the same as what you uploaded eariler in step 2.
Specifically, remove any number that has been added to the end of the file name, if there is any.
ex. If your downloaded file is named like project_details(1).xlsx, remove (1) from the file name and re-upload it.

For project_details_{year}.xlsx, upload the files to /raw/project_details folder.
faculty_engagement_{year}.xlsx files need to be uploaded to /raw/faculty_engagement folder.

Step 6: Start the Second Glue Job

Step 6-1: Copy S3 URI

faculty_engagement_{year}.xlsx will be automatically converted and copied to /staging/faculty_engagement folder.

project_details_{year}.xlsx will require an additional step to complee the data preparation process.
Navigate to the file location in S3, then click Copy S3 URI.

Step 6-2: Update Glue Job Parameter

Again, navigate to AWS Glue console. If you are unsure of how to go to the console, please refer to step 3-2.

Select ETL jobs from the left sidebar, then click tlef-generate-new-grant-ids.
Select Job details tab, then click Advanced Properties at the bottom.

When you scroll down, you will see a section called Job parameters. Replace the Value of one of the parameters --PROJECT_DETAILS_S3URI.

After that, click Save, then Run to start the Glue job.

Step 7: Edit Data

If you would like to edit data after step 6, you can repeat step 4 and 5 for changes to take place.

Other Datasets

Step 1: Name Datasets Appropriately

When you upload datasets, make sure the file name follows our naming convention, and the column names are consistent with our database schema. Please check the table below refer to our data lake schema for the naming convention, and data catalogue for column names.

Data Type	Naming Convention	Example
Focus Area	`focus_area_{year}.xlsx`	`focus_area_2024.xlsx`
Faculty Options	`options_faculty.xlsx`	`options_faculty.xlsx`
Focus Area Options	`options_focus_area.xlsx`	`options_focus_area.xlsx`
Student Reach	`student_reach_{year}.xlsx`	`student_reach_2024.xlsx`
Unique Student	`unique_student_{year}.xlsx`	`unique_student_2024.xlsx`
Unsuccessful Projects	`unsuccessful_projects_{year}.xlsx`	`unsuccessful_projects_2024.xlsx`
Co Curricular Reach	`co_curricular_reach_{year}.xlsx`	`co_curricular_reach_2024.xlsx`
Project Outcome	`project_outcomes.xlsx`	`project_outcomes.xlsx`

Step 2: Uploading files

Once you name each file correctly, upload the files to the appropriate location in Amazon S3. Please refer to the table below, or refer to our data lake schema for details.

Data Type	Location
Focus Area	`/raw/focus_area/`
Faculty Options	`/raw/options/faculties/`
Focus Area Options	`/raw/options/focus_area/`
Student Reach	`/raw/student_reach/`
Unique Student	`/raw/unique_student/`
Unsuccessful Projects	`/raw/unsuccessful_projects/`
Co Curricular Reach	`/raw/co_curricular_reach/`
Project Outcomes	`/raw/project_outcomes/`

Navigate to your folder, and click Upload.

You can either drag and drop the file, or choose Add files to upload it.
Click Upload at the bottom right corner to complete the process.

After you upload each file to the correct location in the S3 bucket, the data preparation process will automatically be triggered and the files will be stored into /staging/ folder.

You can then preview the data you uploaded from the web application.

Step 3: Editing Datasets

If you would like to edit the contents of the datasets after the initial uploads, you can follow the procedures described below.

Step 3-1: Downloading File onto Local Computer

Select the checkbox next to the item you would like to make changes to, then click Download from the top menu. This will allow you to download the file onto your computer. You can then edit the file in Microsoft Excel.

Step 3-2: Re-Uploading Datasets

Before you re-upload, make sure the file names are the same as what you uploaded earlier in step 2.

Go to Amazon S3 console, and upload files by repeating step 2.

Posters and Reports

Step 1: Name Posters and Reports Appropriately

When you upload datasets, make sure the file name follows our naming convention. Please check the table below, or refer to our data lake schema for the details.

Data Type	Naming Convention	Example
Poster	`<grant id>-Poster.png`	`2030-TLEF-SP1-SCI-Evans-Poster.png`
Report	`<project id>-Report.png`	`2030-SP-SCI-001.png`

Step 2: Upload Posters and Reports

Once you name each file correctly, upload the files to the appropriate location in a S3 bucket whose name starts with databasestack-tlefanalyticsimage. Please refer to the table below, or refer to our data lake schema for details.

Data Type	Location
Poster	`poster/`
Report	`report/`

Navigate to your folder, and click Upload.

You can either drag and drop the file, or choose Add files to upload it.
Click Upload at the bottom right corner to complete the process.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

UploadData.md

UploadData.md

How to Upload Data

Table of Contents

Introduction

Project Details and Faculty Engagement

Step 1: Name Datasets Appropriately

Step 2: Upload Raw Data File

Step 3: Initiate Glue Job

Step 3-1: Copy S3 URI

Step 3-2: Update Glue Job Parameter

Step 4: Complete Pre-Populated Datasets

Step 5: Re-Upload Datasets

Step 6: Start the Second Glue Job

Step 6-1: Copy S3 URI

Step 6-2: Update Glue Job Parameter

Step 7: Edit Data

Other Datasets

Step 1: Name Datasets Appropriately

Step 2: Uploading files

Step 3: Editing Datasets

Step 3-1: Downloading File onto Local Computer

Step 3-2: Re-Uploading Datasets

Posters and Reports

Step 1: Name Posters and Reports Appropriately

Step 2: Upload Posters and Reports

Files

UploadData.md

Latest commit

History

UploadData.md

File metadata and controls

How to Upload Data

Table of Contents

Introduction

Project Details and Faculty Engagement

Step 1: Name Datasets Appropriately

Step 2: Upload Raw Data File

Step 3: Initiate Glue Job

Step 3-1: Copy S3 URI

Step 3-2: Update Glue Job Parameter

Step 4: Complete Pre-Populated Datasets

Step 5: Re-Upload Datasets

Step 6: Start the Second Glue Job

Step 6-1: Copy S3 URI

Step 6-2: Update Glue Job Parameter

Step 7: Edit Data

Other Datasets

Step 1: Name Datasets Appropriately

Step 2: Uploading files

Step 3: Editing Datasets

Step 3-1: Downloading File onto Local Computer

Step 3-2: Re-Uploading Datasets

Posters and Reports

Step 1: Name Posters and Reports Appropriately

Step 2: Upload Posters and Reports