Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

📖 Update data request documentation, #1038

Open
wants to merge 3 commits into
base: master
Choose a base branch
from

Conversation

the-bay-kay
Copy link

Shankari and I discussed this page of the documentation, and found it was out of date -- as such, we're updating it! I'm also adding some documentation on how to run the data, building off of Abby's work with the dashboard (link)

Added details on how to request and load data
TODO: Fill in the extra link, confirm data loading instructions
@the-bay-kay
Copy link
Author

Once the info in this new readme is OK'd, this branch should be merged before PR 41 -- I want to point to some of the instructions here in this branch, and will be unable to link until the changes are merged.

1. Data formats for the json objects are at `emission/core/wrapper` (e.g. `emission/core/wrapper/location.py` and `emission/core/wrapper/cleanedtrip.py`)

## Data analysis ##
## Data Analysis - Server ##
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is actually the deprecated method now. We sometimes internally use the user specific dumps to reproduce errors, but for external users, they either get the mongodump, or download csv files from their admin dashboard.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Made a change to specify this is for internal testing only! Let me know if I should be more specific about it being a deprecated method.

Should I add a footnote about working with CSV's? I've only worked with the mongodump format, but could ask around for helping writing a section on that process.

docs/manage/requesting_data_as_a_collaborator.md Outdated Show resolved Hide resolved
- More information on this approach can be found in the public dashboard [ReadMe](https://github.com/e-mission/em-public-dashboard/blob/main/README.md#large-dataset-workaround).


In general, it is best to follow the instructions of the repository you are working with. There are subtle differences between them, and these instructions are intended as general guidance only.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should unify these but obviously we should keep this documentation until we do.

- Made the docker style analysis the main data analysis method
- Emphasized that the server method was for internal debugging purposes.

## Working With Data ##

After requesting data from TSDC, you should receive a "mongodump" file -- a collection of data, archived in `.tar.gz` format. Here are the broad steps you need to take in order to work with this data:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The TSDC will not provide mongodumps. The TSDC will provide access to the data in csv files/postgres database. The mongodump is currently only available for internal use.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Status: Review done; Changes requested
Development

Successfully merging this pull request may close these issues.

2 participants