Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Receive models from emission eval private data #945

Conversation

humbleOldSage
Copy link
Contributor

Moving all the models related files from e-mission-eval-private-data to e-mission-server. Four files, as below, are moved from TRB_label_assist to emission/analysis/modelling/trip_model :

  1. models.py
  2. clustering.py
  3. mapping.py
  4. data_wrangling.py

I'll link the PR that handles changes on e-mission-eval-private-data side below once I have it ready. This way it'll be easier to track changes on both sides.

shankari and others added 30 commits July 30, 2017 12:51
Evaluation for TRB 2017 paper
Percom analysis + adapt the notebook script to read config
Make the setup and teardown generic and update the README to reflect …
Also, since we are sourcing the base setup/teardown, they already operate in
the current directory.

No need for additional copy/remove
Have the teardown delete current conf, not the e-mission conf!
Create a new directory with new setup and teardown scripts
Add the setup and teardown directories for the tripaware paper
To ensure that people don't check in sensitive material
Currently, this is the same as the list for the e-mission server
* Bulk update of repo policies

* Bulk update of repo policies

* Bulk update of repo policies

* Bulk update of repo policies

* Bulk update of repo policies

* Bulk update of repo policies

* Bulk update of repo policies

* Bulk update of repo policies

* Bulk update of repo policies

* Bulk update of repo policies

* Bulk update of repo policies

* Bulk update of repo policies

* Bulk update of repo policies

* Bulk update of repo policies

* Bulk update of repo policies

* Bulk update of repo policies

* Bulk update of repo policies
* Create a new environment for the eval

which includes the visualization modules from emission
instead of polluting the emission environment directly

* Fix/modify the setup scripts

- add an activate option to quickly activate the environment instead of setting everything up
- change the setup code to install emission and the emission viz/notebook code
- ensure that we install/activate conda as well
This is largely a direct copy of the existing `graph` function from
`emission/analysis/modelling/tour_model/similarity.py` in
https://github.com/e-mission/e-mission-server.git

Minor modifications:
- move the matplotlib imports out
- create a figure (`fig = plt.figure()`) before the plot
- return the figure so it is displayed properly
- add a line indicating the cutoff point in the graph
    - change the color of the existing cutoff to be red
    - notice that it is not visible
    - add a line indicating the cutoff instead

Also add the scaffolding to read and analyse data before generating the graph
- read the data
- create a similarity object
- create the bins

Note that this uses the newly refactored `calc_cutoff_bins` method so we can
plot the graph *before* and after the uncommon trips are removed
* plot graphs for all users

* plot graphs for all users

* update graph() function, adjust subplots size

* remove extraneous lines,only keep the code for plotting graphs for all users
Thanks to @corinne-hcr for finding and reporting them
hlu109 and others added 24 commits December 16, 2022 09:21
- Drop `fixed-width (O-D, destination)` since it is much worse than the others
  and we don't have the time to figure out why
- Add curve fitting for the f-score vs. number of trips, showing a curve that
  plateaus between 125 and 375 trips
…results

Note that these results can take very long (> 2 days) to regenerate
Running them from a notebook will either not print logs, or will print so many
logs that the notebook buffers will be overwhelmed

Moving the computation code out to a separate script allows us to more easily
redirect the output to a file and track the progress of the execution
- Replace PLACEHOLDERS with actual opcodes
- comment out knee detection from the classification performance since it didn't actually work that well
- add similar curve fitting to the cluster performance although we didn't use
  it in the paper
- initialize the predictors correctly (with strings) instead of predictors

Testing done:
- Ran all the notebooks, they ran without errors
* Covering up a possible error

Setting the repository up for the first time might cause this error to pop up. A solution was proposed earlier in Teams chat. Just migrating it here.

* Update README.md

Included a check to ensure that

* Update Clustering.py

Update clustering.py file to link it main branch 'trip_model' rather than hlu09's tour_model_extended

* Revert "Update Clustering.py"

This reverts commit e90d5037d73d8504e7429b69fea8af13004c2013.

* Update Readme

Ensuring conf file copied to correct location

* Removed whitespace
* Update clustering.py

Changes in clustering.py file to shift dependency from hlu09's  tour_model_extended to main branch trip_model. Still need to change type of data being passed to fit function for this to work.

* moving clustering_examples.ipynb to trip_model

All dependencies of this notebook from  custom branch are removed. There currently seems no errors while generating maps in clustering_examples notebook.

* Removing changes in builtimeseries.py

With these changes, no change in e-mission-server should be required.

* Changes to support TRB_Label_Assist

passing way of clustering to the e-mission-server. It was 'origin-destination' by default. Now can take one of three values,  'origin','destination' or 'origin-destination'.

* suggestions

previous suggestions to improve readability.

* Revert "suggestions"

This reverts commit 3e19b32cd090135b001709cb52da57e6c6a17c1f.

* Improving readability

Suggestions from previous comments to improve readability.

* making `cluster_performance.ipynb`, `generate_figs_for_poster` and  `SVM_decision_boundaries`  compatible with changes in `clustering.py` and `mapping.py` files. Also porting these 3 notebooks to trip_model

`cluster_performance.ipynb`, `generate_figs_for_poster` and  `SVM_decision_boundaries`  now have no dependence on the custom branch. Results of plots  are attached to show no difference in theie previous and current outputs.

* Unified Interface for fit function

Unified Interface for fit function across all models. Passing 'Entry' Type data from the notebooks till the Binning functions.  Default set to 'none'.

* Fixing `models.py` to support `regenerate_classification_performance_results.py`

Prior to this update, `NaiveBinningClassifier` in 'models.py' had dependencies on both of tour model and trip model. Now, this classifier is completely dependent on trip model. All the other notebooks (except `classification_performance.ipynb`) were tested as well and they are working as usual.

 Other minor fixes to support previous changes.

* [PARTIALLY TESTED] Single database read and   Code Cleanuo

1. removed mentions of `tour_model` or `tour_model_first_only` .

2. removed two reads from database.

3. Removed notebook outputs  ( this could be the reason a few diffs are too big to view)

* Delete TRB_label_assist/first_trial_results/cv results DBSCAN+SVM (destination).csv

not required.

* Reverting Notebook

Reverting notebooks to initial state, since running on the browser messed up the cell index numbers.  This was causing unnecessary git diffs even when no changes were made. running on VS code should resolve this. WIll do the subsequent changes on VS code and commit again.

* [Partially Tested]Handled Whitespaces

Whitespaces corrected.

* [Partially Tested] Suggested changes implemented

`Classification_performance` and `regenerate_classification_performance_results.py` are not tested yet as they would take too long to run. The itertools removal in these two files is tested in other notebooks and it works.  Other files, like models.py will be tested once  any of the above two are run.

* Revert "[Partially Tested] Suggested changes implemented"

This reverts commit bb404e989b2826f159e88fa828537b24785508e3.

* [Partially Tested] Suggested changes implemented

[Partially Tested] Suggested changes implemented
bb404e9
`Classification_performance` and `regenerate_classification_performance_results.py` are not tested yet as they would take too long to run. The itertools removal in these two files is tested in other notebooks and it works. Other files, like models.py will be tested once any of the above two are run.

* Minor variable fixes

Fixed names of variables to be more self-explanatory

* [TESTED] All the notebooks and files are tested

1. Change in models file a.t. changes in greedy_similarity_binning in e-mission-server

2.Minor fixes

* Minor Fixes

Minor Fixes to improve readability.

* Minor Fixes in models.py

Improved readability
…rver' into receive-models-from-emission-eval-private-data
REmoving additional files that came with the model.
Removing unnecessary files
 Removing unnecessary files
Removing files
Updating import paths and dependencies among the four files ( mapping.py,clustering.py,models.py,data_wrangling.py) that were recently moved from e-mission-eval-private-data
@humbleOldSage
Copy link
Contributor Author

corresponding PR on e-mission-eval-private-data is e-mission/e-mission-eval-private-data#40

@shankari
Copy link
Contributor

shankari commented Dec 2, 2023

@humbleOldSage this has extraneous commits as well, including commits that are completely unrelated to this (e.g. 'check in percom analysis before we forget"). We should copy these files over with the commit history related to them, not the entire commit history of the repository.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants