Skip to content

An interactive grid for sorting, filtering, and editing DataFrames in Jupyter notebooks

License

Notifications You must be signed in to change notification settings

richardlin047/modin-spreadsheet

 
 

Repository files navigation

Modin

Modin-spreadsheet

Modin-spreadsheet is the underlying package for the Modin Spreadsheet API. It renders DataFrames within a Jupyter notebook as a spreadsheet and makes it easy to explore with intuitive scrolling, sorting, and filtering controls. The spreadsheet allows click editing, adding/removing rows, etc. and can also be controlled using the API. Modin-spreadsheet also records the history of changes made so that you can share or reproduce your results.

Modin-spreadsheet builds on top of SlickGrid and Modin to provide a highly responsive experience even on DataFrames with 100,000 rows.

Modin-spreadsheet is forked from Qgrid, which was developed by Quantopian. Some documentation will reference Qgrid documentation as we continue to build out our own documentation. To learn more about Qgrid, here is an introduction on YouTube.

Here is an example of the Modin-spreadsheet widget in action.

docs/images/overview_demo.gif

A brief demo showing the common use cases for Modin-spreadsheet: filtering, editing, sorting, generating reproducible code, and exporting the changed dataframe

API Documentation

Full documentation for Modin-spreadsheet is still in progress. Most features are documented on Qgrid's readthedocs: https://qgrid.readthedocs.io/.

Installation

Modin-spreadsheet is intended be used through the Modin Spreadsheet API (Docs in progress...). Please install Modin and Modin-spreadsheet by running the following:

pip install modin
pip install modin[spreadsheet]

To enable the Modin-spreadsheet widget, you may need to also run:

jupyter nbextension enable --py --sys-prefix modin_spreadsheet

# only required if you have not enabled the ipywidgets nbextension yet
jupyter nbextension enable --py --sys-prefix widgetsnbextension

If needed, Modin-spreadsheet can be installed through PyPi.

pip install modin-spreadsheet

Features

Column-specific options: The feature enables the ability to set options on a per column basis. This allows you to do things like explicitly specify which column should be sortable, editable, etc. For example, if you wanted to prevent editing on all columns except for a column named 'A', you could do the following:

col_opts = { 'editable': False }
col_defs = { 'A': { 'editable': True } }
modin_spreadsheet.show_grid(df, column_options=col_opts, column_definitions=col_defs)

See the show_grid documentation for more information.

Disable editing on a per-row basis: This feature allows a user to specify whether or not a particular row should be editable. For example, to make it so only rows in the grid where the 'status' column is set to 'active' are editable, you might use the following code:

def can_edit_row(row):
    return row['status'] == 'active'

modin_spreadsheet.show_grid(df, row_edit_callback=can_edit_row)

Dynamically update an existing spreadsheet widget: These API allow users to programmatically update the state of an existing spreadsheet widget:

MultiIndex Support: Modin-spreadsheet displays multi-indexed DataFrames with some of the index cells merged for readability, as is normally done when viewing DataFrames as a static html table. The following image shows Modin-spreadsheet displaying a multi-indexed DataFrame:

https://s3.amazonaws.com/quantopian-forums/pipeline_with_qgrid.png

Disclaimer: This is from the Qgrid documentation.

Events API: The Events API provides on and off methods which can be used to attach/detach event handlers. They're available on both the modin_spreadsheet module (see qgrid.on), and on individual SpreadsheetWidget instances (see qgrid.QgridWidget.on).

Having the ability to attach event handlers allows us to do some interesting things in terms of using Modin-spreadsheet in conjunction with other widgets/visualizations. One example is using Modin-spreadsheet to filter a DataFrame that's also being displayed by another visualization.

Here's how you would use the on method to print the DataFrame every time there's a change made:

def handle_json_updated(event, spreadsheet_widget):
    # exclude 'viewport_changed' events since that doesn't change the DataFrame
    if (event['triggered_by'] != 'viewport_changed'):
        print(spreadsheet_widget.get_changed_df())

spreadsheet_widget.on('json_updated', handle_json_updated)

Here are some examples of how the Events API can be applied.

This shows how you can use Modin-spreadsheet to filter the data that's being shown by a matplotlib scatter plot:

docs/images/linked_to_scatter.gif

Disclaimer: This is from the Qgrid documentation.

This shows how events are recorded in real-time. The demo is recorded on JupyterLab, which is not yet supported, but the functionality is the same on Jupyter Notebook.

docs/images/events_api.gif

Disclaimer: This is from the Qgrid documentation.

Running from source & testing your changes

If you'd like to contribute to Modin-spreadsheet, or just want to be able to modify the source code for your own purposes, you'll want to clone this repository and run Modin-spreadsheet from your local copy of the repository. The following steps explain how to do this.

  1. Clone the repository from GitHub and cd into the top-level directory:

    git clone https://github.com/modin-project/modin-spreadsheet.git
    cd modin-spreadsheet
    
  2. Install the current project in editable mode:

    pip install -e .
    
  3. Install the node packages that Modin-spreadsheet depends on and build Modin-spreadsheet's javascript using webpack:

    cd js && npm install .
    
  4. Install and enable Modin-spreadsheet's javascript in your local jupyter notebook environment:

    jupyter nbextension install --py --symlink --sys-prefix modin_spreadsheet && jupyter nbextension enable --py --sys-prefix modin_spreadsheet
    
  5. Run the notebook as you normally would with the following command:

    jupyter notebook
    

Manually testing server-side changes

If the code you need to change is in Modin-spreadsheet's python code, then restart the kernel of the notebook you're in and rerun any Modin-spreadsheet cells to see your changes take effect.

Manually testing client-side changes

If the code you need to change is in Modin-spreadsheet's javascript or css code, repeat step 3 to rebuild Modin-spreadsheet's npm package, then refresh the browser tab where you're viewing your notebook to see your changes take effect.

Running automated tests

There is a small python test suite which can be run locally by running the command pytest in the root folder of the repository.

Contributing

All contributions, bug reports, bug fixes, documentation improvements, enhancements, and ideas are welcome. See the Running from source & testing your changes section above for more details on local Modin-spreadsheet development.

If you are looking to start working with the Modin-spreadsheet codebase, navigate to the GitHub issues tab and start looking through interesting issues.

Feel free to ask questions by submitting an issue with your question.

About

An interactive grid for sorting, filtering, and editing DataFrames in Jupyter notebooks

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Python 72.6%
  • JavaScript 22.1%
  • CSS 5.3%