Skip to content

Data Visualization with Superset

Matthew Pope edited this page Mar 7, 2021 · 10 revisions

Superset Setup Guide

Apache Superset is a free, kick-ass tool to visualize and explore data. This guide will get you started with a bare-bones setup. Superset is awesome in that it is 'infinity scalable'. However, in this tutorial we're going with local hosting for ease of use. Superset comes with a Postgres database by default, so we're going to lean on that. With that in mind, this tutorial will work for an external Postgres / MySQL database too. We're using version 1.0.1 (released 02/2021), so this might need to be updated for future versions of Superset.

This guide assumes you have Docker and docker-compose installed. I've tested this on Linux and OSX. I could not get this functioning with Windows. From the Superset docs:

Superset is not officially supported on Windows unfortunately. 
The best option for Windows users to try out Superset locally is to install an Ubuntu Desktop VM

Setup Superset

  1. Follow the Superset docker documentation to run the docker-compose locally.
  2. Test everything looks good by navigating to localhost:8088 in your browser. The username / pass is admin.
  3. Create the database. Run this from the terminal.
    1. docker exec -i superset_db psql -U superset -c "CREATE DATABASE nba;"
  4. Create a connection for Superset.
    1. Go to the Databases tab in the browser.
    2. Read these docs.
    3. Use this connection string: postgresql+psycopg2://superset:superset@superset_db:5432/nba.
  5. Load our data.
    1. Clone this repo (or download the .zip).
    2. Modify the scripts/create_postgres.sh file to change the following environment variables.
      1. DB_NAME="nba"
      2. DB_HOST="localhost"
      3. DB_USER=superset
      4. DB_PASSWORD=superset
    3. Run the script! It'll take a while but we've included some progress bars to show that the script isn't being rate limited (a serious issue). If rate limiting is happening to you (it hangs indefinitely) you need to destroy the database and increase the --time_between_requests parameter.
  6. Follow the regular Superset documentation on how to setup databases and datasets, following the schema provided.
  7. Go to the SQL Lab -> SQL Editor tab. Enter the following query:
SELECT season_id, td3
FROM player_game_log 
LEFT JOIN player ON player.player_id = player_game_log.player_id 
WHERE player.player_name = 'Russell Westbrook';
  1. Click Run then Explore Data. You should then be able to chart Russell Westbrook's Triple-Doubles for each season he has played. 🎉
  2. Change Visualization Type from Table to Bar Chart. Metric should be SUM(td3) and SERIES should be season_id. Remove the row limit at the bottom, then hit Run. A bar chart should appear.

Keep in mind, when you build queries in Superset you shouldn't 'pre-aggregate'. Superset basically accepts a query as a view that it saves outside of our Postgres db, then does it's own aggregation. So make general queries that fetch a ton of rows, then do the SUM, AVG, or whatever inside of Superset.

Happy visualizing!

Clone this wiki locally