Skip to content
Matthew Pope edited this page Mar 6, 2021 · 3 revisions

nba-sql

This is the nba-sql database.

The project grew out of the desire for a free NBA dataset that is queryable using SQL. Existing solutions, like nba_api looked interesting and feature rich, but had several issues. Existing databases are build off of similar data but are hidden behind paywalls. Providing the code (and potentially only the code) to build such a database is desirable over a centrally hosted database with pay walled access. Existing websites are rich in features but doing analysis is extremely cumbersome. The data may go back further, but it is impossible to use this data with tools like Apache Superset or Tableau.

Goals

  • Reduce data duplication.
    • The NBA APIs return some data items excessively. I can only assume this is to reduce the number of requests required to populate their webpage. Things like player_name, age, team_name etc. are returned with most API requests. If included in the database this would require extra space. So these values are abstracted away into general player and team tables.
  • Efficient indexing.
    • We want to be able to query this data fast, and as a side effect of the first goal, only include unique data. We use composite primary keys in several places, which places strict uniqueness constraints on the data.
  • Ease of use.
    • If our current schema poses issues, please file an issue. An open discussion of how this data is organized is welcome.

I'm not very good with organizing wikis, so check the side bar for available pages.

Clone this wiki locally