From fd91e8c96074c9373354fb02a9cd7b25bfd656b2 Mon Sep 17 00:00:00 2001 From: Emad Rad Date: Mon, 21 Oct 2024 13:55:21 +0330 Subject: [PATCH] chore: various markdown warnings resolved --- RESOURCES/FEATURE_FLAGS.md | 1 + RESOURCES/INTHEWILD.md | 14 +- docs/docs/configuration/alerts-reports.mdx | 5 +- .../configuration/async-queries-celery.mdx | 8 +- docs/docs/configuration/cache.mdx | 2 +- .../configuration/configuring-superset.mdx | 13 +- docs/docs/configuration/databases.mdx | 131 +++++++----------- .../importing-exporting-datasources.mdx | 22 +-- .../configuration/networking-settings.mdx | 9 +- docs/docs/configuration/sql-templating.mdx | 19 +-- docs/docs/configuration/timezones.mdx | 4 +- docs/docs/contributing/development.mdx | 14 +- docs/docs/contributing/misc.mdx | 2 +- docs/docs/contributing/resources.mdx | 2 +- docs/docs/faq.mdx | 18 +-- docs/docs/installation/architecture.mdx | 4 + docs/docs/installation/docker-builds.mdx | 3 +- docs/docs/installation/kubernetes.mdx | 7 +- docs/docs/quickstart.mdx | 4 +- docs/docs/security/security.mdx | 11 +- .../creating-your-first-dashboard.mdx | 6 +- docs/docs/using-superset/exploring-data.mdx | 2 +- 22 files changed, 147 insertions(+), 154 deletions(-) diff --git a/RESOURCES/FEATURE_FLAGS.md b/RESOURCES/FEATURE_FLAGS.md index f985ad7254941..8bc6bf4e9707f 100644 --- a/RESOURCES/FEATURE_FLAGS.md +++ b/RESOURCES/FEATURE_FLAGS.md @@ -63,6 +63,7 @@ These features flags are **safe for production**. They have been tested and will [//]: # "PLEASE KEEP THESE LISTS SORTED ALPHABETICALLY" ### Flags on the path to feature launch and flag deprecation/removal + - DASHBOARD_VIRTUALIZATION - DRILL_BY - DISABLE_LEGACY_DATASOURCE_EDITOR diff --git a/RESOURCES/INTHEWILD.md b/RESOURCES/INTHEWILD.md index 4ec5b6ac1e1f9..407e4497a5aae 100644 --- a/RESOURCES/INTHEWILD.md +++ b/RESOURCES/INTHEWILD.md @@ -25,8 +25,8 @@ all you have to do is file a simple PR [like this one](https://github.com/apache the categorization is inaccurate, please file a PR with your correction as well. Join our growing community! - ### Sharing Economy + - [Airbnb](https://github.com/airbnb) - [Faasos](https://faasos.com/) [@shashanksingh] - [Hostnfly](https://www.hostnfly.com/) [@alexisrosuel] @@ -35,6 +35,7 @@ Join our growing community! - [Ontruck](https://www.ontruck.com/) ### Financial Services + - [Aktia Bank plc](https://www.aktia.com) - [American Express](https://www.americanexpress.com) [@TheLastSultan] - [bumper](https://www.bumper.co/) [@vasu-ram, @JamiePercival] @@ -48,9 +49,11 @@ Join our growing community! - [Xendit](https://xendit.co/) [@LieAlbertTriAdrian] ### Gaming + - [Popoko VM Games Studio](https://popoko.live) ### E-Commerce + - [AiHello](https://www.aihello.com) [@ganeshkrishnan1] - [Bazaar Technologies](https://www.bazaartech.com) [@umair-abro] - [Dragonpass](https://www.dragonpass.com.cn/) [@zhxjdwh] @@ -72,6 +75,7 @@ Join our growing community! - [Zalora](https://www.zalora.com) [@ksaagariconic] ### Enterprise Technology + - [A3Data](https://a3data.com.br) [@neylsoncrepalde] - [Analytics Aura](https://analyticsaura.com/) [@Analytics-Aura] - [Apollo GraphQL](https://www.apollographql.com/) [@evans] @@ -121,6 +125,7 @@ Join our growing community! - [Zeta](https://www.zeta.tech/) [@shaikidris] ### Media & Entertainment + - [6play](https://www.6play.fr) [@CoryChaplin] - [bilibili](https://www.bilibili.com) [@Moinheart] - [BurdaForward](https://www.burda-forward.de/en/) @@ -133,6 +138,7 @@ Join our growing community! - [Zaihang](https://www.zaih.com/) ### Education + - [Aveti Learning](https://avetilearning.com/) [@TheShubhendra] - [Brilliant.org](https://brilliant.org/) - [Platzi.com](https://platzi.com/) @@ -143,6 +149,7 @@ Join our growing community! - [WikiMedia Foundation](https://wikimediafoundation.org) [@vg] ### Energy + - [Airboxlab](https://foobot.io) [@antoine-galataud] - [DouroECI](https://www.douroeci.com/) [@nunohelibeires] - [Safaricom](https://www.safaricom.co.ke/) [@mmutiso] @@ -150,6 +157,7 @@ Join our growing community! - [Wattbewerb](https://wattbewerb.de/) [@wattbewerb] ### Healthcare + - [Amino](https://amino.com) [@shkr] - [Bluesquare](https://www.bluesquarehub.com/) [@madewulf] - [Care](https://www.getcare.io/) [@alandao2021] @@ -160,19 +168,23 @@ Join our growing community! - [WeSure](https://www.wesure.cn/) ### HR / Staffing + - [Swile](https://www.swile.co/) [@PaoloTerzi] - [Symmetrics](https://www.symmetrics.fyi) - [bluquist](https://bluquist.com/) ### Government + - [City of Ann Arbor, MI](https://www.a2gov.org/) [@sfirke] - [RIS3 Strategy of CZ, MIT CR](https://www.ris3.cz/) [@RIS3CZ] ### Travel + - [Agoda](https://www.agoda.com/) [@lostseaway, @maiake, @obombayo] - [Skyscanner](https://www.skyscanner.net/) [@cleslie, @stanhoucke] ### Others + - [10Web](https://10web.io/) - [AI inside](https://inside.ai/en/) - [Automattic](https://automattic.com/) [@Khrol, @Usiel] diff --git a/docs/docs/configuration/alerts-reports.mdx b/docs/docs/configuration/alerts-reports.mdx index 410f5e31a2044..b9bd8945ca1e9 100644 --- a/docs/docs/configuration/alerts-reports.mdx +++ b/docs/docs/configuration/alerts-reports.mdx @@ -86,6 +86,7 @@ You can find documentation about each field in the default `config.py` in the Gi You need to replace default values with your custom Redis, Slack and/or SMTP config. Superset uses Celery beat and Celery worker(s) to send alerts and reports. + - The beat is the scheduler that tells the worker when to perform its tasks. This schedule is defined when you create the alert or report. - The worker will process the tasks that need to be performed when an alert or report is fired. @@ -182,7 +183,6 @@ ALERT_REPORTS_EXECUTE_AS = [ExecutorType.SELENIUM] Please refer to `ExecutorType` in the codebase for other executor types. - **Important notes** - Be mindful of the concurrency setting for celery (using `-c 4`). Selenium/webdriver instances can @@ -194,7 +194,6 @@ Please refer to `ExecutorType` in the codebase for other executor types. - Adjust `WEBDRIVER_BASEURL` in your configuration file if celery workers can’t access Superset via its default value of `http://0.0.0.0:8080/`. - It's also possible to specify a minimum interval between each report's execution through the config file: ``` python @@ -300,6 +299,7 @@ One symptom of an invalid connection to an email server is receiving an error of Confirm via testing that your outbound email configuration is correct. Here is the simplest test, for an un-authenticated email SMTP email service running on port 25. If you are sending over SSL, for instance, study how [Superset's codebase sends emails](https://github.com/apache/superset/blob/master/superset/utils/core.py#L818) and then test with those commands and arguments. Start Python in your worker environment, replace all example values, and run: + ```python import smtplib from email.mime.multipart import MIMEMultipart @@ -321,6 +321,7 @@ mailserver.quit() This should send an email. Possible fixes: + - Some cloud hosts disable outgoing unauthenticated SMTP email to prevent spam. For instance, [Azure blocks port 25 by default on some machines](https://learn.microsoft.com/en-us/azure/virtual-network/troubleshoot-outbound-smtp-connectivity). Enable that port or use another sending method. - Use another set of SMTP credentials that you verify works in this setup. diff --git a/docs/docs/configuration/async-queries-celery.mdx b/docs/docs/configuration/async-queries-celery.mdx index b9e2763bc46dd..621d66bd369e5 100644 --- a/docs/docs/configuration/async-queries-celery.mdx +++ b/docs/docs/configuration/async-queries-celery.mdx @@ -42,13 +42,13 @@ CELERY_CONFIG = CeleryConfig To start a Celery worker to leverage the configuration, run the following command: -``` +```bash celery --app=superset.tasks.celery_app:app worker --pool=prefork -O fair -c 4 ``` To start a job which schedules periodic background jobs, run the following command: -``` +```bash celery --app=superset.tasks.celery_app:app beat ``` @@ -93,12 +93,12 @@ issues arise. Please clear your existing results cache store when upgrading an e Flower is a web based tool for monitoring the Celery cluster which you can install from pip: -```python +```bash pip install flower ``` You can run flower using: -``` +```bash celery --app=superset.tasks.celery_app:app flower ``` diff --git a/docs/docs/configuration/cache.mdx b/docs/docs/configuration/cache.mdx index 6d761c56b7113..3a925ccc4ddc6 100644 --- a/docs/docs/configuration/cache.mdx +++ b/docs/docs/configuration/cache.mdx @@ -17,6 +17,7 @@ Caching can be configured by providing dictionaries in `superset_config.py` that comply with [the Flask-Caching config specifications](https://flask-caching.readthedocs.io/en/latest/#configuring-flask-caching). The following cache configurations can be customized in this way: + - Dashboard filter state (required): `FILTER_STATE_CACHE_CONFIG`. - Explore chart form data (required): `EXPLORE_FORM_DATA_CACHE_CONFIG` - Metadata cache (optional): `CACHE_CONFIG` @@ -100,7 +101,6 @@ THUMBNAIL_SELENIUM_USER = "admin" THUMBNAIL_EXECUTE_AS = [ExecutorType.SELENIUM] ``` - For this feature you will need a cache system and celery workers. All thumbnails are stored on cache and are processed asynchronously by the workers. diff --git a/docs/docs/configuration/configuring-superset.mdx b/docs/docs/configuration/configuring-superset.mdx index 4f76f258e62f9..d5924f4128fc5 100644 --- a/docs/docs/configuration/configuring-superset.mdx +++ b/docs/docs/configuration/configuring-superset.mdx @@ -117,7 +117,7 @@ Your deployment must use a complex, unique key. ### Rotating to a newer SECRET_KEY If you wish to change your existing SECRET_KEY, add the existing SECRET_KEY to your `superset_config.py` file as -`PREVIOUS_SECRET_KEY = `and provide your new key as `SECRET_KEY =`. You can find your current SECRET_KEY with these +`PREVIOUS_SECRET_KEY =`and provide your new key as `SECRET_KEY =`. You can find your current SECRET_KEY with these commands - if running Superset with Docker, execute from within the Superset application container: ```python @@ -300,6 +300,7 @@ CUSTOM_SECURITY_MANAGER = CustomSsoSecurityManager - If an OAuth2 authorization server supports OpenID Connect 1.0, you could configure its configuration document URL only without providing `api_base_url`, `access_token_url`, `authorize_url` and other required options like user info endpoint, jwks uri etc. For instance: + ```python OAUTH_PROVIDERS = [ { 'name':'egaSSO', @@ -313,12 +314,15 @@ CUSTOM_SECURITY_MANAGER = CustomSsoSecurityManager } ] ``` + ### Keycloak-Specific Configuration using Flask-OIDC + If you are using Keycloak as OpenID Connect 1.0 Provider, the above configuration based on [`Authlib`](https://authlib.org/) might not work. In this case using [`Flask-OIDC`](https://pypi.org/project/flask-oidc/) is a viable option. Make sure the pip package [`Flask-OIDC`](https://pypi.org/project/flask-oidc/) is installed on the webserver. This was succesfully tested using version 2.2.0. This package requires [`Flask-OpenID`](https://pypi.org/project/Flask-OpenID/) as a dependency. The following code defines a new security manager. Add it to a new file named `keycloak_security_manager.py`, placed in the same directory as your `superset_config.py` file. + ```python from flask_appbuilder.security.manager import AUTH_OID from superset.security import SupersetSecurityManager @@ -373,7 +377,9 @@ class AuthOIDCView(AuthOIDView): return redirect( oidc.client_secrets.get('issuer') + '/protocol/openid-connect/logout?redirect_uri=' + quote(redirect_url)) ``` + Then add to your `superset_config.py` file: + ```python from keycloak_security_manager import OIDCSecurityManager from flask_appbuilder.security.manager import AUTH_OID, AUTH_REMOTE_USER, AUTH_DB, AUTH_LDAP, AUTH_OAUTH @@ -393,7 +399,9 @@ AUTH_USER_REGISTRATION = True # The default user self registration role AUTH_USER_REGISTRATION_ROLE = 'Public' ``` + Store your client-specific OpenID information in a file called `client_secret.json`. Create this file in the same directory as `superset_config.py`: + ```json { "": { @@ -410,6 +418,7 @@ Store your client-specific OpenID information in a file called `client_secret.js } } ``` + ## LDAP Authentication FAB supports authenticating user credentials against an LDAP server. @@ -432,6 +441,7 @@ AUTH_ROLES_MAPPING = { "superset_admins": ["Admin"], } ``` + ### Mapping LDAP groups to Superset roles The following `AUTH_ROLES_MAPPING` dictionary would map the LDAP DN "cn=superset_users,ou=groups,dc=example,dc=com" to the Superset roles "Gamma" as well as "Alpha", and the LDAP DN "cn=superset_admins,ou=groups,dc=example,dc=com" to the Superset role "Admin". @@ -442,6 +452,7 @@ AUTH_ROLES_MAPPING = { "cn=superset_admins,ou=groups,dc=example,dc=com": ["Admin"], } ``` + Note: This requires `AUTH_LDAP_SEARCH` to be set. For more details, please see the [FAB Security documentation](https://flask-appbuilder.readthedocs.io/en/latest/security.html). ### Syncing roles at login diff --git a/docs/docs/configuration/databases.mdx b/docs/docs/configuration/databases.mdx index 8f69cc8d6f670..44b260bee7279 100644 --- a/docs/docs/configuration/databases.mdx +++ b/docs/docs/configuration/databases.mdx @@ -31,16 +31,15 @@ install new database drivers into your Superset configuration. ### Supported Databases and Dependencies - Some of the recommended packages are shown below. Please refer to [pyproject.toml](https://github.com/apache/superset/blob/master/pyproject.toml) for the versions that are compatible with Superset. |
Database
| PyPI package | Connection String | | --------------------------------------------------------- | ---------------------------------------------------------------------------------- | ------------------------------------------------------------------------------------------------------------------------------------------------------ | -| [AWS Athena](/docs/configuration/databases#aws-athena) | `pip install pyathena[pandas]` , `pip install PyAthenaJDBC` | `awsathena+rest://{access_key_id}:{access_key}@athena.{region}.amazonaws.com/{schema}?s3_staging_dir={s3_staging_dir}&... ` | +| [AWS Athena](/docs/configuration/databases#aws-athena) | `pip install pyathena[pandas]` , `pip install PyAthenaJDBC` | `awsathena+rest://{access_key_id}:{access_key}@athena.{region}.amazonaws.com/{schema}?s3_staging_dir={s3_staging_dir}&...` | | [AWS DynamoDB](/docs/configuration/databases#aws-dynamodb) | `pip install pydynamodb` | `dynamodb://{access_key_id}:{secret_access_key}@dynamodb.{region_name}.amazonaws.com?connector=superset` | -| [AWS Redshift](/docs/configuration/databases#aws-redshift) | `pip install sqlalchemy-redshift` | ` redshift+psycopg2://:@:5439/` | +| [AWS Redshift](/docs/configuration/databases#aws-redshift) | `pip install sqlalchemy-redshift` | `redshift+psycopg2://:@:5439/` | | [Apache Doris](/docs/configuration/databases#apache-doris) | `pip install pydoris` | `doris://:@:/.` | | [Apache Drill](/docs/configuration/databases#apache-drill) | `pip install sqlalchemy-drill` | `drill+sadrill:// For JDBC drill+jdbc://` | | [Apache Druid](/docs/configuration/databases#apache-druid) | `pip install pydruid` | `druid://:@:/druid/v2/sql` | @@ -81,6 +80,7 @@ are compatible with Superset. | [Trino](/docs/configuration/databases#trino) | `pip install trino` | `trino://{username}:{password}@{hostname}:{port}/{catalog}` | | [Vertica](/docs/configuration/databases#vertica) | `pip install sqlalchemy-vertica-python` | `vertica+vertica_python://:@/` | | [YugabyteDB](/docs/configuration/databases#yugabytedb) | `pip install psycopg2` | `postgresql://:@/` | + --- Note that many other databases are supported, the main criteria being the existence of a functional @@ -181,7 +181,6 @@ purposes of isolating the problem. Repeat this process for each type of database you want Superset to connect to. - ### Database-specific Instructions #### Ascend.io @@ -207,14 +206,12 @@ You'll need the following setting values to form the connection string: - **Catalog**: Catalog Name - **Database**: Database Name - Here's what the connection string looks like: ``` doris://:@:/. ``` - #### AWS Athena ##### PyAthenaJDBC @@ -244,6 +241,7 @@ awsathena+rest://{aws_access_key_id}:{aws_secret_access_key}@athena.{region_name ``` The PyAthena library also allows to assume a specific IAM role which you can define by adding following parameters in Superset's Athena database connection UI under ADVANCED --> Other --> ENGINE PARAMETERS. + ```json { "connect_args": { @@ -266,7 +264,6 @@ dynamodb://{aws_access_key_id}:{aws_secret_access_key}@dynamodb.{region_name}.am To get more documentation, please visit: [PyDynamoDB WIKI](https://github.com/passren/PyDynamoDB/wiki/5.-Superset). - #### AWS Redshift The [sqlalchemy-redshift](https://pypi.org/project/sqlalchemy-redshift/) library is the recommended @@ -282,7 +279,6 @@ You'll need to set the following values to form the connection string: - **Database Name**: Database Name - **Port**: default 5439 - ##### psycopg2 Here's what the SQLALCHEMY URI looks like: @@ -291,7 +287,6 @@ Here's what the SQLALCHEMY URI looks like: redshift+psycopg2://:@:5439/ ``` - ##### redshift_connector Here's what the SQLALCHEMY URI looks like: @@ -300,8 +295,7 @@ Here's what the SQLALCHEMY URI looks like: redshift+redshift_connector://:@:5439/ ``` - -###### Using IAM-based credentials with Redshift cluster: +###### Using IAM-based credentials with Redshift cluster [Amazon redshift cluster](https://docs.aws.amazon.com/redshift/latest/mgmt/working-with-clusters.html) also supports generating temporary IAM-based database user credentials. @@ -312,10 +306,10 @@ You have to define the following arguments in Superset's redshift database conne ``` {"connect_args":{"iam":true,"database":"","cluster_identifier":"","db_user":""}} ``` -and SQLALCHEMY URI should be set to `redshift+redshift_connector://` +and SQLALCHEMY URI should be set to `redshift+redshift_connector://` -###### Using IAM-based credentials with Redshift serverless: +###### Using IAM-based credentials with Redshift serverless [Redshift serverless](https://docs.aws.amazon.com/redshift/latest/mgmt/serverless-whatis.html) supports connection using IAM roles. @@ -327,8 +321,6 @@ You have to define the following arguments in Superset's redshift database conne {"connect_args":{"iam":true,"is_serverless":true,"serverless_acct_id":"","serverless_work_group":"","database":"","user":"IAMR:"}} ``` - - #### ClickHouse To use ClickHouse with Superset, you will need to install the `clickhouse-connect` Python library: @@ -361,8 +353,6 @@ uses the default user without a password (and doesn't encrypt the connection): clickhousedb://localhost/default ``` - - #### CockroachDB The recommended connector library for CockroachDB is @@ -374,13 +364,12 @@ The expected connection string is formatted as follows: cockroachdb://root@{hostname}:{port}/{database}?sslmode=disable ``` - - #### Couchbase The Couchbase's Superset connection is designed to support two services: Couchbase Analytics and Couchbase Columnar. The recommended connector library for couchbase is [couchbase-sqlalchemy](https://github.com/couchbase/couchbase-sqlalchemy). + ``` pip install couchbase-sqlalchemy ``` @@ -391,7 +380,6 @@ The expected connection string is formatted as follows: couchbase://{username}:{password}@{hostname}:{port}?truststorepath={certificate path}?ssl={true/false} ``` - #### CrateDB The recommended connector library for CrateDB is @@ -410,7 +398,6 @@ The expected connection string is formatted as follows: crate://crate@127.0.0.1:4200 ``` - #### Databend The recommended connector library for Databend is [databend-sqlalchemy](https://pypi.org/project/databend-sqlalchemy/). @@ -428,7 +415,6 @@ Here's a connection string example of Superset connecting to a Databend database databend://user:password@localhost:8000/default?secure=false ``` - #### Databricks Databricks now offer a native DB API 2.0 driver, `databricks-sql-connector`, that can be used with the `sqlalchemy-databricks` dialect. You can install both with: @@ -512,7 +498,6 @@ For a connection to a SQL endpoint you need to use the HTTP path from the endpoi {"connect_args": {"http_path": "/sql/1.0/endpoints/****", "driver_path": "/path/to/odbc/driver"}} ``` - #### Denodo The recommended connector library for Denodo is @@ -524,7 +509,6 @@ The expected connection string is formatted as follows (default port is 9996): denodo://{username}:{password}@{hostname}:{port}/{database} ``` - #### Dremio The recommended connector library for Dremio is @@ -545,7 +529,6 @@ dremio+flight://{username}:{password}@{host}:{port}/dremio This [blog post by Dremio](https://www.dremio.com/tutorials/dremio-apache-superset/) has some additional helpful instructions on connecting Superset to Dremio. - #### Apache Drill ##### SQLAlchemy @@ -587,8 +570,6 @@ We recommend reading the the [GitHub README](https://github.com/JohnOmernik/sqlalchemy-drill#usage-with-odbc) to learn how to work with Drill through ODBC. - - import useBaseUrl from "@docusaurus/useBaseUrl"; #### Apache Druid @@ -602,6 +583,7 @@ The connection string looks like: ``` druid://:@:/druid/v2/sql ``` + Here's a breakdown of the key components of this connection string: - `User`: username portion of the credentials needed to connect to your database @@ -630,7 +612,7 @@ To disable SSL verification, add the following to the **Extras** field: ``` engine_params: {"connect_args": - {"scheme": "https", "ssl_verify_cert": false}} + {"scheme": "https", "ssl_verify_cert": false}} ``` ##### Aggregations @@ -654,7 +636,6 @@ much like you would create an aggregation manually, but specify `postagg` as a ` then have to provide a valid json post-aggregation definition (as specified in the Druid docs) in the JSON field. - #### Elasticsearch The recommended connector library for Elasticsearch is @@ -703,7 +684,7 @@ Then register your table with the alias name logstash_all By default, Superset uses UTC time zone for elasticsearch query. If you need to specify a time zone, please edit your Database and enter the settings of your specified time zone in the Other > ENGINE PARAMETERS: -``` +```json { "connect_args": { "time_zone": "Asia/Shanghai" @@ -725,8 +706,6 @@ To disable SSL verification, add the following to the **SQLALCHEMY URI** field: elasticsearch+https://{user}:{password}@{host}:9200/?verify_certs=False ``` - - #### Exasol The recommended connector library for Exasol is @@ -738,7 +717,6 @@ The connection string for Exasol looks like this: exa+pyodbc://{username}:{password}@{hostname}:{port}/my_schema?CONNECTIONLCALL=en_US.UTF-8&driver=EXAODBC ``` - #### Firebird The recommended connector library for Firebird is [sqlalchemy-firebird](https://pypi.org/project/sqlalchemy-firebird/). @@ -756,7 +734,6 @@ Here's a connection string example of Superset connecting to a local Firebird da firebird+fdb://SYSDBA:masterkey@192.168.86.38:3050//Library/Frameworks/Firebird.framework/Versions/A/Resources/examples/empbuild/employee.fdb ``` - #### Firebolt The recommended connector library for Firebolt is [firebolt-sqlalchemy](https://pypi.org/project/firebolt-sqlalchemy/). @@ -787,7 +764,7 @@ The recommended connector library for BigQuery is Follow the steps [here](/docs/configuration/databases#installing-drivers-in-docker-images) about how to install new database drivers when setting up Superset locally via docker compose. -``` +```bash echo "sqlalchemy-bigquery" >> ./docker/requirements-local.txt ``` @@ -800,7 +777,7 @@ credentials file (as a JSON). appropriate BigQuery datasets, and download the JSON configuration file for the service account. 2. In Superset, you can either upload that JSON or add the JSON blob in the following format (this should be the content of your credential JSON file): -``` +```json { "type": "service_account", "project_id": "...", @@ -828,7 +805,7 @@ credentials file (as a JSON). Go to the **Advanced** tab, Add a JSON blob to the **Secure Extra** field in the database configuration form with the following format: - ``` + ```json { "credentials_info": } @@ -836,7 +813,7 @@ credentials file (as a JSON). The resulting file should have this structure: - ``` + ```json { "credentials_info": { "type": "service_account", @@ -863,8 +840,6 @@ To be able to upload CSV or Excel files to BigQuery in Superset, you'll need to Currently, the Google BigQuery Python SDK is not compatible with `gevent`, due to some dynamic monkeypatching on python core library by `gevent`. So, when you deploy Superset with `gunicorn` server, you have to use worker type except `gevent`. - - #### Google Sheets Google Sheets has a very limited @@ -875,7 +850,6 @@ There are a few steps involved in connecting Superset to Google Sheets. This [tutorial](https://preset.io/blog/2020-06-01-connect-superset-google-sheets/) has the most up to date instructions on setting up this connection. - #### Hana The recommended connector library is [sqlalchemy-hana](https://github.com/SAP/sqlalchemy-hana). @@ -886,7 +860,6 @@ The connection string is formatted as follows: hana://{username}:{password}@{host}:{port} ``` - #### Apache Hive The [pyhive](https://pypi.org/project/PyHive/) library is the recommended way to connect to Hive through SQLAlchemy. @@ -897,7 +870,6 @@ The expected connection string is formatted as follows: hive://hive@{hostname}:{port}/{database} ``` - #### Hologres Hologres is a real-time interactive analytics service developed by Alibaba Cloud. It is fully compatible with PostgreSQL 11 and integrates seamlessly with the big data ecosystem. @@ -916,7 +888,6 @@ The connection string looks like: postgresql+psycopg2://{username}:{password}@{host}:{port}/{database} ``` - #### IBM DB2 The [IBM_DB_SA](https://github.com/ibmdb/python-ibmdbsa/tree/master/ibm_db_sa) library provides a @@ -934,7 +905,6 @@ There are two DB2 dialect versions implemented in SQLAlchemy. If you are connect ibm_db_sa://{username}:{passport}@{hostname}:{port}/{database} ``` - #### Apache Impala The recommended connector library to Apache Impala is [impyla](https://github.com/cloudera/impyla). @@ -945,7 +915,6 @@ The expected connection string is formatted as follows: impala://{hostname}:{port}/{database} ``` - #### Kusto The recommended connector library for Kusto is @@ -966,7 +935,6 @@ kustokql+https://{cluster_url}/{database}?azure_ad_client_id={azure_ad_client_id Make sure the user has privileges to access and use all required databases/tables/views. - #### Apache Kylin The recommended connector library for Apache Kylin is @@ -978,10 +946,6 @@ The expected connection string is formatted as follows: kylin://:@:/?=&= ``` - - - - #### MySQL The recommended connector library for MySQL is [mysqlclient](https://pypi.org/project/mysqlclient/). @@ -1006,7 +970,6 @@ One problem with `mysqlclient` is that it will fail to connect to newer MySQL da mysql+mysqlconnector://{username}:{password}@{host}/{database} ``` - #### IBM Netezza Performance Server The [nzalchemy](https://pypi.org/project/nzalchemy/) library provides a @@ -1023,21 +986,19 @@ netezza+nzpy://{username}:{password}@{hostname}:{port}/{database} The [sqlalchemy-oceanbase](https://pypi.org/project/oceanbase_py/) library is the recommended way to connect to OceanBase through SQLAlchemy. - The connection string for OceanBase looks like this: ``` oceanbase://:@:/ ``` - #### Ocient DB The recommended connector library for Ocient is [sqlalchemy-ocient](https://pypi.org/project/sqlalchemy-ocient). ##### Install the Ocient Driver -``` +```bash pip install sqlalchemy-ocient ``` @@ -1060,8 +1021,6 @@ The connection string is formatted as follows: oracle://:@: ``` - - #### Apache Pinot The recommended connector library for Apache Pinot is [pinotdb](https://pypi.org/project/pinotdb/). @@ -1080,7 +1039,8 @@ pinot://:@:/query/sql? If you want to use explore view or joins, window functions, etc. then enable [multi-stage query engine](https://docs.pinot.apache.org/reference/multi-stage-engine). Add below argument while creating database connection in Advanced -> Other -> ENGINE PARAMETERS -``` + +```json {"connect_args":{"use_multistage_engine":"true"}} ``` @@ -1120,7 +1080,6 @@ More information about PostgreSQL connection options can be found in the and the [PostgreSQL docs](https://www.postgresql.org/docs/9.1/libpq-connect.html#LIBPQ-PQCONNECTDBPARAMS). - #### Presto The [pyhive](https://pypi.org/project/PyHive/) library is the recommended way to connect to Presto through SQLAlchemy. @@ -1146,7 +1105,7 @@ presto://datascientist:securepassword@presto.example.com:8080/hive By default Superset assumes the most recent version of Presto is being used when querying the datasource. If you’re using an older version of Presto, you can configure it in the extra parameter: -``` +```json { "version": "0.123" } @@ -1154,7 +1113,7 @@ datasource. If you’re using an older version of Presto, you can configure it i SSL Secure extra add json config to extra connection information. -``` +```json { "connect_args": {"protocol": "https", @@ -1163,8 +1122,6 @@ SSL Secure extra add json config to extra connection information. } ``` - - #### RisingWave The recommended connector library for RisingWave is @@ -1176,7 +1133,6 @@ The expected connection string is formatted as follows: risingwave://root@{hostname}:{port}/{database}?sslmode=disable ``` - #### Rockset The connection string for Rockset is: @@ -1196,7 +1152,6 @@ rockset://{api key}:@{api server}/{VI ID} For more complete instructions, we recommend the [Rockset documentation](https://docs.rockset.com/apache-superset/). - #### Snowflake ##### Install Snowflake Driver @@ -1204,7 +1159,7 @@ For more complete instructions, we recommend the [Rockset documentation](https:/ Follow the steps [here](/docs/configuration/databases#installing-database-drivers) about how to install new database drivers when setting up Superset locally via docker compose. -``` +```bash echo "snowflake-sqlalchemy" >> ./docker/requirements-local.txt ``` @@ -1237,7 +1192,7 @@ To connect Snowflake with Key Pair Authentication, you need to add the following ***Please note that you need to merge multi-line private key content to one line and insert `\n` between each line*** -``` +```json { "auth_method": "keypair", "auth_params": { @@ -1249,7 +1204,7 @@ To connect Snowflake with Key Pair Authentication, you need to add the following If your private key is stored on server, you can replace "privatekey_body" with “privatekey_path” in parameter. -``` +```json { "auth_method": "keypair", "auth_params": { @@ -1270,7 +1225,6 @@ The connection string for Solr looks like this: solr://{username}:{password}@{host}:{port}/{server_path}/{collection}[/?use_ssl=true|false] ``` - #### Apache Spark SQL The recommended connector library for Apache Spark SQL [pyhive](https://pypi.org/project/PyHive/). @@ -1294,6 +1248,7 @@ mssql+pymssql://:@:/ It is also possible to connect using [pyodbc](https://pypi.org/project/pyodbc) with the parameter [odbc_connect](https://docs.sqlalchemy.org/en/14/dialects/mssql.html#pass-through-exact-pyodbc-string) The connection string for SQL Server looks like this: + ``` mssql+pyodbc:///?odbc_connect=Driver%3D%7BODBC+Driver+17+for+SQL+Server%7D%3BServer%3Dtcp%3A%3Cmy_server%3E%2C1433%3BDatabase%3Dmy_database%3BUid%3Dmy_user_name%3BPwd%3Dmy_password%3BEncrypt%3Dyes%3BConnection+Timeout%3D30 ``` @@ -1339,7 +1294,7 @@ here: https://downloads.teradata.com/download/connectivity/odbc-driver/linux Here are the required environment variables: -``` +```bash export ODBCINI=/.../teradata/client/ODBC_64/odbc.ini export ODBCINST=/.../teradata/client/ODBC_64/odbcinst.ini ``` @@ -1348,8 +1303,8 @@ We recommend using the first library because of the lack of requirement around ODBC drivers and because it's more regularly updated. - #### TimescaleDB + [TimescaleDB](https://www.timescale.com) is the open-source relational database for time-series and analytics to build powerful data-intensive applications. TimescaleDB is a PostgreSQL extension, and you can use the standard PostgreSQL connector library, [psycopg2](https://www.psycopg.org/docs/), to connect to the database. @@ -1381,31 +1336,38 @@ postgresql://{username}:{password}@{host}:{port}/{database name}?sslmode=require [Learn more about TimescaleDB!](https://docs.timescale.com/) - #### Trino Supported trino version 352 and higher ##### Connection String + The connection string format is as follows: + ``` trino://{username}:{password}@{hostname}:{port}/{catalog} ``` If you are running Trino with docker on local machine, please use the following connection URL + ``` trino://trino@host.docker.internal:8080 ``` ##### Authentications + ###### 1. Basic Authentication + You can provide `username`/`password` in the connection string or in the `Secure Extra` field at `Advanced / Security` -* In Connection String + +- In Connection String + ``` trino://{username}:{password}@{hostname}:{port}/{catalog} ``` -* In `Secure Extra` field +- In `Secure Extra` field + ```json { "auth_method": "basic", @@ -1419,7 +1381,9 @@ You can provide `username`/`password` in the connection string or in the `Secure NOTE: if both are provided, `Secure Extra` always takes higher priority. ###### 2. Kerberos Authentication + In `Secure Extra` field, config as following example: + ```json { "auth_method": "kerberos", @@ -1436,7 +1400,9 @@ All fields in `auth_params` are passed directly to the [`KerberosAuthentication` NOTE: Kerberos authentication requires installing the [`trino-python-client`](https://github.com/trinodb/trino-python-client) locally with either the `all` or `kerberos` optional features, i.e., installing `trino[all]` or `trino[kerberos]` respectively. ###### 3. Certificate Authentication + In `Secure Extra` field, config as following example: + ```json { "auth_method": "certificate", @@ -1450,7 +1416,9 @@ In `Secure Extra` field, config as following example: All fields in `auth_params` are passed directly to the [`CertificateAuthentication`](https://github.com/trinodb/trino-python-client/blob/0.315.0/trino/auth.py#L416) class. ###### 4. JWT Authentication + Config `auth_method` and provide token in `Secure Extra` field + ```json { "auth_method": "jwt", @@ -1461,8 +1429,10 @@ Config `auth_method` and provide token in `Secure Extra` field ``` ###### 5. Custom Authentication + To use custom authentication, first you need to add it into `ALLOWED_EXTRA_AUTHENTICATIONS` allow list in Superset config file: + ```python from your.module import AuthClass from another.extra import auth_method @@ -1476,6 +1446,7 @@ ALLOWED_EXTRA_AUTHENTICATIONS: Dict[str, Dict[str, Callable[..., Any]]] = { ``` Then in `Secure Extra` field: + ```json { "auth_method": "custom_auth", @@ -1491,8 +1462,8 @@ or factory function (which returns an `Authentication` instance) to `auth_method All fields in `auth_params` are passed directly to your class/function. **Reference**: -* [Trino-Superset-Podcast](https://trino.io/episodes/12.html) +- [Trino-Superset-Podcast](https://trino.io/episodes/12.html) #### Vertica @@ -1519,7 +1490,6 @@ Other parameters: - Load Balancer - Backup Host - #### YugabyteDB [YugabyteDB](https://www.yugabyte.com/) is a distributed SQL database built on top of PostgreSQL. @@ -1534,8 +1504,6 @@ The connection string looks like: postgresql://{username}:{password}@{host}:{port}/{database} ``` - - ## Connecting through the UI Here is the documentation on how to leverage the new DB Connection UI. This will provide admins the ability to enhance the UX for users who want to connect to new databases. @@ -1608,9 +1576,6 @@ For databases like MySQL and Postgres that use the standard format of `engine+dr For other databases you need to implement these methods yourself. The BigQuery DB engine spec is a good example of how to do that. - - - ### Extra Database Settings ##### Deeper SQLAlchemy Integration @@ -1674,9 +1639,7 @@ You can use the `Extra` field in the **Edit Databases** form to configure SSL: } ``` - - -## Misc. +## Misc ### Querying across databases diff --git a/docs/docs/configuration/importing-exporting-datasources.mdx b/docs/docs/configuration/importing-exporting-datasources.mdx index 83495478ae53a..400d64590adf5 100644 --- a/docs/docs/configuration/importing-exporting-datasources.mdx +++ b/docs/docs/configuration/importing-exporting-datasources.mdx @@ -10,7 +10,7 @@ version: 1 The superset cli allows you to import and export datasources from and to YAML. Datasources include databases. The data is expected to be organized in the following hierarchy: -``` +```text ├──databases | ├──database_1 | | ├──table_1 @@ -30,13 +30,13 @@ databases. The data is expected to be organized in the following hierarchy: You can print your current datasources to stdout by running: -``` +```bash superset export_datasources ``` To save your datasources to a ZIP file run: -``` +```bash superset export_datasources -f ``` @@ -55,7 +55,7 @@ Alternatively, you can export datasources using the UI: In order to obtain an **exhaustive list of all fields** you can import using the YAML import run: -``` +```bash superset export_datasource_schema ``` @@ -65,13 +65,13 @@ As a reminder, you can use the `-b` flag to include back references. In order to import datasources from a ZIP file, run: -``` +```bash superset import_datasources -p ``` The optional username flag **-u** sets the user used for the datasource import. The default is 'admin'. Example: -``` +```bash superset import_datasources -p -u 'admin' ``` @@ -81,7 +81,7 @@ superset import_datasources -p -u 'admin' When using Superset version 4.x.x to import from an older version (2.x.x or 3.x.x) importing is supported as the command `legacy_import_datasources` and expects a JSON or directory of JSONs. The options are `-r` for recursive and `-u` for specifying a user. Example of legacy import without options: -``` +```bash superset legacy_import_datasources -p ``` @@ -89,21 +89,21 @@ superset legacy_import_datasources -p When using an older Superset version (2.x.x & 3.x.x) of Superset, the command is `import_datasources`. ZIP and YAML files are supported and to switch between them the feature flag `VERSIONED_EXPORT` is used. When `VERSIONED_EXPORT` is `True`, `import_datasources` expects a ZIP file, otherwise YAML. Example: -``` +```bash superset import_datasources -p ``` When `VERSIONED_EXPORT` is `False`, if you supply a path all files ending with **yaml** or **yml** will be parsed. You can apply additional flags (e.g. to search the supplied path recursively): -``` +```bash superset import_datasources -p -r ``` The sync flag **-s** takes parameters in order to sync the supplied elements with your file. Be careful this can delete the contents of your meta database. Example: -``` +```bash superset import_datasources -p -s columns,metrics ``` @@ -115,7 +115,7 @@ If you don’t supply the sync flag (**-s**) importing will only add and update E.g. you can add a verbose_name to the column ds in the table random_time_series from the example datasets by saving the following YAML to file and then running the **import_datasources** command. -``` +```yaml databases: - database_name: main tables: diff --git a/docs/docs/configuration/networking-settings.mdx b/docs/docs/configuration/networking-settings.mdx index 0e1f3c969f6d8..da759d5132367 100644 --- a/docs/docs/configuration/networking-settings.mdx +++ b/docs/docs/configuration/networking-settings.mdx @@ -20,14 +20,12 @@ The following keys in `superset_config.py` can be specified to configure CORS: - `CORS_OPTIONS`: options passed to Flask-CORS ([documentation](https://flask-cors.corydolphin.com/en/latest/api.html#extension)) - ## HTTP headers Note that Superset bundles [flask-talisman](https://pypi.org/project/talisman/) Self-described as a small Flask extension that handles setting HTTP headers that can help protect against a few common web application security issues. - ## HTML Embedding of Dashboards and Charts There are two ways to embed a dashboard: Using the [SDK](https://www.npmjs.com/package/@superset-ui/embedded-sdk) or embedding a direct link. Note that in the latter case everybody who knows the link is able to access the dashboard. @@ -39,14 +37,16 @@ This works by first changing the content security policy (CSP) of [flask-talisma #### Changing flask-talisman CSP Add to `superset_config.py` the entire `TALISMAN_CONFIG` section from `config.py` and include a `frame-ancestors` section: + ```python TALISMAN_ENABLED = True TALISMAN_CONFIG = { "content_security_policy": { ... - "frame-ancestors": ["*.my-domain.com", "*.another-domain.com"], + "frame-ancestors": ["*.my-domain.com", "*.another-domain.com"], ... ``` + Restart Superset for this configuration change to take effect. #### Making a Dashboard Public @@ -69,6 +69,7 @@ Now anybody can directly access the dashboard's URL. You can embed it in an ifra > ``` + #### Embedding a Chart A chart's embed code can be generated by going to a chart's edit view and then clicking at the top right on `...` > `Share` > `Embed code` @@ -89,7 +90,6 @@ Similarly, [flask-wtf](https://flask-wtf.readthedocs.io/en/0.15.x/config/) is us some CSRF configurations. If you need to exempt endpoints from CSRF (e.g. if you are running a custom auth postback endpoint), you can add the endpoints to `WTF_CSRF_EXEMPT_LIST`: - ## SSH Tunneling 1. Turn on feature flag @@ -105,7 +105,6 @@ running a custom auth postback endpoint), you can add the endpoints to `WTF_CSRF 3. Verify data is flowing - Once SSH tunneling has been enabled, go to SQL Lab and write a query to verify data is properly flowing. - ## Domain Sharding Chrome allows up to 6 open connections per domain at a time. When there are more than 6 slices in diff --git a/docs/docs/configuration/sql-templating.mdx b/docs/docs/configuration/sql-templating.mdx index 64caea2157579..f6b828ea6d3a3 100644 --- a/docs/docs/configuration/sql-templating.mdx +++ b/docs/docs/configuration/sql-templating.mdx @@ -74,6 +74,7 @@ In the UI you can assign a set of parameters as JSON "my_table": "foo" } ``` + The parameters become available in your SQL (example: `SELECT * FROM {{ my_table }}` ) by using Jinja templating syntax. SQL Lab template parameters are stored with the dataset as `TEMPLATE PARAMETERS`. @@ -100,7 +101,6 @@ GROUP BY action Note ``_filters`` is not stored with the dataset. It's only used within the SQL Lab UI. - Besides default Jinja templating, SQL lab also supports self-defined template processor by setting the `CUSTOM_TEMPLATE_PROCESSORS` in your superset configuration. The values in this dictionary overwrite the default Jinja template processors of the specified database engine. The example below @@ -183,7 +183,7 @@ cache hit in the future and Superset can retrieve cached data. You can disable the inclusion of the `username` value in the calculation of the cache key by adding the following parameter to your Jinja code: -``` +```python {{ current_username(add_to_cache_keys=False) }} ``` @@ -198,7 +198,7 @@ cache hit in the future and Superset can retrieve cached data. You can disable the inclusion of the account `id` value in the calculation of the cache key by adding the following parameter to your Jinja code: -``` +```python {{ current_user_id(add_to_cache_keys=False) }} ``` @@ -213,7 +213,7 @@ cache hit in the future and Superset can retrieve cached data. You can disable the inclusion of the email value in the calculation of the cache key by adding the following parameter to your Jinja code: -``` +```python {{ current_user_email(add_to_cache_keys=False) }} ``` @@ -298,7 +298,7 @@ This is useful if: Here's a concrete example: -``` +```sql WITH RECURSIVE superiors(employee_id, manager_id, full_name, level, lineage) AS ( SELECT @@ -354,6 +354,7 @@ considerably improve performance, as many databases and query engines are able t if the temporal filter is placed on the inner query, as opposed to the outer query. The macro takes the following parameters: + - `column`: Name of the temporal column. Leave undefined to reference the time range from a Dashboard Native Time Range filter (when present). - `default`: The default value to fall back to if the time filter is not present, or has the value `No filter` @@ -367,6 +368,7 @@ The macro takes the following parameters: filter should only apply to the inner query. The return type has the following properties: + - `from_expr`: the start of the time filter (if any) - `to_expr`: the end of the time filter (if any) - `time_range`: The applied time range @@ -407,6 +409,7 @@ LIMIT 1000; When using the `default` parameter, the templated query can be simplified, as the endpoints will always be defined (to use a fixed time range, you can also use something like `default="2024-08-27 : 2024-09-03"`) + ``` {% set time_filter = get_time_filter("dttm", default="Last week", remove_filter=True) %} SELECT @@ -426,19 +429,19 @@ To use the macro, first you need to find the ID of the dataset. This can be done Once you have the ID you can query it as if it were a table: -``` +```sql SELECT * FROM {{ dataset(42) }} LIMIT 10 ``` If you want to select the metric definitions as well, in addition to the columns, you need to pass an additional keyword argument: -``` +```sql SELECT * FROM {{ dataset(42, include_metrics=True) }} LIMIT 10 ``` Since metrics are aggregations, the resulting SQL expression will be grouped by all non-metric columns. You can specify a subset of columns to group by instead: -``` +```sql SELECT * FROM {{ dataset(42, include_metrics=True, columns=["ds", "category"]) }} LIMIT 10 ``` diff --git a/docs/docs/configuration/timezones.mdx b/docs/docs/configuration/timezones.mdx index 3a23a667dcdfe..233e4786fcd66 100644 --- a/docs/docs/configuration/timezones.mdx +++ b/docs/docs/configuration/timezones.mdx @@ -24,7 +24,7 @@ The challenge however lies with the slew of [database engines](/docs/configurati For example the following is a comparison of MySQL and Presto, -``` +```python import pandas as pd from sqlalchemy import create_engine @@ -41,7 +41,7 @@ pd.read_sql_query( which outputs `{"ts":{"0":1640995200000}}` (which infers the UTC timezone per the Epoch time definition) and `{"ts":{"0":"2022-01-01 00:00:00.000"}}` (without an explicit timezone) respectively and thus are treated differently in JavaScript: -``` +```js new Date(1640995200000) > Sat Jan 01 2022 13:00:00 GMT+1300 (New Zealand Daylight Time) diff --git a/docs/docs/contributing/development.mdx b/docs/docs/contributing/development.mdx index cb48a4c7b29b4..a38fe3fe4ca89 100644 --- a/docs/docs/contributing/development.mdx +++ b/docs/docs/contributing/development.mdx @@ -219,22 +219,22 @@ If you have made changes to the FAB-managed templates, which are not built the s If you add a new requirement or update an existing requirement (per the `install_requires` section in `setup.py`) you must recompile (freeze) the Python dependencies to ensure that for CI, testing, etc. the build is deterministic. This can be achieved via, ```bash -$ python3 -m venv venv -$ source venv/bin/activate -$ python3 -m pip install -r requirements/development.txt -$ pip-compile-multi --no-upgrade +python3 -m venv venv +source venv/bin/activate +python3 -m pip install -r requirements/development.txt +pip-compile-multi --no-upgrade ``` When upgrading the version number of a single package, you should run `pip-compile-multi` with the `-P` flag: ```bash -$ pip-compile-multi -P my-package +pip-compile-multi -P my-package ``` To bring all dependencies up to date as per the restrictions defined in `setup.py` and `requirements/*.in`, run pip-compile-multi` without any flags: ```bash -$ pip-compile-multi +pip-compile-multi ``` This should be done periodically, but it is recommended to do thorough manual testing of the application to ensure no breaking changes have been introduced that aren't caught by the unit and integration tests. @@ -773,7 +773,7 @@ To debug Flask running in POD inside a kubernetes cluster, you'll need to make s add: ["SYS_PTRACE"] ``` -See (set capabilities for a container)[https://kubernetes.io/docs/tasks/configure-pod-container/security-context/#set-capabilities-for-a-container] for more details. +See [set capabilities for a container](https://kubernetes.io/docs/tasks/configure-pod-container/security-context/#set-capabilities-for-a-container) for more details. Once the pod is running as root and has the SYS_PTRACE capability it will be able to debug the Flask app. diff --git a/docs/docs/contributing/misc.mdx b/docs/docs/contributing/misc.mdx index bf68ea1edcbae..61c79c0a8dfd1 100644 --- a/docs/docs/contributing/misc.mdx +++ b/docs/docs/contributing/misc.mdx @@ -3,7 +3,7 @@ sidebar_position: 6 version: 1 --- -# Misc. +# Misc ## Reporting a Security Vulnerability diff --git a/docs/docs/contributing/resources.mdx b/docs/docs/contributing/resources.mdx index f533865b17560..74a48b02117a8 100644 --- a/docs/docs/contributing/resources.mdx +++ b/docs/docs/contributing/resources.mdx @@ -7,7 +7,7 @@ import InteractiveSVG from '../../src/components/InteractiveERDSVG'; # Resources -## Entity-Relationship Diagram +## Entity-Relationship Diagram Here is our interactive ERD: diff --git a/docs/docs/faq.mdx b/docs/docs/faq.mdx index e7b166bb212fa..a329210ea5503 100644 --- a/docs/docs/faq.mdx +++ b/docs/docs/faq.mdx @@ -66,7 +66,7 @@ For running long query from Sql Lab, by default Superset allows it run as long a being killed by celery. If you want to increase the time for running query, you can specify the timeout in configuration. For example: -``` +```python SQLLAB_ASYNC_TIME_LIMIT_SEC = 60 * 60 * 6 ``` @@ -78,7 +78,7 @@ come back within client-side timeout (60 seconds by default), Superset will disp to avoid gateway timeout message. If you have a longer gateway timeout limit, you can change the timeout settings in **superset_config.py**: -``` +```python SUPERSET_WEBSERVER_TIMEOUT = 60 ``` @@ -87,7 +87,7 @@ SUPERSET_WEBSERVER_TIMEOUT = 60 You need to register a free account at [Mapbox.com](https://www.mapbox.com), obtain an API key, and add it to **.env** at the key MAPBOX_API_KEY: -``` +```python MAPBOX_API_KEY = "longstringofalphanumer1c" ``` @@ -99,7 +99,7 @@ refreshed - especially if some data is slow moving, or run heavy queries. To exc from the timed refresh process, add the `timed_refresh_immune_slices` key to the dashboard JSON Metadata field: -``` +```json { "filter_immune_slices": [], "expanded_slices": {}, @@ -115,7 +115,7 @@ Slice refresh will also be staggered over the specified period. You can turn off setting the `stagger_refresh` to false and modify the stagger period by setting `stagger_time` to a value in milliseconds in the JSON Metadata field: -``` +```json { "stagger_refresh": false, "stagger_time": 2500 @@ -137,7 +137,7 @@ You can override this path using the **SUPERSET_HOME** environment variable. Another workaround is to change where superset stores the sqlite database by adding the following in `superset_config.py`: -``` +```python SQLALCHEMY_DATABASE_URI = 'sqlite:////new/location/superset.db?check_same_thread=false' ``` @@ -157,12 +157,12 @@ table afterwards to configure the Columns tab, check the appropriate boxes and s To clarify, the database backend is an OLTP database used by Superset to store its internal information like your list of users and dashboard definitions. While Superset supports a -[variety of databases as data *sources*](/docs/configuration/databases#installing-database-drivers), +[variety of databases as data _sources_](/docs/configuration/databases#installing-database-drivers), only a few database engines are supported for use as the OLTP backend / metadata store. Superset is tested using MySQL, PostgreSQL, and SQLite backends. It’s recommended you install Superset on one of these database servers for production. Installation on other OLTP databases -may work but isn’t tested. It has been reported that [Microsoft SQL Server does *not* +may work but isn’t tested. It has been reported that [Microsoft SQL Server does _not_ work as a Superset backend](https://github.com/apache/superset/issues/18961). Column-store, non-OLTP databases are not designed for this type of workload. @@ -236,7 +236,7 @@ made to cover more and more use cases. The API available is documented using [Swagger](https://swagger.io/) and the documentation can be made available under **/swagger/v1** by enabling the following flag in `superset_config.py`: -``` +```python FAB_API_SWAGGER_UI = True ``` diff --git a/docs/docs/installation/architecture.mdx b/docs/docs/installation/architecture.mdx index 85b42398b8978..a08aa471b4e54 100644 --- a/docs/docs/installation/architecture.mdx +++ b/docs/docs/installation/architecture.mdx @@ -14,6 +14,7 @@ This page is meant to give new administrators an understanding of Superset's com ## Components A Superset installation is made up of these components: + 1. The Superset application itself 2. A metadata database 3. A caching layer (optional, but necessary for some features) @@ -22,6 +23,7 @@ A Superset installation is made up of these components: ### Optional components and associated features The optional components above are necessary to enable these features: + - [Alerts and Reports](/docs/configuration/alerts-reports) - [Caching](/docs/configuration/cache) - [Async Queries](/docs/configuration/async-queries-celery/) @@ -36,6 +38,7 @@ Here are further details on each component. ### The Superset Application This is the core application. Superset operates like this: + - A user visits a chart or dashboard - That triggers a SQL query to the data warehouse holding the underlying dataset - The resulting data is served up in a data visualization @@ -52,6 +55,7 @@ For production, a properly-configured, managed, standalone database is recommend ### Caching Layer The caching layer serves two main functions: + - Store the results of queries to your data warehouse so that when a chart is loaded twice, it pulls from the cache the second time, speeding up the application and reducing load on your data warehouse. - Act as a message broker for the worker, enabling the Alerts & Reports, async queries, and thumbnail caching features. diff --git a/docs/docs/installation/docker-builds.mdx b/docs/docs/installation/docker-builds.mdx index 0cdb8c9990bc6..436e9ed41b07d 100644 --- a/docs/docs/installation/docker-builds.mdx +++ b/docs/docs/installation/docker-builds.mdx @@ -59,14 +59,13 @@ Here are the build presets that are exposed through the `build_docker.py` script this specific SHA, which could be from a `master` merge, or release. - `websocket-latest`: The WebSocket image for use in a Superset cluster. - - For insights or modifications to the build matrix and tagging conventions, check the [build_docker.py](https://github.com/apache/superset/blob/master/scripts/build_docker.py) script and the [docker.yml](https://github.com/apache/superset/blob/master/.github/workflows/docker.yml) GitHub action. ## Key ARGs in Dockerfile + - `BUILD_TRANSLATIONS`: whether to build the translations into the image. For the frontend build this tells webpack to strip out all locales other than `en` from the `moment-timezone` library. For the backendthis skips compiling the diff --git a/docs/docs/installation/kubernetes.mdx b/docs/docs/installation/kubernetes.mdx index 6cb2096584fec..986d917fbc86d 100644 --- a/docs/docs/installation/kubernetes.mdx +++ b/docs/docs/installation/kubernetes.mdx @@ -35,7 +35,7 @@ helm repo add superset https://apache.github.io/superset "superset" has been added to your repositories ``` -2. View charts in repo +1. View charts in repo ```sh helm search repo superset @@ -43,7 +43,7 @@ NAME CHART VERSION APP VERSION DESCRIPTION superset/superset 0.1.1 1.0 Apache Superset is a modern, enterprise-ready b... ``` -3. Configure your setting overrides +1. Configure your setting overrides Just like any typical Helm chart, you'll need to craft a `values.yaml` file that would define/override any of the values exposed into the default [values.yaml](https://github.com/apache/superset/tree/master/helm/superset/values.yaml), or from any of the dependent charts it depends on: @@ -52,7 +52,7 @@ Just like any typical Helm chart, you'll need to craft a `values.yaml` file that More info down below on some important overrides you might need. -4. Install and run +1. Install and run ```sh helm upgrade --install --values my-values.yaml superset superset/superset @@ -154,6 +154,7 @@ See [Install Database Drivers](/docs/configuration/databases) for more informati ::: The following example installs the drivers for BigQuery and Elasticsearch, allowing you to connect to these data sources within your Superset setup: + ```yaml bootstrapScript: | #!/bin/bash diff --git a/docs/docs/quickstart.mdx b/docs/docs/quickstart.mdx index be548c65c74e0..f3fba6bfa395c 100644 --- a/docs/docs/quickstart.mdx +++ b/docs/docs/quickstart.mdx @@ -22,7 +22,7 @@ page. ### 1. Get Superset ```bash -$ git clone https://github.com/apache/superset +git clone https://github.com/apache/superset ``` ### 2. Start the latest official release of Superset @@ -58,7 +58,7 @@ password: admin Once you're done with Superset, you can stop and delete just like any other container environment: ```bash -$ docker compose down +docker compose down ``` :::tip diff --git a/docs/docs/security/security.mdx b/docs/docs/security/security.mdx index 5425e7368c05a..0c60ef229c378 100644 --- a/docs/docs/security/security.mdx +++ b/docs/docs/security/security.mdx @@ -224,17 +224,17 @@ this warning using the `CONTENT_SECURITY_POLICY_WARNING` key in `config.py`. #### CSP Requirements -* Superset needs the `style-src unsafe-inline` CSP directive in order to operate. +- Superset needs the `style-src unsafe-inline` CSP directive in order to operate. ``` style-src 'self' 'unsafe-inline' ``` -* Only scripts marked with a [nonce](https://content-security-policy.com/nonce/) can be loaded and executed. +- Only scripts marked with a [nonce](https://content-security-policy.com/nonce/) can be loaded and executed. Nonce is a random string automatically generated by Talisman on each page load. You can get current nonce value by calling jinja macro `csp_nonce()`. - ``` + ```html @@ -253,17 +253,16 @@ You can get current nonce value by calling jinja macro `csp_nonce()`. connect-src 'self' https://api.mapbox.com https://events.mapbox.com ``` -* Other CSP directives default to `'self'` to limit content to the same origin as the Superset server. +- Other CSP directives default to `'self'` to limit content to the same origin as the Superset server. In order to adjust provided CSP configuration to your needs, follow the instructions and examples provided in [Content Security Policy Reference](https://content-security-policy.com/) - #### Other Talisman security considerations Setting `TALISMAN_ENABLED = True` will invoke Talisman's protection with its default arguments, of which `content_security_policy` is only one. Those can be found in the -[Talisman documentation](https://pypi.org/project/flask-talisman/) under _Options_. +[Talisman documentation](https://pypi.org/project/flask-talisman/) under *Options*. These generally improve security, but administrators should be aware of their existence. In particular, the option of `force_https = True` (`False` by default) may break Superset's Alerts & Reports diff --git a/docs/docs/using-superset/creating-your-first-dashboard.mdx b/docs/docs/using-superset/creating-your-first-dashboard.mdx index 8a7343d7a8336..8a52258fe7bc5 100644 --- a/docs/docs/using-superset/creating-your-first-dashboard.mdx +++ b/docs/docs/using-superset/creating-your-first-dashboard.mdx @@ -48,7 +48,6 @@ Please note, if you are trying to connect to another locally running database (w Once you've clicked that link you only need to specify two things (the database name and SQLAlchemy URI): - {" "}

As noted in the text below the form, you should refer to the SQLAlchemy documentation on @@ -104,7 +103,7 @@ Aggregate functions are allowed and encouraged for metrics. You can also certify metrics if you'd like for your team in this view. -2. Virtual calculated columns: you can write SQL queries that +1. Virtual calculated columns: you can write SQL queries that customize the appearance and behavior of a specific column (e.g. `CAST(recovery_rate as float)`). Aggregate functions aren't allowed in calculated columns. @@ -179,8 +178,8 @@ of other table configuration and visualization options, so please start explorin slices and dashboards of your own ֿ -### Manage access to Dashboards +### Manage access to Dashboards Access to dashboards is managed via owners (users that have edit permissions to the dashboard) @@ -196,6 +195,7 @@ Non-owner users access can be managed two different ways: ### Customizing dashboard The following URL parameters can be used to modify how the dashboard is rendered: + - `standalone`: - `0` (default): dashboard is displayed normally - `1`: Top Navigation is hidden diff --git a/docs/docs/using-superset/exploring-data.mdx b/docs/docs/using-superset/exploring-data.mdx index 90bbf0727aa42..c2adabeae0811 100644 --- a/docs/docs/using-superset/exploring-data.mdx +++ b/docs/docs/using-superset/exploring-data.mdx @@ -13,7 +13,7 @@ In this tutorial, we will introduce key concepts in Apache Superset through the real dataset which contains the flights made by employees of a UK-based organization in 2011. The following information about each flight is given: -- The traveller’s department. For the purposes of this tutorial the departments have been renamed +- The traveler’s department. For the purposes of this tutorial the departments have been renamed Orange, Yellow and Purple. - The cost of the ticket. - The travel class (Economy, Premium Economy, Business and First Class).