Docs > SQL 101
SQL is a declarative language for accessing, updating, and transforming data.
- Table - A dataset defined by one or more columns, having zero or more rows.
- Column - A field or attribute in a dataset which has a specific name and data type.
- Row - A single entry in a table dataset.
- SELECT - The command for pulling one or more rows data out of a data source.
- DELETE - The command to remove one or more rows from an existing table based upon a specific condition.
- UPDATE - The command to modify one or more rows in an existing table.
NOTE: Modern big data applications and analytical use cases almost exclusivily use SELECT statements. For this reason, we will ignore DELETE and UPDATE in this introductory course.
SELECT ...a list of columns and expressions...
FROM ...your primary table...
JOIN ...to any additional tables...
WHERE ...rows meet certain conditions...
GROUP BY ...any summary columns...
HAVING ...met any additional criteria after aggregations are performed...
Can you tell what each of the below statements is returning?
SELECT *
FROM fact_sales
WHERE sales_date = TODAY()
SELECT DISTINCT
customer_id
, product_id
FROM fact_sales
WHERE sales_date = TODAY()
SELECT list_customers.customer_name
FROM fact_sales
JOIN list_customers
ON fact_sales.customer_id = list_customers.customer_id
WHERE fact_sales.sales_date = TODAY()
SELECT list_customers.customer_name
FROM fact_sales
JOIN list_customers
ON fact_sales.customer_id = list_customers.customer_id
WHERE fact_sales.sales_date = TODAY()
SELECT
fact_sales.sales_date
, sum(fact_sales.sales) as sales
FROM fact_sales
GROUP BY fact_sales.sales_date
WHERE fact_sales.sales_date = TODAY()
SELECT
list_regions.region_name
, sum(fact_sales.sales) as sales
FROM fact_sales
LEFT JOIN list_customers
ON fact_sales.customer_id = list_customers.customer_id
LEFT JOIN list_regions
ON list_customers.region_id = list_regions.region_id
WHERE fact_sales.sales_date = TODAY()
GROUP BY list_regions.region_name
- fact_sales
- sales_date
- order_id
- customer_id
- product_id
- sales
- quantity
- list_customers
- customer_id
- customer_name
- region_id
- list_regions
- region_id
- region_name
- country_code
The CTAS command creates a new permanent table from any valid SELECT statement.
Consider the following example.
CREATE TABLE fact_sales_summary_daily AS
SELECT
sales_date
, sum(sales) AS sales
FROM fact_sales
GROUP BY sales_date
The above CTAS command will create a new, smaller, and faster table which only has a single summary row per day.
NOTE: Most databases also support "CREATE OR REPLACE" to replace any existing table at the same time:
CREATE OR REPLACE TABLE ______ AS...
- LIST
- aka: "Type 1 Dimension"
- HIST
- aka: "Type 2 Dimension"
- EVENTS
- aka: "Fact Table"
- FACT
- aka: "Aggregate Table"
- MAP
- aka: "Factless Fact Table"