Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Added notebooks for load data #103

Merged
merged 5 commits into from
Jul 11, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
4 changes: 4 additions & 0 deletions authors/chetan-thote.toml
Original file line number Diff line number Diff line change
@@ -0,0 +1,4 @@
name="Chetan Thote"
title="Product Team"
image="singlestore"
external=false
11 changes: 11 additions & 0 deletions notebooks/load-csv-data-s3/meta.toml
Original file line number Diff line number Diff line change
@@ -0,0 +1,11 @@
[meta]
authors=["chetan-thote"]
title="Sales Data Analysis Dataset From Amazon S3"
description="""\
The Sales Data Analysis use case demonstrates how to utilize Singlestore's powerful querying capabilities to analyze sales data stored in a CSV file."""
difficulty="beginner"
tags=["starter", "loaddata", "s3"]
lesson_areas=["Ingest"]
icon="database"
destinations=["spaces"]
minimum_tier="free-shared"
360 changes: 360 additions & 0 deletions notebooks/load-csv-data-s3/notebook.ipynb
Original file line number Diff line number Diff line change
@@ -0,0 +1,360 @@
{
"cells": [
{
"cell_type": "markdown",
"id": "97f96c34-81a9-495a-a55d-c565695e87f0",
"metadata": {},
"source": [
"<div id=\"singlestore-header\" style=\"display: flex; background-color: rgba(235, 249, 245, 0.25); padding: 5px;\">\n",
" <div id=\"icon-image\" style=\"width: 90px; height: 90px;\">\n",
" <img width=\"100%\" height=\"100%\" src=\"https://raw.githubusercontent.com/singlestore-labs/spaces-notebooks/master/common/images/header-icons/database.png\" />\n",
" </div>\n",
" <div id=\"text\" style=\"padding: 5px; margin-left: 10px;\">\n",
" <div id=\"badge\" style=\"display: inline-block; background-color: rgba(0, 0, 0, 0.15); border-radius: 4px; padding: 4px 8px; align-items: center; margin-top: 6px; margin-bottom: -2px; font-size: 80%\">SingleStore Notebooks</div>\n",
" <h1 style=\"font-weight: 500; margin: 8px 0 0 4px;\">Sales Data Analysis Dataset From Amazon S3</h1>\n",
" </div>\n",
"</div>"
]
},
{
"cell_type": "markdown",
"id": "612bd378-f145-42f1-b8ce-32557a4c00cd",
"metadata": {},
"source": [
"<div class=\"alert alert-block alert-warning\">\n",
" <b class=\"fa fa-solid fa-exclamation-circle\"></b>\n",
" <div>\n",
" <p><b>Note</b></p>\n",
" <p>This notebook can be run on a Free Starter Workspace. To create a Free Starter Workspace navigate to <tt>Start</tt> using the left nav. You can also use your existing Standard or Premium workspace with this Notebook.</p>\n",
" </div>\n",
"</div>"
]
},
{
"attachments": {},
"cell_type": "markdown",
"id": "481ce5ae-2ee0-4b63-b3f3-a4b53a5bc381",
"metadata": {},
"source": [
"The Sales Data Analysis use case demonstrates how to utilize Singlestore's powerful querying capabilities to analyze sales data stored in a CSV file. This demo showcases typical operations that businesses perform to gain insights from their sales data, such as calculating total sales, identifying top-selling products, and analyzing sales trends over time. By working through this example, new users will learn how to load CSV data into Singlestore, execute aggregate functions, and perform time-series analysis, which are essential skills for leveraging the full potential of Singlestore in a business intelligence context."
]
},
{
"attachments": {},
"cell_type": "markdown",
"id": "72fe6854-5b6e-4b79-a2d0-79bda0e18429",
"metadata": {},
"source": [
"<h3>Demo Flow</h3>"
]
},
{
"attachments": {},
"cell_type": "markdown",
"id": "5ed26ab8-1217-4fbd-be0c-4e7728314671",
"metadata": {},
"source": [
"<img src=https://singlestoreloaddata.s3.ap-south-1.amazonaws.com/images/LoadDataCSV.png width=\"100%\" hight=\"50%\"/>"
]
},
{
"attachments": {},
"cell_type": "markdown",
"id": "46fb95a8-1402-4b97-b04a-560741f96181",
"metadata": {},
"source": [
"## How to use this notebook"
]
},
{
"attachments": {},
"cell_type": "markdown",
"id": "a701cd90-dd42-4a06-b7a1-e0a2132af558",
"metadata": {},
"source": [
"<img src=https://singlestoreloaddata.s3.ap-south-1.amazonaws.com/images/notebookuse.gif width=\"75%\" hight=\"50%\"/>"
]
},
{
"attachments": {},
"cell_type": "markdown",
"id": "2d22fd53-2c18-40e5-bb38-6d8ebc06f1b8",
"metadata": {},
"source": [
"## Create a database\n",
"\n",
"We need to create a database to work with in the following examples."
]
},
{
"cell_type": "code",
"execution_count": 1,
"id": "1624ccea-0c15-4048-ab2a-fe2178e5912a",
"metadata": {},
"outputs": [],
"source": [
"shared_tier_check = %sql show variables like 'is_shared_tier'\n",
"if not shared_tier_check or shared_tier_check[0][1] == 'OFF':\n",
" %sql DROP DATABASE IF EXISTS SalesAnalysis;\n",
" %sql CREATE DATABASE SalesAnalysis;"
]
},
{
"attachments": {},
"cell_type": "markdown",
"id": "901e6ec1-2530-497a-857e-7973bb9714f1",
"metadata": {},
"source": [
"<h3>Create Table</h3>"
]
},
{
"cell_type": "code",
"execution_count": 2,
"id": "7ac4285d-0d2d-44ec-8b1e-eef7b4f9358c",
"metadata": {},
"outputs": [],
"source": [
"%%sql\n",
"CREATE TABLE `SalesData` (\n",
" `Date` text CHARACTER SET utf8mb4 COLLATE utf8mb4_general_ci,\n",
" `Store_ID` bigint(20) DEFAULT NULL,\n",
" `ProductID` text CHARACTER SET utf8mb4 COLLATE utf8mb4_general_ci,\n",
" `Product_Name` text CHARACTER SET utf8mb4 COLLATE utf8mb4_general_ci,\n",
" `Product_Category` text CHARACTER SET utf8mb4 COLLATE utf8mb4_general_ci,\n",
" `Quantity_Sold` bigint(20) DEFAULT NULL,\n",
" `Price` float DEFAULT NULL,\n",
" `Total_Sales` float DEFAULT NULL\n",
")"
]
},
{
"attachments": {},
"cell_type": "markdown",
"id": "1de959eb-4f17-45d4-af74-42f45684d67b",
"metadata": {},
"source": [
"<h3>Load Data Using Pipelines</h3>"
]
},
{
"cell_type": "code",
"execution_count": 3,
"id": "84f592b8-a12e-41d8-bff0-fe96175992b9",
"metadata": {},
"outputs": [],
"source": [
"%%sql\n",
"CREATE PIPELINE SalesData_Pipeline AS\n",
"LOAD DATA S3 's3://singlestoreloaddata/SalesData/sales_data.csv'\n",
"CONFIG '{ \\\"region\\\": \\\"ap-south-1\\\" }'\n",
"/*\n",
"CREDENTIALS '{\"aws_access_key_id\": \"<access key id>\",\n",
" \"aws_secret_access_key\": \"<access_secret_key>\"}'\n",
" */\n",
"INTO TABLE SalesData\n",
"FIELDS TERMINATED BY ','\n",
"LINES TERMINATED BY '\\r\\n'\n",
"IGNORE 1 lines;\n",
"\n",
"\n",
"START PIPELINE SalesData_Pipeline;"
]
},
{
"cell_type": "code",
"execution_count": 4,
"id": "352e340a-a613-4ec5-94a5-c4e1f3565757",
"metadata": {},
"outputs": [],
"source": [
"%%sql\n",
"SELECT * FROM SalesData LIMIT 10"
]
},
{
"attachments": {},
"cell_type": "markdown",
"id": "4508d431-7683-4ac9-a4e8-d939c47dd1fc",
"metadata": {},
"source": [
"<h3>Sample Queries</h3>\n",
"\n",
"We will try to execute some Analytical Queries"
]
},
{
"attachments": {},
"cell_type": "markdown",
"id": "55ac6134-976c-4f27-bc2b-140835b64f13",
"metadata": {},
"source": [
"<b>Top-Selling Products"
]
},
{
"cell_type": "code",
"execution_count": 5,
"id": "d666c04b-ccb0-47cc-a1e7-efaa7a590d27",
"metadata": {},
"outputs": [],
"source": [
"%%sql\n",
"SELECT product_name, SUM(quantity_sold) AS total_quantity_sold FROM SalesData\n",
" GROUP BY product_name ORDER BY total_quantity_sold DESC LIMIT 5;"
]
},
{
"attachments": {},
"cell_type": "markdown",
"id": "87c36700-0db8-405f-97c0-e13a6a2ae0cb",
"metadata": {},
"source": [
"<b>Sales Trends Over Time"
]
},
{
"cell_type": "code",
"execution_count": 6,
"id": "b46d72c7-07a3-4e23-8fe4-c238b5517ef6",
"metadata": {},
"outputs": [],
"source": [
"%%sql\n",
"SELECT date, SUM(total_sales) AS total_sales FROM SalesData\n",
"GROUP BY date ORDER BY total_sales desc limit 5;"
]
},
{
"attachments": {},
"cell_type": "markdown",
"id": "e6c232a1-acce-4d25-aebd-1a89aafba47d",
"metadata": {},
"source": [
"<b>Total Sales by Store"
]
},
{
"cell_type": "code",
"execution_count": 7,
"id": "af571f6c-0145-4466-9ed7-000d37e4738f",
"metadata": {},
"outputs": [],
"source": [
"%%sql\n",
"SELECT Store_ID, SUM(total_sales) AS total_sales FROM SalesData\n",
"GROUP BY Store_ID ORDER BY total_sales DESC limit 5;"
]
},
{
"attachments": {},
"cell_type": "markdown",
"id": "9bf1d7f3-c636-4ac0-b2be-e48eaca747ef",
"metadata": {},
"source": [
"<b>Sales Contribution by Product (Percentage)"
]
},
{
"cell_type": "code",
"execution_count": 8,
"id": "5613b3e8-72d2-48dc-a7ae-47911df24cd2",
"metadata": {},
"outputs": [],
"source": [
"%%sql\n",
"SELECT product_name, SUM(total_sales) * 100.0 / (SELECT SUM(total_sales) FROM SalesData) AS sales_percentage FROM SalesData\n",
" GROUP BY product_name ORDER BY sales_percentage DESC limit 5;"
]
},
{
"attachments": {},
"cell_type": "markdown",
"id": "afed201d-d9f2-49cc-8a14-df35103abd4e",
"metadata": {},
"source": [
"<b>Top Days with Highest Sale</b>"
]
},
{
"cell_type": "code",
"execution_count": 9,
"id": "7fd8d785-7861-4570-88b3-0185c2c9c298",
"metadata": {},
"outputs": [],
"source": [
"%%sql\n",
"SELECT date, SUM(total_sales) AS total_sales FROM SalesData\n",
" GROUP BY date ORDER BY total_sales DESC LIMIT 5;"
]
},
{
"attachments": {},
"cell_type": "markdown",
"id": "6738b6e4-5e8b-45db-b3dc-ebcb73bcf629",
"metadata": {},
"source": [
"## Conclusion\n",
"\n",
"<div class=\"alert alert-block alert-warning\">\n",
" <b class=\"fa fa-solid fa-exclamation-circle\"></b>\n",
" <div>\n",
" <p><b>Action Required</b></p>\n",
" <p> If you created a new database in your Standard or Premium Workspace, you can drop the database by running the cell below. Note: this will not drop your database for Free Starter Workspaces. To drop a Free Starter Workspace, terminate the Workspace using the UI. </p>\n",
" </div>\n",
"</div>\n",
"\n",
"We have shown how to insert data from a Amazon S3 using `Pipelines` to SingleStoreDB. These techniques should enable you to\n",
"integrate your Amazon S3 with SingleStoreDB."
]
},
{
"cell_type": "code",
"execution_count": 10,
"id": "d5053a52-5579-4fea-9594-5250f6fcc289",
"metadata": {},
"outputs": [],
"source": [
"shared_tier_check = %sql show variables like 'is_shared_tier'\n",
"if not shared_tier_check or shared_tier_check[0][1] == 'OFF':\n",
" %sql DROP DATABASE IF EXISTS SalesAnalysis;"
]
},
{
"cell_type": "markdown",
"id": "2dcc585a-43c2-4598-93bf-888143dd5e29",
"metadata": {},
"source": [
"<div id=\"singlestore-footer\" style=\"background-color: rgba(194, 193, 199, 0.25); height:2px; margin-bottom:10px\"></div>\n",
"<div><img src=\"https://raw.githubusercontent.com/singlestore-labs/spaces-notebooks/master/common/images/singlestore-logo-grey.png\" style=\"padding: 0px; margin: 0px; height: 24px\"/></div>"
]
}
],
"metadata": {
"jupyterlab": {
"notebooks": {
"version_major": 6,
"version_minor": 4
}
},
"kernelspec": {
"display_name": "Python 3 (ipykernel)",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.11.6"
}
},
"nbformat": 4,
"nbformat_minor": 5
}
12 changes: 12 additions & 0 deletions notebooks/load-data-kakfa/meta.toml
Original file line number Diff line number Diff line change
@@ -0,0 +1,12 @@
[meta]
authors=["chetan-thote"]
title="Real-Time Event Monitoring Dataset From Kafka"
description="""\
The Real-Time Event Monitoring use case illustrates how to leverage Singlestore's capabilities to process and analyze streaming data from a Kafka data source.
"""
difficulty="beginner"
tags=["starter", "loaddata", "kafka"]
lesson_areas=["Ingest"]
icon="database"
destinations=["spaces"]
minimum_tier="free-shared"
Loading
Loading