From 870a1b07c80bdb2de22c09dac5b788e00f2273cc Mon Sep 17 00:00:00 2001 From: arno756 Date: Mon, 16 Oct 2023 12:59:41 -0700 Subject: [PATCH] Add files via upload --- notebooks/network-intrusion-detection-part-1/meta.toml | 8 ++++++++ .../network-intrusion-detection-part-1/notebook.ipynb | 1 + 2 files changed, 9 insertions(+) create mode 100644 notebooks/network-intrusion-detection-part-1/meta.toml create mode 100644 notebooks/network-intrusion-detection-part-1/notebook.ipynb diff --git a/notebooks/network-intrusion-detection-part-1/meta.toml b/notebooks/network-intrusion-detection-part-1/meta.toml new file mode 100644 index 0000000..d5acf21 --- /dev/null +++ b/notebooks/network-intrusion-detection-part-1/meta.toml @@ -0,0 +1,8 @@ +[meta] +title="Part 1 or Real-time threat Detection - This notebook demonstrates the application of SingleStoreDB's similarity search to create a system +for identifying infrequent occurrences, a common requirement in fields such as cybersecurity +and fraud detection where only a small percentage of events are potentially malicious. +" +icon="browser" +tags=["iot", "cybersecurity", "training", "vectordb"] +destinations=["spaces"] \ No newline at end of file diff --git a/notebooks/network-intrusion-detection-part-1/notebook.ipynb b/notebooks/network-intrusion-detection-part-1/notebook.ipynb new file mode 100644 index 0000000..b403636 --- /dev/null +++ b/notebooks/network-intrusion-detection-part-1/notebook.ipynb @@ -0,0 +1 @@ +{"cells":[{"cell_type":"markdown","id":"93ad2bda-e101-4aad-a83b-45f84560597c","metadata":{"language":"python"},"source":"
\n
\n \n
\n
\n
SingleStore Notebooks
\n

IT Threat Detection, Part 1

\n
\n
"},{"cell_type":"markdown","id":"4d343fe9-0c6f-4cf7-bf04-02524cdb5879","metadata":{"language":"python"},"source":"This notebook demonstrates the application of SingleStoreDB's similarity search to create a system for identifying infrequent occurrences, a common requirement in fields such as cybersecurity and fraud detection where only a small percentage of events are potentially malicious.\n\nIn this instance, we aim to construct a network intrusion detection system. These systems continuously monitor incoming and outgoing network traffic, generating alerts when potential threats are detected. We'll utilize a combination of a deep learning model and similarity search to identify and classify network intrusion traffic.\n\nOur initial step involves a dataset of labeled traffic events, distinguishing between benign and malicious events, by transforming them into vector embeddings. These vector embeddings serve as comprehensive mathematical representations of network traffic events. SingleStoreDB's built-in similarity-search algorithms allow us to measure the similarity between different network events. To generate these embeddings, we'll leverage a deep learning model based on recent academic research.\n\nSubsequently, we'll apply this dataset to search for the most similar matches when presented with new, unseen network events. We'll retrieve these matches along with their corresponding labels. This process enables us to classify the unseen events as either **benign** or **malicious** by propagating the labels of the matched events. It's essential to note that intrusion detection is a complex classification task, primarily because malicious events occur infrequently. The similarity search service plays a crucial role in identifying relevant historical labeled events, thus enabling the identification of these rare events while maintaining a low rate of false alarms.\n\n## Install Dependencies"},{"cell_type":"code","execution_count":4,"id":"c649045f-0a53-4c49-88cb-0351e872d68c","metadata":{"execution":{"iopub.execute_input":"2023-10-04T15:11:32.025758Z","iopub.status.busy":"2023-10-04T15:11:32.025506Z","iopub.status.idle":"2023-10-04T15:12:02.177962Z","shell.execute_reply":"2023-10-04T15:12:02.177338Z","shell.execute_reply.started":"2023-10-04T15:11:32.025741Z"},"language":"python","tags":[],"trusted":true},"outputs":[],"source":"!pip3 install tensorflow keras pandas --upgrade --quiet"},{"cell_type":"code","execution_count":5,"id":"f3bb35f9-67ea-4a23-b713-888746494baf","metadata":{"execution":{"iopub.execute_input":"2023-10-04T15:12:06.378457Z","iopub.status.busy":"2023-10-04T15:12:06.378188Z","iopub.status.idle":"2023-10-04T15:12:07.773000Z","shell.execute_reply":"2023-10-04T15:12:07.772488Z","shell.execute_reply.started":"2023-10-04T15:12:06.378438Z"},"language":"python","scrolled":true,"tags":[],"trusted":true},"outputs":[],"source":"import os\nos.environ['TF_CPP_MIN_LOG_LEVEL'] = '3'\n\nimport pandas as pd\nimport tensorflow.keras.backend as K\nfrom tensorflow import keras\nfrom tensorflow.keras.models import Model"},{"cell_type":"markdown","id":"01fa2a0b-0213-4399-8481-adc1734cecf0","metadata":{"language":"python"},"source":"We'll define a Python context manager called `clear_memory()` using the **contextlib** module. This context manager will be used to clear memory by running Python's garbage collector (`gc.collect()`) after a block of code is executed."},{"cell_type":"code","execution_count":6,"id":"b7b5a0c6-aef9-4fea-91d9-8777e75d5c1f","metadata":{"execution":{"iopub.execute_input":"2023-10-04T15:12:13.931036Z","iopub.status.busy":"2023-10-04T15:12:13.930471Z","iopub.status.idle":"2023-10-04T15:12:13.934758Z","shell.execute_reply":"2023-10-04T15:12:13.933574Z","shell.execute_reply.started":"2023-10-04T15:12:13.931013Z"},"language":"python","trusted":true},"outputs":[],"source":"import contextlib\nimport gc\n\n@contextlib.contextmanager\ndef clear_memory():\n try:\n yield\n finally:\n gc.collect()"},{"cell_type":"markdown","id":"6eaabcc0-3bfa-4c67-86cd-8b3a42150a6f","metadata":{"language":"python"},"source":"We'll will incorporate portions of code from [research work](https://github.com/Colorado-Mesa-University-Cybersecurity/DeepLearning-IDS). To begin, we'll clone the repository required for data preparation."},{"cell_type":"code","execution_count":7,"id":"1d06bd7a-f3ad-4c3a-a085-d8583d20e2bc","metadata":{"execution":{"iopub.execute_input":"2023-10-04T15:12:20.727187Z","iopub.status.busy":"2023-10-04T15:12:20.726917Z","iopub.status.idle":"2023-10-04T15:12:23.906216Z","shell.execute_reply":"2023-10-04T15:12:23.905573Z","shell.execute_reply.started":"2023-10-04T15:12:20.727172Z"},"language":"python","tags":[],"trusted":true},"outputs":[],"source":"!git clone -q https://github.com/Colorado-Mesa-University-Cybersecurity/DeepLearning-IDS.git"},{"cell_type":"markdown","id":"8c323aab-899d-4156-8cfe-98f0fdbb2147","metadata":{"language":"python"},"source":"## Data Preparation\n\nThe datasets we'll utilize comprise two types of network traffic:\n\n1. Benign (normal)\n2. Malicious (attack)\n\nstemming from various network attacks. Our focus will be solely on web-based attacks. These web attacks fall into three common categories:\n\n1. Cross-site scripting (BruteForce-XSS)\n2. SQL-Injection (SQL-Injection)\n3. Brute force attempts on administrative and user passwords (BruteForce-Web)\n\nThe original data was collected over a span of two days.\n\n### Download Data\n\nWe'll proceed by downloading data for two specific dates:\n\n1. February 22, 2018\n2. February 23, 2018\n\nThese files will be retrieved and saved to the current directory. Our intention is to use one of these dates for training and generating vectors, while the other will be reserved for testing purposes."},{"cell_type":"code","execution_count":8,"id":"47a639f2-1f03-4972-8212-316887cc1c73","metadata":{"execution":{"iopub.execute_input":"2023-10-04T15:12:32.057896Z","iopub.status.busy":"2023-10-04T15:12:32.057596Z","iopub.status.idle":"2023-10-04T15:12:43.448204Z","shell.execute_reply":"2023-10-04T15:12:43.447571Z","shell.execute_reply.started":"2023-10-04T15:12:32.057874Z"},"language":"python","tags":[],"trusted":true},"outputs":[{"name":"stdout","output_type":"stream","text":"Thursday-22-02-2018 100%[===================>] 364.91M 57.0MB/s in 5.8s \nFriday-23-02-2018_T 100%[===================>] 365.10M 78.5MB/s in 4.7s \n"}],"source":"!wget \"https://cse-cic-ids2018.s3.ca-central-1.amazonaws.com/Processed%20Traffic%20Data%20for%20ML%20Algorithms/Thursday-22-02-2018_TrafficForML_CICFlowMeter.csv\" -q --show-progress\n!wget \"https://cse-cic-ids2018.s3.ca-central-1.amazonaws.com/Processed%20Traffic%20Data%20for%20ML%20Algorithms/Friday-23-02-2018_TrafficForML_CICFlowMeter.csv\" -q --show-progress"},{"cell_type":"markdown","id":"21103ba6-036d-4525-9e5a-073ec515aba9","metadata":{"execution":{"iopub.execute_input":"2023-09-27T15:17:45.604442Z","iopub.status.busy":"2023-09-27T15:17:45.604182Z","iopub.status.idle":"2023-09-27T15:17:45.607166Z","shell.execute_reply":"2023-09-27T15:17:45.606518Z","shell.execute_reply.started":"2023-09-27T15:17:45.604422Z"},"language":"python"},"source":"### Review Data"},{"cell_type":"code","execution_count":14,"id":"c17e283b-8629-483a-9631-69392b3b872e","metadata":{"execution":{"iopub.execute_input":"2023-10-04T15:12:49.959451Z","iopub.status.busy":"2023-10-04T15:12:49.959246Z","iopub.status.idle":"2023-10-04T15:12:55.095007Z","shell.execute_reply":"2023-10-04T15:12:55.094492Z","shell.execute_reply.started":"2023-10-04T15:12:49.959434Z"},"language":"python","tags":[],"trusted":true},"outputs":[{"data":{"text/plain":"Label\nBenign 1048009\nBrute Force -Web 362\nBrute Force -XSS 151\nSQL Injection 53\nName: count, dtype: int64"},"execution_count":14,"metadata":{},"output_type":"execute_result"}],"source":"with clear_memory():\n data = pd.read_csv('Friday-23-02-2018_TrafficForML_CICFlowMeter.csv')\n\ndata.Label.value_counts()"},{"cell_type":"markdown","id":"176bc1e1-822e-4e22-a5fa-0dd03d6ecf04","metadata":{"language":"python"},"source":"### Clean Data\n\nWe'll run a cleanup script from the previously downloaded GitHub repo."},{"cell_type":"code","execution_count":17,"id":"af4f1d28-cba5-4547-a2bd-a34b200fc261","metadata":{"execution":{"iopub.execute_input":"2023-10-04T15:13:01.860976Z","iopub.status.busy":"2023-10-04T15:13:01.860721Z","iopub.status.idle":"2023-10-04T15:13:47.515247Z","shell.execute_reply":"2023-10-04T15:13:47.514670Z","shell.execute_reply.started":"2023-10-04T15:13:01.860959Z"},"language":"python","tags":[],"trusted":true},"outputs":[{"name":"stdout","output_type":"stream","text":"cleaning Friday-23-02-2018_TrafficForML_CICFlowMeter.csv\ntotal rows read = 1048576\nall done writing 1042868 rows; dropped 5708 rows\n"}],"source":"!python DeepLearning-IDS/data_cleanup.py \"Friday-23-02-2018_TrafficForML_CICFlowMeter.csv\" \"result23022018\""},{"cell_type":"markdown","id":"15fd1f07-d8a1-4ab5-a36c-5a08f1f043e9","metadata":{"language":"python"},"source":"We'll now review the cleaned data from the previous step."},{"cell_type":"code","execution_count":18,"id":"d893178a-6bcc-4a16-89ef-f3559adad14c","metadata":{"execution":{"iopub.execute_input":"2023-10-04T15:13:58.399760Z","iopub.status.busy":"2023-10-04T15:13:58.399474Z","iopub.status.idle":"2023-10-04T15:14:03.115597Z","shell.execute_reply":"2023-10-04T15:14:03.115048Z","shell.execute_reply.started":"2023-10-04T15:13:58.399739Z"},"language":"python","tags":[],"trusted":true},"outputs":[{"data":{"text/html":"
\n\n\n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n
Dst PortProtocolTimestampFlow DurationTot Fwd PktsTot Bwd PktsTotLen Fwd PktsTotLen Bwd PktsFwd Pkt Len MaxFwd Pkt Len Min...Fwd Seg Size MinActive MeanActive StdActive MaxActive MinIdle MeanIdle StdIdle MaxIdle MinLabel
02261.519374e+0915326981111117919696480...320.00.0000.00.000000e+0000Benign
1500171.519374e+091175738553015000500500...80.00.00058786927.52.375324e+077558300641990849Benign
2500171.519374e+091175738483015000500500...80.00.00058786924.02.375325e+077558300741990841Benign
32261.519374e+0917453921111117919696480...320.00.0000.00.000000e+0000Benign
4500171.519374e+09894834746030000500500...84000364.00.04000364400036421370777.51.528092e+07419895767200485Benign
\n

5 rows × 80 columns

\n
","text/plain":" Dst Port Protocol Timestamp Flow Duration Tot Fwd Pkts \\\n0 22 6 1.519374e+09 1532698 11 \n1 500 17 1.519374e+09 117573855 3 \n2 500 17 1.519374e+09 117573848 3 \n3 22 6 1.519374e+09 1745392 11 \n4 500 17 1.519374e+09 89483474 6 \n\n Tot Bwd Pkts TotLen Fwd Pkts TotLen Bwd Pkts Fwd Pkt Len Max \\\n0 11 1179 1969 648 \n1 0 1500 0 500 \n2 0 1500 0 500 \n3 11 1179 1969 648 \n4 0 3000 0 500 \n\n Fwd Pkt Len Min ... Fwd Seg Size Min Active Mean Active Std \\\n0 0 ... 32 0.0 0.0 \n1 500 ... 8 0.0 0.0 \n2 500 ... 8 0.0 0.0 \n3 0 ... 32 0.0 0.0 \n4 500 ... 8 4000364.0 0.0 \n\n Active Max Active Min Idle Mean Idle Std Idle Max Idle Min \\\n0 0 0 0.0 0.000000e+00 0 0 \n1 0 0 58786927.5 2.375324e+07 75583006 41990849 \n2 0 0 58786924.0 2.375325e+07 75583007 41990841 \n3 0 0 0.0 0.000000e+00 0 0 \n4 4000364 4000364 21370777.5 1.528092e+07 41989576 7200485 \n\n Label \n0 Benign \n1 Benign \n2 Benign \n3 Benign \n4 Benign \n\n[5 rows x 80 columns]"},"execution_count":18,"metadata":{},"output_type":"execute_result"}],"source":"with clear_memory():\n data_23_cleaned = pd.read_csv('result23022018.csv')\n\ndata_23_cleaned.head()"},{"cell_type":"code","execution_count":19,"id":"d98eee6b-6a68-4fb6-b418-1599daa9e4ff","metadata":{"execution":{"iopub.execute_input":"2023-10-04T15:14:09.241483Z","iopub.status.busy":"2023-10-04T15:14:09.241237Z","iopub.status.idle":"2023-10-04T15:14:09.283891Z","shell.execute_reply":"2023-10-04T15:14:09.283370Z","shell.execute_reply.started":"2023-10-04T15:14:09.241467Z"},"language":"python","tags":[],"trusted":true},"outputs":[{"data":{"text/plain":"Label\nBenign 1042301\nBrute Force -Web 362\nBrute Force -XSS 151\nSQL Injection 53\nName: count, dtype: int64"},"execution_count":19,"metadata":{},"output_type":"execute_result"}],"source":"data_23_cleaned.Label.value_counts()"},{"cell_type":"markdown","id":"1b466f2f-ee4f-43ba-b1e6-0cb3380c9125","metadata":{"language":"python"},"source":"## Load Model\n\nIn this section, we'll load a pre-trained model that has been trained on data collected from the same date.\n\nThere are slight modifications to the original model, specifically, altering the number of classes. Initially, the model was designed to classify into four categories:\n\n1. Benign\n2. BruteForce-Web\n3. BruteForce-XSS\n4. SQL-Injection\n\nOur modified model has been adjusted to classify into just two categories:\n\n1. Benign\n2. Attack\n\n
\n \n
\n

Action Required

\n

The ZIP file is hosted on a Google Drive.

\n

Using the Edit Firewall button in the top right, add the following to the SingleStoreDB Cloud notebook firewall, one-by-one:\n

    \n
  • drive.google.com
  • \n
  • *.googleapis.com
  • \n
  • *.googleusercontent.com
  • \n
\n

\n
\n
"},{"cell_type":"code","execution_count":20,"id":"4074a769-b16b-4543-b0db-518c7f95f205","metadata":{"execution":{"iopub.execute_input":"2023-10-04T15:14:18.247758Z","iopub.status.busy":"2023-10-04T15:14:18.247504Z","iopub.status.idle":"2023-10-04T15:14:20.107308Z","shell.execute_reply":"2023-10-04T15:14:20.106660Z","shell.execute_reply.started":"2023-10-04T15:14:18.247741Z"},"language":"python","tags":[],"trusted":true},"outputs":[],"source":"!wget -q -O it_threat_model.zip \"https://drive.google.com/uc?export=download&id=1ahr5dYlhuxS56M6helUFI0yIxxIoFk9o\" \n!unzip -q it_threat_model.zip"},{"cell_type":"code","execution_count":21,"id":"7c77cf96-e3f7-4bed-9f7e-913f851367d8","metadata":{"execution":{"iopub.execute_input":"2023-10-04T15:14:26.353287Z","iopub.status.busy":"2023-10-04T15:14:26.352961Z","iopub.status.idle":"2023-10-04T15:14:26.688942Z","shell.execute_reply":"2023-10-04T15:14:26.688380Z","shell.execute_reply.started":"2023-10-04T15:14:26.353263Z"},"language":"python","tags":[],"trusted":true},"outputs":[{"name":"stdout","output_type":"stream","text":"Model: \"sequential\"\n_________________________________________________________________\n Layer (type) Output Shape Param # \n=================================================================\n dense (Dense) (None, 128) 10240 \n \n dense_1 (Dense) (None, 64) 8256 \n \n dense_2 (Dense) (None, 1) 65 \n \n=================================================================\nTotal params: 18561 (72.50 KB)\nTrainable params: 18561 (72.50 KB)\nNon-trainable params: 0 (0.00 Byte)\n_________________________________________________________________\n"}],"source":"with clear_memory():\n model = keras.models.load_model('it_threat_model')\n\nmodel.summary()"},{"cell_type":"code","execution_count":22,"id":"ed66c137-1023-4dce-b9ea-5e742099302e","metadata":{"execution":{"iopub.execute_input":"2023-10-04T15:14:34.082125Z","iopub.status.busy":"2023-10-04T15:14:34.081856Z","iopub.status.idle":"2023-10-04T15:14:34.224177Z","shell.execute_reply":"2023-10-04T15:14:34.223680Z","shell.execute_reply.started":"2023-10-04T15:14:34.082108Z"},"language":"python","tags":[],"trusted":true},"outputs":[],"source":"with clear_memory():\n # Use the first layer\n layer_name = 'dense'\n intermediate_layer_model = Model(\n inputs = model.input,\n outputs = model.get_layer(layer_name).output\n )"},{"cell_type":"markdown","id":"24d3bfbb-78ee-4a8b-a429-84f988915b6d","metadata":{"language":"python"},"source":"## Upload Data to SingleStoreDB\n\n### Prepare Data\nWe'll use a method for defining item IDs that aligns with the event's label."},{"cell_type":"code","execution_count":23,"id":"ae1fc38e-f532-430a-99cd-f3e154343bd4","metadata":{"execution":{"iopub.execute_input":"2023-10-04T15:14:41.159100Z","iopub.status.busy":"2023-10-04T15:14:41.158708Z","iopub.status.idle":"2023-10-04T15:16:14.701123Z","shell.execute_reply":"2023-10-04T15:16:14.700506Z","shell.execute_reply.started":"2023-10-04T15:14:41.159081Z"},"language":"python","tags":[],"trusted":true},"outputs":[{"name":"stdout","output_type":"stream","text":"32590/32590 [==============================] - 26s 797us/step\n"},{"name":"stderr","output_type":"stream","text":"100%|██████████| 1042867/1042867 [00:50<00:00, 20810.99it/s]\n"}],"source":"from tqdm import tqdm\nitems_to_upload = []\n\nwith clear_memory():\n model_res = intermediate_layer_model.predict(K.constant(data_23_cleaned.iloc[:,:-1]))\n \n for i, res in tqdm(zip(data_23_cleaned.iterrows(), model_res), total = len(model_res)):\n benign_or_attack = i[1]['Label'][:3]\n items_to_upload.append((benign_or_attack + '_' + str(i[0]), res.tolist()))"},{"cell_type":"markdown","id":"e4b3f74e-efea-499e-b25a-74316fcf2395","metadata":{"language":"python"},"source":"We'll store the data in a Pandas DataFrame."},{"cell_type":"code","execution_count":29,"id":"0177714c-62b7-4d3e-a93b-2fd467ff1352","metadata":{"execution":{"iopub.execute_input":"2023-10-04T15:17:45.778441Z","iopub.status.busy":"2023-10-04T15:17:45.778183Z","iopub.status.idle":"2023-10-04T15:17:47.525268Z","shell.execute_reply":"2023-10-04T15:17:47.524802Z","shell.execute_reply.started":"2023-10-04T15:17:45.778425Z"},"language":"python","tags":[],"trusted":true},"outputs":[{"data":{"text/html":"
\n\n\n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n
IDModel_Results
0Ben_0[0.0, 0.0, 0.0, 125628656.0, 0.0, 0.0, 5421442...
1Ben_1[0.0, 0.0, 0.0, 356751744.0, 1190461440.0, 0.0...
2Ben_2[0.0, 0.0, 0.0, 356751680.0, 1190461440.0, 0.0...
3Ben_3[0.0, 0.0, 0.0, 125515856.0, 0.0, 0.0, 5432884...
4Ben_4[0.0, 0.0, 0.0, 26214912.0, 698683840.0, 0.0, ...
\n
","text/plain":" ID Model_Results\n0 Ben_0 [0.0, 0.0, 0.0, 125628656.0, 0.0, 0.0, 5421442...\n1 Ben_1 [0.0, 0.0, 0.0, 356751744.0, 1190461440.0, 0.0...\n2 Ben_2 [0.0, 0.0, 0.0, 356751680.0, 1190461440.0, 0.0...\n3 Ben_3 [0.0, 0.0, 0.0, 125515856.0, 0.0, 0.0, 5432884...\n4 Ben_4 [0.0, 0.0, 0.0, 26214912.0, 698683840.0, 0.0, ..."},"execution_count":29,"metadata":{},"output_type":"execute_result"}],"source":"with clear_memory():\n df = pd.DataFrame(items_to_upload, columns=['ID', 'Model_Results'])\n\ndf.head()"},{"cell_type":"markdown","id":"f52639f6-82c8-427f-aa7b-9859d6580eae","metadata":{"language":"python"},"source":"Now we'll convert the vectors to a binary format, ready to store in SingleStoreDB."},{"cell_type":"code","execution_count":30,"id":"ec8a2d26-98cf-4f1a-9e13-e197cd65040c","metadata":{"execution":{"iopub.execute_input":"2023-10-04T15:17:53.975325Z","iopub.status.busy":"2023-10-04T15:17:53.975066Z","iopub.status.idle":"2023-10-04T15:17:57.811281Z","shell.execute_reply":"2023-10-04T15:17:57.810738Z","shell.execute_reply.started":"2023-10-04T15:17:53.975308Z"},"language":"python","trusted":true},"outputs":[],"source":"import struct\n\ndef data_to_binary(data: list[float]):\n format_string = 'f' * len(data)\n return struct.pack(format_string, *data)\n\nwith clear_memory():\n df['Model_Results'] = df['Model_Results'].apply(data_to_binary)"},{"cell_type":"markdown","id":"e267e6a4-3e70-4411-96cd-94da3f1f6011","metadata":{"language":"python"},"source":"We'll check the DataFrame."},{"cell_type":"code","execution_count":31,"id":"770cfe1e-4d55-43a8-a5e0-36da2525ca25","metadata":{"execution":{"iopub.execute_input":"2023-10-04T15:18:02.459555Z","iopub.status.busy":"2023-10-04T15:18:02.459286Z","iopub.status.idle":"2023-10-04T15:18:02.466500Z","shell.execute_reply":"2023-10-04T15:18:02.465962Z","shell.execute_reply.started":"2023-10-04T15:18:02.459538Z"},"language":"python","trusted":true},"outputs":[{"data":{"text/html":"
\n\n\n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n
IDModel_Results
0Ben_0b'\\x00\\x00\\x00\\x00\\x00\\x00\\x00\\x00\\x00\\x00\\x00...
1Ben_1b'\\x00\\x00\\x00\\x00\\x00\\x00\\x00\\x00\\x00\\x00\\x00...
2Ben_2b'\\x00\\x00\\x00\\x00\\x00\\x00\\x00\\x00\\x00\\x00\\x00...
3Ben_3b'\\x00\\x00\\x00\\x00\\x00\\x00\\x00\\x00\\x00\\x00\\x00...
4Ben_4b'\\x00\\x00\\x00\\x00\\x00\\x00\\x00\\x00\\x00\\x00\\x00...
\n
","text/plain":" ID Model_Results\n0 Ben_0 b'\\x00\\x00\\x00\\x00\\x00\\x00\\x00\\x00\\x00\\x00\\x00...\n1 Ben_1 b'\\x00\\x00\\x00\\x00\\x00\\x00\\x00\\x00\\x00\\x00\\x00...\n2 Ben_2 b'\\x00\\x00\\x00\\x00\\x00\\x00\\x00\\x00\\x00\\x00\\x00...\n3 Ben_3 b'\\x00\\x00\\x00\\x00\\x00\\x00\\x00\\x00\\x00\\x00\\x00...\n4 Ben_4 b'\\x00\\x00\\x00\\x00\\x00\\x00\\x00\\x00\\x00\\x00\\x00..."},"execution_count":31,"metadata":{},"output_type":"execute_result"}],"source":"df.head()"},{"cell_type":"markdown","id":"8131ad6c-6d50-49db-845c-457e9957ddb9","metadata":{"language":"python"},"source":"### Create Database and Table"},{"cell_type":"code","execution_count":34,"id":"8940168f-072f-42cd-8855-4c6b6829421d","metadata":{"execution":{"iopub.execute_input":"2023-10-04T15:18:54.107461Z","iopub.status.busy":"2023-10-04T15:18:54.107206Z","iopub.status.idle":"2023-10-04T15:19:02.426902Z","shell.execute_reply":"2023-10-04T15:19:02.426343Z","shell.execute_reply.started":"2023-10-04T15:18:54.107444Z"},"language":"sql","tags":[],"trusted":true},"outputs":[{"data":{"text/html":"\n \n \n \n \n \n \n
","text/plain":"++\n||\n++\n++"},"execution_count":34,"metadata":{},"output_type":"execute_result"}],"source":"%%sql\n%%sql\nDROP DATABASE IF EXISTS siem_log_kafka_demo;\n\nCREATE DATABASE IF NOT EXISTS siem_log_kafka_demo;\n\nUSE siem_log_kafka_demo;\n\nDROP TABLE IF EXISTS model_results_demo;\n\nCREATE TABLE IF NOT EXISTS model_results (\n id TEXT,\n Model_Results BLOB\n);"},{"cell_type":"markdown","id":"b6390a0c-d60a-4876-85bd-56e17deabeeb","metadata":{"language":"python"},"source":"### Get Connection Details\n\n
\n \n
\n

Action Required

\n

Select the database from the drop-down menu at the top of this notebook. It updates the connection_url which is used by SQLAlchemy to make connections to the selected database.

\n
\n
"},{"cell_type":"code","execution_count":36,"id":"8208b373-1973-4925-8d8a-cd43ba13e7d7","metadata":{"execution":{"iopub.execute_input":"2023-10-04T15:19:22.359227Z","iopub.status.busy":"2023-10-04T15:19:22.358972Z","iopub.status.idle":"2023-10-04T15:19:22.363216Z","shell.execute_reply":"2023-10-04T15:19:22.362608Z","shell.execute_reply.started":"2023-10-04T15:19:22.359211Z"},"language":"python","tags":[],"trusted":true},"outputs":[],"source":"from sqlalchemy import *\n\ndb_connection = create_engine(connection_url)"},{"cell_type":"markdown","id":"d97c6886-0ea9-47b5-938b-03ada717b394","metadata":{"execution":{"iopub.execute_input":"2023-09-27T15:37:17.351200Z","iopub.status.busy":"2023-09-27T15:37:17.350962Z","iopub.status.idle":"2023-09-27T15:37:17.354770Z","shell.execute_reply":"2023-09-27T15:37:17.353988Z","shell.execute_reply.started":"2023-09-27T15:37:17.351170Z"},"language":"python"},"source":"### Store DataFrame"},{"cell_type":"code","execution_count":37,"id":"e70b91a0-fdb9-46da-8c0d-5acc947799be","metadata":{"execution":{"iopub.execute_input":"2023-10-04T15:19:28.729452Z","iopub.status.busy":"2023-10-04T15:19:28.729169Z","iopub.status.idle":"2023-10-04T15:22:34.401289Z","shell.execute_reply":"2023-10-04T15:22:34.400783Z","shell.execute_reply.started":"2023-10-04T15:19:28.729435Z"},"language":"python","trusted":true},"outputs":[],"source":"with clear_memory():\n df.to_sql(\n 'model_results',\n con = db_connection,\n if_exists = 'append',\n index = False,\n chunksize = 1000\n )"},{"cell_type":"markdown","id":"559928bd-10a0-4880-805a-23444d7d72a3","metadata":{"language":"python"},"source":"### Check Stored Data"},{"cell_type":"code","execution_count":47,"id":"f4cb8e7f-362f-4477-8600-e5f39ec197b8","metadata":{"execution":{"iopub.execute_input":"2023-10-04T15:22:52.838607Z","iopub.status.busy":"2023-10-04T15:22:52.838359Z","iopub.status.idle":"2023-10-04T15:22:53.253409Z","shell.execute_reply":"2023-10-04T15:22:53.252952Z","shell.execute_reply.started":"2023-10-04T15:22:52.838591Z"},"language":"sql","trusted":true},"outputs":[{"data":{"text/html":"\n \n \n \n \n \n \n \n \n \n \n \n \n
IDModel_Results
Ben_764632[0, 0, 0, 161398336, 0, 0, 91465440, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 186428320, 0, 0, 0, 167306864, 0, 0, 277207904, 0, 92328576, 73124928, 0, 0, 0, 95751136, 0, 0, 0, 0, 0, 0, 0, 0, 230162768, 273622432, 511405120, 0, 0, 0, 0, 0, 0, 0, 0, 0, 145775152, 106490400, 373456928, 0, 0, 0, 211604256, 30848250, 0, 0, 0, 0, 326004800, 0, 0, 0, 0, 13625428, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 248507264, 0, 121489904, 196521904, 0, 2331058, 0, 0, 234076784, 247954704, 0, 0, 16321682, 0, 0, 0, 343808992, 0, 0, 0, 74993352, 0, 0, 59710728, 0, 0, 89274704, 0, 174431776, 107296112, 0, 0, 134864096, 0, 0]
","text/plain":"+------------+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+\n| ID | Model_Results |\n+------------+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+\n| Ben_764632 | [0, 0, 0, 161398336, 0, 0, 91465440, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 186428320, 0, 0, 0, 167306864, 0, 0, 277207904, 0, 92328576, 73124928, 0, 0, 0, 95751136, 0, 0, 0, 0, 0, 0, 0, 0, 230162768, 273622432, 511405120, 0, 0, 0, 0, 0, 0, 0, 0, 0, 145775152, 106490400, 373456928, 0, 0, 0, 211604256, 30848250, 0, 0, 0, 0, 326004800, 0, 0, 0, 0, 13625428, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 248507264, 0, 121489904, 196521904, 0, 2331058, 0, 0, 234076784, 247954704, 0, 0, 16321682, 0, 0, 0, 343808992, 0, 0, 0, 74993352, 0, 0, 59710728, 0, 0, 89274704, 0, 174431776, 107296112, 0, 0, 134864096, 0, 0] |\n+------------+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+"},"execution_count":47,"metadata":{},"output_type":"execute_result"}],"source":"%%sql\n%%sql\nUSE siem_log_kafka_demo;\n\nSELECT ID, JSON_ARRAY_UNPACK(Model_Results) AS Model_Results\nFROM model_results\nLIMIT 1;"}],"metadata":{"kernelspec":{"display_name":"Python 3 (ipykernel)","language":"python","name":"python3"},"language_info":{"codemirror_mode":{"name":"ipython","version":3},"file_extension":".py","mimetype":"text/x-python","name":"python","nbconvert_exporter":"python","pygments_lexer":"ipython3","version":"3.11.4"},"singlestore_connection":{"connectionID":"a98658e6-5e5d-4be9-be2e-9fa993172504","defaultDatabase":""},"singlestore_row_limit":300},"nbformat":4,"nbformat_minor":5} \ No newline at end of file