wip

run-llama · Jul 25, 2024 · 1a4ce17 · 1a4ce17
1 parent a7423fb
commit 1a4ce17
Showing 1 changed file with 155 additions and 5 deletions.
diff --git a/examples/demo_pydantic_model.ipynb b/examples/demo_pydantic_model.ipynb
@@ -1,5 +1,58 @@
 {
  "cells": [
+  {
+   "cell_type": "markdown",
+   "id": "7ec31923-0ac8-4455-b78d-2b6465c93af6",
+   "metadata": {},
+   "source": [
+    "# Using LlamaExtract with Pydantic Models"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "d159ec4f-7e83-46a9-a8fc-7c69b16b82fb",
+   "metadata": {},
+   "source": [
+    "In this notebook, we should how to define data schema with `Pydantic` Models and extract structured data with `LlamaExtract`."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "5cd78f3f-4d59-4205-ac02-9755af1c2842",
+   "metadata": {},
+   "source": [
+    "### Setup"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "e763c385-0daa-43fa-a95f-7c43fda6df1b",
+   "metadata": {},
+   "source": [
+    "Install `llama-extract` client library."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 34,
+   "id": "28716847-6f47-4b6f-bfd1-17658e218adc",
+   "metadata": {},
+   "outputs": [
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "\n",
+      "\u001b[1m[\u001b[0m\u001b[34;49mnotice\u001b[0m\u001b[1;39;49m]\u001b[0m\u001b[39;49m A new release of pip is available: \u001b[0m\u001b[31;49m24.0\u001b[0m\u001b[39;49m -> \u001b[0m\u001b[32;49m24.1.2\u001b[0m\n",
+      "\u001b[1m[\u001b[0m\u001b[34;49mnotice\u001b[0m\u001b[1;39;49m]\u001b[0m\u001b[39;49m To update, run: \u001b[0m\u001b[32;49mpip install --upgrade pip\u001b[0m\n",
+      "Note: you may need to restart the kernel to use updated packages.\n"
+     ]
+    }
+   ],
+   "source": [
+    "%pip install llama-extract > /dev/null"
+   ]
+  },
   {
    "cell_type": "code",
    "execution_count": 1,
@@ -9,7 +62,7 @@
    "source": [
     "import os\n",
     "\n",
-    "os.environ[\"LLAMA_CLOUD_API_KEY\"] = \"llx-QlZUAXGgDpfavBR40UJp6tvfH9h0fEsvTVk0oR9JzNi5bU9c\""
+    "os.environ[\"LLAMA_CLOUD_API_KEY\"] = \"llx-...\""
    ]
   },
   {
@@ -20,6 +73,14 @@
     "### Load data"
    ]
   },
+  {
+   "cell_type": "markdown",
+   "id": "d07e56a1-64c6-4443-bfca-b3799551962e",
+   "metadata": {},
+   "source": [
+    "For this demo, We use 3 sample resumes from [Resume Dataset](https://www.kaggle.com/datasets/gauravduttakiit/resume-dataset) from Kaggle (data is included in this repo)."
+   ]
+  },
   {
    "cell_type": "code",
    "execution_count": 2,
@@ -64,6 +125,14 @@
     "### Define a Pydantic Model"
    ]
   },
+  {
+   "cell_type": "markdown",
+   "id": "10cece12-9199-4a8c-8ea1-45a98abfd730",
+   "metadata": {},
+   "source": [
+    "First, let's define our data model with Pydantic."
+   ]
+  },
   {
    "cell_type": "code",
    "execution_count": 4,
@@ -101,6 +170,14 @@
     "### Create schema"
    ]
   },
+  {
+   "cell_type": "markdown",
+   "id": "d279927b-5446-4323-ac9d-b9456abceb0e",
+   "metadata": {},
+   "source": [
+    "Let's use the `Pydantic` Model to define an extraction schema in `LlamaExtract`"
+   ]
+  },
   {
    "cell_type": "code",
    "execution_count": 6,
@@ -123,6 +200,43 @@
     "schema_response = await extractor.acreate_schema('Resume Schema', data_schema=Resume)"
    ]
   },
+  {
+   "cell_type": "code",
+   "execution_count": 39,
+   "id": "d8e38724-22db-4ae6-9e26-a024b963e14a",
+   "metadata": {},
+   "outputs": [
+    {
+     "data": {
+      "text/plain": [
+       "{'type': 'object',\n",
+       " '$defs': {'Education': {'type': 'object',\n",
+       "   'title': 'Education',\n",
+       "   'required': ['degree',\n",
+       "    'honors',\n",
+       "    'institution',\n",
+       "    'field_of_study',\n",
+       "    'graudation_year'],\n",
+       "   'properties': {'degree': {'type': 'string', 'title': 'Degree'},\n",
+       "    'honors': {'type': 'string', 'title': 'Honors'},\n",
+       "    'institution': {'type': 'string', 'title': 'Institution'},\n",
+       "    'field_of_study': {'type': 'string', 'title': 'Field Of Study'},\n",
+       "    'graudation_year': {'type': 'string', 'title': 'Graudation Year'}}}},\n",
+       " 'title': 'Resume',\n",
+       " 'required': ['education', 'summary'],\n",
+       " 'properties': {'summary': {'type': 'string', 'title': 'Summary'},\n",
+       "  'education': {'$ref': '#/$defs/Education'}}}"
+      ]
+     },
+     "execution_count": 39,
+     "metadata": {},
+     "output_type": "execute_result"
+    }
+   ],
+   "source": [
+    "schema_response.data_schema"
+   ]
+  },
   {
    "cell_type": "markdown",
    "id": "f27d35ff-d17b-49ca-925c-d49087e1b21b",
@@ -131,6 +245,16 @@
     "### Run extraction"
    ]
   },
+  {
+   "cell_type": "markdown",
+   "id": "3802cc18-83b2-42bc-af46-c068945c2169",
+   "metadata": {},
+   "source": [
+    "Now that we have the schema, we can extract structured representation of our resume files.\n",
+    "\n",
+    "By specifying `Resume` as the response model. We can directly get extraction results that are validated."
+   ]
+  },
   {
    "cell_type": "code",
    "execution_count": 10,
@@ -178,13 +302,39 @@
     "    print('Institution:\\t', model.education.institution)"
    ]
   },
+  {
+   "cell_type": "markdown",
+   "id": "75eb50f7-484d-4a99-90fa-a0ee2415ad30",
+   "metadata": {},
+   "source": [
+    "You can also direclty work with raw JSON output."
+   ]
+  },
   {
    "cell_type": "code",
-   "execution_count": null,
-   "id": "f1672b11-fa39-42cf-bc47-82e132c21587",
+   "execution_count": 41,
+   "id": "bcf0cf95-29a7-4fc6-945f-3d54c44bba8f",
    "metadata": {},
-   "outputs": [],
-   "source": []
+   "outputs": [
+    {
+     "data": {
+      "text/plain": [
+       "{'summary': 'Degreed accountant with more than 10 years of diversified accounting experience seeking accounting position at a well-established company in Houston',\n",
+       " 'education': {'degree': \"Bachelor's degree\",\n",
+       "  'honors': 'Cum Laude - Graduating With Honors',\n",
+       "  'institution': 'University of Houston',\n",
+       "  'field_of_study': 'accounting',\n",
+       "  'graudation_year': '2005'}}"
+      ]
+     },
+     "execution_count": 41,
+     "metadata": {},
+     "output_type": "execute_result"
+    }
+   ],
+   "source": [
+    "responses[0].data"
+   ]
   }
  ],
  "metadata": {