Skip to content

Commit

Permalink
wip
Browse files Browse the repository at this point in the history
  • Loading branch information
Disiok committed Jul 25, 2024
1 parent a7423fb commit 1a4ce17
Showing 1 changed file with 155 additions and 5 deletions.
160 changes: 155 additions & 5 deletions examples/demo_pydantic_model.ipynb
Original file line number Diff line number Diff line change
@@ -1,5 +1,58 @@
{
"cells": [
{
"cell_type": "markdown",
"id": "7ec31923-0ac8-4455-b78d-2b6465c93af6",
"metadata": {},
"source": [
"# Using LlamaExtract with Pydantic Models"
]
},
{
"cell_type": "markdown",
"id": "d159ec4f-7e83-46a9-a8fc-7c69b16b82fb",
"metadata": {},
"source": [
"In this notebook, we should how to define data schema with `Pydantic` Models and extract structured data with `LlamaExtract`."
]
},
{
"cell_type": "markdown",
"id": "5cd78f3f-4d59-4205-ac02-9755af1c2842",
"metadata": {},
"source": [
"### Setup"
]
},
{
"cell_type": "markdown",
"id": "e763c385-0daa-43fa-a95f-7c43fda6df1b",
"metadata": {},
"source": [
"Install `llama-extract` client library."
]
},
{
"cell_type": "code",
"execution_count": 34,
"id": "28716847-6f47-4b6f-bfd1-17658e218adc",
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"\n",
"\u001b[1m[\u001b[0m\u001b[34;49mnotice\u001b[0m\u001b[1;39;49m]\u001b[0m\u001b[39;49m A new release of pip is available: \u001b[0m\u001b[31;49m24.0\u001b[0m\u001b[39;49m -> \u001b[0m\u001b[32;49m24.1.2\u001b[0m\n",
"\u001b[1m[\u001b[0m\u001b[34;49mnotice\u001b[0m\u001b[1;39;49m]\u001b[0m\u001b[39;49m To update, run: \u001b[0m\u001b[32;49mpip install --upgrade pip\u001b[0m\n",
"Note: you may need to restart the kernel to use updated packages.\n"
]
}
],
"source": [
"%pip install llama-extract > /dev/null"
]
},
{
"cell_type": "code",
"execution_count": 1,
Expand All @@ -9,7 +62,7 @@
"source": [
"import os\n",
"\n",
"os.environ[\"LLAMA_CLOUD_API_KEY\"] = \"llx-QlZUAXGgDpfavBR40UJp6tvfH9h0fEsvTVk0oR9JzNi5bU9c\""
"os.environ[\"LLAMA_CLOUD_API_KEY\"] = \"llx-...\""
]
},
{
Expand All @@ -20,6 +73,14 @@
"### Load data"
]
},
{
"cell_type": "markdown",
"id": "d07e56a1-64c6-4443-bfca-b3799551962e",
"metadata": {},
"source": [
"For this demo, We use 3 sample resumes from [Resume Dataset](https://www.kaggle.com/datasets/gauravduttakiit/resume-dataset) from Kaggle (data is included in this repo)."
]
},
{
"cell_type": "code",
"execution_count": 2,
Expand Down Expand Up @@ -64,6 +125,14 @@
"### Define a Pydantic Model"
]
},
{
"cell_type": "markdown",
"id": "10cece12-9199-4a8c-8ea1-45a98abfd730",
"metadata": {},
"source": [
"First, let's define our data model with Pydantic."
]
},
{
"cell_type": "code",
"execution_count": 4,
Expand Down Expand Up @@ -101,6 +170,14 @@
"### Create schema"
]
},
{
"cell_type": "markdown",
"id": "d279927b-5446-4323-ac9d-b9456abceb0e",
"metadata": {},
"source": [
"Let's use the `Pydantic` Model to define an extraction schema in `LlamaExtract`"
]
},
{
"cell_type": "code",
"execution_count": 6,
Expand All @@ -123,6 +200,43 @@
"schema_response = await extractor.acreate_schema('Resume Schema', data_schema=Resume)"
]
},
{
"cell_type": "code",
"execution_count": 39,
"id": "d8e38724-22db-4ae6-9e26-a024b963e14a",
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"{'type': 'object',\n",
" '$defs': {'Education': {'type': 'object',\n",
" 'title': 'Education',\n",
" 'required': ['degree',\n",
" 'honors',\n",
" 'institution',\n",
" 'field_of_study',\n",
" 'graudation_year'],\n",
" 'properties': {'degree': {'type': 'string', 'title': 'Degree'},\n",
" 'honors': {'type': 'string', 'title': 'Honors'},\n",
" 'institution': {'type': 'string', 'title': 'Institution'},\n",
" 'field_of_study': {'type': 'string', 'title': 'Field Of Study'},\n",
" 'graudation_year': {'type': 'string', 'title': 'Graudation Year'}}}},\n",
" 'title': 'Resume',\n",
" 'required': ['education', 'summary'],\n",
" 'properties': {'summary': {'type': 'string', 'title': 'Summary'},\n",
" 'education': {'$ref': '#/$defs/Education'}}}"
]
},
"execution_count": 39,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"schema_response.data_schema"
]
},
{
"cell_type": "markdown",
"id": "f27d35ff-d17b-49ca-925c-d49087e1b21b",
Expand All @@ -131,6 +245,16 @@
"### Run extraction"
]
},
{
"cell_type": "markdown",
"id": "3802cc18-83b2-42bc-af46-c068945c2169",
"metadata": {},
"source": [
"Now that we have the schema, we can extract structured representation of our resume files.\n",
"\n",
"By specifying `Resume` as the response model. We can directly get extraction results that are validated."
]
},
{
"cell_type": "code",
"execution_count": 10,
Expand Down Expand Up @@ -178,13 +302,39 @@
" print('Institution:\\t', model.education.institution)"
]
},
{
"cell_type": "markdown",
"id": "75eb50f7-484d-4a99-90fa-a0ee2415ad30",
"metadata": {},
"source": [
"You can also direclty work with raw JSON output."
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "f1672b11-fa39-42cf-bc47-82e132c21587",
"execution_count": 41,
"id": "bcf0cf95-29a7-4fc6-945f-3d54c44bba8f",
"metadata": {},
"outputs": [],
"source": []
"outputs": [
{
"data": {
"text/plain": [
"{'summary': 'Degreed accountant with more than 10 years of diversified accounting experience seeking accounting position at a well-established company in Houston',\n",
" 'education': {'degree': \"Bachelor's degree\",\n",
" 'honors': 'Cum Laude - Graduating With Honors',\n",
" 'institution': 'University of Houston',\n",
" 'field_of_study': 'accounting',\n",
" 'graudation_year': '2005'}}"
]
},
"execution_count": 41,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"responses[0].data"
]
}
],
"metadata": {
Expand Down

0 comments on commit 1a4ce17

Please sign in to comment.