Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

As a dev, I want to explore if annotating Pydantic models can improve GPT performance in our pipeline #49

Open
k-allagbe opened this issue Oct 9, 2024 · 0 comments · May be fixed by #51

Comments

@k-allagbe
Copy link
Member

k-allagbe commented Oct 9, 2024

Description

Context
Currently, we pass our Pydantic model's JSON schema to GPT, like this:

json_schema = FertilizerInspection.model_json_schema()
signature = dspy.ChainOfThought(ProduceLabelForm)
prediction = signature(text=text, json_schema=json_schema, requirements=REQUIREMENTS)

An example output of model_json_schema():

from pydantic import BaseModel

class Address(BaseModel):
    street: str

class User(BaseModel):
    id: int
    email: str | None = None
    address: Address

print(User.model_json_schema())

Results:

{
  '$defs': {
    'Address': {
      'properties': {'street': {'title': 'Street', 'type': 'string'}},
      'required': ['street'],
      'title': 'Address',
      'type': 'object'
    }
  },
  'properties': {
    'id': {'title': 'Id', 'type': 'integer'},
    'email': {
      'anyOf': [{'type': 'string'}, {'type': 'null'}],
      'default': None,
      'title': 'Email'
    },
    'address': {'$ref': '#/$defs/Address'}
  },
  'required': ['id', 'address'],
  'title': 'User',
  'type': 'object'
}

Problem Statement
I want to investigate whether annotating the Pydantic model with additional metadata (like descriptions and examples) could improve GPT's performance and the accuracy of predictions in our pipeline.

For instance, here's how we can annotate the same model:

from pydantic import BaseModel, Field

class Address(BaseModel):
    street: str = Field(..., description="Street address of the user", example="123 Main St")

class User(BaseModel):
    id: int = Field(..., description="User's unique identifier", example=1)
    email: str | None = Field(None, description="User's email address, optional", example="email@somewhere")
    address: Address = Field(..., description="Address details of the user")

print(User.model_json_schema())

Results with annotations:

{
  '$defs': {
    'Address': {
      'properties': {
        'street': {
          'description': 'Street address of the user',
          'example': '123 Main St',
          'title': 'Street',
          'type': 'string'
        }
      },
      'required': ['street'],
      'title': 'Address',
      'type': 'object'
    }
  },
  'properties': {
    'id': {
      'description': "User's unique identifier",
      'example': 1,
      'title': 'Id',
      'type': 'integer'
    },
    'email': {
      'anyOf': [{'type': 'string'}, {'type': 'null'}],
      'default': None,
      'description': "User's email address, optional",
      'example': 'email@somewhere',
      'title': 'Email'
    },
    'address': {
      'allOf': [{'$ref': '#/$defs/Address'}],
      'description': 'Address details of the user'
    }
  },
  'required': ['id', 'address'],
  'title': 'User',
  'type': 'object'
}

Acceptance Criteria

  • Research the effectiveness of annotated Pydantic models when passed to GPT for predictions.
  • Measure any performance improvements (e.g., more accurate responses, better contextual understanding).
  • If improvements are found, quantify by how much and document the changes.

Additional Information

  • Consider testing with different levels of model complexity and annotation depth.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants