Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

add avro doc field of a record #90

Open
wants to merge 12 commits into
base: main
Choose a base branch
from

Conversation

sookyomkim
Copy link

add avro doc field of a record using model's docstring

@sookyomkim
Copy link
Author

@timvancann You merged the branch where sharonyogev worked on to support pydantic v2 and force pushed.
Is that a mistake? Or is there some other intent?

If you were thinking of merging together, please review this pr. 🙇

@timvancann
Copy link
Contributor

@sookyomkim I expect I made a booboo somewhere. Removing the pydantic-v2 changes was not intended. These changes are back so feel free to rebase on main. After that we can approve and merge this one :)

@DaanRademaker
Copy link
Collaborator

So I would like to give my 2 cents here. I am a bit hesitant to automatically add description based on docstring. What would happen for different docstring formats? Some people might add types and param description in docstring as well how would these then translate to decriptions in the avro file?

To still support description I would suggest perhaps:

  1. Add a dunder method to the basemodel for a model specific description ( like the docstring)
  2. For the individual Fields the native pydantic description could be used and supported

@sookyomkim
Copy link
Author

sookyomkim commented Aug 9, 2023

Take a look at this pydantic code.
pydantic takes the docstring of the child class or the docstring of the parent class (if the child class has no docstring), calls cleandoc() to remove the white space, and writes it as a description.
This looks no different than the code I wrote. Except for the part about getting the docstring of the parent class.

Shouldn't that be enough? Do we need to worry about the format of the docstring? Isn't it enough to see it as text and just put it in?

Initially, I implemented it by using pydantic json schema's description, but the reason for using a direct docstring is that pydantic description takes the docstring of the parent class if the child class does not have a docstring.
So, if you don't write a docstring in your model, it will take the docstring from AvroBase. This is it.
pydantic v2 doesn't have this problem.

@BeRT2me
Copy link

BeRT2me commented Aug 28, 2023

Perhaps a good balance would be taking only the first section of a docstring, as the convention of what they should look like is fairly well defined: https://peps.python.org/pep-0257/#handling-docstring-indentation

class Complex(AvroBase):
    """Form a complex number
    and do other stuff too.

    Keyword arguments:
    real -- the real part (default 0.0)
    imag -- the imaginary part (default 0.0)
    """

doc = Complex.model_json_schema()['description']
split = doc.find('\n\n')
doc = doc[:split] if split != -1 else doc

doc == 'Form a complex number\nand do other stuff too.'

Comment on lines +189 to +190
if cls.__doc__:
avro_schema_dict["doc"] = cleandoc(cls.__doc__)
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if cls.__doc__ already acts as the necessary check for whether or not schema['description'] both exists and wasn't inherited.

So simply:

Suggested change
if cls.__doc__:
avro_schema_dict["doc"] = cleandoc(cls.__doc__)
if cls.__doc__:
avro_schema_dict["doc"] = schema["description"]

Would work as desired~

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You're right, but if we use schema["description"] directly, it will get the docstring of the parent class if the child class's docstring isn't defined.
pydantic uses inspect.getdoc().
So if we don't define a docstring, we'll get AvroBase's docstring.
That's why i used cls.__doc__ directly instead of using schema["description"].

pydantic V2 doesn't have this problem. It only gets the docstring of the child class.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My point is,cls.__doc__ will be None for both V1 and V2 if not directly defined.
__doc__ is not inherited in either case, and only the description is inherited in V1.
So if __doc__ is defined, there's no reason to redo what pydantic does, and if __doc__ isn't defined the line gets skipped in either case...

@BeRT2me
Copy link

BeRT2me commented Aug 28, 2023

I hadn't seen this PR until after I made mine, and originally closed it.
But, because it adds the reverse Avro --> Pydantic as well, I've taken some ideas from here and re-opened it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants