How to parse copyright year #31

rabelux · 2021-09-19T13:48:05Z

I'm getting an error when parsing the copyright line of this book:
©Knaus Verlag (P)2002 Mango Studios Köln

The error says AttributeError: 'NoneType' object has no attribute 'group' in line 674 executing helper.date = re.match(".?(\d{4}).*", cstring).group(1)

I had a look at the code and wanted to write a fix but don't understand the cases you're trying to catch.
Maybe we could collect different examples and expected output?

As far as I understand you're stripping the string down to the part before (P) and extract the date from that part only.
What compells against matching the first 4-digit part in the whole copyright?

The text was updated successfully, but these errors were encountered:

djdembeck · 2021-09-20T16:26:24Z

This code came from unending's fork: Unending/Audiobooks.bundle@85694cb

I didn't personally test it. I can try and help a bit later. The regex is saying something along the lines of "match 4 digits in a row from the given string". 101regex is a great tool to learn more about regexes. Since copyrights only contain years, all it needs to match is those 4 digits.

rabelux · 2021-09-20T16:45:04Z

My code starting in line 658 currently looks like this:

        if cstring:
            if "Public Domain" in cstring:
                helper.date = re.match(".*\(P\)(\d{4})", cstring).group(1)
            else:
                if cstring.startswith(u'\xA9'):
                    cstring = cstring[1:]
                helper.date = re.search(r'\d{4}', cstring).group()
                #if "(P)" in cstring:
                #    cstring = re.match("(.*)\(P\).*", cstring).group(1)
                #if ";" in cstring:
                #    helper.date = str(
                #        min(
                #            [int(i) for i in cstring.split() if i.isdigit()]
                #        )
                #    )
                #else:
                #    helper.date = re.match(".?(\d{4}).*", cstring).group(1)

It matches the first 4 digits it finds after the (c).
But I see what Unending did there. He tried to prioritize whereas I don't see any reason to do that at this point.
I think the (P) stands for sound recording copyright and should be equivalent to (c).

I'm just guessing here so everybody is invited to enlighten me.

seanap · 2021-09-21T17:40:02Z

Audible isn't very consistent but the way I've noticed the most common use is that (C) is the original copyright year of the work, and (P) is the copyright year of the specific publication. See here for reference: https://www.audible.com/pd/East-of-Eden-Audiobook/B00546SXO0

I personally prioritize (C) year, as I think that sorting by year, or filtering by decade works better when the original copyright year is used, but the (P) is also important and should be equivalent to the release date. Both dates need to be used, but I don't know of any player that takes advantage of them. Ideally the id3 tags should be:
ORIGYEAR = (C) year
YEAR = (P) year
RELEASETIME = (P) date

rabelux · 2021-09-23T14:59:45Z

Regarding the example you posted: Would you prefer to have the year set to 1952, or 1980?

As we only have one year to set in Plex I'd suggest to simplify copyright-parsing and do it in the following order:
Take the first year that can be found, unless there is ; in the string, then take the first year after ;

seanap · 2021-09-23T15:07:34Z

For (C) it should be the original year, so 1952. The actual plex tag is "Release Date" so I think (P) 2011 should be the year/date actually imported into plex.

rabelux · 2021-09-23T15:17:31Z

The part of the code I'm talking about is only called if the preferences are set to "use copyright year instead of date published".
So in that case it should be correct to use the first year found - unless the setting has to be renamed or changed to a dropdown list.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

How to parse copyright year #31

How to parse copyright year #31

rabelux commented Sep 19, 2021

djdembeck commented Sep 20, 2021

rabelux commented Sep 20, 2021

seanap commented Sep 21, 2021

rabelux commented Sep 23, 2021

seanap commented Sep 23, 2021

rabelux commented Sep 23, 2021 •

edited

Loading

How to parse copyright year #31

How to parse copyright year #31

Comments

rabelux commented Sep 19, 2021

djdembeck commented Sep 20, 2021

rabelux commented Sep 20, 2021

seanap commented Sep 21, 2021

rabelux commented Sep 23, 2021

seanap commented Sep 23, 2021

rabelux commented Sep 23, 2021 • edited Loading

rabelux commented Sep 23, 2021 •

edited

Loading