Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Reconcile English/British/foreign citations #42

Open
1 of 3 tasks
kfunk074 opened this issue Nov 12, 2021 · 12 comments
Open
1 of 3 tasks

Reconcile English/British/foreign citations #42

kfunk074 opened this issue Nov 12, 2021 · 12 comments
Assignees

Comments

@kfunk074
Copy link
Collaborator

kfunk074 commented Nov 12, 2021

CAP has only U.S. cases and does not detect citations of English (or any foreign) reporters, nor would it help much if it did, as the case text and metadata will not be in CAP.

  • Prepare an excel file list of foreign reporters and their common abbreviations
  • Construct particular reg ex detectors of foreign reporters
  • Explore open source corpora of English reports spanning the relevant time period (1800-1920)
@kfunk074 kfunk074 self-assigned this Nov 12, 2021
@kfunk074
Copy link
Collaborator Author

Grajzl and Murrell's helpful guide to the English Reports, and how they constructed their database of pre-1765 case reports: http://www.econweb.umd.edu/~murrell/articles/AppendicesMachineCaselawJOIE.pdf

"The source of our data and the starting point for our corpus construction and processing was
a digitized database of English Reports, obtained from Juta and Company (Pty) Ltd (English
Reports (1260-1865), n.d.). The resultant database consists of 129,042 nominate reports of
decisions rendered in the English courts of law between the early 13th century and the mid-19th
century."

@kfunk074
Copy link
Collaborator Author

English Case Reports.xlsx

Here's a start on a database of UK law reports, adapted from the second English edition (1892) of Joseph Story's Commentaries on Equity. It's probably incomplete, but hopefully not very.

@kfunk074
Copy link
Collaborator Author

So with perfect OCR, we can at least use this dataset to match a citation to a UK reporter. To this point I haven't attempted to correct common OCR errors on English reports. To do that, it would be helpful to have the output of our general regex run on the Story volume I mention above.

  • Produce a general regex citations output for Story's English Equity, Gale ID: F0105632267

@kfunk074
Copy link
Collaborator Author

86 Eng Rep 2

Image of a typical page in the English Reports. The plain text is not expensive to acquire. This page makes clear there are two complications posed by the English reports that we won't usually encounter with American reports: 1) multiple cases can be reported on a single page, meaning citation "addresses" are not unique. 2) Many private reporters had such limited runs they only produced one volume and so there is no volume signifier in the standard citation form. Neither of these derail the main project. We will either miss citations to the obscure private reporters or we can write special particular regex's to find them.

So far as I can tell, there is no CAP equivalent for UK case reports. There are things we could do to create more meaningful connections in the data, but these should all be considered back burner to the main project.

  • We could "section" the cases into separate texts as we did with the Field Codes. Each text could retain its "address" in the English Reports and we could try to extract the OCR of the private reporter citations with which each report begins.
  • The English Report volumes are divided up by jurisdiction (King's Bench, Chancery, Exchequer, etc.) and then run chronologically. An RA could prepare a database of court personnel and corresponding dates. We could then track decisional law by court and jurist as we can with CAP.
  • Grajzl and Murrell are trying to topic model this corpus to death. I'll get in touch to see what if anything they've done to think about citations.

@lmullen
Copy link
Owner

lmullen commented Apr 22, 2022

@kfunk074 Two questions about the status of this one.

  1. Any more (much more?) to be done to create as complete a list of English reporters as reasonable?

  2. Any reason to think these won't be picked up by our general Go cite detector? In other words, the problem isn't detection by analysis?

@kfunk074
Copy link
Collaborator Author

I don’t know what I don’t know. I think it’s a pretty extensive list, and I don’t know where to look to find more, though there may well be more out there. Many are single-volume, but that’s the only hang up to finding them with a general regex search.

@kfunk074
Copy link
Collaborator Author

For future reference, this database might be helpful as a UK CAP alternative. Have yet to suss out how comprehensive it is: https://swarb.co.uk/its-what-we-do/

@lmullen lmullen changed the title Track English/British/Foreign Citations Reconcile English/British/foreign citations Aug 26, 2022
@lmullen
Copy link
Owner

lmullen commented Aug 26, 2022

We have essentially detected the British citations, unless there is some reporters that fall out of the 1 Reporter 123 pattern. What we need is a process to reconcile them to useful information parallel to CAP.

@kfunk074
Copy link
Collaborator Author

Not sure how I missed this before. A complete database of the English Reports appears to be here: http://www.commonlii.org/uk/cases/EngR/

It appears there are hand-keyed parallel citations that could link to our detected cases and allow us to extract at least the dates of the decisions. I'll see if they can share their datafiles.

@kfunk074
Copy link
Collaborator Author

Behold, the English Reports. Turns out each case has one and only one parallel cite, so no extra table needed for that. The second table here matches up volume number to court jurisdiction. We have the full text too, just not in table form yet. Low priority to get full text I would think.

Edit: File too big. Download the csv here.

english_reports_courts_by_volume.csv

@kfunk074
Copy link
Collaborator Author

A few pointers, as I review Phil's data:

  1. The reporter_standard entries in the whitelist now match exactly the reporter abbreviations used in the English Reports. A "raw" MOML citation should match exactly the official or nominate citation from the English Reports.
  2. The English reports give one and only one nominate citation for each official citation. I don't know if that's historically accurate but I have no evidence to doubt it either, so for now we can just embrace the simplicity.
  3. The English Reports are comprehensive through 1866, sporadic until 1877. An entirely different set of reports, the Law Times Weekly, became the official reporter in the 1870s. I'm working with law librarians to see if a structured database of the Law Times is available, but just to be clear: the English Reports cover UK cases from 1200 to about 1870. They will only account for a fraction (half? a third?) of all cites our whitelist labels "UK." But they're comprehensive, influential, and the metadata is useful, so well-worth plugging in now while we wait to see if anything comes of the Law Times.
  4. The good and bad news is that the metadata is far less extensive than CAP's, and the corpus far smaller. Hopefully that helps with linking. There are three tables in the drive folder linked above: The data on each case in the reports, a table of jurisdictions by volume (the printed English Reports are organized chronologically by jurisdiction), and a table of full text reports keyed to each case id (being ironed out by Phil as of 7/23 but nearly complete). We don't need to import the full text if we don't want to burden the server with a bunch of data we're not going to use for the foreseeable future.

@kfunk074
Copy link
Collaborator Author

The complete, clean, final, and godly English Reports are here: https://drive.google.com/drive/folders/1QpwUQHIxzAJdeUG15CdNPioT5HBilyKY?usp=sharing

The csv file contains everything described above as well as the clean years, titles, and wordcounts from Peter Murrell's data. This is ready to integrate when you're ready to tackle the integration.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants