-
-
Notifications
You must be signed in to change notification settings - Fork 38
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
indexing of dates, etc., in multi part manuscripts #180
Comments
This would require indexing and displaying parts as separate records, in the same index as the non-composite manuscripts. So that instead of "MS. Ashmole 210" being one hit returned for any given search query, there would be potentially up to five hits, for each of the five parts. That way, none would appear in the example you gave, because they'd each have their own presence in the filters. But each would have then have to have their own web page, which the XSL could be modified to build, repeating the common bits in each. That would be better for people looking for Italian decorated 14th century material, but maybe not as good in other scenarios. Also, It would be a lot of development work. Pretty much every indexing script would need modification, because anything that potentially links to parts would need changing. After Medieval, which has almost 1000 About 10% of Fihrist are probably composite manuscripts, but except for 100 mostly Wellcome Collection ones, they haven't been marked up using So this would be a Medieval-only enhancement for the foreseeable future. An alternative might be to display multiple lines under composite manuscripts in the search results, where currently there is just the one "Contents:". Possibly something like this: It's still appearing in the search results that arguably it shouldn't, but at least users could probably figure out why, without having to click the link and read the entire manuscript. |
The idea of listing each part separately looks like it could be a good temporary solution. I don't know enough about SOLR / Blacklight, but would we really have to index and display parts as separate records (which would be undesirable)? The ideal solution would be to still have the 'parent' or 'master' record for (say) MS. Ashmole 210, but for some of the information in that record to be in separate 'part' records linked to the parent - something like the following. But maybe this isn't technically possible?
|
Perhaps it would be helpful for you to say what you would expect to see in a search for, e.g. 17th C and Italy. What would the list of results look like? At the MS level it is certainly True that Ashmole 210 meets those criteria. It may not at the individual part level, but because we don’t index those as separate retrievable records we can’t therefore return them as a result. Even the parent/child separation would have the same result. We could index individual parts (the children) but retrieve and show the MS record (the parent). But the end result would be the same. If one or more children meet one or more criteria, we would have to show Ashmole 210. It’s just a more complicated way of arriving at the same result. |
Sorry didn’t mean to close! |
We could index parts alongside manuscripts, moving the relevant index fields for facets such as Origin and Century from the latter to the former. And we could probably find the bit of the Blacklight code that builds the links in search results and modify it so that, while there would be pages such as Then the MS. Ashmole 210 problem wouldn't occur, because the Solr record for that manuscript wouldn't have anything in those facet fields. It would disappear from results as soon as either Italy or 14th Century were selected, and instead the parts which match would be listed, then all of those would be gone when the second filter is applied. Only parts from other manuscripts, plus entire non-composite manuscripts, which are specifically from Italy in the 14th Century, would be returned. The downside is that multiple parts from the same composite manuscripts whose origins are all the same or similar might flood the search results. MS. Ashmole 210 would disappear from the results, but MS. Canon. Ital. 157 would appear 7 times, once for each of its 7 parts, all of which are from 14th Century Italy. You could change the TEI to only have one history section in the msDesc for the entire thing when all parts are the same, but there are other examples which are more complicated. For example, MS. Lat. misc. b. 18 contains one part from 14th Century Italy but also another 54 parts, 3 from Italy but not the 14th Century, 4 French, 25 English, and 22 without an origPlace. 10 are English from the 15th Century so browsing for that would mean 10 records where there's currently only one for that manuscript alone. Overall, it could add hundreds more results. They're all relevant, and it might be preferable for some users, but for others it would be more work to find what they want. There would also have to be some overlap in fulltext indexing between parts and the parent manuscripts, so that people could still search for, say, a manuscript they remember containing works X and Y, but in different parts. So keyword searching would also return more hits (in some cases many more) than currently. Potentially these extra hits in search results could be minimized with field collapsing. That uses the same underlying Lucene/Solr feature that allows SOLO to group multiple editions of book into one, with a link to "See all versions". But that can be tricky to set up. The search engine chooses which record to display as representative of the others based on relevance ranking, so sometimes it would be a part and sometimes the whole manuscript. All the above would be quite a lot of development work, and I need to concentrate on writing up documentation, so I'll park this for a while. Meanwhile, when I get a chance, I'll set up the QA server to list parts in the search results, so you can see if that does enough to avoid confusion. |
I've implemented listing of the origins of parts under each manuscript in search results on QA. So filtering on 14th Century and Italy still returns MS. Ashmole 210 but you can see that its only 14th Century part is English and its only Italian part is 17th Century. Listing of parts is only done if the manuscript has no overall You can no longer search for "composite manuscript" as you can on production, but that was never a foolproof way to find mutli-part manuscripts, because anything with a @holfordm: Let me know if this looks good, and is a reasonable mitigation of this issue. |
I've added "Multiple" options to the Language, Century and Origin facets, plus extended "Mixed" in Materials to include manuscripts with multiple distinct |
Looking at how other institutions have dealt with this. Biblissima index entry for a Pseudo-Cicero text has some entries for "manuscripts" and some for "parts of manuscripts". http://beta.biblissima.fr/fr/ark:/43093/oedata6faf100c5a7ac93a73a7cd50662ef5e358ba368f |
So that is their equivalent of this: https://medieval.bodleian.ox.ac.uk/catalog/work_3977 We could do similar, something like this mockup: That would be relatively easy, without requiring the major work to set up a separate index for parts needed to list them separately in search results, as discussed previously. |
@ahankinson and I had an email conversation about this in March 2017. The question relates to the indexing of multi-part manuscripts where each part may have a different material, century of origin, county of origin, etc. An example is MS. Ashmole 210 https://medieval.bodleian.ox.ac.uk/catalog/manuscript_315
Currently if you filter for manuscripts with decoration produced in Italy in the fourteenth century, Ashmole 210 is one of the results. https://medieval.bodleian.ox.ac.uk/?f%5Bms_date_sm%5D%5B%5D=14th+Century&f%5Bms_deconote_b%5D%5B%5D=true&f%5Bms_origin_sm%5D%5B%5D=Italy&f%5Btype%5D%5B%5D=manuscript
This is because one of its parts is from Italy [but is seventeenth century and does not have decoration], another has decoration and is fourteenth century. The question is how far this is a wrong / misleading result. Andrew wrote at the time:
"Determining whether a MSS meets their needs should be left to the user to decide, which means we should favour recall over precision (i.e., showing more results, even if they are potentially irrelevant to their specific needs). If they perform the type of search you are anticipating then they will still find it; they may just have to sift through a few more MSS than they might otherwise.
This is preferable to the alternative, which is that we do not show potentially relevant results because the user did not know how to express their intentions to our system. This would be favouring precision over recall. This is a much more difficult thing to get right, especially with data that's as multifaceted as this catalogue"
I'd like to reopen this is as a question to get the opinion of @andrew-morrison, @eifionjones and other catalogues about what desirable behaviour would be in such cases.
The text was updated successfully, but these errors were encountered: