Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

We are not clear on what happens when any given boolean parameter is omitted #115

Closed
martindholmes opened this issue Jan 19, 2021 · 11 comments
Labels
bug Something isn't working documentation Improvements or additions to documentation
Milestone

Comments

@martindholmes
Copy link
Collaborator

We don't have default values in the content models of our booleans parameters, and we are not consistent with our defaults:

<!--Specify whether or not to link to fragments; we default true-->
            <xsl:if test="not($configDoc//params/linkToFragmentId)">
                <xso:param name="linkToFragmentId" select="true()"/>
            </xsl:if>
            
            <!--Turn on experimental scroll-to-text feature: default false.-->
            <xsl:if test="not($configDoc//params/scrollToTextFragment)">
                <xso:param name="scrollToTextFragment" select="false()"/>
            </xsl:if>

(from create_config.xsl).

I think our defaults are common-sensical, generally, but it should be clear what you're getting if you don't include one of these optional elements. Another alternative would be to make all the elements mandatory.

@martindholmes martindholmes added bug Something isn't working documentation Improvements or additions to documentation labels Jan 19, 2021
@martindholmes martindholmes added this to the Release 1.2 milestone Jan 19, 2021
@joeytakeda
Copy link
Contributor

I think there's a few ways to approach this, but I wasn't totally sure what you had in mind.

If we don't make elements in the config mandatory, then the simplest would be to announce whenever we use a default value in the XSLT for the configuration file:

<xsl:if test="not($configDoc//params/linkToFragmentId)">
               <xsl:message>INFO: linkToFragmentId not set. Defaulting to true</xsl:message>
                <xso:param name="linkToFragmentId" select="true()"/>
            </xsl:if>

That won't make it clear to people when they're creating their config what the default option is, but it would at least alert them to what happened after the fact.

However (and I'm not sure if you were suggesting this), we could also add the values as defaults to the schema and use those to populate the values in create_config.xsl. If we do add defaults, I think the XSLT should still output an INFO message when it uses the default value from the ODD.

In either case, we should add a bit of a summary of the configuration options to the documentation. I like the way we have the information now, but I think a summary at the top would be helpful--I'm thinking a table with the columns NAME | REQ | DEFAULT VALUE. How that's generated depends on whether we want to use default values in the ODD or not, of course.

However, I like the idea of making the configuration elements mandatory, but in my mind, it would be really complex since a number of options depend on other options: for instance, if you've set createContexts to false, then you shouldn't need to include any of maxKwicsToShow, maxKwicsToHarvest, phrasalSearch, wildcardSearch, totalKwicLength, or kwicTruncateString. That said, I'm not confident that we aren't doing that already.

So if we were to make configuration options mandatory, I'd say we should do it properly: make sure we have all of the proper constraints in the ODD and ensure that our configuration XSLT implements that properly. That's a major project, of course, but one that will probably be a good thing in the long run.

@martindholmes
Copy link
Collaborator Author

As a general rule, I prefer mandatory elements that you have to pay attention to over mysterious defaults that make things happen in the background without your knowing about them.

We could use Schematron to enforce the co-occurrence constraints, and catch inconsistencies before the build goes ahead.

I've been thinking about how complicated our configuration file is getting these days, and wondering about this:

  1. Include very detailed glosses, descs and explanatory remarks in the schema elementSpecs.
  2. Harvest those using specList/specDesc in the prose, rather than having both prose and spec descriptions that could get out of sync.
  3. Also harvest them to generate an HTML page which is part of the documentation, where you can generate a schema for yourself. You would read an explanation for each setting and then set it with a form control, and a text box below containing a full config file XML serialization changes accordingly. This would be an even more useful version of the tabular layout you're suggesting.

@joeytakeda
Copy link
Contributor

As a general rule, I prefer mandatory elements that you have to pay attention to over mysterious defaults that make things happen in the background without your knowing about them.

Agreed.

We could use Schematron to enforce the co-occurrence constraints, and catch inconsistencies before the build goes ahead.

That seems like a good idea to me--of course, adding schematron validation will add a bit of complexity to the build (generating the XSLT etc etc), but that seems worth it. We should also add the standard processing-instructions to the top of the config file so that people get all the benefits of the schema if/when they write them by hand.

There will be some backwards compatibility issues. For instance, our test config would be invalid since it includes the <maxKwicsToRetrieve> option even though <phrasalSearch> is set to true. So I think for 1.2, the schematron should probably be a WARNING rather than a full-blown error and the XSLT should be able to handle those errors gracefully. So we should retain the default values in the XSLT for 1.2 and output a message with a big WARNING saying that default values for configuration options will be deprecated in 1.3 and that all values be made explicit.

I've been thinking about how complicated our configuration file is getting these days, and wondering about this:

  1. Include very detailed glosses, descs and explanatory remarks in the schema elementSpecs.
  2. Harvest those using specList/specDesc in the prose, rather than having both prose and spec descriptions that could get out of sync.
  3. Also harvest them to generate an HTML page which is part of the documentation, where you can generate a schema for yourself. You would read an explanation for each setting and then set it with a form control, and a text box below containing a full config file XML serialization changes accordingly. This would be an even more useful version of the tabular layout you're suggesting.

That all sounds good to me. When I first saw this ticket, I was thinking a similar thing re: a config helper (though I was thinking of a command line utility rather than via our documentation). But an HTML version will be much more helpful since all of the documentation will be right there; the only advantage of a command line thing is it could catch whether or not your search directory actually exists, but we could conceivably do that with schematron.

@martindholmes
Copy link
Collaborator Author

@joeytakeda I'm not sure that we have a clear idea how to proceed on this, so I'm inclined to kick it to 1.3, where we'll probably also address the issue of a config helper tool. What do you think?

@joeytakeda
Copy link
Contributor

Makes sense to me! I’ll move the milestone now

@joeytakeda joeytakeda modified the milestones: Release 1.2, Release 1.3 Mar 1, 2021
@martindholmes
Copy link
Collaborator Author

I've created branch issue-115-experiment to play around with some options here. My first approach is going to be making all the elements mandatory, and giving them all default values in the schema.

@martindholmes
Copy link
Collaborator Author

@joeytakeda I haven't done a pull request for this yet, but I think in general the approach taken in the issue-115-experiment branch is the only one which will really work; specifying defaults in the ODD is tricky for many of these values, and it might just be easier to give people a template to start from where they're all included with their defaults. What do you think?

@martindholmes martindholmes modified the milestones: Release 1.3, Release 1.4 May 3, 2021
@joeytakeda
Copy link
Contributor

@martindholmes I'm mostly in agreement here with two exceptions:

Making <versionFile> mandatory makes having a VERSION file necessary too. We could make the default in the template something like STATIC_SEARCH_VERSION.txt, which would just be created on the fly from the build. (Though that of course runs the risk of us overwriting someone's pre-existing STATIC_SEARCH_VERSION.txt if they had it it for whatever reason)

As I think I've said before (though I can't find it now), I'm not a fan of requiring configuration options that
don't do anything unless some other condition is met: the main one I'm thinking of is maxKwicsToHarvest, which is only useful in the case where createContexts is true and both phrasalSearch and wildcardSearch are false—otherwise, it's just a random thing that you just have to fill out for no clear reason. (And if createContexts is false, then a half-dozen other options become useless).

So one option is that maxKwicsToHarvest is a single outlier in the schema: it isn't necessary and is a very special thing to be used only when you've done the combination above (and we then enforce its use in schematron). I can't imagine many instances where someone truly wants to use the parameter (I don't think any of our projects have sacrificed the phrasal/wildcard functionality for a smaller index size). Or a more reckless suggestion might be to ditch the ability to turn off wildcards/phrasal and force everyone to have contexts (that would really simplify things, but it would force people to have fairly large indexes, which might be a bad idea —though, I can't imagine the indexes ever getting so large that they'd exceed the limits of, say, GitHub pages or AWS' free tier in terms of storage, and usually the biggest files in the mix are the filters and titles).

@martindholmes
Copy link
Collaborator Author

@joeytakeda The <versionFile> element could be empty; that way you're explicitly saying "I don't want a version extension, or I don't care what it is", and then we just make it up on the fly (maybe two random ascii characters).

I take your point about <maxKwicsToHarvest>; I can imagine collections where it would be worth having (huge collections of very similar documents, for instance, where there are too many for wildcards to be practical, but you still want some richness in the search results). But none of our projects fall into that category yet. We'll see what happens when we get to VI History.

On the issue of options made irrelevant by other options, I take your point, but I don't see that it does any harm. I think if we were starting from scratch, these options would be organized as attributes on a smaller number of elements, giving us better configuration possibilities in terms of default values, but I don't think it's worth changing now. But I wonder if it's possible to re-imagine the config options in such a way that they always make sense. We could do that by figuring out the minimum number of bits of information that the build needs to know, and see if there are more elegant ways to group the data.

Anyway, this branch is not useful at this point, and probably couldn't be merged without a lot of work after the weekend scramble to get 1.3 out. I should probably delete it or rebase it on the current dev.

@joeytakeda
Copy link
Contributor

The <versionFile> element could be empty; that way you're explicitly saying "I don't want a version extension, or I don't care what it is", and then we just make it up on the fly (maybe two random ascii characters).

I think empty <versionFile/> should just be "I don't want a version extension" and <versionFile>VERSION</versionFile> (where VERSION doesn't actually exist) would be "I don't care what it is, so just make it for me"; otherwise, there's no way that someone could decide not to have VERSION'd files. In terms of what we'd generate as the version, we could just do [a-z0-9] (to avoid any potential conflicts). And since we're using Saxon 10, we can use random-number-generator() to generate a unique six character version code (I hadn't had a chance to use it yet, but just played around with it and it's great. Would send a xsltfiddle, but neither xsltfiddle nor xquery fiddle has support for Saxon 10 at the moment apparently).

I take your point about <maxKwicsToHarvest>; I can imagine collections where it would be worth having (huge collections of very similar documents, for instance, where there are too many for wildcards to be practical, but you still want some richness in the search results). But none of our projects fall into that category yet. We'll see what happens when we get to VI History.

LOI would fall into that camp, I think, but wildcards are essential because of variant spellings, of course. I presume the same would go for VI History, but I can see how that might be a case where the "global" search is very basic to keep a reasonably sized index and then the "advanced" sub-collection searches have wildcards enabled, etc.

On the issue of options made irrelevant by other options, I take your point, but I don't see that it does any harm. I think if we were starting from scratch, these options would be organized as attributes on a smaller number of elements, giving us better configuration possibilities in terms of default values, but I don't think it's worth changing now. But I wonder if it's possible to re-imagine the config options in such a way that they always make sense. We could do that by figuring out the minimum number of bits of information that the build needs to know, and see if there are more elegant ways to group the data.

I agree that we should really think about better ways to group things--whether that's elements with attributes or containers around particular types of elements or just co-occurence constraints in Schematron.

@martindholmes
Copy link
Collaborator Author

As discussed in email, I've split this issue out into #194, #195 and #196, which are all assigned to the 1.5 milestone, and I'm now closing this rather sprawling discussion.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working documentation Improvements or additions to documentation
Projects
None yet
Development

No branches or pull requests

2 participants