-
Notifications
You must be signed in to change notification settings - Fork 22
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Reconfigure search parameters in config file #195
Comments
I think I'm convinced that we should make these mandatory. My only thought is whether some of these config options are necessary at all—that said, I don't want to spin this ticket out into a broader discussion of particular elements. So, for now, I'll just ask: if we are to cull config options, should we deprecated them at the same time that we make all options mandatory or should we do it after? I'd say we deprecated them at the same time so that whenever the rule comes into effect, the mandatory elements are ones we actually want. |
I agree: deprecate unnecessary elements and make the others mandatory at the same time. Thinking about interdependent config items: could we unify them into single elements with a more sophisticated and explicit range of options? |
For our reference, here's a table of the child elements of
|
In terms of interdependence, it all comes back to Brainstorming a few possible solutions:
Option 3 is the most disruptive and introduces some nesting, but it is also the most clear in my opinion: those options all have to do with the creation of contexts, so it makes sense to nest them. Option 2 is less disruptive, but introduces an attribute and still has the superfluous option. Option 1 is the least disruptive, but not particularly helpful for those wanting content completion. |
Thanks for this detailed breakdown. I like option 3. I also think that a) we should deprecate scrollToTextFragment and get rid of the related code, because it doesn't show signs of taking hold; b) we could get rid of linkToFragmentId because we should just always do that if createContexts is true. |
I agree re: deprecating scrollToTextFragment and linkToFragmentId. If we are deprecating options, I think indentJSON could be deprecated (mea culpa since I know I advocated for it) and verbose could be deprecated and superseded by a parameter sent to ant since it's annoying to have to change the verbose parameter in the config to get verbose logging when trying to debug something in a single place. |
Attached to this ticket is a zip archive that contains an ODD, a produced RNG, and some sample configuration files that model the changes proposed based on the above discussion. For my own edification, I did this via ODD chaining (using the GitHub repo ODD as the base and modifying) to show exactly what we're changing, which worked precisely as advertised—of course when we actually do this, we'll do this in the real ODD, but it was easier to show the changes up for discussion. Here's a sample of the ODD (with schematron removed for brevity): <elementSpec ident="params" ns="http://hcmc.uvic.ca/ns/staticSearch" module="ss" mode="change">
<content>
<!--Order isn't preserved, but
listed here in logical groups anyway-->
<sequence preserveOrder="false">
<!--Input stuff-->
<elementRef key="searchFile"/>
<elementRef key="recurse"/>
<elementRef key="versionFile"/>
<elementRef key="stemmerFolder"/>
<elementRef key="stopwordsFile"/>
<elementRef key="dictionaryFile"/>
<!--Tokenizing/Parsing-->
<elementRef key="scoringAlgorithm"/>
<elementRef key="createContexts"/>
<!--Output stuff-->
<elementRef key="outputFolder"/>
<elementRef key="resultsPerPage"/>
<elementRef key="resultsLimit"/>
</sequence>
</content>
</elementSpec>
<elementSpec ident="createContexts" ns="http://hcmc.uvic.ca/ns/staticSearch" module="ss" mode="change">
<content>
<!--Could also be <alternate> with <empty/>, I think-->
<sequence minOccurs="0" maxOccurs="1">
<elementRef key="phrasalSearch"/>
<elementRef key="wildcardSearch"/>
<elementRef key="maxKwicsToHarvest"
minOccurs="0" maxOccurs="1"/>
<elementRef key="maxKwicsToShow"/>
<elementRef key="totalKwicLength"/>
<elementRef key="kwicTruncateString"/>
<elementRef key="linkToFragmentId"/>
</sequence>
</content>
<!--snip: bunch of constraintSpecs-->
<attList>
<attDef ident="value" usage="req" mode="add">
<datatype>
<dataRef name="boolean"/>
</datatype>
</attDef>
</attList>
</elementSpec>
<!--DEPRECATIONS-->
<elementSpec ident="verbose" module="ss" ns="http://hcmc.uvic.ca/ns/staticSearch" mode="delete"/>
<elementSpec ident="indentJSON" module="ss" ns="http://hcmc.uvic.ca/ns/staticSearch" mode="delete"/>
<elementSpec ident="scrollToTextFragment" module="ss" ns="http://hcmc.uvic.ca/ns/staticSearch" mode="delete"/>
<elementSpec ident="linkToFragmentId" module="ss" ns="http://hcmc.uvic.ca/ns/staticSearch" mode="delete"/>
|
Thanks for this; I'm slightly surprised that the ODD-chaining worked, but that's good. :-) I suggest we set everything to version 2 right now, and first do the deletions (deleting rather than deprecating; we can handle the mechanics of alerting users by having Ant do a quick check of the file version, and if it's not 2, then run a special XSLT to help users upgrade, then exit. So I don't think we actually need a deprecation period for these four specific things. As they're deleted, we can a) remove any handling for them from the XSLT, and b) add the info to the what's new page. It's refreshing to be able to delete stuff from a project, rather than having all the old cruft just pile up with the new floating on top of it. |
I'm going to do the deletions now in a separate branch--I'll open a ticket for that to keep things clean. |
…licitly set a now-deleted parameter
Arising from #270—we should make sure the documentation is clear about the mandatory elements and resolve the ambiguous language that params "requires two elements" |
I've added in #271 another case where the "default" behaviour of Static Search seems problematic. My two cents worth: version 2 (I think that is what we are talking about) should enact what the current version states: just two |
This is the key deliverable for version 2.0 at this point. Once this is sorted, we can release, assuming no other breaking or blocking things emerge from testing with real sites. |
Following a very productive meeting today (Oct 18), @martindholmes and I have rethought this quite significantly so that each configuration element describes its purpose with properties specified as attributes — this means far fewer elements and more straightforward values with better ways of constraining values (and enforcing co-occurrence constraints). Here's what we're thinking: <params>
<searchPage file="test/search.html"/>
<index recurse="yes"/>
<stopwords
file="test/test_stopwords.txt"/>
<dictionary
file="xsl/english_words.txt"/>
<tokenizer
minWordLength="2"/>
<scoringAlgorithm name="tf-idf"/>
<stemmer
dir="stemmers/en/"/>
<contexts
create="true"
phrasalSearch="true"
wildcardSearch="true"
maxKwicsToHarvest="5"
maxKwicLength="15"
kwicTruncateString="..."
linkToFragmentId="true"
/>
<results
resultsPerPage="5"
maxKwicsToShow="#"/>
<version file="test/VERSION"/>
<outputDir name="ssTest"/>
</params>
This will be a major change, so we'll certainly need to make sure that we have clear guidance and mapping for upgrading from 1.x to 2.0 (and an automated process for doing so). |
If, as argued in #194 and originally arising out of #115, schema-based default values for configuration items are not practical or desirable, another option for making our use of de facto defaults in config processing more explicit would be to make more -- or all -- configuration elements mandatory. I'm actually in favour of this. It would mean that we could provide a single straightforward template for a config file (at least, for the initial option collection -- and we could also provide commented-out examples for all the other components). We could then validate the config file and simply fail the build if anything is missing, forcing users to see all the config they're using. We could have an initial deprecation period in which we would simply warn about missing elements, but then in the subsequent release start failing builds. The interdependence of various config options could also then be tested with Schematron, so incompatibilities would be explicit and easily understood by the user.
For cases where one option makes another redundant, the latter could have an explicit value of "redundant because X".
This would also mean that there would be less pressure on the documentation to explain any mysterious defaulting that might take place in the background to handle a missing element; there could be no missing element, so regular documentation of the values would be sufficient.
The text was updated successfully, but these errors were encountered: