TestSuiteCleanup

EpubCheck Test Suite Cleanup

Why?

EpubCheck’s test suite has grown organically, it consists of both packaged and unpackaged EPUB content, with no naming convention, and barely organized.

The EpubCheck TF of the EPUB 3 CG decided to start cleaning the test suite so that it is easier to use and maintain.

It's issues like #798 which drives the maintainers crazy because they have to change ~140 files in order to change a severity level of a message where there should only be one test for EPUB 2 and maybe another for EPUB 3 but no more...

How?

Every contribution helps… so big thanks are in order for those brave enough to roll up their sleeves and join the effort!

You can find an example of a cleanup in the (already merged) PullRequest #797

If you have questions regarding specific tests or this process in general, create an issue (like #824).

The followings instruction describe how to pick a test from the old test suite and clean it up to port it to the new test suite:

1. Select a test

TBD: describe how to distribute the work among the volunteers

Choose a testcase which produces several non-contiguous messages and identify the original cause for this test (e.g. an issue).
- Study the original cause (e.g. the issue) and then edit the EPUB to remove all unnecessary tests.
Choosing can also happen by accident where you stumble upon such a case whilst developing (e.g. #797)

2. Create a new branch

Whether you’re directly working on a clone of the epubcheck repo or on a private fork, we recommend that you create a new branch, so that it will be easier to submit a pull request of the changes you make. We propose the following naming convention for the branches, so that they’re easier to track:

tests/<your-github-name>-batchN for a batch of tests (for example tests/rdeltour-batch1).
tests/<name-of-the-old-test> for a single test (for example tests/20/valid/lorem-basic)

3. Try to identify what is being tested

In order to port the old test to the new test suite, you should first have a good understanding of what the test is about. Some guess work may be involved here.

Trying to infer a rationale from the test data

sometimes the name of the test sample would be explicit enough. For instance, the (expanded) EPUB named fallbacks-circular in the directory 30/expanded/invalid indicates that the data represent an EPUB with a circular fallback chain (which is invalid).
sometimes the test sample is named after a specific issue. Look up the issue in the tracker to see if the issue’s description gives any clue. For instance, the sample EPUB issue188 in the directory 30/expanded/valid refers to issue #188, which implies that the test was created to assert that EPUB containing a resource with a name including the character '+' will not create a false-positive (i.e. report an error for a valid EPUB).
sometimes the commit history will give you a hint about what the test is about. For instance, you can get the commit history of the sample EPUB lorem-remote-4 in directory 30/expanded/invalid by invoking the command git log src/test/resources/30/expanded/invalid/lorem-remote-4. This will notably list commit 5a68cc8, which says "Fixes Issue 202". And issue #202 will inform you that this is about some case of remote audio/video resources being incorrectly reported as invalid.
other times, you’ll just find an obvious answer by just directly looking at the test data…

Trying to run the test

If the test in question is an "error or warning" test (i.e. a test that asserts that EpubCheck correctly reports errors or warnings), it can be informative to run it and look at the test log.

You can inspect the execution log from different places:

The easiest is simply to directly run EpubCheck with the test data as input.

For instance:

$ epubcheck src/test/resources/20/epub/invalid/issue236.epub

Which will print the following to the standard output:

ERROR(OPF-017): src/test/resources/20/epub/invalid/issue236.epub/META-INF/container.xml(-1,-1): The attribute "full-path" on element "rootfile" must not be empty.
ERROR(OPF-016): src/test/resources/20/epub/invalid/issue236.epub/META-INF/container.xml(-1,-1): The element "rootfile" is missing its required attribute "full-path".

Check finished with errors

From which you can infer what the test is about.

You can also search for the Java test method that uses this test data.

For example, searching the Java test code for the string "issue236.epub" indicates that the test data from the previous example is used in the testMissingFullpathAttributeIssue236 method, at line #236 of the file Epub20CheckTest.java class:

  @Test
  public void testMissingFullpathAttributeIssue236() {
    Collections.addAll(expectedErrors, MessageId.OPF_017, MessageId.OPF_016);
    List<MessageId> expectedFatals = new ArrayList<MessageId>();
    //container.xml missing @full-path attribute or @full-path is empty
    // issue 95 / issue 236
    testValidateDocument("invalid/issue236.epub");
}

The code alredy gives you interesting bits of information, notably the error (or warning) codes that are expected to be raised, and here an inline comment also describes what this test is about.

Having identified the Java test method, you can now run the unit test with Maven’s test command, filtering the tests run only this method with the test property. For example, running:

$ mvn test -Dtest=Epub20CheckTest#testMissingFullpathAttributeIssue236

Would run the test and print (among other Maven output lines):

-------------------------------------------------------
 T E S T S
-------------------------------------------------------
Running com.adobe.epubcheck.api.Epub20CheckTest
Tests run: 1, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 2.216 sec

Results :

Tests run: 1, Failures: 0, Errors: 0, Skipped: 0

This is not very informative however (at least when the test passes), so most of EpubCheck’s test runner methods can be used with a boolean as their last argument to indicate that they be run in verbose mode (i.e. output the execution log).

In the preceding example, you’d enable the verbose mode by replacing line 241 with:

    testValidateDocument("invalid/issue236.epub", true);

and re-running the test would now print EpubCheck’s output:

-------------------------------------------------------
 T E S T S
-------------------------------------------------------
Running com.adobe.epubcheck.api.Epub20CheckTest
invalid/issue236.epub: Errors: 2; Warnings: 0
ERROR: invalid/issue236.epub:META-INF/container.xml: The attribute "full-path" on element "rootfile" must not be empty.
ERROR: invalid/issue236.epub:META-INF/container.xml: The element "rootfile" is missing its required attribute "full-path".
INFO: invalid/issue236.epub:META-INF/container.xml[creation date] 2013-03-24T13:15:52Z
INFO: invalid/issue236.epub[EPUB renditions count] 2

Tests run: 1, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 1.886 sec

Results :

Tests run: 1, Failures: 0, Errors: 0, Skipped: 0

Note that if you feel more familiar using Eclipse –or any other IDE– than the command line, you can very well run all the tests –or single test methods– directly from your IDE.

4. Split the test

Once the scope of the test data has been identified, make sure that the test is atomic enough. In other words, make sure that the data is used to test one and only one reason for failure or sucess.

If the test data is used to test more that one case, split the test in several individual test samples to make sure each covers only one case.

5. Check for duplicates

Now that you have identified the test cases covered by the test data, look up the existing test data that have been already ported to the new test suite, and ensure that the case is not already covered there. If it is already, then the test data can be discarded. Else, move on to the next step.

6. Minimize the test data

Reimplement the test based on one of the minimal EPUB templates. The new refactored test data should only add the bare mininum to these templates, in order to reduce the risk of having to change the test data in the future, due to regression bugs or update of EpubCheck’s logic.

7. Rename the test

Now that the test data has been reduced and optimized, rename it according to the naming conventions (File Naming) and move it to the relevant directory in the new test suite, following the recommended organization (site structure).

8. Create a pull request

Your branch should now contain the changes implementing the port of some test data to the new structure. Submit these changes in a new pull request against the project’s master branch!

After the entire test suite is ported

The instructions above cover the step needed to port the existing test data to the new test suite.

Once done, several additional tasks will need to be achieved to finalize the test suite clean up:

make sure that each error message is tested at least once (this is almost true already).
add some missing tests if required.
refactor the Java runner classes to simplify them and make them easier to maintain.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly