-
Notifications
You must be signed in to change notification settings - Fork 0
/
README
153 lines (105 loc) · 6.29 KB
/
README
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
= splix
== Introduction
The splix program splits a large input XML file into smaller ones based
on the file's contents. This can be useful when working with huge XML files.
When running splix, you have to chooses an input file, and output directory
and a split level. The XML file will be then split at the chosen hierarchical
depth of the tags in the XML tree. Splix will output a header named header.xml
and many files with automatically chosen file names and a .xml extension.
The generated header.xml fill will contain all tags that are at a lower
hierarchical depth than the split level, and placeholder
<splix ref="filename.xml"/> tags for the split-out xml
elements to enable reconstruction of the original file.
The split off files will have file names based on the tag names, and any
attributes of that that match /[iI][dD]/ or, if that is missing
any attributes that match /.*[rR][eE][fF].*/.
An underscore in the file name represents a change in depth of the XML tree at
that level. The split off files will contain all tags that are at a hierarchical
level equal to or higher than the split level. The parts of the file name
are coded with percent coding to avoid invalid characters in the file name
and to keep the level structure clear. In case a duplicate file name would be
generated the file name will have a .number.xml suffix.
== Usage
The program splix is a command line tool with the following options:
splix: syntax: splix -o output_dir -i input_file.xml -l split_level [-n] [-q] [-a] [-d]
The -o option is used to specify the output directory.
The output directory must exist and be writable.
The -i option is used to specify the input file.
The input file must exist and be readable.
The -l option is used to specify the split level. It must be nonnegative integer.
Some trial and error mat be needed to find the most useful split level.
It's mandatory to give all three of the -i, -o and -l options.
If the optional -n option is specified all name spaces in the header part of
input_file.xml will be copied to the root tags of the generated output files.
This differs from the -N option in that all name spaces in all tags at a split
level less than the requested level will be copied, and not just those of
the root tag.
If the optional -q option is specified the escaping of double quotes
as " will be supressed where possible.
If the optional -a option is specified the escaping of ampersands
as & will be supressed where possible.
If the optional -v option is specified the version of splix is printed and
splix exits.
If the optional -Q option is specified splix will silence all non-error output.
If the optional -N option is specified all name spaces of the root tag of
input_file.xml will be copied to the root tags of the generated output files.
This differs from the -n option in that only the name spaces of the
root tag are copied.
If the optional -V option is specified splix will produce more verbose output.
If the optional -Q option is specified splix will silence all non-error output.
The file names of all files that are processed are output to standard output.
If any errors are encountered, these will be printed to standard error,
and the program will return nonzero. The program will return zero on success.
== Installation
=== Prerequisites
Splix requires libexpat and a POSIX operating system that supports the
getopt(), mkdir(), and isatty() system calls. To compile 32 bits binaries
32 bits version of libexpat is must also be installed.
=== Installation Procedure
Splix is compiled with Tup. Just do tup in the source dir to compile it.
Then copy bin/splix or bin/splix_static to somewhere in the path. To compile
32 bits versions use tup build-i386 and copy the binaries in build-i386/bin
== Return Value
The program returns 0 on success, nonzero on error.
== Notes
== Known Bugs
If the XML file has long tag names, or long ID's or REF values, or a very deep
structure, or a too high split level is chosen, splix may try to generate
file names that are too long and fail in interesting ways.
== Acknowledgements
This program uses the following open source components:
* klib (kvec.h, kbtree.h) under the MIT/X11 license.
* A modified MIT licensed version of slre.c and slre.h.
* parg.c and parg.h for argument parsing, under the CC0 license.
== Author
Björn De Meyer, 2015.
== License
The Sleepycat License
Splix is copyright (c) 2015 ([email protected]),
All rights reserved.
Redistribution and use in source and binary forms, with or without modification,
are permitted provided that the following conditions are met:
1. Redistributions of source code must retain the above copyright notice,
this list of conditions and the following disclaimer.
2. Redistributions in binary form must reproduce the above copyright notice,
this list of conditions and the following disclaimer in the documentation
and/or other materials provided with the distribution.
3. Redistributions in any form must be accompanied by information on how to
obtain complete source code for the Splix software and any accompanying
software that uses the Splix software. The source code must either be
included in the distribution or be available for no more than the cost of
distribution plus a nominal fee, and must be freely redistributable under
reasonable conditions. For an executable file, complete source code means
the source code for all modules it contains. It does not include source code
for modules or files that typically accompany the major components of the
operating system on which the executable file runs.
THIS SOFTWARE IS PROVIDED BY THE AUTHOR(S) ``AS IS'' AND ANY EXPRESS OR IMPLIED
WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF
MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE, OR NON-INFRINGEMENT, ARE
DISCLAIMED. IN NO EVENT SHALL THE AUTHOR(S) BE LIABLE FOR ANY DIRECT, INDIRECT,
INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA,
OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF
LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE
OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF
ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.