diff --git a/docs/LICENSE.html b/docs/LICENSE.html new file mode 100644 index 00000000..5899a2d2 --- /dev/null +++ b/docs/LICENSE.html @@ -0,0 +1,796 @@ + + + +
+ + + + +GNU GENERAL PUBLIC LICENSE + Version 3, 29 June 2007 + + Copyright (C) 2007 Free Software Foundation, Inc. <http://fsf.org/> + Everyone is permitted to copy and distribute verbatim copies + of this license document, but changing it is not allowed. + + Preamble + + The GNU General Public License is a free, copyleft license for +software and other kinds of works. + + The licenses for most software and other practical works are designed +to take away your freedom to share and change the works. By contrast, +the GNU General Public License is intended to guarantee your freedom to +share and change all versions of a program--to make sure it remains free +software for all its users. We, the Free Software Foundation, use the +GNU General Public License for most of our software; it applies also to +any other work released this way by its authors. You can apply it to +your programs, too. + + When we speak of free software, we are referring to freedom, not +price. Our General Public Licenses are designed to make sure that you +have the freedom to distribute copies of free software (and charge for +them if you wish), that you receive source code or can get it if you +want it, that you can change the software or use pieces of it in new +free programs, and that you know you can do these things. + + To protect your rights, we need to prevent others from denying you +these rights or asking you to surrender the rights. Therefore, you have +certain responsibilities if you distribute copies of the software, or if +you modify it: responsibilities to respect the freedom of others. + + For example, if you distribute copies of such a program, whether +gratis or for a fee, you must pass on to the recipients the same +freedoms that you received. You must make sure that they, too, receive +or can get the source code. And you must show them these terms so they +know their rights. + + Developers that use the GNU GPL protect your rights with two steps: +(1) assert copyright on the software, and (2) offer you this License +giving you legal permission to copy, distribute and/or modify it. + + For the developers' and authors' protection, the GPL clearly explains +that there is no warranty for this free software. For both users' and +authors' sake, the GPL requires that modified versions be marked as +changed, so that their problems will not be attributed erroneously to +authors of previous versions. + + Some devices are designed to deny users access to install or run +modified versions of the software inside them, although the manufacturer +can do so. This is fundamentally incompatible with the aim of +protecting users' freedom to change the software. The systematic +pattern of such abuse occurs in the area of products for individuals to +use, which is precisely where it is most unacceptable. Therefore, we +have designed this version of the GPL to prohibit the practice for those +products. If such problems arise substantially in other domains, we +stand ready to extend this provision to those domains in future versions +of the GPL, as needed to protect the freedom of users. + + Finally, every program is threatened constantly by software patents. +States should not allow patents to restrict development and use of +software on general-purpose computers, but in those that do, we wish to +avoid the special danger that patents applied to a free program could +make it effectively proprietary. To prevent this, the GPL assures that +patents cannot be used to render the program non-free. + + The precise terms and conditions for copying, distribution and +modification follow. + + TERMS AND CONDITIONS + + 0. Definitions. + + "This License" refers to version 3 of the GNU General Public License. + + "Copyright" also means copyright-like laws that apply to other kinds of +works, such as semiconductor masks. + + "The Program" refers to any copyrightable work licensed under this +License. Each licensee is addressed as "you". "Licensees" and +"recipients" may be individuals or organizations. + + To "modify" a work means to copy from or adapt all or part of the work +in a fashion requiring copyright permission, other than the making of an +exact copy. The resulting work is called a "modified version" of the +earlier work or a work "based on" the earlier work. + + A "covered work" means either the unmodified Program or a work based +on the Program. + + To "propagate" a work means to do anything with it that, without +permission, would make you directly or secondarily liable for +infringement under applicable copyright law, except executing it on a +computer or modifying a private copy. Propagation includes copying, +distribution (with or without modification), making available to the +public, and in some countries other activities as well. + + To "convey" a work means any kind of propagation that enables other +parties to make or receive copies. Mere interaction with a user through +a computer network, with no transfer of a copy, is not conveying. + + An interactive user interface displays "Appropriate Legal Notices" +to the extent that it includes a convenient and prominently visible +feature that (1) displays an appropriate copyright notice, and (2) +tells the user that there is no warranty for the work (except to the +extent that warranties are provided), that licensees may convey the +work under this License, and how to view a copy of this License. If +the interface presents a list of user commands or options, such as a +menu, a prominent item in the list meets this criterion. + + 1. Source Code. + + The "source code" for a work means the preferred form of the work +for making modifications to it. "Object code" means any non-source +form of a work. + + A "Standard Interface" means an interface that either is an official +standard defined by a recognized standards body, or, in the case of +interfaces specified for a particular programming language, one that +is widely used among developers working in that language. + + The "System Libraries" of an executable work include anything, other +than the work as a whole, that (a) is included in the normal form of +packaging a Major Component, but which is not part of that Major +Component, and (b) serves only to enable use of the work with that +Major Component, or to implement a Standard Interface for which an +implementation is available to the public in source code form. A +"Major Component", in this context, means a major essential component +(kernel, window system, and so on) of the specific operating system +(if any) on which the executable work runs, or a compiler used to +produce the work, or an object code interpreter used to run it. + + The "Corresponding Source" for a work in object code form means all +the source code needed to generate, install, and (for an executable +work) run the object code and to modify the work, including scripts to +control those activities. However, it does not include the work's +System Libraries, or general-purpose tools or generally available free +programs which are used unmodified in performing those activities but +which are not part of the work. For example, Corresponding Source +includes interface definition files associated with source files for +the work, and the source code for shared libraries and dynamically +linked subprograms that the work is specifically designed to require, +such as by intimate data communication or control flow between those +subprograms and other parts of the work. + + The Corresponding Source need not include anything that users +can regenerate automatically from other parts of the Corresponding +Source. + + The Corresponding Source for a work in source code form is that +same work. + + 2. Basic Permissions. + + All rights granted under this License are granted for the term of +copyright on the Program, and are irrevocable provided the stated +conditions are met. This License explicitly affirms your unlimited +permission to run the unmodified Program. The output from running a +covered work is covered by this License only if the output, given its +content, constitutes a covered work. This License acknowledges your +rights of fair use or other equivalent, as provided by copyright law. + + You may make, run and propagate covered works that you do not +convey, without conditions so long as your license otherwise remains +in force. You may convey covered works to others for the sole purpose +of having them make modifications exclusively for you, or provide you +with facilities for running those works, provided that you comply with +the terms of this License in conveying all material for which you do +not control copyright. Those thus making or running the covered works +for you must do so exclusively on your behalf, under your direction +and control, on terms that prohibit them from making any copies of +your copyrighted material outside their relationship with you. + + Conveying under any other circumstances is permitted solely under +the conditions stated below. Sublicensing is not allowed; section 10 +makes it unnecessary. + + 3. Protecting Users' Legal Rights From Anti-Circumvention Law. + + No covered work shall be deemed part of an effective technological +measure under any applicable law fulfilling obligations under article +11 of the WIPO copyright treaty adopted on 20 December 1996, or +similar laws prohibiting or restricting circumvention of such +measures. + + When you convey a covered work, you waive any legal power to forbid +circumvention of technological measures to the extent such circumvention +is effected by exercising rights under this License with respect to +the covered work, and you disclaim any intention to limit operation or +modification of the work as a means of enforcing, against the work's +users, your or third parties' legal rights to forbid circumvention of +technological measures. + + 4. Conveying Verbatim Copies. + + You may convey verbatim copies of the Program's source code as you +receive it, in any medium, provided that you conspicuously and +appropriately publish on each copy an appropriate copyright notice; +keep intact all notices stating that this License and any +non-permissive terms added in accord with section 7 apply to the code; +keep intact all notices of the absence of any warranty; and give all +recipients a copy of this License along with the Program. + + You may charge any price or no price for each copy that you convey, +and you may offer support or warranty protection for a fee. + + 5. Conveying Modified Source Versions. + + You may convey a work based on the Program, or the modifications to +produce it from the Program, in the form of source code under the +terms of section 4, provided that you also meet all of these conditions: + + a) The work must carry prominent notices stating that you modified + it, and giving a relevant date. + + b) The work must carry prominent notices stating that it is + released under this License and any conditions added under section + 7. This requirement modifies the requirement in section 4 to + "keep intact all notices". + + c) You must license the entire work, as a whole, under this + License to anyone who comes into possession of a copy. This + License will therefore apply, along with any applicable section 7 + additional terms, to the whole of the work, and all its parts, + regardless of how they are packaged. This License gives no + permission to license the work in any other way, but it does not + invalidate such permission if you have separately received it. + + d) If the work has interactive user interfaces, each must display + Appropriate Legal Notices; however, if the Program has interactive + interfaces that do not display Appropriate Legal Notices, your + work need not make them do so. + + A compilation of a covered work with other separate and independent +works, which are not by their nature extensions of the covered work, +and which are not combined with it such as to form a larger program, +in or on a volume of a storage or distribution medium, is called an +"aggregate" if the compilation and its resulting copyright are not +used to limit the access or legal rights of the compilation's users +beyond what the individual works permit. Inclusion of a covered work +in an aggregate does not cause this License to apply to the other +parts of the aggregate. + + 6. Conveying Non-Source Forms. + + You may convey a covered work in object code form under the terms +of sections 4 and 5, provided that you also convey the +machine-readable Corresponding Source under the terms of this License, +in one of these ways: + + a) Convey the object code in, or embodied in, a physical product + (including a physical distribution medium), accompanied by the + Corresponding Source fixed on a durable physical medium + customarily used for software interchange. + + b) Convey the object code in, or embodied in, a physical product + (including a physical distribution medium), accompanied by a + written offer, valid for at least three years and valid for as + long as you offer spare parts or customer support for that product + model, to give anyone who possesses the object code either (1) a + copy of the Corresponding Source for all the software in the + product that is covered by this License, on a durable physical + medium customarily used for software interchange, for a price no + more than your reasonable cost of physically performing this + conveying of source, or (2) access to copy the + Corresponding Source from a network server at no charge. + + c) Convey individual copies of the object code with a copy of the + written offer to provide the Corresponding Source. This + alternative is allowed only occasionally and noncommercially, and + only if you received the object code with such an offer, in accord + with subsection 6b. + + d) Convey the object code by offering access from a designated + place (gratis or for a charge), and offer equivalent access to the + Corresponding Source in the same way through the same place at no + further charge. You need not require recipients to copy the + Corresponding Source along with the object code. If the place to + copy the object code is a network server, the Corresponding Source + may be on a different server (operated by you or a third party) + that supports equivalent copying facilities, provided you maintain + clear directions next to the object code saying where to find the + Corresponding Source. Regardless of what server hosts the + Corresponding Source, you remain obligated to ensure that it is + available for as long as needed to satisfy these requirements. + + e) Convey the object code using peer-to-peer transmission, provided + you inform other peers where the object code and Corresponding + Source of the work are being offered to the general public at no + charge under subsection 6d. + + A separable portion of the object code, whose source code is excluded +from the Corresponding Source as a System Library, need not be +included in conveying the object code work. + + A "User Product" is either (1) a "consumer product", which means any +tangible personal property which is normally used for personal, family, +or household purposes, or (2) anything designed or sold for incorporation +into a dwelling. In determining whether a product is a consumer product, +doubtful cases shall be resolved in favor of coverage. For a particular +product received by a particular user, "normally used" refers to a +typical or common use of that class of product, regardless of the status +of the particular user or of the way in which the particular user +actually uses, or expects or is expected to use, the product. A product +is a consumer product regardless of whether the product has substantial +commercial, industrial or non-consumer uses, unless such uses represent +the only significant mode of use of the product. + + "Installation Information" for a User Product means any methods, +procedures, authorization keys, or other information required to install +and execute modified versions of a covered work in that User Product from +a modified version of its Corresponding Source. The information must +suffice to ensure that the continued functioning of the modified object +code is in no case prevented or interfered with solely because +modification has been made. + + If you convey an object code work under this section in, or with, or +specifically for use in, a User Product, and the conveying occurs as +part of a transaction in which the right of possession and use of the +User Product is transferred to the recipient in perpetuity or for a +fixed term (regardless of how the transaction is characterized), the +Corresponding Source conveyed under this section must be accompanied +by the Installation Information. But this requirement does not apply +if neither you nor any third party retains the ability to install +modified object code on the User Product (for example, the work has +been installed in ROM). + + The requirement to provide Installation Information does not include a +requirement to continue to provide support service, warranty, or updates +for a work that has been modified or installed by the recipient, or for +the User Product in which it has been modified or installed. Access to a +network may be denied when the modification itself materially and +adversely affects the operation of the network or violates the rules and +protocols for communication across the network. + + Corresponding Source conveyed, and Installation Information provided, +in accord with this section must be in a format that is publicly +documented (and with an implementation available to the public in +source code form), and must require no special password or key for +unpacking, reading or copying. + + 7. Additional Terms. + + "Additional permissions" are terms that supplement the terms of this +License by making exceptions from one or more of its conditions. +Additional permissions that are applicable to the entire Program shall +be treated as though they were included in this License, to the extent +that they are valid under applicable law. If additional permissions +apply only to part of the Program, that part may be used separately +under those permissions, but the entire Program remains governed by +this License without regard to the additional permissions. + + When you convey a copy of a covered work, you may at your option +remove any additional permissions from that copy, or from any part of +it. (Additional permissions may be written to require their own +removal in certain cases when you modify the work.) You may place +additional permissions on material, added by you to a covered work, +for which you have or can give appropriate copyright permission. + + Notwithstanding any other provision of this License, for material you +add to a covered work, you may (if authorized by the copyright holders of +that material) supplement the terms of this License with terms: + + a) Disclaiming warranty or limiting liability differently from the + terms of sections 15 and 16 of this License; or + + b) Requiring preservation of specified reasonable legal notices or + author attributions in that material or in the Appropriate Legal + Notices displayed by works containing it; or + + c) Prohibiting misrepresentation of the origin of that material, or + requiring that modified versions of such material be marked in + reasonable ways as different from the original version; or + + d) Limiting the use for publicity purposes of names of licensors or + authors of the material; or + + e) Declining to grant rights under trademark law for use of some + trade names, trademarks, or service marks; or + + f) Requiring indemnification of licensors and authors of that + material by anyone who conveys the material (or modified versions of + it) with contractual assumptions of liability to the recipient, for + any liability that these contractual assumptions directly impose on + those licensors and authors. + + All other non-permissive additional terms are considered "further +restrictions" within the meaning of section 10. If the Program as you +received it, or any part of it, contains a notice stating that it is +governed by this License along with a term that is a further +restriction, you may remove that term. If a license document contains +a further restriction but permits relicensing or conveying under this +License, you may add to a covered work material governed by the terms +of that license document, provided that the further restriction does +not survive such relicensing or conveying. + + If you add terms to a covered work in accord with this section, you +must place, in the relevant source files, a statement of the +additional terms that apply to those files, or a notice indicating +where to find the applicable terms. + + Additional terms, permissive or non-permissive, may be stated in the +form of a separately written license, or stated as exceptions; +the above requirements apply either way. + + 8. Termination. + + You may not propagate or modify a covered work except as expressly +provided under this License. Any attempt otherwise to propagate or +modify it is void, and will automatically terminate your rights under +this License (including any patent licenses granted under the third +paragraph of section 11). + + However, if you cease all violation of this License, then your +license from a particular copyright holder is reinstated (a) +provisionally, unless and until the copyright holder explicitly and +finally terminates your license, and (b) permanently, if the copyright +holder fails to notify you of the violation by some reasonable means +prior to 60 days after the cessation. + + Moreover, your license from a particular copyright holder is +reinstated permanently if the copyright holder notifies you of the +violation by some reasonable means, this is the first time you have +received notice of violation of this License (for any work) from that +copyright holder, and you cure the violation prior to 30 days after +your receipt of the notice. + + Termination of your rights under this section does not terminate the +licenses of parties who have received copies or rights from you under +this License. If your rights have been terminated and not permanently +reinstated, you do not qualify to receive new licenses for the same +material under section 10. + + 9. Acceptance Not Required for Having Copies. + + You are not required to accept this License in order to receive or +run a copy of the Program. Ancillary propagation of a covered work +occurring solely as a consequence of using peer-to-peer transmission +to receive a copy likewise does not require acceptance. However, +nothing other than this License grants you permission to propagate or +modify any covered work. These actions infringe copyright if you do +not accept this License. Therefore, by modifying or propagating a +covered work, you indicate your acceptance of this License to do so. + + 10. Automatic Licensing of Downstream Recipients. + + Each time you convey a covered work, the recipient automatically +receives a license from the original licensors, to run, modify and +propagate that work, subject to this License. You are not responsible +for enforcing compliance by third parties with this License. + + An "entity transaction" is a transaction transferring control of an +organization, or substantially all assets of one, or subdividing an +organization, or merging organizations. If propagation of a covered +work results from an entity transaction, each party to that +transaction who receives a copy of the work also receives whatever +licenses to the work the party's predecessor in interest had or could +give under the previous paragraph, plus a right to possession of the +Corresponding Source of the work from the predecessor in interest, if +the predecessor has it or can get it with reasonable efforts. + + You may not impose any further restrictions on the exercise of the +rights granted or affirmed under this License. For example, you may +not impose a license fee, royalty, or other charge for exercise of +rights granted under this License, and you may not initiate litigation +(including a cross-claim or counterclaim in a lawsuit) alleging that +any patent claim is infringed by making, using, selling, offering for +sale, or importing the Program or any portion of it. + + 11. Patents. + + A "contributor" is a copyright holder who authorizes use under this +License of the Program or a work on which the Program is based. The +work thus licensed is called the contributor's "contributor version". + + A contributor's "essential patent claims" are all patent claims +owned or controlled by the contributor, whether already acquired or +hereafter acquired, that would be infringed by some manner, permitted +by this License, of making, using, or selling its contributor version, +but do not include claims that would be infringed only as a +consequence of further modification of the contributor version. For +purposes of this definition, "control" includes the right to grant +patent sublicenses in a manner consistent with the requirements of +this License. + + Each contributor grants you a non-exclusive, worldwide, royalty-free +patent license under the contributor's essential patent claims, to +make, use, sell, offer for sale, import and otherwise run, modify and +propagate the contents of its contributor version. + + In the following three paragraphs, a "patent license" is any express +agreement or commitment, however denominated, not to enforce a patent +(such as an express permission to practice a patent or covenant not to +sue for patent infringement). To "grant" such a patent license to a +party means to make such an agreement or commitment not to enforce a +patent against the party. + + If you convey a covered work, knowingly relying on a patent license, +and the Corresponding Source of the work is not available for anyone +to copy, free of charge and under the terms of this License, through a +publicly available network server or other readily accessible means, +then you must either (1) cause the Corresponding Source to be so +available, or (2) arrange to deprive yourself of the benefit of the +patent license for this particular work, or (3) arrange, in a manner +consistent with the requirements of this License, to extend the patent +license to downstream recipients. "Knowingly relying" means you have +actual knowledge that, but for the patent license, your conveying the +covered work in a country, or your recipient's use of the covered work +in a country, would infringe one or more identifiable patents in that +country that you have reason to believe are valid. + + If, pursuant to or in connection with a single transaction or +arrangement, you convey, or propagate by procuring conveyance of, a +covered work, and grant a patent license to some of the parties +receiving the covered work authorizing them to use, propagate, modify +or convey a specific copy of the covered work, then the patent license +you grant is automatically extended to all recipients of the covered +work and works based on it. + + A patent license is "discriminatory" if it does not include within +the scope of its coverage, prohibits the exercise of, or is +conditioned on the non-exercise of one or more of the rights that are +specifically granted under this License. You may not convey a covered +work if you are a party to an arrangement with a third party that is +in the business of distributing software, under which you make payment +to the third party based on the extent of your activity of conveying +the work, and under which the third party grants, to any of the +parties who would receive the covered work from you, a discriminatory +patent license (a) in connection with copies of the covered work +conveyed by you (or copies made from those copies), or (b) primarily +for and in connection with specific products or compilations that +contain the covered work, unless you entered into that arrangement, +or that patent license was granted, prior to 28 March 2007. + + Nothing in this License shall be construed as excluding or limiting +any implied license or other defenses to infringement that may +otherwise be available to you under applicable patent law. + + 12. No Surrender of Others' Freedom. + + If conditions are imposed on you (whether by court order, agreement or +otherwise) that contradict the conditions of this License, they do not +excuse you from the conditions of this License. If you cannot convey a +covered work so as to satisfy simultaneously your obligations under this +License and any other pertinent obligations, then as a consequence you may +not convey it at all. For example, if you agree to terms that obligate you +to collect a royalty for further conveying from those to whom you convey +the Program, the only way you could satisfy both those terms and this +License would be to refrain entirely from conveying the Program. + + 13. Use with the GNU Affero General Public License. + + Notwithstanding any other provision of this License, you have +permission to link or combine any covered work with a work licensed +under version 3 of the GNU Affero General Public License into a single +combined work, and to convey the resulting work. The terms of this +License will continue to apply to the part which is the covered work, +but the special requirements of the GNU Affero General Public License, +section 13, concerning interaction through a network will apply to the +combination as such. + + 14. Revised Versions of this License. + + The Free Software Foundation may publish revised and/or new versions of +the GNU General Public License from time to time. Such new versions will +be similar in spirit to the present version, but may differ in detail to +address new problems or concerns. + + Each version is given a distinguishing version number. If the +Program specifies that a certain numbered version of the GNU General +Public License "or any later version" applies to it, you have the +option of following the terms and conditions either of that numbered +version or of any later version published by the Free Software +Foundation. If the Program does not specify a version number of the +GNU General Public License, you may choose any version ever published +by the Free Software Foundation. + + If the Program specifies that a proxy can decide which future +versions of the GNU General Public License can be used, that proxy's +public statement of acceptance of a version permanently authorizes you +to choose that version for the Program. + + Later license versions may give you additional or different +permissions. However, no additional obligations are imposed on any +author or copyright holder as a result of your choosing to follow a +later version. + + 15. Disclaimer of Warranty. + + THERE IS NO WARRANTY FOR THE PROGRAM, TO THE EXTENT PERMITTED BY +APPLICABLE LAW. EXCEPT WHEN OTHERWISE STATED IN WRITING THE COPYRIGHT +HOLDERS AND/OR OTHER PARTIES PROVIDE THE PROGRAM "AS IS" WITHOUT WARRANTY +OF ANY KIND, EITHER EXPRESSED OR IMPLIED, INCLUDING, BUT NOT LIMITED TO, +THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR +PURPOSE. THE ENTIRE RISK AS TO THE QUALITY AND PERFORMANCE OF THE PROGRAM +IS WITH YOU. SHOULD THE PROGRAM PROVE DEFECTIVE, YOU ASSUME THE COST OF +ALL NECESSARY SERVICING, REPAIR OR CORRECTION. + + 16. Limitation of Liability. + + IN NO EVENT UNLESS REQUIRED BY APPLICABLE LAW OR AGREED TO IN WRITING +WILL ANY COPYRIGHT HOLDER, OR ANY OTHER PARTY WHO MODIFIES AND/OR CONVEYS +THE PROGRAM AS PERMITTED ABOVE, BE LIABLE TO YOU FOR DAMAGES, INCLUDING ANY +GENERAL, SPECIAL, INCIDENTAL OR CONSEQUENTIAL DAMAGES ARISING OUT OF THE +USE OR INABILITY TO USE THE PROGRAM (INCLUDING BUT NOT LIMITED TO LOSS OF +DATA OR DATA BEING RENDERED INACCURATE OR LOSSES SUSTAINED BY YOU OR THIRD +PARTIES OR A FAILURE OF THE PROGRAM TO OPERATE WITH ANY OTHER PROGRAMS), +EVEN IF SUCH HOLDER OR OTHER PARTY HAS BEEN ADVISED OF THE POSSIBILITY OF +SUCH DAMAGES. + + 17. Interpretation of Sections 15 and 16. + + If the disclaimer of warranty and limitation of liability provided +above cannot be given local legal effect according to their terms, +reviewing courts shall apply local law that most closely approximates +an absolute waiver of all civil liability in connection with the +Program, unless a warranty or assumption of liability accompanies a +copy of the Program in return for a fee. + + END OF TERMS AND CONDITIONS + + How to Apply These Terms to Your New Programs + + If you develop a new program, and you want it to be of the greatest +possible use to the public, the best way to achieve this is to make it +free software which everyone can redistribute and change under these terms. + + To do so, attach the following notices to the program. It is safest +to attach them to the start of each source file to most effectively +state the exclusion of warranty; and each file should have at least +the "copyright" line and a pointer to where the full notice is found. + + {one line to give the program's name and a brief idea of what it does.} + Copyright (C) {year} {name of author} + + This program is free software: you can redistribute it and/or modify + it under the terms of the GNU General Public License as published by + the Free Software Foundation, either version 3 of the License, or + (at your option) any later version. + + This program is distributed in the hope that it will be useful, + but WITHOUT ANY WARRANTY; without even the implied warranty of + MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the + GNU General Public License for more details. + + You should have received a copy of the GNU General Public License + along with this program. If not, see <http://www.gnu.org/licenses/>. + +Also add information on how to contact you by electronic and paper mail. + + If the program does terminal interaction, make it output a short +notice like this when it starts in an interactive mode: + + {project} Copyright (C) {year} {fullname} + This program comes with ABSOLUTELY NO WARRANTY; for details type `show w'. + This is free software, and you are welcome to redistribute it + under certain conditions; type `show c' for details. + +The hypothetical commands `show w' and `show c' should show the appropriate +parts of the General Public License. Of course, your program's commands +might be different; for a GUI interface, you would use an "about box". + + You should also get your employer (if you work as a programmer) or school, +if any, to sign a "copyright disclaimer" for the program, if necessary. +For more information on this, and how to apply and follow the GNU GPL, see +<http://www.gnu.org/licenses/>. + + The GNU General Public License does not permit incorporating your program +into proprietary programs. If your program is a subroutine library, you +may consider it more useful to permit linking proprietary applications with +the library. If this is what you want to do, use the GNU Lesser General +Public License instead of this License. But first, please read +<http://www.gnu.org/philosophy/why-not-lgpl.html>. ++ +
sperrorest
is a generic framework which aims to work with all R models/packages. In statistical learning, model setups, their formulas and error measures all depend on the family of the response variable. Various families exist (numeric, binary, multiclass) which again include sub-families (e.g. gaussian or poisson distribution of a numeric response).
This detail needs to be specified via the respective function, e.g. when using glm()
with a binary response, one needs to set family = "binomial"
to make sure that the model does something meaningful. Most of the time, the same applies to the generic predict()
function. For the glm()
case, one would need to set type = "response"
if the predicted values should reflect probabilities instead of log-odds.
These settings can be specified using model_args
and pred_args
in sperrorest()
. So fine, “why do we need to write all these wrappers and custom model/predict functions then?!”
model_fun
expects at least formula argument and a data.frame with the learning sample. All arguments, including the additional ones provided via model_args
, are getting passed to model_fun
via a do.call()
call. However, if model_fun
does not have an argument named formula
but e.g. fixed
(like it is the case for glmmPQL()
) the do.call()
call will fail because sperrorest()
tries to pass an argument named formula
but glmmPQL
expects an argument named fixed
.
In this case, we need to write a wrapper function for glmmPQL
(named glmmPQL_modelfun
here) which accounts for this naming problem. Here, we are passing the formula
argument to our custom model function which then does the actual call to glmmPQL()
using the supplied formula
object as the fixed
argument of glmmPQL
.
+By default, glmmPQL()
has further arguments like family
or random
. If we want to use these, we pass them to model_args
which then appends these to the arguments of glmmPQL_modelfun
.
glmmPQL_modelfun <- function(formula = NULL, data = NULL, random = NULL,
+ family = NULL) {
+ fit <- glmmPQL(fixed = formula, data = data, random = random, family = family)
+ return(fit)
+}
Unless specified explicitly, sperrorest()
tries to use the generic predict()
function. This function works differently depending on the class of the provided fitted model, i.e. many models slightly differ in the naming (and availability) of their arguments. For example, when fitting a Support Vector Machine (SVM) with a binary response variable, package kernlab
expects an argument type = "probabilities"
in its predict()
call to receive predicted probabilities while in package e1071
it is "probability = TRUE"
. Similar to model_args
, this can be accounted for in the pred_args
of sperrorest()
.
However, sperrorest()
expects that the predicted values (of any response type) are stored directly in the returned object of the predict()
function. While this is the case for many models, mainly with a numeric response, classification cases often behave differently. Here, the predicted values (classes in this case) are often stored in a sub-object named class
or predicted
.
Since there is no way to account for this in a general way (when every package may return the predicted values in a different format/column), we need to account for it by providing a custom predict function which returns only the predicted values so that sperrorest()
can continue properly. This time we are showing two examples. The first takes again a binary classification using randomForest
.
When calling predict on a fitted randomForest
model with a binary response variable, the predicted values are actually stored in the resulting object returned by predict()
(here called pred
). So why do we have trouble here then?
+Simply because pred
is a matrix containing both probabilities for the FALSE
(= 0) and TRUE
(= 1) case. sperrorest()
needs a vector containing only the predicted values of the TRUE
case to pass these further onto err_fun()
which then takes care of calculating all the error measures. So the important part is to subset the resulting matrix in the pred
object to TRUE
cases only and return the result.
rf_predfun <- function(object = NULL, newdata = NULL, type = NULL) {
+ pred <- predict(object = object, newdata = newdata, type = type)
+ pred <- pred[, 2]
+}
The same case (binary response) using svm
from the e1071
package. Here, the predicted probabilities are stored in a sub-object of pred
. We can address it using the attr()
function. Then again, we only need the TRUE
cases for sperrorest()
.
svm_predfun <- function(object = NULL, newdata = NULL, probability = NULL) {
+ pred <- predict(object, newdata = newdata, probability = TRUE)
+ pred <- attr(pred, "probabilities")[, 2]
+}
sperrorest
• sperrorestsperrorest
+sperrorest
is parallelized by default from v2.0.0 and higher.
Most users are not familiar with parallelization and have no time/motivation to wrap their head around it. Instead, they just accept to wait “a bit” longer until the process finishes.
+While this is no problem for “quick” cross-validation (CV) cases with a low number of repetitions and models which converge quickly, in some cases processing may take up to several months. For example, running a spatial cross-validation using a Generalized Linear Mixed Model (GLMM) with both random effects and a spatial autocorrelation structure on around 1000 observations takes roughly this time, if executed sequentially. Most of the fitting time hereby is devoted to the integration of the spatial autocorrelation structure.
+sperrorest
comes with four different parallelization modes and also offers sequential execution.
Unless specified otherwise, all cores of the machine are used. Limiting the number of cores makes sense in cases when you want to do other work on your machine while running a cross-validation so that your system stays responsive. Also, if you are working on a server and have, let’s say, 48 cores available and want to do a 100 repetition CV. Since most models take roughly the same time to fit, it would be smart to use 34 cores. Taking this number of cores is faster than using 48 because
+You need 3 iterations (34 in the first, 68 in the second and finishing in the 3rd) to process all repetitions. During the third iteration, a lot of cores would do nothing else but just wait for the others to finish.
The parallelization overhead, which is mainly caused by splitting and combining all jobs to the workers, would be higher for the case with 48 cores than for 34 cores. Hence, 34 cores will finish faster than 48 cores on 100 repetitions. Of course, when taking 50 cores it would only need 2 worker iterations to process everything which would again speed up the process.
future
backendAll modes expect "apply"
(including the sequential one) are running on the parallel API of the future
package. It offers a unified, cross-platform API combining all other existing parallel approaches of R into one package. Besides the variety of parallel options to choose from (multiprocess
, multisession
, multicore
, cluster
, etc.) it also provides a sequential
option. Every options is initiated in the same way:
library(future)
+registerDoFuture()
+
+plan("sequential") # sequential
+plan("multicore") # parallel (Unix only)
+plan("multisession") # parallel
+plan("multiprocess") # parallel
+plan("cluster") # parallel
Every option has its advantages and disadvantages. Check the future
package vignettes for more information.
Unless specified otherwise, the default parallel mode uses foreach
with the "cluster"
option of the future
package. Package doFuture
takes care that foreach
works with the parallel initialization of the future
package.
This option is taken as default because it works cross-platform and provides progress output to the console. Unfortunately, on Windows this output is not shown to the console but needs to be written to a file (default to the current working directory). Another downside is that the global environment needs to copied to every worker before processing starts. Workers are started sequentially and therefore the startup of > 10 workers may take some seconds.
+This mode is also cross-platform but uses different functions on Unix/non-Unix systems for actual processing. On Unix, it uses the pbmcapply
package which combines the pbapply
package (provides progress bar for ‘apply’ functions) and the future
package to speed up processing. On Windows, pbapply
is used which in the end uses parApply()
to setup a cluster like parallelization including a progress bar.
This modes entirely uses the future
package in combination with future_lapply()
as the working horse. It can be used with any future
plan specified via par_option
. It is the fastest mode but provides no progress output.
This mode executes sperrorest()
sequentially. It also runs on the future
API using foreach
/doFuture
which provide the possibility of sequential execution using plan("sequential")
.
Example setup:
+partition_cv
)glm
+Note that the only argument which needs to be changed is par_mode
here. Subsequently, par_mode = "foreach"
, par_mode = "apply"
and par_mode = "future"
were used.
All default settings of each mode were used. par_mode = "foreach"
runs on plan("cluster")
while par_mode = "future"
runs on plan("multiprocess")
. Mode "apply"
used pbmcapply
in the end since the test was running on a Unix System.
data(ecuador)
+fo <- slides ~ dem + slope + hcurv + vcurv + log.carea + cslope
+
+sperrorest(data = ecuador, formula = fo,
+ model_fun = glm, model_args = list(family = "binomial"),
+ pred_args = list(type = "response"),
+ smp_fun = partition_cv,
+ smp_args = list(repetition = 1:100, nfold = 5),
+ par_args = list(par_mode = "foreach", par_units = 20),
+ benchmark = TRUE, progress = FALSE,
+ importance = TRUE, imp_permutations = 100)
+ | foreach | +apply | +future | ++ |
---|---|---|---|---|
runtime (min) | +52.33 | +51.67 | +49.54 | ++ |
Geospatial data scientists often make use of a variety of statistical and machine learning techniques for spatial prediction in applications such as landslide susceptibility modeling (Goetz et al. 2015) or habitat modeling (Knudby, Brenning, and LeDrew 2010). Novel and often more flexible techniques promise improved predictive performances as they are better able to represent nonlinear relationships or higher-order interactions between predictors than less flexible linear models.
+Nevertheless, this increased flexibility comes with the risk of possible over-fitting to the training data. Since nearby spatial observations often tend to be more similar than distant ones, traditional random cross-validation is unable to detect this over-fitting whenever spatial observations are close to each other (e.g. Brenning (2005)). Spatial cross-validation addresses this by resampling the data not completely randomly, but using larger spatial regions. In some cases, spatial data is grouped, e.g. in remotely-sensed land use mapping grid cells belonging to the same field share the same management procedures and cultivation history, making them more similar to each other than to pixels from other fields with the same crop type.
+This package provides a customizable toolkit for cross-validation (and bootstrap) estimation using a variety of spatial resampling schemes. More so, this toolkit can even be extended to spatio-temporal data or other complex data structures. This vignette will walk you through a simple case study, crop classification in central Chile (Peña and Brenning 2015).
+This vignette is based on code that Alex Brenning developed for his course on ‘Environmental Statistics and GeoComputation’ that he teaches in the Geographic Information Science Master’s program at Friedrich Schiller University Jena, Germany. Please take a look at our program and spread the word!
+As a case study we will carry out a supervised classification analysis using remotely-sensed data to predict fruit-tree crop types in central Chile. This data set is a subsample of data from (Peña and Brenning 2015).
+library(pacman)
+p_load(sperrorest)
data("maipo", package = "sperrorest")
The remote-sensing predictor variables were derived from an image times series consisting of eight Landsat images acquired throughout the (southern hemisphere) growing season. The data set includes the following variables:
+Response
+- croptype
: response variable (factor) with 4 levels: ground truth information
Predictors
+- b
[12-87]: spectral data, e.g. b82 = image date #8, spectral band #2
+- ndvi
[01-08]: Normalized Difference Vegetation Index, e.g. #8 = image date #8
+- ndwi
[01-08]: Normalized Difference Water Index, e.g. #8 = image date #8
Others
+- field
: field identifier (grouping variable - not to be used as predictor)
+- utmx
, utmy
: x/y location; not to be used as predictors
All but the first four variables of the data set are predictors; their names are used to construct a formula object:
+predictors <- colnames(maipo)[5:ncol(maipo)]
+# Construct a formula:
+fo <- as.formula(paste("croptype ~", paste(predictors, collapse = "+")))
Here we will take a look at a few classification methods with varying degrees of computational complexity and flexibility. This should give you an idea of how different models are handled by sperrorest
, depending on the characteristics of their fitting and prediction methods. Please refer to (James et al. 2013) for background information on the models used here.
LDA is simple and fast, and often performs surprisingly well if the problem at hand is ‘linear enough’. As a start, let’s fit a model with all predictors and using all available data:
+p_load(MASS)
+fit <- lda(fo, data = maipo)
Predict the croptype with the fitted model and calculate the misclassification error rate (MER) on the training sample:
+pred <- predict(fit, newdata = maipo)$class
+mean(pred != maipo$croptype)
## [1] 0.0437
+But remember that this result is over-optimistic because we are re-using the training sample for model evaluation. We will soon show you how to do better with cross-validation.
+We can also take a look at the confusion matrix but again, this result is overly optimistic:
+table(pred = pred, obs = maipo$croptype)
## obs
+## pred crop1 crop2 crop3 crop4
+## crop1 1294 8 4 37
+## crop2 50 1054 4 44
+## crop3 0 0 1935 6
+## crop4 45 110 29 3093
+Classification and regression trees (CART) take a completely different approach—they are based on yes/no questions in the predictor variables and can be referred to as a binary partitioning technique. Fit a model with all predictors and default settings:
+p_load(rpart)
fit <- rpart(fo, data = maipo)
+
+## optional: view the classiciation tree
+# par(xpd = TRUE)
+# plot(fit)
+# text(fit, use.n = TRUE)
Again, predict the croptype with the fitted model and calculate the average MER:
+pred <- predict(fit, newdata = maipo, type = "class")
+mean(pred != maipo$croptype)
## [1] 0.113
+Here the predict
call is slightly different. Again, we could calculate a confusion matrix.
table(pred = pred, obs = maipo$croptype)
## obs
+## pred crop1 crop2 crop3 crop4
+## crop1 1204 66 0 54
+## crop2 47 871 38 123
+## crop3 38 8 1818 53
+## crop4 100 227 116 2950
+Bagging, bundling and random forests build upon the CART technique by fitting many trees on bootstrap resamples of the original data set (Breiman 1996) (Breiman 2001) (Hothorn and Lausen 2005). They differ in that random forest also samples from the predictors, and bundling adds an ancillary classifier for improved classification. We will use the nowadays widely used randomForest()
here.
p_load(randomForest)
fit <- randomForest(fo, data = maipo, coob = TRUE)
+fit
##
+## Call:
+## randomForest(formula = fo, data = maipo, coob = TRUE)
+## Type of random forest: classification
+## Number of trees: 500
+## No. of variables tried at each split: 8
+##
+## OOB estimate of error rate: 0.57%
+## Confusion matrix:
+## crop1 crop2 crop3 crop4 class.error
+## crop1 1382 2 0 5 0.00504
+## crop2 1 1163 0 8 0.00768
+## crop3 0 0 1959 13 0.00659
+## crop4 7 5 3 3165 0.00472
+Let’s take a look at the MER achieved on the training sample:
+pred <- predict(fit, newdata = maipo, type = "class")
+mean(pred != maipo$croptype)
## [1] 0
+table(pred = pred, obs = maipo$croptype)
## obs
+## pred crop1 crop2 crop3 crop4
+## crop1 1389 0 0 0
+## crop2 0 1172 0 0
+## crop3 0 0 1972 0
+## crop4 0 0 0 3180
+Isn’t this amazing? Only one grid cell is misclassified by the bagging classifier! Even the OOB (out-of-bag) estimate of the error rate is < 1%.
+Too good to be true? We’ll see…
Of course we can’t take the MER on the training set too seriously—it is biased. But we’ve heard of cross-validation, in which disjoint subsets are used for model training and testing. Let’s use sperrorest
for cross-validation.
Also, at this point we should highlight that the observations in this data set are pixels, and multiple grid cells belong to the same field. In a predictive situation, and when field boundaries are known (as is the case here), we would want to predict the same class for all grid cells that belong to the same field. Here we will use a majority filter. This filter ensures that the final predicted class type of every field is the most often predicted croptype within one field.
+First, we need to create a wrapper predict method for LDA for sperrorest()
. This is necessary in order to accomodate the majority filter, and also because class predictions from lda
’s predict method are hidden in the $class
component of the returned object.
lda_predfun <- function(object, newdata, fac = NULL) {
+
+ p_load(nnet)
+ majority <- function(x) {
+ levels(x)[which.is.max(table(x))]
+ }
+
+ majority_filter <- function(x, fac) {
+ for (lev in levels(fac)) {
+ x[fac == lev] <- majority(x[fac == lev])
+ }
+ x
+ }
+
+ pred <- predict(object, newdata = newdata)$class
+ if (!is.null(fac)) pred <- majority_filter(pred, newdata[, fac])
+ return(pred)
+}
To ensure that custom predict-functions will work with sperrorest()
, we need to wrap all custom functions in one single function. Otherwise, sperrorest()
might fail during execution.
Finally, we can run sperrorest()
with a non-spatial sampling setting (partition_cv()
). In this example we use a ‘100 repetitions - 5 folds’ setup to reduce the influence of random partitioning.
res_lda_nsp <- sperrorest(fo, data = maipo, coords = c("utmx","utmy"),
+ model_fun = lda,
+ pred_fun = lda_predfun,
+ pred_args = list(fac = "field"),
+ smp_fun = partition_cv,
+ smp_args = list(repetition = 1:100, nfold = 5),
+ error_rep = TRUE, error_fold = TRUE,
+ progress = FALSE)
summary(res_lda_nsp$error_rep)
## mean sd median IQR
+## train_error 3.40e-02 0.001 3.40e-02 0.001
+## train_accuracy 9.66e-01 0.001 9.66e-01 0.001
+## train_events 4.69e+03 0.000 4.69e+03 0.000
+## train_count 3.09e+04 0.000 3.09e+04 0.000
+## test_error 4.00e-02 0.002 4.00e-02 0.002
+## test_accuracy 9.60e-01 0.002 9.60e-01 0.002
+## test_events 1.17e+03 0.000 1.17e+03 0.000
+## test_count 7.71e+03 0.000 7.71e+03 0.000
+To run a spatial cross-validation at the field level, we can use partition_factor_cv()
as the sampling function. Since we are using 5 folds, we get a coarse 80/20 split of our data. 80% will be used for training, 20% for testing our trained model.
To take a look where our training and tests sets will be partitioned on each fold, we can plot them. The red colored points represent the test set in each fold, the black colored points the training set. Note that because we plotted over 7000 points, overplotting occurs and since the red crosses are plotted after the black ones, it seems visually that way more than ~20% of red points exist than it is really the case.
+resamp <- partition_factor_cv(maipo, nfold = 5, repetition = 1:1, fac = "field")
+plot(resamp, maipo, coords = c("utmx","utmy"))
Subsequently, we have to specify the location of the fields (fac = "field"
) in the prediction arguments (pred_args
) and sampling arguments (smp_args
) in sperrorest()
.
res_lda_sp <- sperrorest(fo, data = maipo, coords = c("utmx","utmy"),
+ model_fun = lda,
+ pred_fun = lda_predfun,
+ pred_args = list(fac = "field"),
+ smp_fun = partition_factor_cv,
+ smp_args = list(fac = "field", repetition = 1:50, nfold = 5),
+ error_rep = TRUE, error_fold = TRUE,
+ benchmark = TRUE, progress = FALSE)
+res_lda_sp$benchmark$runtime_performance
summary(res_lda_sp$error_rep)
## mean sd median IQR
+## train_error 2.95e-02 0.00177 2.97e-02 0.00261
+## train_accuracy 9.70e-01 0.00177 9.70e-01 0.00261
+## train_events 4.69e+03 0.00000 4.69e+03 0.00000
+## train_count 3.09e+04 0.00000 3.09e+04 0.00000
+## test_error 6.65e-02 0.00807 6.59e-02 0.01083
+## test_accuracy 9.33e-01 0.00807 9.34e-01 0.01083
+## test_events 1.17e+03 0.00000 1.17e+03 0.00000
+## test_count 7.71e+03 0.00000 7.71e+03 0.00000
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+In the case of Random Forest, the customized pred_fun
looks as follows; it is only required because of the majority filter, without it, we could just omit the pred_fun
and pred_args
arguments below.
rf_predfun <- function(object, newdata, fac = NULL) {
+
+ p_load(nnet)
+ majority <- function(x) {
+ levels(x)[which.is.max(table(x))]
+ }
+
+ majority_filter <- function(x, fac) {
+ for (lev in levels(fac)) {
+ x[fac == lev] <- majority(x[fac == lev])
+ }
+ x
+ }
+
+ pred <- predict(object, newdata = newdata)
+ if (!is.null(fac)) pred <- majority_filter(pred, newdata[,fac])
+ return(pred)
+}
Running sperrorest()
takes some time here (Test machine: 3,2 Ghz Intel I-5 with 4 physical cores).
res_rf_sp <- sperrorest(fo, data = maipo, coords = c("utmx","utmy"),
+ model_fun = randomForest,
+ pred_fun = rf_predfun,
+ pred_args = list(fac = "field"),
+ smp_fun = partition_factor_cv,
+ smp_args = list(fac = "field",
+ repetition = 1:50, nfold = 5),
+ error_rep = TRUE, error_fold = TRUE,
+ benchmark = TRUE, progress = 2)
## Mon Feb 27 20:56:01 2017 Repetition 1
+## Mon Feb 27 20:57:12 2017 Repetition 2
+## Mon Feb 27 20:58:20 2017 Repetition 3
+## Mon Feb 27 20:59:29 2017 Repetition 4
+## Mon Feb 27 21:00:36 2017 Repetition 5
+## Mon Feb 27 21:01:46 2017 Repetition 6
+## Mon Feb 27 21:02:55 2017 Repetition 7
+## Mon Feb 27 21:04:01 2017 Repetition 8
+## Mon Feb 27 21:05:07 2017 Repetition 9
+## Mon Feb 27 21:06:16 2017 Repetition 10
+## Mon Feb 27 21:07:23 2017 Repetition 11
+## Mon Feb 27 21:08:30 2017 Repetition 12
+## Mon Feb 27 21:09:38 2017 Repetition 13
+## Mon Feb 27 21:10:45 2017 Repetition 14
+## Mon Feb 27 21:11:53 2017 Repetition 15
+## Mon Feb 27 21:13:01 2017 Repetition 16
+## Mon Feb 27 21:14:09 2017 Repetition 17
+## Mon Feb 27 21:15:16 2017 Repetition 18
+## Mon Feb 27 21:16:23 2017 Repetition 19
+## Mon Feb 27 21:17:31 2017 Repetition 20
+## Mon Feb 27 21:18:39 2017 Repetition 21
+## Mon Feb 27 21:19:46 2017 Repetition 22
+## Mon Feb 27 21:20:53 2017 Repetition 23
+## Mon Feb 27 21:22:03 2017 Repetition 24
+## Mon Feb 27 21:23:13 2017 Repetition 25
+## Mon Feb 27 21:24:23 2017 Repetition 26
+## Mon Feb 27 21:25:32 2017 Repetition 27
+## Mon Feb 27 21:26:39 2017 Repetition 28
+## Mon Feb 27 21:27:47 2017 Repetition 29
+## Mon Feb 27 21:28:55 2017 Repetition 30
+## Mon Feb 27 21:30:03 2017 Repetition 31
+## Mon Feb 27 21:31:11 2017 Repetition 32
+## Mon Feb 27 21:32:18 2017 Repetition 33
+## Mon Feb 27 21:33:25 2017 Repetition 34
+## Mon Feb 27 21:34:33 2017 Repetition 35
+## Mon Feb 27 21:35:40 2017 Repetition 36
+## Mon Feb 27 21:36:47 2017 Repetition 37
+## Mon Feb 27 21:37:54 2017 Repetition 38
+## Mon Feb 27 21:39:02 2017 Repetition 39
+## Mon Feb 27 21:40:09 2017 Repetition 40
+## Mon Feb 27 21:41:17 2017 Repetition 41
+## Mon Feb 27 21:42:24 2017 Repetition 42
+## Mon Feb 27 21:43:31 2017 Repetition 43
+## Mon Feb 27 21:44:38 2017 Repetition 44
+## Mon Feb 27 21:45:46 2017 Repetition 45
+## Mon Feb 27 21:46:54 2017 Repetition 46
+## Mon Feb 27 21:48:01 2017 Repetition 47
+## Mon Feb 27 21:49:07 2017 Repetition 48
+## Mon Feb 27 21:50:15 2017 Repetition 49
+## Mon Feb 27 21:51:21 2017 Repetition 50
+## Mon Feb 27 21:52:27 2017 Done.
+
+res_rf_sp$benchmark$runtime_performance
+## Time difference of 56.4 mins
+summary(res_rf_sp$error_rep$test_error)
## Min. 1st Qu. Median Mean 3rd Qu. Max.
+## 0.0630 0.0827 0.0871 0.0868 0.0928 0.1100
+summary(res_rf_sp$error_rep$test_accuracy)
## Min. 1st Qu. Median Mean 3rd Qu. Max.
+## 0.890 0.907 0.913 0.913 0.917 0.937
+What a surprise! RandomForest classification isn’t that good after all, if we acknowledge that in ‘real life’ we wouldn’t be making predictions in situations where the class membership of other grid cells in the same field is known in the training stage. So spatial dependence does matter.
+Given all the different sampling functions and the required custom predict functions (e.g. rf_predfun()
) in this example, you might be a little confused which function to use for your use case.
+If you want to do a “normal”, i.e. non-spatial cross-validation we recommend to use partition_cv()
as smp_fun
in sperrorest()
. If you want to perform a spatial cross-validation (and you do not have a grouping structure like fields in this example), partition_kmeans()
takes care of spatial partitioning. In most cases you can simply use the generic predict()
method for your model (= skip this argument in sperrorest()
). Check our “custom model and predict functions” vignette for more information on cases where adjustments are needed.
For further questions/issues, please open an issue at our Github repo.
+Breiman, Leo. 1996. “Bagging Predictors.” Machine Learning 24 (2). Springer Nature: 123–40. doi:10.1007/bf00058655.
+———. 2001. “Random Forests.” Machine Learning 45 (1). Springer Nature: 5–32. doi:10.1023/a:1010933404324.
+Brenning, A. 2005. “Spatial Prediction Models for Landslide Hazards: Review, Comparison and Evaluation.” Natural Hazards and Earth System Science 5 (6). Copernicus GmbH: 853–62. doi:10.5194/nhess-5-853-2005.
+Goetz, J.N., A. Brenning, H. Petschko, and P. Leopold. 2015. “Evaluating Machine Learning and Statistical Prediction Techniques for Landslide Susceptibility Modeling.” Computers & Geosciences 81 (August). Elsevier BV: 1–11. doi:10.1016/j.cageo.2015.04.007.
+Hothorn, Torsten, and Berthold Lausen. 2005. “Bundling Classifiers by Bagging Trees.” Computational Statistics & Data Analysis 49 (4). Elsevier BV: 1068–78. doi:10.1016/j.csda.2004.06.019.
+James, Gareth, Daniela Witten, Trevor Hastie, and Robert Tibshirani. 2013. An Introduction to Statistical Learning. Springer New York. doi:10.1007/978-1-4614-7138-7.
+Knudby, Anders, Alexander Brenning, and Ellsworth LeDrew. 2010. “New Approaches to Modelling Fishhabitat Relationships.” Ecological Modelling 221 (3). Elsevier BV: 503–11. doi:10.1016/j.ecolmodel.2009.11.008.
+Peña, M.A., and A. Brenning. 2015. “Assessing Fruit-Tree Crop Classification from Landsat-8 Time Series for the Maipo Valley, Chile.” Remote Sensing of Environment 171 (December). Elsevier BV: 234–44. doi:10.1016/j.rse.2015.10.029.
+partition_kmeans()
has been integrated into mlr (see e865e4). sperrorest
is currently not actively developed. We recommend to use mlr
for all future (spatial) cross-validation work. We will provide an tutorial for spatial data in the mlr-tutorial soon.
Spatial Error Estimation and Variable Importance
+This package implements spatial error estimation and permutation-based spatial variable importance using different spatial cross-validation and spatial block bootstrap methods. To cite sperrorest
in publications, reference the paper by (???). To see the package in action, please check the vignette.
NA
and a message is printed to the console. sperrorest()
will continue normally and uses the successful folds to calculate the repetition error. This helps to run CV with many repetitions using models which do not always converge like maxnet()
, gamm()
or svm()
.ecuador
has been adjusted to avoid exact duplicates of partitions when using partition_kmeans()
.parsperrorest()
into sperrorest()
.sperrorest()
now runs in parallel using all available cores.runfolds()
and runreps()
are now doing the heavy lifting in the background. All modes are now running on the same code base. Before, all parallel modes were running on different code implementations.apply
: calls pbmclapply()
on Unix and pbapply()
on Windows.future
: calls future_lapply()
with various future
options (multiprocess
, multicore
, etc.).foreach
: foreach()
with various future
options (multiprocess
, multicore
, etc.). Default option to cluster
. This is also the overall default mode for sperrorest()
.sequential
: sequential execution using future
backend.repetition
argument of sperrorest()
. Specifying a range like repetition = 1:10
will also stay valid.sperrorest::parallel-modes
comparing the various parallel modes.sperrorest::custom-pred-and-model-functions
explaining why and how custom defined model and predict functions are needed for some model setups.do_try
argument has been removed.error.fold
, error.rep
and err.train
arguments have been removed because they are all calculated by default now.add parsperrorest()
: This function lets you exexute sperrorest()
in parallel. It includes two modes (par.mode = 1
and par.mode = 2
) which use different parallelization approaches in the background. See ?parsperrorest()
for more details.
add partition.factor.cv()
: This resampling method enables partitioning based on a given factor variable. This can be used, for example, to resample agricultural data, that is grouped by fields, at the agricultural field level in order to preserve spatial autocorrelation within fields.
sperrorest()
and parsperrorest()
: Add benchmark
item to returned object giving information about execution time, used cores and other system details.
Changes to functions:
+sperrorest
(): Change argument naming. err.unpooled
is now error.fold
and err.pooled
is now error.rep
sperrorest()
and parsperrorest()
: Change order and naming of returned object
+sperrorestpoolederror
is now sperrorestreperror
+sperrorest
list is now ordered as follows:
+Add distance information to resampling objects
+ + +add.distance(object, ...) + +# S3 method for resampling +add.distance(object, data, coords = c("x", "y"), ...) + +# S3 method for represampling +add.distance(object, ...)+ +
object | +resampling or represampling object. |
+
---|---|
... | +Additional arguments to dataset_distance and +add.distance.resampling, respectively. |
+
data | +
|
+
coords | +(ignored by |
+
A resampling or represampling object
+containing an additional.
+$distance
component in each resampling object.
+The distance
component is a single numeric value indicating, for
+each train
/ test
pair, the (by default, mean)
+nearest-neighbour distance between the two sets.
Nearest-neighbour distances are calculated for each sample in the
+test set. These nrow(???$test)
nearest-neighbour distances are then
+averaged. Aggregation methods other than mean
can be chosen using
+the fun
argument, which will be passed on to
+dataset_distance.
dataset_distance represampling +resampling
+ + ++data(ecuador) # Muenchow et al. (2012), see ?ecuador +nsp.parti <- partition_cv(ecuador) +sp.parti <- partition_kmeans(ecuador) +nsp.parti <- add.distance(nsp.parti, ecuador) +sp.parti <- add.distance(sp.parti, ecuador) +# non-spatial partioning: very small test-training distance: +nsp.parti[[1]][[1]]$distance#> [1] 53.79223# spatial partitioning: more substantial distance, depending on number of +# folds etc. +sp.parti[[1]][[1]]$distance#> [1] 390.1742+
Functions for handling represampling
objects, i.e. list
s of
+resampling objects.
as.represampling(object, ...) + +# S3 method for list +as.represampling(object, ...) + +# S3 method for represampling +print(x, ...) + +is_represampling(object)+ +
object | +object of class |
+
---|---|
... | +currently not used. |
+
x | +object of class |
+
as.represampling
methods return an object of class
+represampling
with the contents of object
.
represampling
objects are (names) lists of
+resampling objects. Such objects are typically created by
+partition_cv, partition_kmeans,
+represampling_disc_bootstrap and related functions.
In r
-repeated k
-fold cross-validation, for example, the
+corresponding represampling
object has length r
, and each of
+its r
resampling objects has length k
.
+ as.resampling_list
coerces object
to class represampling
+while coercing its elements to resampling objects.
+Some validity checks are performed.
resampling, partition_cv, +partition_kmeans, +represampling_disc_bootstrap, etc.
+ + ++data(ecuador) # Muenchow et al. (2012), see ?ecuador +# Partitioning by elevation classes in 200 m steps: +fac <- factor( as.character( floor( ecuador$dem / 300 ) ) ) +summary(fac)#> 10 5 6 7 8 9 +#> 4 21 246 255 147 78parti <- as.resampling(fac) +# a list of lists specifying sets of training and test sets, +# using each factor at a time as the test set: +str(parti)#> List of 6 +#> $ 10:List of 2 +#> ..$ train: int [1:747] 1 2 3 4 5 6 7 8 9 10 ... +#> ..$ test : int [1:4] 535 566 684 734 +#> $ 5 :List of 2 +#> ..$ train: int [1:730] 1 2 3 4 5 6 7 8 9 10 ... +#> ..$ test : int [1:21] 42 77 93 106 115 139 250 332 385 405 ... +#> $ 6 :List of 2 +#> ..$ train: int [1:505] 2 4 7 8 9 12 13 14 15 17 ... +#> ..$ test : int [1:246] 1 3 5 6 10 11 16 19 23 29 ... +#> $ 7 :List of 2 +#> ..$ train: int [1:496] 1 3 5 6 7 8 10 11 12 13 ... +#> ..$ test : int [1:255] 2 4 9 18 20 22 24 26 28 30 ... +#> $ 8 :List of 2 +#> ..$ train: int [1:604] 1 2 3 4 5 6 7 9 10 11 ... +#> ..$ test : int [1:147] 8 12 14 15 21 25 27 32 46 54 ... +#> $ 9 :List of 2 +#> ..$ train: int [1:673] 1 2 3 4 5 6 8 9 10 11 ... +#> ..$ test : int [1:78] 7 13 17 35 44 75 78 79 88 97 ... +#> - attr(*, "class")= chr "resampling"summary(parti)#> n.train n.test +#> 10 747 4 +#> 5 730 21 +#> 6 505 246 +#> 7 496 255 +#> 8 604 147 +#> 9 673 78
Create/coerce and print resampling objects, e.g., partitionings or boostrap +samples derived from a data set.
+ + +as.resampling(object, ...) + +# S3 method for default +as.resampling(object, ...) + +# S3 method for factor +as.resampling(object, ...) + +# S3 method for list +as.resampling(object, ...) + +validate.resampling(object) + +is.resampling(x, ...) + +# S3 method for resampling +print(x, ...)+ +
object | +depending on the function/method, a list or a vector of type +factor defining a partitioning of the dataset. |
+
---|---|
... | +currently not used. |
+
x | +object of class |
+
as.resampling
methods: An object of class resampling
.
A resampling
object is a list of lists defining a set of
+training and test samples.
In the case of k
-fold cross-validation partitioning, for example,
+the corresponding resampling
object would be of length k
,
+i.e. contain k
lists. Each of these k
lists defines a training
+set of size n(k-1)/k
(where n
is the overall sample size), and
+a test set of size n/k
.
+The resampling
object does, however, not contain the data itself, but
+only indices between 1
and n
identifying the selection
+(see Examples).
Another example is bootstrap resampling. represampling_bootstrap
+with argument oob = TRUE
generates rep
resampling
objects
+with indices of a bootstrap sample in the train
component and indices
+of the out-of-bag sample in the test component (see Examples below).
+ as.resampling.factor
: For each factor level of the input variable,
+as.resampling.factor
determines the indices of samples in this level
+(= test samples) and outside this level (= training samples). Empty levels of
+object
are dropped without warning.
+ as.resampling_list
checks if the list in object
has a valid
+resampling
object structure (with components train
and
+test
etc.) and assigns the class attribute 'resampling'
if
+successful.
represampling, partition_cv, +partition_kmeans, represampling_bootstrap, etc.
+ + ++data(ecuador) # Muenchow et al. (2012), see ?ecuador + +# Partitioning by elevation classes in 200 m steps: +parti <- factor( as.character( floor( ecuador$dem / 200 ) ) ) +smp <- as.resampling(parti) +summary(smp)#> n.train n.test +#> 10 600 151 +#> 11 585 166 +#> 12 660 91 +#> 13 641 110 +#> 14 727 24 +#> 15 747 4 +#> 8 730 21 +#> 9 567 184# Compare: +summary(parti)#> 10 11 12 13 14 15 8 9 +#> 151 166 91 110 24 4 21 184+# k-fold (non-spatial) cross-validation partitioning: +parti <- partition_cv(ecuador) +parti <- parti[[1]] # the first (and only) resampling object in parti +# data corresponding to the test sample of the first fold: +str( ecuador[ parti[[1]]$test , ])#> 'data.frame': 76 obs. of 13 variables: +#> $ x : num 714042 715282 713962 713412 714902 ... +#> $ y : num 9558482 9557602 9561082 9560472 9559262 ... +#> $ dem : num 2408 2837 1839 1869 2363 ... +#> $ slope : num 24.1 34.4 63.4 16.9 50.7 ... +#> $ hcurv : num 0.00659 -0.02191 -0.04951 -0.00156 -0.01407 ... +#> $ vcurv : num 0.01041 -0.00579 -0.00529 0.00406 0.00547 ... +#> $ carea : num 773 2247 3282 2421 519 ... +#> $ cslope : num 27.5 37.7 29.1 31 35.3 ... +#> $ distroad : num 300 300 173 300 300 ... +#> $ slides : Factor w/ 2 levels "FALSE","TRUE": 2 2 2 2 2 2 2 2 2 2 ... +#> $ distdeforest : num 300 300 33.8 52.5 300 ... +#> $ distslidespast: num 6 100 100 39 59 100 37 100 25 0 ... +#> $ log.carea : num 2.89 3.35 3.52 3.38 2.72 ...# the corresponding training sample - larger: +str( ecuador[ parti[[1]]$train , ])#> 'data.frame': 675 obs. of 13 variables: +#> $ x : num 712882 715232 715392 715042 715382 ... +#> $ y : num 9560002 9559582 9560172 9559312 9560142 ... +#> $ dem : num 1912 2199 1989 2320 2021 ... +#> $ slope : num 25.6 23.2 40.5 42.9 42 ... +#> $ hcurv : num -0.00681 -0.00501 -0.01919 -0.01106 0.00958 ... +#> $ vcurv : num -0.00029 -0.00649 -0.04051 -0.04634 0.02642 ... +#> $ carea : num 5577 1399 351155 501 671 ... +#> $ cslope : num 34.4 30.7 32.8 33.9 41.6 ... +#> $ distroad : num 300 300 300 300 300 ... +#> $ slides : Factor w/ 2 levels "FALSE","TRUE": 2 2 2 2 2 2 2 2 2 2 ... +#> $ distdeforest : num 15 300 300 300 300 9.15 300 300 300 0 ... +#> $ distslidespast: num 9 21 40 100 21 2 100 100 41 5 ... +#> $ log.carea : num 3.75 3.15 5.55 2.7 2.83 ...+# Bootstrap training sets, out-of-bag test sets: +parti <- represampling_bootstrap(ecuador, oob = TRUE) +parti <- parti[[1]] # the first (and only) resampling object in parti +# out-of-bag test sample: approx. one-third of nrow(ecuador): +str( ecuador[ parti[[1]]$test , ])#> 'data.frame': 290 obs. of 13 variables: +#> $ x : num 715232 715042 715382 712802 714842 ... +#> $ y : num 9559582 9559312 9560142 9559952 9558892 ... +#> $ dem : num 2199 2320 2021 1838 2483 ... +#> $ slope : num 23.2 42.9 42 52.1 68.8 ... +#> $ hcurv : num -0.00501 -0.01106 0.00958 0.00183 -0.04921 ... +#> $ vcurv : num -0.00649 -0.04634 0.02642 -0.09203 -0.12438 ... +#> $ carea : num 1399 501 671 634 754 ... +#> $ cslope : num 30.7 33.9 41.6 30.3 53.7 ... +#> $ distroad : num 300 300 300 300 300 ... +#> $ slides : Factor w/ 2 levels "FALSE","TRUE": 2 2 2 2 2 2 2 2 2 2 ... +#> $ distdeforest : num 300 300 300 9.15 300 ... +#> $ distslidespast: num 21 100 21 2 100 5 20 100 100 100 ... +#> $ log.carea : num 3.15 2.7 2.83 2.8 2.88 ...# bootstrap training sample: same size as nrow(ecuador): +str( ecuador[ parti[[1]]$train , ])#> 'data.frame': 751 obs. of 13 variables: +#> $ x : num 715382 713132 715432 714892 715722 ... +#> $ y : num 9558062 9560672 9558592 9559282 9557532 ... +#> $ dem : num 2799 1897 2622 2374 3097 ... +#> $ slope : num 50.6 24.7 19 42.8 28.8 ... +#> $ hcurv : num -0.00812 0.00306 -0.00301 -0.01057 0.02327 ... +#> $ vcurv : num -0.02208 -0.00436 -0.01099 0.02427 0.01833 ... +#> $ carea : num 951 2603 1146 381 300 ... +#> $ cslope : num 50.9 29 20.1 27.2 25.9 ... +#> $ distroad : num 300 10 300 300 300 300 300 300 300 300 ... +#> $ slides : Factor w/ 2 levels "FALSE","TRUE": 1 2 1 2 1 1 2 2 2 1 ... +#> $ distdeforest : num 300 166 300 300 300 ... +#> $ distslidespast: num 100 2 26 46 100 55 100 100 11 45 ... +#> $ log.carea : num 2.98 3.42 3.06 2.58 2.48 ...+
Functions for generating and handling alphanumeric tile names of the
+form 'X2:Y7'
as used by partition_tiles and
+represampling_tile_bootstrap.
as.tilename(x, ...) + +# S3 method for numeric +as.tilename(x, ...) + +# S3 method for tilename +as.character(x, ...) + +# S3 method for tilename +as.numeric(x, ...) + +# S3 method for character +as.tilename(x, ...) + +# S3 method for tilename +print(x, ...)+ +
x | +object of class |
+
---|---|
... | +additional arguments (currently ignored). |
+
object of class tilename
, character
, or numeric
+vector of length 2
partition_tiles, represampling, +represampling_tile_bootstrap
+ + ++tnm <- as.tilename(c(2,3)) +tnm # 'X2:Y3'#> [1] "X2:Y3"as.numeric(tnm) # c(2,3)#> Warning: NAs introduced by coercion#> [1] NA
dataset_distance
calculates Euclidean nearest-neighbour distances
+between two point datasets and summarizes these distances using some
+function, by default the mean.
dataset_distance(d1, d2, x_name = "x", y_name = "y", fun = mean, + method = "euclidean", ...)+ +
d1 | +a |
+
---|---|
d2 | +see |
+
x_name | +name of column in |
+
y_name | +same for y coordinates |
+
fun | +function to be applied to the vector of nearest-neighbor
+distances of |
+
method | +type of distance metric to be used; only |
+
... | +additional arguments to |
+
depends on fun
; typically (e.g., mean
) a numeric vector
+of length 1
Nearest-neighbour distances are calculated for each point in
+d1
, resulting in a vector of length nrow(d1)
, and fun
+is applied to this vector.
+df <- data.frame(x = rnorm(100), y = rnorm(100)) +dataset_distance(df, df) # == 0#> [1] 0+
Data set created by Jannes Muenchow, University of Erlangen-Nuremberg, +Germany. +These data should be cited as Muenchow et al. (2012) (see reference below). +This publication also contains additional information on data collection and +the geomorphology of the area. The data set provded here is (a subset of) the +one from the 'natural' part of the RBSF area and corresponds to landslide +distribution in the year 2000.
+ + + +a data.frame
with point samples of landslide and
+non-landslide locations in a study area in the Andes of southern Ecuador.
Muenchow, J., Brenning, A., Richter, M., 2012. Geomorphic process +rates of landslides along a humidity gradient in the tropical Andes. +Geomorphology, 139-140: 271-284.
+Brenning, A., 2005. Spatial prediction models for landslide hazards: +review, comparison and evaluation. +Natural Hazards and Earth System Sciences, 5(6): 853-862.
+ + ++data(ecuador) +str(ecuador)#> 'data.frame': 751 obs. of 13 variables: +#> $ x : num 712882 715232 715392 715042 715382 ... +#> $ y : num 9560002 9559582 9560172 9559312 9560142 ... +#> $ dem : num 1912 2199 1989 2320 2021 ... +#> $ slope : num 25.6 23.2 40.5 42.9 42 ... +#> $ hcurv : num -0.00681 -0.00501 -0.01919 -0.01106 0.00958 ... +#> $ vcurv : num -0.00029 -0.00649 -0.04051 -0.04634 0.02642 ... +#> $ carea : num 5577 1399 351155 501 671 ... +#> $ cslope : num 34.4 30.7 32.8 33.9 41.6 ... +#> $ distroad : num 300 300 300 300 300 ... +#> $ slides : Factor w/ 2 levels "FALSE","TRUE": 2 2 2 2 2 2 2 2 2 2 ... +#> $ distdeforest : num 15 300 300 300 300 9.15 300 300 300 0 ... +#> $ distslidespast: num 9 21 40 100 21 2 100 100 41 5 ... +#> $ log.carea : num 3.75 3.15 5.55 2.7 2.83 ...library(rpart) +ctrl <- rpart.control(cp = 0.02) +fit <- rpart(slides ~ dem + slope + hcurv + vcurv + + log.carea + cslope, data = ecuador, control = ctrl) +par(xpd = TRUE) +plot(fit, compress = TRUE, main = 'Muenchows landslide data set')text(fit, use.n = TRUE)
Calculate a variety of accuracy measures from observations +and predictions of numerical and categorical response variables.
+ + +err_default(obs, pred)+ +
obs | +factor, logical, or numeric vector with observations |
+
---|---|
pred | +factor, logical, or numeric vector with predictions. Must be of
+same type as |
+
A list with (currently) the following components, depending on the +type of prediction problem:
+misclassification error, overall accuracy; +if two classes, sensitivity, specificity, positive predictive value (PPV), +negative predictive value (NPV), kappa
area under the ROC curve, error and accuracy +at a obs>0.5 dichotomization, false-positive rate (FPR; 1-specificity) +at 70, 80 and 90 percent sensitivity, true-positive rate (sensitivity) +at 80, 90 and 95 percent specificity
bias, standard deviation, mean squared error, +MAD (mad), median, interquartile range (IQR) +of residuals
NA
values are currently not handled by this function,
+i.e. they will result in an error.
ROCR
+ + ++obs <- rnorm(1000) +# Two mock (soft) classification examples: +err_default( obs > 0, rnorm(1000) ) # just noise#> $auroc +#> [1] 0.4621542 +#> +#> $error +#> [1] 0.513 +#> +#> $accuracy +#> [1] 0.487 +#> +#> $sensitivity +#> [1] 0.2727273 +#> +#> $specificity +#> [1] 0.6970297 +#> +#> $fpr70 +#> [1] 0.7366337 +#> +#> $fpr80 +#> [1] 0.829703 +#> +#> $fpr90 +#> [1] 0.9069307 +#> +#> $tpr80 +#> [1] 0.1515152 +#> +#> $tpr90 +#> [1] 0.06868687 +#> +#> $tpr95 +#> [1] 0.04040404 +#> +#> $events +#> [1] 495 +#> +#> $count +#> [1] 1000 +#>err_default( obs > 0, obs + rnorm(1000) ) # some discrimination#> $auroc +#> [1] 0.8270627 +#> +#> $error +#> [1] 0.259 +#> +#> $accuracy +#> [1] 0.741 +#> +#> $sensitivity +#> [1] 0.6282828 +#> +#> $specificity +#> [1] 0.8514851 +#> +#> $fpr70 +#> [1] 0.219802 +#> +#> $fpr80 +#> [1] 0.3465347 +#> +#> $fpr90 +#> [1] 0.5009901 +#> +#> $tpr80 +#> [1] 0.6888889 +#> +#> $tpr90 +#> [1] 0.5313131 +#> +#> $tpr95 +#> [1] 0.4121212 +#> +#> $events +#> [1] 495 +#> +#> $count +#> [1] 1000 +#># Three mock regression examples: +err_default( obs, rnorm(1000) ) # just noise, but no bias#> $bias +#> [1] 0.01646476 +#> +#> $stddev +#> [1] 1.437289 +#> +#> $rmse +#> [1] 1.436665 +#> +#> $mad +#> [1] 1.486945 +#> +#> $median +#> [1] 0.01263483 +#> +#> $iqr +#> [1] 2.004161 +#> +#> $count +#> [1] 1000 +#>err_default( obs, obs + rnorm(1000) ) # some association, no bias#> $bias +#> [1] -0.05961818 +#> +#> $stddev +#> [1] 1.000054 +#> +#> $rmse +#> [1] 1.00133 +#> +#> $mad +#> [1] 0.9719318 +#> +#> $median +#> [1] -0.05654433 +#> +#> $iqr +#> [1] 1.302193 +#> +#> $count +#> [1] 1000 +#>err_default( obs, obs + 1 ) # perfect correlation, but with bias#> $bias +#> [1] -1 +#> +#> $stddev +#> [1] 6.429096e-17 +#> +#> $rmse +#> [1] 1 +#> +#> $mad +#> [1] 0 +#> +#> $median +#> [1] -1 +#> +#> $iqr +#> [1] 0 +#> +#> $count +#> [1] 1000 +#>+
get_small_tiles
identifies partitions (tiles) that are too small
+according to some defined criterion / criteria (minimum number of samples in
+tile and/or minimum fraction of entire dataset).
get_small_tiles(tile, min_n = NULL, min_frac = 0, ignore = c())+ +
tile | +factor: tile/partition names for all samples; names must be
+coercible to class tilename, i.e. of the form |
+
---|---|
min_n | +integer (optional): minimum number of samples per partition_ |
+
min_frac | +numeric >0, <1: minimum relative size of partition as +percentage of sample. |
+
ignore | +character vector: names of tiles to be ignored, i.e. to be +retained even if the inclusion criteria are not met. |
+
character vector: names of tiles that are considered 'small' +according to these criteria
+ ++data(ecuador) # Muenchow et al. (2012), see ?ecuador +# Rectangular partitioning without removal of small tiles: +parti <- partition_tiles(ecuador, nsplit = c(10,10), reassign = FALSE) +summary(parti)#> $`1` +#> n.train n.test +#> X1:Y7 688 8 +#> X1:Y8 678 18 +#> X10:Y2 685 11 +#> X10:Y3 688 8 +#> X10:Y4 685 11 +#> X2:Y4 683 13 +#> X2:Y5 690 6 +#> X2:Y6 689 7 +#> X2:Y7 670 26 +#> X2:Y8 674 22 +#> X2:Y9 691 5 +#> X3:Y10 689 7 +#> X3:Y5 675 21 +#> X3:Y6 687 9 +#> X3:Y8 691 5 +#> X3:Y9 676 20 +#> X4:Y10 690 6 +#> X4:Y4 686 10 +#> X4:Y5 685 11 +#> X4:Y6 687 9 +#> X4:Y7 684 12 +#> X4:Y8 690 6 +#> X4:Y9 683 13 +#> X5:Y10 687 9 +#> X5:Y2 689 7 +#> X5:Y3 684 12 +#> X5:Y4 676 20 +#> X5:Y5 691 5 +#> X5:Y6 686 10 +#> X5:Y7 690 6 +#> X5:Y9 689 7 +#> X6:Y1 691 5 +#> X6:Y2 689 7 +#> X6:Y3 685 11 +#> X6:Y4 691 5 +#> X6:Y5 681 15 +#> X6:Y7 689 7 +#> X6:Y8 685 11 +#> X6:Y9 691 5 +#> X7:Y1 687 9 +#> X7:Y10 676 20 +#> X7:Y2 686 10 +#> X7:Y3 688 8 +#> X7:Y4 682 14 +#> X7:Y5 688 8 +#> X7:Y6 687 9 +#> X7:Y7 685 11 +#> X7:Y8 685 11 +#> X7:Y9 687 9 +#> X8:Y1 669 27 +#> X8:Y2 683 13 +#> X8:Y3 684 12 +#> X8:Y4 689 7 +#> X8:Y5 673 23 +#> X8:Y6 685 11 +#> X8:Y7 684 12 +#> X9:Y1 687 9 +#> X9:Y2 690 6 +#> X9:Y3 686 10 +#> X9:Y4 678 18 +#> X9:Y6 685 11 +#> X9:Y7 691 5 +#> X9:Y8 686 10 +#> X9:Y9 689 7 +#>length(parti[[1]])#> [1] 64# Same in factor format for the application of get_small_tiles: +parti_fac <- partition_tiles(ecuador, nsplit = c(10, 10), reassign = FALSE, + return_factor = TRUE) +get_small_tiles(parti_fac[[1]], min_n = 20) # tiles with less than 20 samples#> [1] X2:Y9 X3:Y8 X5:Y5 X6:Y1 X6:Y4 X6:Y9 X9:Y7 X2:Y5 X4:Y10 X4:Y8 +#> [11] X5:Y7 X9:Y2 X2:Y6 X3:Y10 X5:Y2 X5:Y9 X6:Y2 X6:Y7 X8:Y4 X9:Y9 +#> [21] X1:Y7 X10:Y3 X7:Y3 X7:Y5 X3:Y6 X4:Y6 X5:Y10 X7:Y1 X7:Y6 X7:Y9 +#> [31] X9:Y1 X4:Y4 X5:Y6 X7:Y2 X9:Y3 X9:Y8 X10:Y2 X10:Y4 X4:Y5 X6:Y3 +#> [41] X6:Y8 X7:Y7 X7:Y8 X8:Y6 X9:Y6 X4:Y7 X5:Y3 X8:Y3 X8:Y7 X2:Y4 +#> [51] X4:Y9 X8:Y2 X7:Y4 X6:Y5 X1:Y8 X9:Y4 +#> 64 Levels: X1:Y7 X1:Y8 X10:Y2 X10:Y3 X10:Y4 X2:Y4 X2:Y5 X2:Y6 X2:Y7 ... X9:Y9parti2 <- partition_tiles(ecuador, nsplit = c(10, 10), reassign = TRUE, + min_n = 20, min_frac = 0) +length(parti2[[1]]) # < length(parti[[1]])#> [1] 31
Maipo dataset from Marco Pena
+ + + + +partition_cv
creates a represampling object for
+length(repetition)
-repeated nfold
-fold cross-validation.
partition_cv(data, coords = c("x", "y"), nfold = 10, repetition = 1, + seed1 = NULL, return_factor = FALSE)+ +
data | +
|
+
---|---|
coords | +(ignored by |
+
nfold | +number of partitions (folds) in |
+
repetition | +numeric vector: cross-validation repetitions
+to be generated. Note that this is not the number of repetitions,
+but the indices of these repetitions. E.g., use |
+
seed1 | +
|
+
return_factor | +if |
+
If return_factor = FALSE
(the default), a
+represampling object. Specifically, this is a (named) list of
+length(repetition)
resampling
objects.
+Each of these resampling objects is a list of length
+nfold
corresponding to the folds.
+Each fold is represented by a list of containing the components train
+and test
, specifying the indices of training and test samples
+(row indices for data
).
+If return_factor = TRUE
(mainly used internally), a (named) list of
+length length(repetition)
.
+Each component of this list is a vector of length nrow(data)
of type
+factor
, specifying for each sample the fold to which it belongs.
+The factor levels are factor(1:nfold)
.
This function does not actually perform a cross-validation +or partition the data set itself; it simply creates a data structure +containing the indices of training and test samples.
+ ++data(ecuador) +## non-spatial cross-validation: +resamp <- partition_cv(ecuador, nfold = 5, repetition = 5) +# plot(resamp, ecuador) +# first repetition, second fold, test set indices: +idx <- resamp[['1']][[2]]$test +# test sample used in this particular repetition and fold: +ecuador[idx , ]#> x y dem slope hcurv vcurv carea +#> 4965 715042.5 9559312 2320.49 42.857816 -0.01106 -0.04634 500.5027 +#> 37912 712802.5 9559952 1838.40 52.101344 0.00183 -0.09203 634.3320 +#> 31357 714752.5 9561022 1848.22 33.446411 -0.00347 0.02357 1752.0375 +#> 25090 715362.5 9560102 2059.29 49.119672 0.02059 -0.00628 556.0121 +#> 40756 714022.5 9558862 2331.20 45.085476 -0.00075 0.00475 1001.0861 +#> 14254 714852.5 9557882 2680.00 23.335425 0.00479 0.01261 323.8441 +#> 24072 715282.5 9557602 2837.46 34.394083 -0.02191 -0.00579 2246.7725 +#> 34512 713162.5 9559632 2041.52 46.815236 -0.00857 0.03677 1675.4679 +#> 24129 714602.5 9560542 2038.30 25.584857 0.00065 0.00005 769.3426 +#> 9669 713852.5 9558612 2308.97 52.180412 -0.01059 -0.07431 4079.0366 +#> 39885 715062.5 9561022 1840.94 38.435728 0.00250 -0.01340 604.1032 +#> 32622 714972.5 9557762 2687.38 30.556412 -0.00730 -0.01491 1844.4353 +#> 38993 714202.5 9558402 2475.75 33.159359 -0.01607 0.00677 1343.5006 +#> 22814 713862.5 9558582 2327.48 48.612031 -0.02894 -0.03416 947.4878 +#> 42598 713542.5 9559972 2184.39 35.241488 -0.00841 -0.00010 1527.1028 +#> 38287 714872.5 9561162 1719.41 48.235598 -0.03021 -0.09328 2235.5259 +#> 39061 712922.5 9559912 1931.50 36.315211 0.00414 -0.00104 1641.4587 +#> 30800 715182.5 9557582 2772.37 39.204064 -0.02475 0.00665 25318.3555 +#> 38410 713712.5 9561172 1776.08 36.319222 -0.02678 -0.04272 4955.5586 +#> 48498 714972.5 9557972 2681.66 38.346919 0.00528 0.01392 1255.0884 +#> 35285 714602.5 9560112 2142.75 51.107835 -0.00017 -0.00032 994.6740 +#> 29472 714792.5 9561072 1786.08 36.173117 -0.01029 -0.02312 2657.6936 +#> 22933 712712.5 9560432 1902.17 38.123466 -0.00746 0.00096 3210.5781 +#> 38032 714762.5 9560442 1988.33 35.169868 -0.00905 -0.01385 674.0721 +#> 14522 715012.5 9557732 2721.34 35.181327 -0.00382 0.00672 1114.8801 +#> 31576 713942.5 9558492 2387.12 46.143156 -0.05634 -0.02626 4316.0977 +#> 20290 714812.5 9561092 1775.73 34.626704 -0.01537 -0.00203 10263.2734 +#> 27978 713552.5 9558532 2373.05 17.044921 0.00832 0.00838 238.6685 +#> 15340 715452.5 9558852 2523.95 34.165473 0.00151 0.00899 1394.2274 +#> 4827 714962.5 9559882 2274.42 29.244912 0.00051 0.00889 362.8363 +#> 6808 715192.5 9559492 2219.03 21.817660 0.00713 -0.00173 8967.1924 +#> 18391 715112.5 9559432 2235.02 22.257691 -0.02651 -0.03629 360379.1875 +#> 8073 712572.5 9560302 1917.62 52.009098 0.01937 0.00083 994.7876 +#> 27957 715012.5 9559332 2306.35 20.480376 0.00225 0.00325 616.8652 +#> 15390 713132.5 9560622 1873.26 38.480418 0.00252 0.01118 2359.5059 +#> 22735 714492.5 9559772 2170.30 39.688786 -0.00440 -0.02651 419.0927 +#> 46756 713372.5 9559192 2120.47 46.005073 0.03696 0.02514 297.7842 +#> 2858 714292.5 9559332 2335.04 42.355905 0.04547 0.03323 183.1836 +#> 34563 715022.5 9558162 2675.70 18.889846 -0.01993 0.01373 16031.7500 +#> 46235 712712.5 9561042 2023.11 41.277025 0.01458 0.00672 565.1107 +#> 27084 713632.5 9558472 2380.37 23.135463 -0.01752 -0.00517 1000.7119 +#> 5578 714962.5 9557612 2751.95 39.191459 0.00288 0.01452 437.3965 +#> 2976 715122.5 9559142 2390.05 33.167954 0.01008 0.00102 483.4184 +#> 34707 713962.5 9561092 1819.49 65.686173 -0.04749 0.04779 2972.0193 +#> 20220 714052.5 9558522 2378.67 39.656128 -0.00855 -0.02175 1122.9120 +#> 21952 715352.5 9560172 2013.44 40.488063 0.02409 0.02771 674.3859 +#> 25244 713472.5 9559092 2137.03 18.348973 -0.10372 -0.00008 825875.6875 +#> 26698 713382.5 9559082 2155.56 34.264595 -0.00211 0.00241 585.6182 +#> 19887 714662.5 9559632 2266.47 45.179441 -0.02797 -0.00063 576.2819 +#> 38862 712952.5 9559892 1950.86 37.721822 -0.00048 -0.00332 1418.6622 +#> 2595 714912.5 9557662 2700.72 24.194862 -0.02681 -0.01319 3457.7519 +#> 32909 712812.5 9560452 1883.85 38.251235 -0.00221 0.00121 4159.6738 +#> 10507 714892.5 9559282 2374.45 42.842346 -0.01057 0.02427 380.9472 +#> 7668 713372.5 9559062 2164.82 37.934963 -0.01807 -0.01883 616.1993 +#> 44825 714322.5 9557882 2534.80 40.688598 0.00114 0.03026 247.0894 +#> 46065 713072.5 9559162 2109.39 16.264553 -0.00195 -0.00656 3036.7383 +#> 7064 715592.5 9558662 2597.69 34.222196 0.00099 0.00240 1201.4724 +#> 2552 714942.5 9557652 2719.32 34.873649 -0.01238 -0.00851 1012.5963 +#> 45592 714962.5 9559862 2282.08 22.160862 -0.00028 0.01318 289.0063 +#> 41783 715572.5 9558652 2609.51 32.408212 0.00291 -0.00081 940.6002 +#> 34192 712702.5 9560102 1867.01 35.474682 0.00629 -0.00469 1021.5407 +#> 5239 714832.5 9560992 1859.70 19.960131 -0.00320 0.00180 3904.5254 +#> 31834 713492.5 9560682 1800.85 34.237093 -0.00472 -0.00228 2262.3796 +#> 18897 712802.5 9559882 1851.71 50.234648 -0.02627 -0.05393 8301.6094 +#> 39722 715022.5 9557712 2735.53 30.260766 0.00133 0.00367 806.0966 +#> 30268 714172.5 9558612 2328.84 54.950663 0.03743 -0.05242 507.9263 +#> 39166 712832.5 9560142 1922.46 39.660139 0.01469 0.02940 987.8085 +#> 33910 715202.5 9559572 2195.66 13.791667 -0.04376 -0.01644 1057765.6250 +#> 31117 712892.5 9558922 2129.84 30.550683 0.00521 -0.00870 1895.1399 +#> 47702 714692.5 9559712 2200.51 33.098626 -0.13038 -0.04212 46310.0195 +#> 20797 713012.5 9560392 1842.25 36.649818 0.01299 0.00141 4611.0718 +#> 34615 713242.5 9560732 1879.60 27.593647 0.00384 0.00175 1026.6818 +#> 30933 714672.5 9561082 1798.37 54.810289 0.00542 -0.00722 1373.4147 +#> 26428 713042.5 9560332 1895.15 33.959781 -0.01439 -0.01161 3065.3694 +#> 26101 715562.5 9558682 2594.87 32.411077 0.00079 0.00321 1170.5904 +#> 17058 714032.5 9558502 2402.95 31.352824 0.01924 0.02606 564.6876 +#> 28173 714562.5 9560382 2048.26 27.685894 -0.00124 -0.00456 1531.4274 +#> 2801 715252.5 9559612 2199.28 24.403418 0.03051 0.02190 264.6404 +#> 16071 714562.5 9560062 2166.69 28.696018 0.02281 0.04759 158.9370 +#> 22297 714642.5 9558172 2606.18 59.469008 -0.00495 -0.03756 353.9086 +#> 39624 715222.5 9559552 2205.54 23.897497 0.00401 -0.00391 1794.7964 +#> 750 712812.5 9560032 1847.09 54.959258 -0.00767 -0.03713 6379.7974 +#> 48650 715592.5 9558612 2626.74 31.876507 0.00010 0.00140 1066.3957 +#> 14494 715332.5 9558792 2557.22 20.720446 0.01267 0.00483 343.1384 +#> 44612 714932.5 9558822 2578.80 38.255246 0.01257 0.00044 288.5468 +#> 39405 713122.5 9559052 2169.30 37.646192 -0.02110 -0.02450 2589.9910 +#> 14950 713382.5 9559172 2111.91 48.106682 0.00915 -0.02705 413.5702 +#> 24723 712942.5 9560202 1983.18 33.071697 0.00285 -0.01325 595.0426 +#> 42617 714092.5 9558392 2469.21 42.799947 -0.01397 0.00526 706.1463 +#> 29024 713492.5 9559112 2173.47 53.432325 0.04864 0.04226 338.3474 +#> 28260 715312.5 9558302 2681.75 34.484038 -0.01125 -0.02685 2568.4631 +#> 22847 714582.5 9560382 2040.49 25.336194 0.00181 -0.00341 1363.4000 +#> 46792 713862.5 9559672 2272.03 35.464942 0.00560 0.00610 454.8103 +#> 46627 714852.5 9558932 2459.24 53.445503 0.02451 -0.03921 525.3245 +#> 49459 714692.5 9557342 2639.74 42.047081 0.00216 -0.02606 919.4241 +#> 37115 715612.5 9559202 2358.15 9.727677 -0.01020 0.00279 39495.1797 +#> 17227 714812.5 9558892 2545.43 54.071746 0.00458 0.04291 293.4231 +#> 12074 714562.5 9560372 2045.17 26.287304 -0.00144 -0.00856 1789.1596 +#> 47722 712732.5 9561022 1999.03 37.492639 0.00004 -0.02384 947.2075 +#> 4479 713982.5 9557812 2399.24 41.231189 0.01895 0.04744 226.7911 +#> 10213 715542.5 9558782 2532.10 35.413375 -0.00893 -0.00867 2247.8748 +#> 23832 714252.5 9560182 2188.12 30.647512 0.00427 0.02403 474.8613 +#> 3771 714842.5 9561152 1724.51 45.539831 0.02405 -0.05655 130.9468 +#> 21490 714672.5 9560382 2021.86 32.804126 -0.00783 -0.00367 1966.2112 +#> 39466 714942.5 9557772 2680.35 16.753286 0.00166 -0.01265 1148.6487 +#> 6470 713742.5 9558802 2260.79 30.680171 0.00000 0.00320 911.5911 +#> 10145 713782.5 9560932 1865.62 23.595548 -0.00203 0.00642 1688.1782 +#> 748 715102.5 9559692 2301.39 27.021836 -0.00759 0.00158 837.3730 +#> 34250 713822.5 9559142 2316.36 33.759819 -0.02284 0.01085 2229.5105 +#> 15558 714732.5 9559502 2283.66 53.431752 0.00390 -0.02831 454.9113 +#> 31071 713472.5 9558462 2298.33 29.153239 0.00156 -0.00116 1817.5222 +#> 18286 714792.5 9557882 2650.57 28.253695 -0.00124 0.00204 853.2902 +#> 26023 715832.5 9557632 3097.56 41.710182 0.00133 -0.00452 578.5847 +#> 20067 714242.5 9560442 2125.23 38.640274 0.00363 -0.00363 389.2691 +#> 26963 715702.5 9557882 2880.32 43.733296 -0.01174 -0.01466 2633.9666 +#> 19493 714252.5 9560372 2156.93 24.879546 0.00825 0.00445 355.5078 +#> 29125 714992.5 9558792 2584.41 24.665833 0.01253 -0.00333 422.8840 +#> 44396 715512.5 9558102 2845.31 26.598420 0.01263 0.03598 186.3527 +#> 20547 713732.5 9560822 1851.67 36.968383 -0.01405 0.00476 11992.5273 +#> 40139 712852.5 9560072 1907.89 43.704075 0.00001 0.00759 1880.4403 +#> 34616 714522.5 9558742 2568.77 34.997981 0.00990 0.02790 209.6952 +#> 32849 714302.5 9558222 2520.01 32.157256 -0.00846 0.00216 1412.2295 +#> 21925 714542.5 9559232 2314.33 50.247826 0.00737 0.01033 3446.5520 +#> 14800 712952.5 9558682 2120.89 38.273581 -0.00022 0.00662 1269.2650 +#> 8895 714132.5 9557692 2487.19 37.364297 0.03423 -0.00983 270.7949 +#> 47059 713822.5 9558272 2413.20 38.175032 -0.00687 -0.00223 1056.3701 +#> 34105 715432.5 9558592 2621.73 18.956309 -0.00301 -0.01099 1146.2140 +#> 29835 715402.5 9558512 2610.14 27.118092 -0.01322 -0.00759 4131.5356 +#> 13908 715022.5 9559372 2290.81 27.242997 -0.03004 0.00714 480535.5000 +#> 35038 715092.5 9558152 2715.69 40.183822 0.00270 -0.00250 2225.4426 +#> 42424 714692.5 9560902 1921.34 25.331037 0.01154 -0.00204 618.2027 +#> 41612 713952.5 9560002 2198.00 28.756179 0.00119 0.00361 1011.0248 +#> 1479 715622.5 9557952 2888.67 34.953863 0.01635 -0.00105 868.7802 +#> 14671 712672.5 9560372 1872.21 39.197189 -0.00873 -0.01417 1520.6630 +#> 32430 715832.5 9558112 2824.39 23.694097 0.00944 0.00836 250.5111 +#> 484 713812.5 9560592 2016.41 44.301097 -0.00719 0.00079 1647.6451 +#> 44434 713022.5 9558762 2161.07 31.454237 -0.00273 -0.01047 2826.7700 +#> 9007 714952.5 9560902 1863.75 32.770321 0.00074 -0.02904 608.3561 +#> 280 713182.5 9560632 1862.83 39.004102 0.00745 0.00875 2156.9358 +#> 4943 712452.5 9559172 1927.46 34.465130 0.00105 -0.01576 1045.1074 +#> 10981 713832.5 9560022 2125.33 34.848439 -0.06909 -0.01151 143213.1562 +#> 16845 712772.5 9560022 1831.76 21.399974 -0.00865 -0.00805 7489.7417 +#> 14881 715482.5 9559232 2376.03 29.905532 -0.01137 -0.01523 888.2811 +#> 33881 712992.5 9558822 2173.26 33.197748 -0.00595 0.03064 1485.5201 +#> 35257 713822.5 9557982 2350.80 29.307937 0.01549 -0.00169 649.9975 +#> 39883 712732.5 9560842 2082.69 25.090968 0.00732 0.01858 298.4490 +#> 6577 714142.5 9559902 2244.30 57.890510 -0.06986 -0.05013 616.4128 +#> 32342 714862.5 9559622 2294.74 36.967237 0.01021 0.00109 915.4989 +#> 13548 715812.5 9558122 2821.65 15.508249 0.02751 0.00388 180.4414 +#> 6206 714122.5 9560992 1917.28 37.969913 -0.01023 -0.00227 841.2025 +#> cslope distroad slides distdeforest distslidespast log.carea +#> 4965 33.9059234 300.00 TRUE 300.00 100 2.699406 +#> 37912 30.2945705 300.00 TRUE 9.15 2 2.802317 +#> 31357 23.8172826 158.92 TRUE 0.00 5 3.243543 +#> 25090 43.5316144 300.00 TRUE 300.00 26 2.745084 +#> 40756 39.3352715 300.00 TRUE 300.00 100 3.000471 +#> 14254 15.6652391 300.00 TRUE 300.00 10 2.510336 +#> 24072 37.6668184 300.00 TRUE 300.00 100 3.351559 +#> 34512 34.6398824 300.00 TRUE 195.00 2 3.224136 +#> 24129 24.1289716 300.00 TRUE 300.00 89 2.886120 +#> 9669 31.6645125 300.00 TRUE 300.00 1 3.610558 +#> 39885 6.0848118 210.57 TRUE 0.00 100 2.781111 +#> 32622 30.1713845 300.00 TRUE 300.00 100 3.265863 +#> 38993 29.5485794 300.00 TRUE 300.00 2 3.128238 +#> 22814 34.9120373 300.00 TRUE 300.00 6 2.976574 +#> 42598 20.0689927 300.00 TRUE 247.02 100 3.183868 +#> 38287 14.8969027 41.43 TRUE 1.90 100 3.349380 +#> 39061 35.2409151 300.00 TRUE 70.65 56 3.215230 +#> 30800 33.2481679 300.00 TRUE 300.00 100 4.403435 +#> 38410 8.7610976 69.52 TRUE 47.61 65 3.695093 +#> 48498 33.6182986 300.00 TRUE 300.00 100 3.098674 +#> 35285 42.4664859 300.00 TRUE 300.00 100 2.997681 +#> 29472 18.6486303 111.09 TRUE 0.00 25 3.424505 +#> 22933 33.0631025 87.56 TRUE 115.40 2 3.506583 +#> 38032 35.0082942 300.00 TRUE 300.00 60 2.828706 +#> 14522 28.3986531 300.00 TRUE 300.00 100 3.047228 +#> 31576 26.9353189 300.00 TRUE 300.00 4 3.635091 +#> 20290 15.4566824 95.53 TRUE 0.00 49 4.011286 +#> 27978 14.3285285 300.00 TRUE 300.00 12 2.377795 +#> 15340 27.9752373 300.00 TRUE 300.00 100 3.144334 +#> 4827 21.0739607 300.00 TRUE 300.00 0 2.559711 +#> 6808 28.5659568 300.00 TRUE 300.00 40 3.952656 +#> 18391 34.5075291 300.00 TRUE 300.00 100 5.556760 +#> 8073 36.0757146 138.71 TRUE 76.41 35 2.997730 +#> 27957 30.0602307 300.00 TRUE 300.00 75 2.790190 +#> 15390 29.3222611 60.00 TRUE 118.92 2 3.372821 +#> 22735 36.5168921 300.00 TRUE 300.00 90 2.622310 +#> 46756 38.9617030 300.00 TRUE 300.00 37 2.473902 +#> 2858 35.3194103 300.00 TRUE 300.00 2 2.262887 +#> 34563 38.1566973 300.00 TRUE 300.00 100 4.204981 +#> 46235 30.1083592 300.00 TRUE 41.37 0 2.752133 +#> 27084 21.6727016 300.00 TRUE 300.00 81 3.000309 +#> 5578 25.7418478 300.00 TRUE 300.00 100 2.640875 +#> 2976 26.8820338 300.00 TRUE 300.00 100 2.684323 +#> 34707 27.7025094 165.13 TRUE 39.56 100 3.473052 +#> 20220 32.1755909 300.00 TRUE 300.00 8 3.050346 +#> 21952 40.0915758 300.00 TRUE 300.00 64 2.828908 +#> 25244 33.2189470 300.00 TRUE 300.00 6 5.916915 +#> 26698 34.2496981 300.00 TRUE 300.00 7 2.767615 +#> 19887 38.0621593 300.00 TRUE 300.00 63 2.760635 +#> 38862 34.9000052 300.00 TRUE 104.48 92 3.151879 +#> 2595 30.4395288 300.00 TRUE 300.00 100 3.538794 +#> 32909 28.3791726 118.07 TRUE 98.15 0 3.619059 +#> 10507 27.1622102 300.00 TRUE 300.00 46 2.580865 +#> 7668 36.3203676 300.00 TRUE 300.00 5 2.789721 +#> 44825 29.0214582 300.00 TRUE 300.00 100 2.392854 +#> 46065 29.7708870 300.00 TRUE 251.11 100 3.482407 +#> 7064 29.1326121 300.00 TRUE 300.00 100 3.079714 +#> 2552 32.2156979 300.00 TRUE 300.00 100 3.005436 +#> 45592 15.6486233 300.00 TRUE 300.00 2 2.460907 +#> 41783 28.1860858 300.00 TRUE 300.00 100 2.973405 +#> 34192 33.8749837 273.94 TRUE 4.48 85 3.009256 +#> 5239 24.6652601 197.29 TRUE 4.67 57 3.591568 +#> 31834 24.6469255 215.68 TRUE 0.00 100 3.354565 +#> 18897 31.7882714 300.00 TRUE 20.00 5 3.919162 +#> 39722 26.5416969 300.00 TRUE 300.00 100 2.906387 +#> 30268 47.2724558 300.00 TRUE 300.00 41 2.705801 +#> 39166 29.8184425 300.00 TRUE 1.90 90 2.994673 +#> 33910 32.6734912 300.00 TRUE 300.00 6 6.024389 +#> 31117 26.6459752 300.00 TRUE 291.23 100 3.277641 +#> 47702 33.6664271 300.00 TRUE 300.00 86 4.665675 +#> 20797 32.0569886 285.07 TRUE 0.00 96 3.663802 +#> 34615 34.1895376 17.57 TRUE 142.95 16 3.011436 +#> 30933 12.8434219 99.29 TRUE 0.00 6 3.137802 +#> 26428 35.4752548 300.00 TRUE 0.00 100 3.486483 +#> 26101 28.6461709 300.00 TRUE 300.00 100 3.068405 +#> 17058 26.3228270 300.00 TRUE 300.00 8 2.751808 +#> 28173 23.8837457 300.00 TRUE 300.00 59 3.185096 +#> 2801 25.7464315 300.00 TRUE 300.00 26 2.422656 +#> 16071 29.6213451 300.00 TRUE 300.00 100 2.201225 +#> 22297 46.6496507 300.00 TRUE 300.00 11 2.548891 +#> 39624 33.2945775 300.00 TRUE 300.00 2 3.254015 +#> 750 34.3471009 300.00 TRUE 0.00 0 3.804807 +#> 48650 27.0522023 300.00 TRUE 300.00 100 3.027918 +#> 14494 21.1851145 300.00 TRUE 300.00 28 2.535469 +#> 44612 34.6914486 300.00 TRUE 300.00 100 2.460216 +#> 39405 27.7202711 300.00 TRUE 300.00 100 3.413298 +#> 14950 41.8740475 300.00 TRUE 300.00 15 2.616549 +#> 24723 28.1734807 300.00 TRUE 36.63 100 2.774548 +#> 42617 31.2319294 300.00 TRUE 300.00 100 2.848895 +#> 29024 42.5730560 300.00 TRUE 300.00 32 2.529363 +#> 28260 31.8335351 300.00 TRUE 300.00 100 3.409673 +#> 22847 24.7477660 300.00 TRUE 300.00 41 3.134623 +#> 46792 27.6010959 300.00 TRUE 300.00 18 2.657830 +#> 46627 53.6832806 300.00 TRUE 300.00 100 2.720428 +#> 49459 41.8568588 300.00 TRUE 300.00 100 2.963516 +#> 37115 32.1870501 300.00 TRUE 300.00 100 4.596544 +#> 17227 43.5975045 300.00 TRUE 300.00 100 2.467494 +#> 12074 23.2855778 300.00 TRUE 300.00 63 3.252649 +#> 47722 33.1158146 300.00 TRUE 65.02 1 2.976445 +#> 4479 32.0117250 300.00 TRUE 300.00 55 2.355626 +#> 10213 31.5986224 300.00 TRUE 300.00 100 3.351772 +#> 23832 25.4851627 300.00 TRUE 300.00 8 2.676567 +#> 3771 0.3294507 44.23 TRUE 1.67 94 2.117095 +#> 21490 34.0600491 300.00 TRUE 300.00 1 3.293630 +#> 39466 27.2515916 300.00 TRUE 300.00 100 3.060187 +#> 6470 28.0932666 300.00 TRUE 300.00 2 2.959800 +#> 10145 27.4160305 276.16 FALSE 91.33 100 3.227418 +#> 748 38.1326331 300.00 FALSE 300.00 12 2.922919 +#> 34250 30.8927384 300.00 FALSE 300.00 100 3.348210 +#> 15558 50.3440826 300.00 FALSE 300.00 45 2.657927 +#> 31071 31.7378511 300.00 FALSE 300.00 76 3.259480 +#> 18286 24.3507063 300.00 FALSE 300.00 42 2.931097 +#> 26023 45.0728709 300.00 FALSE 300.00 100 2.762367 +#> 20067 29.8235992 300.00 FALSE 268.03 100 2.590250 +#> 26963 42.0631236 300.00 FALSE 300.00 100 3.420610 +#> 19493 18.5724269 300.00 FALSE 291.23 100 2.550849 +#> 29125 23.9450522 300.00 FALSE 300.00 100 2.626221 +#> 44396 21.4423725 300.00 FALSE 300.00 100 2.270336 +#> 20547 34.2611573 300.00 FALSE 42.56 100 4.078911 +#> 40139 33.1433166 300.00 FALSE 1.11 21 3.274260 +#> 34616 31.4995007 300.00 FALSE 300.00 100 2.321589 +#> 32849 29.5520172 300.00 FALSE 300.00 45 3.149905 +#> 21925 31.2485452 300.00 FALSE 300.00 95 3.537385 +#> 14800 31.5229920 300.00 FALSE 300.00 100 3.103552 +#> 8895 38.0776291 300.00 FALSE 300.00 90 2.432641 +#> 47059 25.7596095 300.00 FALSE 300.00 100 3.023816 +#> 34105 20.0770141 300.00 FALSE 300.00 26 3.059266 +#> 29835 28.8530086 300.00 FALSE 300.00 45 3.616112 +#> 13908 34.3774677 300.00 FALSE 300.00 96 5.681725 +#> 35038 40.2038755 300.00 FALSE 300.00 100 3.347416 +#> 42424 22.6071957 275.21 FALSE 93.50 70 2.791131 +#> 41612 25.5871492 300.00 FALSE 300.00 100 3.004762 +#> 1479 43.6765727 300.00 FALSE 300.00 100 2.938910 +#> 14671 37.8822505 123.04 FALSE 75.11 0 3.182033 +#> 32430 22.8604431 300.00 FALSE 300.00 100 2.398827 +#> 484 32.1457971 300.00 FALSE 55.63 100 3.216864 +#> 44434 36.3197946 300.00 FALSE 300.00 100 3.451290 +#> 9007 30.9643581 300.00 FALSE 14.98 35 2.784158 +#> 280 29.3234070 60.55 FALSE 127.54 29 3.333837 +#> 4943 37.7716697 300.00 FALSE 300.00 100 3.019161 +#> 10981 29.1664166 300.00 FALSE 300.00 100 5.155983 +#> 16845 22.8827884 300.00 FALSE 25.55 33 3.874467 +#> 14881 26.8562507 300.00 FALSE 300.00 100 2.948550 +#> 33881 29.1858971 300.00 FALSE 300.00 100 3.171879 +#> 35257 30.8818522 300.00 FALSE 300.00 79 2.812912 +#> 39883 17.0443485 233.81 FALSE 219.23 100 2.474870 +#> 6577 42.4074712 300.00 FALSE 300.00 70 2.789872 +#> 32342 37.6198359 300.00 FALSE 300.00 81 2.961658 +#> 13548 16.9784583 300.00 FALSE 300.00 100 2.256336 +#> 6206 32.4632157 294.31 FALSE 35.90 100 2.924901
partition_cv_strat
creates a set of sample indices corresponding
+to cross-validation test and training sets.
partition_cv_strat(data, coords = c("x", "y"), nfold = 10, + return_factor = FALSE, repetition = 1, seed1 = NULL, strat)+ +
data | +
|
+
---|---|
coords | +vector of length 2 defining the variables in |
+
nfold | +number of partitions (folds) in |
+
return_factor | +if |
+
repetition | +numeric vector: cross-validation repetitions
+to be generated. Note that this is not the number of repetitions,
+but the indices of these repetitions. E.g., use |
+
seed1 | +
|
+
strat | +character: column in |
+
A represampling
object, see also
+partition_cv
. partition_strat_cv
, however,
+stratified with respect to the variable data[,strat]
;
+i.e., cross-validation partitioning is done within each set
+data[data[,strat]==i,]
(i
in levels(data[, strat])
), and
+the i
th folds of all levels are combined into one cross-validation
+fold.
sperrorest
, as.resampling
,
+resample_strat_uniform
+data(ecuador) +parti <- partition_cv_strat(ecuador, strat = 'slides', nfold = 5, +repetition = 1) +idx <- parti[['1']][[1]]$train +mean(ecuador$slides[idx] == 'TRUE') / mean(ecuador$slides == 'TRUE')#> [1] 0.9996672# always == 1 +# Non-stratified cross-validation: +parti <- partition_cv(ecuador, nfold = 5, repetition = 1) +idx <- parti[['1']][[1]]$train +mean(ecuador$slides[idx] == 'TRUE') / mean(ecuador$slides == 'TRUE')#> [1] 1.002166# close to 1 because of large sample size, but with some random variation +
partition_disc
partitions the sample into training and tests set by
+selecting circular test areas (possibly surrounded by an exclusion buffer)
+and using the remaining samples as training samples (leave-one-disc-out
+cross-validation). partition_loo
creates training and test sets for
+leave-one-out cross-validation with (optional) buffer.
partition_disc(data, coords = c("x", "y"), radius, buffer = NULL, + ndisc = nrow(data), seed1 = NULL, return_train = TRUE, prob = NULL, + replace = FALSE, repetition = 1) + +partition_loo(data, ndisc = nrow(data), replace = FALSE, ...)+ +
data | +
|
+
---|---|
coords | +vector of length 2 defining the variables in |
+
radius | +radius of test area discs; performs leave-one-out resampling +if radius <0. |
+
buffer | +radius of additional 'neutral area' around test area discs +that is excluded from training and test sets; defaults to 0, +i.e. all samples are either in the test area or in the training area. |
+
ndisc | +Number of discs to be randomly selected; each disc constitutes
+a separate test set. Defaults to |
+
seed1 | +
|
+
return_train | +If |
+
prob | +optional argument to sample. |
+
replace | +optional argument to sample: sampling with or +without replacement? |
+
repetition | +see |
+
... | +arguments to be passed to |
+
A represampling object.
+Contains length(repetition)
resampling
objects.
+Each of these contains ndisc
lists with indices of test and
+(if return_train = TRUE
) training sets.
Test area discs are centered at (random) samples, not at general
+random locations. Test area discs may (and likely will) overlap independently
+of the value of replace
. replace
only controls the replacement
+of the center point of discs when drawing center points from the samples.
+ radius < 0
does leave-one-out resampling with an optional buffer.
+radius = 0
is similar except that samples with identical coordinates
+would fall within the test area disc.
Brenning, A. 2005. Spatial prediction models for landslide +hazards: review, comparison and evaluation. Natural Hazards and Earth System +Sciences, 5(6): 853-862.
+ +sperrorest, partition_cv, +partition_kmeans
+ + ++data(ecuador) +parti <- partition_disc(ecuador, radius = 200, buffer = 200, + ndisc = 5, repetition = 1:2) +# plot(parti,ecuador) +summary(parti)#> $`1` +#> n.train n.test +#> 635 718 6 +#> 44 727 9 +#> 263 723 24 +#> 28 727 6 +#> 129 708 17 +#> +#> $`2` +#> n.train n.test +#> 70 712 6 +#> 594 708 13 +#> 412 711 13 +#> 250 729 5 +#> 689 725 5 +#>+# leave-one-out with buffer: +parti.loo <- partition_loo(ecuador, buffer = 200) +summary(parti)#> $`1` +#> n.train n.test +#> 635 718 6 +#> 44 727 9 +#> 263 723 24 +#> 28 727 6 +#> 129 708 17 +#> +#> $`2` +#> n.train n.test +#> 70 712 6 +#> 594 708 13 +#> 412 711 13 +#> 250 729 5 +#> 689 725 5 +#>
partition_factor
creates a represampling object, i.e. a set of sample
+indices defining cross-validation test and training sets.
partition_factor(data, coords = c("x", "y"), fac, return_factor = FALSE, + repetition = 1)+ +
data | +
|
+
---|---|
coords | +vector of length 2 defining the variables in |
+
fac | +either the name of a variable (column) in |
+
return_factor | +if |
+
repetition | +numeric vector: cross-validation repetitions
+to be generated. Note that this is not the number of repetitions,
+but the indices of these repetitions. E.g., use |
+
A represampling object, +see also partition_cv for details.
+ +In this partitioning approach, all repetition
s are identical and
+therefore pseudo-replications.
sperrorest, partition_cv, +as.resampling.factor
+ + ++data(ecuador) +# I don't recommend using this partitioning for cross-validation, +# this is only for demonstration purposes: +breaks <- quantile(ecuador$dem, seq(0, 1, length = 6)) +ecuador$zclass <- cut(ecuador$dem, breaks, include.lowest = TRUE) +summary(ecuador$zclass)#> [1.72e+03,1.92e+03] (1.92e+03,2.14e+03] (2.14e+03,2.31e+03] (2.31e+03,2.57e+03] +#> 151 150 150 150 +#> (2.57e+03,3.11e+03] +#> 150parti <- partition_factor(ecuador, fac = 'zclass') +# plot(parti,ecuador) +summary(parti)#> $`1` +#> n.train n.test +#> [1.72e+03,1.92e+03] 600 151 +#> (1.92e+03,2.14e+03] 601 150 +#> (2.14e+03,2.31e+03] 601 150 +#> (2.31e+03,2.57e+03] 601 150 +#> (2.57e+03,3.11e+03] 601 150 +#>
partition_factor_cv
creates a represampling object,
+i.e. a set of sample indices defining cross-validation test and
+training sets, where partitions are obtained by resampling at the level of
+groups of observations as defined by a given factor variable.
+This can be used, for example, to resample agricultural data that is grouped
+by fields, at the agricultural field level in order to preserve
+spatial autocorrelation within fields.
partition_factor_cv(data, coords = c("x", "y"), fac, nfold = 10, + repetition = 1, seed1 = NULL, return_factor = FALSE)+ +
data | +
|
+
---|---|
coords | +vector of length 2 defining the variables in |
+
fac | +either the name of a variable (column) in |
+
nfold | +number of partitions (folds) in |
+
repetition | +numeric vector: cross-validation repetitions
+to be generated. Note that this is not the number of repetitions,
+but the indices of these repetitions. E.g., use |
+
seed1 | +
|
+
return_factor | +if |
+
A represampling object, +see also partition_cv for details.
+ +In this partitioning approach, the number of factor levels in
+fac
must be large enough for this factor-level resampling to make
+sense.
sperrorest, partition_cv, +partition_factor, as.resampling.factor
+ + +partition_kmeans
divides the study area into irregularly shaped
+spatial partitions based on k-means (kmeans) clustering
+of spatial coordinates.
partition_kmeans(data, coords = c("x", "y"), nfold = 10, repetition = 1, + seed1 = NULL, return_factor = FALSE, balancing_steps = 1, + order_clusters = TRUE, ...)+ +
data | +
|
+
---|---|
coords | +vector of length 2 defining the variables in |
+
nfold | +number of cross-validation folds, i.e. parameter k in +k-means clustering. |
+
repetition | +numeric vector: cross-validation repetitions
+to be generated. Note that this is not the number of repetitions,
+but the indices of these repetitions. E.g., use |
+
seed1 | +
|
+
return_factor | +if |
+
balancing_steps | +if |
+
order_clusters | +if |
+
... | +additional arguments to kmeans. |
+
A represampling object, see also +partition_cv for details.
+ +Default parameter settings may change in future releases.
+ +Brenning, A., Long, S., & Fieguth, P. (2012). +Detecting rock glacier flow structures using Gabor filters and IKONOS +imagery. Remote Sensing of Environment, 125, 227-237. +doi:10.1016/j.rse.2012.07.005
+Russ, G. & A. Brenning. 2010a. Data mining in precision agriculture: +Management of spatial information. In 13th International Conference on +Information Processing and Management of Uncertainty, +IPMU 2010; Dortmund; 28 June - 2 July 2010. +Lecture Notes in Computer Science, 6178 LNAI: 350-359.
+ +sperrorest, partition_cv, +partition_disc, partition_tiles, +kmeans
+ + ++data(ecuador) +resamp <- partition_kmeans(ecuador, nfold = 5, repetition = 2) +# plot(resamp, ecuador)
partition_tiles
divides the study area into a specified number of
+rectangular tiles. Optionally small partitions can be merged with adjacent
+tiles to achieve a minimum number or percentage of samples in each tile.
partition_tiles(data, coords = c("x", "y"), dsplit = NULL, nsplit = NULL, + rotation = c("none", "random", "user"), user_rotation, offset = c("none", + "random", "user"), user_offset, reassign = TRUE, min_frac = 0.025, + min_n = 5, iterate = 1, return_factor = FALSE, repetition = 1, + seed1 = NULL)+ +
data | +
|
+
---|---|
coords | +vector of length 2 defining the variables in |
+
dsplit | +optional vector of length 2: equidistance of splits in
+(possibly rotated) x direction ( |
+
nsplit | +optional vector of length 2: number of splits in
+(possibly rotated) x direction ( |
+
rotation | +indicates whether and how the rectangular grid should
+be rotated; random rotation is only between |
+
user_rotation | +if |
+
offset | +indicates whether and how the rectangular grid should be +shifted by an offset. |
+
user_offset | +if |
+
reassign | +logical (default |
+
min_frac | +numeric >=0, <1: minimum relative size of partition as
+percentage of sample; argument passed to get_small_tiles.
+Will be ignored if |
+
min_n | +integer >=0: minimum number of samples per partition;
+argument passed to get_small_tiles.
+Will be ignored if |
+
iterate | +argument to be passed to tile_neighbors |
+
return_factor | +if |
+
repetition | +numeric vector: cross-validation repetitions
+to be generated. Note that this is not the number of repetitions,
+but the indices of these repetitions. E.g., use |
+
seed1 | +
|
+
A represampling object.
+Contains length(repetition)
resampling objects as
+repetitions. The exact number of folds / test-set tiles within each
+resampling objects depends on the spatial configuration of
+the data set and possible cleaning steps (see min_frac
, min_n
).
Default parameter settings may change in future releases.
+This function, especially the rotation and shifting part of it and the
+algorithm for cleaning up small tiles is still a bit experimental.
+Use with caution.
+For non-zero offsets (offset!='none')
), the number of tiles may
+actually be greater than nsplit[1]*nsplit[2]
because of fractional
+tiles lurking into the study region. reassign=TRUE
with suitable
+thresholds is therefore recommended for non-zero (including random) offsets.
sperrorest, as.resampling.factor, +get_small_tiles, tile_neighbors
+ + ++data(ecuador) +parti <- partition_tiles(ecuador, nsplit = c(4, 3), reassign = FALSE) +# plot(parti,ecuador) +summary(parti) # tile A4 has only 55 samples#> $`1` +#> n.train n.test +#> X1:Y2 686 65 +#> X1:Y3 665 86 +#> X2:Y1 711 40 +#> X2:Y2 666 85 +#> X2:Y3 690 61 +#> X3:Y1 664 87 +#> X3:Y2 661 90 +#> X3:Y3 681 70 +#> X4:Y1 671 80 +#> X4:Y2 692 59 +#> X4:Y3 723 28 +#># same partitioning, but now merge tiles with less than 100 samples to +# adjacent tiles: +parti2 <- partition_tiles(ecuador, nsplit = c(4,3), reassign = TRUE, +min_n = 100) +# plot(parti2,ecuador) +summary(parti2)#> $`1` +#> n.train n.test +#> X1:Y3 600 151 +#> X2:Y2 626 125 +#> X3:Y1 584 167 +#> X3:Y2 574 177 +#> X3:Y3 620 131 +#># tile B4 (in 'parti') was smaller than A3, therefore A4 was merged with B4, +# not with A3 +# now with random rotation and offset, and tiles of 2000 m length: +parti3 <- partition_tiles(ecuador, dsplit = 2000, offset = 'random', +rotation = 'random', reassign = TRUE, min_n = 100) +# plot(parti3, ecuador) +summary(parti3)#> $`1` +#> n.train n.test +#> X1:Y2 584 167 +#> X2:Y1 530 221 +#> X2:Y2 508 243 +#> X3:Y2 631 120 +#>
plot.represampling
displays the partitions or samples corresponding
+arising from the resampling of a data set.
# S3 method for represampling +plot(x, data, coords = c("x", "y"), pch = "+", + wiggle_sd = 0, ...) + +# S3 method for resampling +plot(x, ...)+ +
x | +a represampling resp. resampling object. |
+
---|---|
data | +a |
+
coords | +vector of length 2 defining the variables in |
+
pch | +point symbol (to be passed to points). |
+
wiggle_sd | +'wiggle' the point locations in x and y direction to avoid +overplotting of samples drawn multiple times by bootstrap methods; +this is a standard deviation (in the units of the x/y coordinates) of a +normal distribution and defaults to 0 (no wiggling). |
+
... | +additional arguments to plot. |
+
This function is not intended for samples obtained by resampling with +replacement (e.g., bootstrap) because training and test points will be +overplotted in that case. The size of the plotting region will also limit +the number of maps that can be displayed at once, i.e., the number of rows +(repetitions) and fields (columns).
+ + ++data(ecuador) +# non-spatial cross-validation: +resamp <- partition_cv(ecuador, nfold = 5, repetition = 1:2) +# plot(resamp, ecuador) +# spatial cross-validation using k-means clustering: +resamp <- partition_kmeans(ecuador, nfold = 5, repetition = 1:2) +# plot(resamp, ecuador)
Accounts for missing factor levels present only in test data +but not in train data by setting values to NA
+ + +remove_missing_levels(fit, test_data)+ +
fit | +fitted model on training data |
+
---|---|
test_data | +data to make predictions for |
+
data.frame with matching factor levels to fitted model
+ + +represampling_bootstrap
draws a bootstrap random sample
+(with replacement) from data
.
represampling_bootstrap(data, coords = c("x", "y"), nboot = nrow(data), + repetition = 1, seed1 = NULL, oob = FALSE)+ +
data | +
|
+
---|---|
coords | +vector of length 2 defining the variables in |
+
nboot | +Size of bootstrap sample |
+
repetition | +numeric vector: cross-validation repetitions
+to be generated. Note that this is not the number of repetitions,
+but the indices of these repetitions. E.g., use |
+
seed1 | +
|
+
oob | +logical (default |
+
A represampling object. This is a (named) list
+containing length(repetition)
.
+resampling objects. Each of these contains only one list with
+indices of train
ing and test
samples.
+Indices are row indices for data
.
+data(ecuador) +# only 10 bootstrap repetitions, normally use >=100: +parti <- represampling_bootstrap(ecuador, repetition = 10) +# plot(parti, ecuador) # careful: overplotting occurs +# because some samples are included in both the training and +# the test sample (possibly even multiple times)
represampling_disc_bootstrap
performs a spatial block bootstrap by
+resampling at the level of rectangular partitions or 'tiles' generated by
+partition_tiles
.
represampling_disc_bootstrap(data, coords = c("x", "y"), nboot, + repetition = 1, seed1 = NULL, oob = FALSE, ...)+ +
data | +
|
+
---|---|
coords | +vector of length 2 defining the variables in |
+
nboot | +number of bootstrap samples; you may specify different values
+for the training sample ( |
+
repetition | +numeric vector: cross-validation repetitions
+to be generated. Note that this is not the number of repetitions,
+but the indices of these repetitions. E.g., use |
+
seed1 | +
|
+
oob | +logical (default |
+
... | +additional arguments to be passed to partition_disc;
+note that a |
+
Performs nboot
out of nrow(data)
resampling of circular
+discs. This is an overlapping spatial block bootstrap where the
+blocks are circular.
+data(ecuador) +# Overlapping disc bootstrap: +parti <- represampling_disc_bootstrap(ecuador, radius = 200, nboot = 20, +oob = FALSE) +# plot(parti, ecuador) +# Note that a 'buffer' argument would make no difference because boostrap +# sets of discs are drawn independently for the training and test sample. +# +# Overlapping disc bootstrap for training sample, out-of-bag sample as test +# sample: +parti <- represampling_disc_bootstrap(ecuador, radius = 200, buffer = 200, + nboot = 10, oob = TRUE) +# plot(parti,ecuador)
represampling_factor_bootstrap
resamples partitions defined by a
+factor variable. This can be used for non-overlapping block bootstraps and
+similar.
represampling_factor_bootstrap(data, fac, repetition = 1, nboot = -1, + seed1 = NULL, oob = FALSE)+ +
data | +
|
+
---|---|
fac | +defines a grouping or partitioning of the samples in |
+
repetition | +numeric vector: cross-validation repetitions
+to be generated. Note that this is not the number of repetitions,
+but the indices of these repetitions. E.g., use |
+
nboot | +number of bootstrap replications used for generating the
+bootstrap training sample ( |
+
seed1 | +
|
+
oob | +if |
+
nboot
refers to the number of groups
+(as defined by the factors) to be drawn with replacement from the set of
+groups. I.e., if fac
is a factor variable, nboot
would normally
+not be greater than nlevels(fac)
, nlevels(fac)
being the
+default as per nboot = -1
.
represampling_disc_bootstrap, +represampling_tile_bootstrap
+ + ++data(ecuador) +# a dummy example for demonstration, performing bootstrap +# at the level of an arbitrary factor variable: +parti <- represampling_factor_bootstrap(ecuador, + factor(floor(ecuador$dem / 100)), + oob = TRUE) +# plot(parti,ecuador) +# using the factor bootstrap for a non-overlapping block bootstrap +# (see also represampling_tile_bootstrap): +fac <- partition_tiles(ecuador, return_factor = TRUE, repetition = c(1:3), + dsplit = 500, min_n = 200, rotation = 'random', + offset = 'random') +parti <- represampling_factor_bootstrap(ecuador, fac, oob = TRUE, +repetition = c(1:3)) +# plot(parti, ecuador)
represampling_kmeans_bootstrap
performs a non-overlapping spatial
+block bootstrap by resampling at the level of irregularly-shaped partitions
+generated by partition_kmeans.
represampling_kmeans_bootstrap(data, coords = c("x", "y"), repetition = 1, + nfold = 10, nboot = nfold, seed1 = NULL, oob = FALSE, ...)+ +
data | +
|
+
---|---|
coords | +vector of length 2 defining the variables in |
+
repetition | +numeric vector: cross-validation repetitions
+to be generated. Note that this is not the number of repetitions,
+but the indices of these repetitions. E.g., use |
+
nfold | +see partition_kmeans |
+
nboot | ++ |
seed1 | +
|
+
oob | ++ |
... | +additional arguments to be passed to partition_kmeans |
+
represampling_tile_bootstrap
performs a non-overlapping spatial
+block bootstrap by resampling at the level of rectangular partitions or
+'tiles' generated by partition_tiles.
represampling_tile_bootstrap(data, coords = c("x", "y"), repetition = 1, + nboot = -1, seed1 = NULL, oob = FALSE, ...)+ +
data | +
|
+
---|---|
coords | +vector of length 2 defining the variables in |
+
repetition | +numeric vector: cross-validation repetitions
+to be generated. Note that this is not the number of repetitions,
+but the indices of these repetitions. E.g., use |
+
nboot | ++ |
seed1 | +
|
+
oob | ++ |
... | +additional arguments to be passed to partition_tiles |
+
resample_factor
draws a random (sub)sample
+(with or without replacement) of the groups or clusters identified by
+the fac
argument.
resample_factor(data, param = list(fac = "class", n = Inf, replace = FALSE))+ +
data | +a |
+
---|---|
param | +a list with the following components: |
+
a data.frame
containing a subset of the rows of data
.
If param$replace=FALSE
, a subsample of
+min(param$n,nlevel(data[,fac]))
groups will be drawn from data
.
+If param$replace=TRUE
, the number of groups to be drawn is param$n
.
resample_strat_uniform()
, sample()
resample_strat_uniform
draws a stratified random sample
+(with or without replacement) from the samples in data
.
+Stratification is over the levels of data[, param$response]
.
+The same number of samples is drawn within each level.
resample_strat_uniform(data, param = list(strat = "class", nstrat = Inf, + replace = FALSE))+ +
data | +a |
+
---|---|
param | +a list with the following components: |
+
a data.frame
containing a subset of the rows of data
.
If param$replace=FALSE
, a subsample of size
+min(param$n,nrow(data))
will be drawn from data
.
+If param$replace=TRUE
, the size of the subsample is param$n
.
resample_uniform()
, sample()
+data(ecuador) # Muenchow et al. (2012), see ?ecuador +d <- resample_strat_uniform(ecuador, + param = list(strat = 'slides', nstrat = 100)) +nrow(d) # == 200#> [1] 200sum(d$slides == 'TRUE') # == 100#> [1] 100+
resample_uniform
draws a random (sub)sample
+(with or without replacement) from the samples in data
.
resample_uniform(data, param = list(n = Inf, replace = FALSE))+ +
data | +a |
+
---|---|
param | +a list with the following components: |
+
a data.frame
containing a subset of the rows of data
.
If param$replace=FALSE
, a subsample of size
+min(param$n,nrow(data))
will be drawn from data
.
+If param$replace=TRUE
, the size of the subsample is param$n
.
resample_strat_uniform()
, sample()
+data(ecuador) # Muenchow et al. (2012), see ?ecuador +d <- resample_uniform(ecuador, param = list(strat = 'slides', n = 200)) +nrow(d) # == 200#> [1] 200sum(d$slides == 'TRUE')#> [1] 139+
Runs model fitting, error estimation and variable importance +on fold level
+ + +runfolds(j = NULL, current_sample = NULL, data = NULL, i = NULL, + formula = NULL, model_args = NULL, par_cl = NULL, par_mode = NULL, + model_fun = NULL, pred_fun = NULL, imp_variables = NULL, + imp_permutations = NULL, err_fun = NULL, train_fun = NULL, + importance = NULL, current_res = NULL, current_impo = NULL, + pred_args = NULL, pooled_obs_train = NULL, pooled_obs_test = NULL, + pooled_pred_train = NULL, response = NULL, progress = NULL, + is_factor_prediction = NULL, pooled_pred_test = NULL, coords = NULL, + test_fun = NULL, imp_one_rep = NULL, do_gc = NULL, test_param = NULL, + train_param = NULL)+ + +
Runs model fitting, error estimation and variable importance +on fold level
+ + +runreps(current_sample = NULL, data = NULL, formula = NULL, + model_args = NULL, par_cl = NULL, do_gc = NULL, imp_one_rep = NULL, + model_fun = NULL, pred_fun = NULL, imp_variables = NULL, + imp_permutations = NULL, err_fun = NULL, importance = NULL, + current_res = NULL, current_impo = NULL, pred_args = NULL, + progress = NULL, pooled_obs_train = NULL, pooled_obs_test = NULL, + pooled_pred_train = NULL, response = NULL, is_factor_prediction = NULL, + pooled_pred_test = NULL, test_fun = NULL, test_param = NULL, + train_fun = NULL, train_param = NULL, coords = NULL, par_mode = NULL, + i = NULL)+ + +
This package implements spatial error estimation and permutation-based +spatial variable importance using different spatial cross-validation +and spatial block bootstrap methods. To cite `sperrorest' in publications, +reference the paper by Brenning (2012).
+ + + +Brenning, A. 2012. Spatial cross-validation and bootstrap for the +assessment of prediction rules in remote sensing: the R package 'sperrorest'. +2012 IEEE International Geoscience and Remote Sensing Symposium (IGARSS), +23-27 July 2012, p. 5372-5375.
+Brenning, A. 2005. Spatial prediction models for landslide hazards: +review, comparison and evaluation. Natural Hazards and Earth System Sciences, +5(6): 853-862.
+Russ, G. & A. Brenning. 2010a. Data mining in precision agriculture: +Management of spatial information. In 13th International Conference on +Information Processing and Management of Uncertainty, IPMU 2010; Dortmund; +28 June - 2 July 2010. Lecture Notes in Computer Science, +6178 LNAI: 350-359.
+Russ, G. & A. Brenning. 2010b. Spatial variable importance assessment for +yield prediction in Precision Agriculture. In Advances in Intelligent +Data Analysis IX, Proceedings, 9th International Symposium, IDA 2010, +Tucson, AZ, USA, 19-21 May 2010. +Lecture Notes in Computer Science, 6065 LNCS: 184-195.
+ + +sperrorest
is a flexible interface for multiple types of
+parallelized spatial and non-spatial cross-validation
+and bootstrap error estimation and parallelized permutation-based
+assessment of spatial variable importance.
sperrorest(formula, data, coords = c("x", "y"), model_fun, + model_args = list(), pred_fun = NULL, pred_args = list(), + smp_fun = partition_cv, smp_args = list(), train_fun = NULL, + train_param = NULL, test_fun = NULL, test_param = NULL, + err_fun = err_default, imp_variables = NULL, imp_permutations = 1000, + importance = !is.null(imp_variables), distance = FALSE, + par_args = list(par_mode = "foreach", par_units = NULL, par_option = NULL), + do_gc = 1, progress = "all", out_progress = "", benchmark = FALSE, + ...)+ +
formula | +A formula specifying the variables used by the |
+
---|---|
data | +a |
+
coords | +vector of length 2 defining the variables in |
+
model_fun | +Function that fits a predictive model, such as |
+
model_args | +Arguments to be passed to |
+
pred_fun | +Prediction function for a fitted model object created
+by |
+
pred_args | +(optional) Arguments to |
+
smp_fun | +A function for sampling training and test sets from
+ |
+
smp_args | +(optional) Arguments to be passed to |
+
train_fun | +(optional) A function for resampling or subsampling the +training sample in order to achieve, e.g., uniform sample sizes on all +training sets, or maintaining a certain ratio of positives and negatives +in training sets. +E.g. resample_uniform or resample_strat_uniform. |
+
train_param | +(optional) Arguments to be passed to |
+
test_fun | +(optional) Like |
+
test_param | +(optional) Arguments to be passed to |
+
err_fun | +A function that calculates selected error measures from the
+known responses in |
+
imp_variables | +(optional; used if |
+
imp_permutations | +(optional; used if |
+
importance | +logical (default: |
+
distance | +logical (default: |
+
par_args | +list of parallelization parameters:
|
+
do_gc | +numeric (default: 1): defines frequency of memory garbage
+collection by calling gc; if |
+
progress | +character (default: |
+
out_progress | +only used if |
+
benchmark | +(optional) logical (default: |
+
... | +Further options passed to makeCluster for
+ |
+
A list (object of class sperrorest
) with (up to) six components:
a sperrorestreperror
object containing
+predictive performances at the repetition level
a sperroresterror
object containing predictive
+performances at the fold level
a represampling()
object
a sperrorestimportance
object containing
+permutation-based variable importances at the fold level
a sperrorestbenchmark
object containing
+information on the system the code is running on, starting and
+finishing times, number of available CPU cores, parallelization mode,
+number of parallel units, and runtime performance
a sperrorestpackageversion
object containing
+information about the sperrorest
package version
By default sperrorest
runs in parallel on all cores using
+foreach
with the future backend. If this is not desired, specify
+par_units
in par_args
or set par_mode = "sequential"
.
Available parallelization modes include par_mode = "apply"
+(calls pbmclapply on Unix, parApply on Windows) and
+future
(future_lapply).
+For the latter and par_mode = "foreach"
, par_option
+(default to multiprocess
and
+cluster
, respectively) can be specified. See plan for further details.
Custom predict functions passed to pred_fun
, which consist of
+multiple custom defined child functions, must be defined in one function.
Brenning, A. 2012. Spatial cross-validation and bootstrap for +the assessment of prediction rules in remote sensing: the R package +'sperrorest'. +2012 IEEE International Geoscience and Remote Sensing Symposium (IGARSS), +23-27 July 2012, p. 5372-5375.
+Brenning, A. 2005. Spatial prediction models for landslide hazards: review, +comparison and evaluation. Natural Hazards and Earth System Sciences, +5(6): 853-862.
+Brenning, A., S. Long & P. Fieguth. Forthcoming. Detecting rock glacier flow +structures using Gabor filters and IKONOS imagery. +Submitted to Remote Sensing of Environment.
+Russ, G. & A. Brenning. 2010a. Data mining in precision agriculture: +Management of spatial information. In 13th International Conference on +Information Processing and Management of Uncertainty, IPMU 2010; Dortmund; +28 June - 2 July 2010. Lecture Notes in Computer Science, 6178 LNAI: 350-359.
+Russ, G. & A. Brenning. 2010b. Spatial variable importance assessment for +yield prediction in Precision Agriculture. In Advances in Intelligent +Data Analysis IX, Proceedings, 9th International Symposium, +IDA 2010, Tucson, AZ, USA, 19-21 May 2010. +Lecture Notes in Computer Science, 6065 LNCS: 184-195.
+ + +# NOT RUN { +##------------------------------------------------------------ +## Classification tree example using non-spatial partitioning +## setup and default parallel mode ("foreach") +##------------------------------------------------------------ + +data(ecuador) # Muenchow et al. (2012), see ?ecuador +fo <- slides ~ dem + slope + hcurv + vcurv + log.carea + cslope + +library(rpart) +mypred_part <- function(object, newdata) predict(object, newdata)[, 2] +ctrl <- rpart.control(cp = 0.005) # show the effects of overfitting +fit <- rpart(fo, data = ecuador, control = ctrl) + +### Non-spatial 5-repeated 10-fold cross-validation: +mypred_part <- function(object, newdata) predict(object, newdata)[, 2] +par_nsp_res <- sperrorest(data = ecuador, formula = fo, + model_fun = rpart, + model_args = list(control = ctrl), + pred_fun = mypred_part, + progress = TRUE, + smp_fun = partition_cv, + smp_args = list(repetition = 1:5, nfold = 10)) +summary(par_nsp_res$error_rep) +summary(par_nsp_res$error_fold) +summary(par_nsp_res$represampling) +# plot(par_nsp_res$represampling, ecuador) + +### Spatial 5-repeated 10-fold spatial cross-validation: +par_sp_res <- sperrorest(data = ecuador, formula = fo, + model_fun = rpart, + model_args = list(control = ctrl), + pred_fun = mypred_part, + progress = TRUE, + smp_fun = partition_kmeans, + smp_args = list(repetition = 1:5, nfold = 10)) +summary(par_sp_res$error_rep) +summary(par_sp_res$error_fold) +summary(par_sp_res$represampling) +# plot(par_sp_res$represampling, ecuador) + +smry <- data.frame( + nonspat_training = unlist(summary(par_nsp_res$error_rep, + level = 1)$train_auroc), + nonspat_test = unlist(summary(par_nsp_res$error_rep, + level = 1)$test_auroc), + spatial_training = unlist(summary(par_sp_res$error_rep, + level = 1)$train_auroc), + spatial_test = unlist(summary(par_sp_res$error_rep, + level = 1)$test_auroc)) +boxplot(smry, col = c('red','red','red','green'), + main = 'Training vs. test, nonspatial vs. spatial', + ylab = 'Area under the ROC curve') + +##------------------------------------------------------------ +## Logistic regression example (glm) using partition_kmeans +## and computation of permutation based variable importance +##------------------------------------------------------------ + +data(ecuador) +fo <- slides ~ dem + slope + hcurv + vcurv + log.carea + cslope + +out <- sperrorest(data = ecuador, formula = fo, + model_fun = glm, + model_args = list(family = "binomial"), + pred_fun = predict, + pred_args = list(type = "response"), + smp_fun = partition_cv, + smp_args = list(repetition = 1:2, nfold = 4), + par_args = list(par_mode = "future"), + importance = TRUE, imp_permutations = 10) +summary(out$error_rep) +summary(out$importance) +# }+
Calculates sample sizes of training and test sets within repetitions and
+folds of a resampling
or represampling
object.
# S3 method for represampling +summary(object, ...) + +# S3 method for resampling +summary(object, ...)+ +
object | +A |
+
---|---|
... | +currently ignored. |
+
A list of data.frame
s summarizing the sample sizes of training
+and test sets in each fold of each repetition.
Summary methods provide varying level of detail while print methods +provide full details.
+ + +# S3 method for sperrorestreperror +summary(object, level = 0, na.rm = TRUE, ...) + +# S3 method for sperrorest +summary(object, ...) + +# S3 method for sperrorestimportance +print(x, ...) + +# S3 method for sperroresterror +print(x, ...) + +# S3 method for sperrorestreperror +print(x, ...) + +# S3 method for sperrorest +print(x, ...) + +# S3 method for sperrorestbenchmarks +print(x, ...) + +# S3 method for sperrorestpackageversion +print(x, ...)+ +
object | +a sperrorest object |
+
---|---|
level | +Level at which errors are summarized: +0: overall; 1: repetition; 2: fold |
+
na.rm | +Remove |
+
... | +additional arguments for summary.sperroresterror +or summary.sperrorestimportance |
+
x | +Depending on method, a sperrorest,
+ |
+
sperrorest, +summary.sperroresterror, +summary.sperrorestimportance
+ + +sperrorest
— summary.sperroresterror • sperrorestsperrorest
summary.sperroresterror
calculates mean, standard deviation,
+median etc. of the calculated error measures at the specified level
+(overall, repetition, or fold).
+summary.sperrorestreperror
does the same with the pooled error,
+at the overall or repetition level.
# S3 method for sperroresterror +summary(object, level = 0, pooled = TRUE, + na.rm = TRUE, ...)+ +
object | +
|
+
---|---|
level | +Level at which errors are summarized: +0: overall; 1: repetition; 2: fold |
+
pooled | +If |
+
na.rm | +Remove |
+
... | +additional arguments (currently ignored) |
+
Depending on the level of aggregation, a list
or
+data.frame
with mean, and at level 0 also standard deviation,
+median and IQR of the error measures.
Let's use an example to explain the error_rep
argument.
+E.g., assume we are using 100-repeated 10-fold cross-validation.
+If error_rep = TRUE
(default), the mean and standard deviation calculated
+when summarizing at level = 0
+are calculated across the error estimates obtained for
+each of the 100*10 = 1000
folds.
+If error_rep = FALSE
, mean and standard deviation are calculated across
+the 100
repetitions, using the weighted average of the fold-level
+errors to calculate an error value for the entire sample.
+This will essentially not affect the mean value but of course the
+standard deviation of the error. error_rep = FALSE
is not recommended,
+it is mainly for testing purposes; when the test sets are small
+(as in leave-one-out cross-validation, in the extreme case),
+consider running sperrorest with error_rep = TRUE
and
+examine only the error_rep
component of its result.
sperrorest
— summary.sperrorestimportance • sperrorestsperrorest
summary.sperrorestimportance
calculated mean, standard deviation,
+median etc. of the calculated error measures at the specified level
+(overall, repetition, or fold).
# S3 method for sperrorestimportance +summary(object, level = 0, na.rm = TRUE, + which = NULL, ...)+ +
object | +
|
+
---|---|
level | +Level at which errors are summarized: +0: overall; 1: repetition; 2: fold |
+
na.rm | +Remove |
+
which | +optional character vector specifying selected variables for +which the importances should be summarized (to do: check implementation) |
+
... | +additional arguments (currently ignored) |
+
a list or data.frame, depending on the level
of aggregation
This based on 'counting' up and down based on the tile name.
+ + +tile_neighbors(nm, tileset, iterate = 0, diagonal = FALSE)+ +
nm | +Character string or factor: name of a tile, e.g., |
+
---|---|
tileset | +Admissible tile names; if missing and |
+
iterate | +internal - do not change default: to control behaviour in an +interactive call to this function. |
+
diagonal | +if |
+
Character string.
+ + +transfers output of parallel calls to runreps
+ + +transfer_parallel_output(my_res = NULL, res = NULL, impo = NULL, + pooled_error = NULL)+ + +