diff --git a/docs/LICENSE.html b/docs/LICENSE.html new file mode 100644 index 00000000..5899a2d2 --- /dev/null +++ b/docs/LICENSE.html @@ -0,0 +1,796 @@ + + + + + + + + +License • sperrorest + + + + + + + + + + + + + + + + + + + + + + + + + + + +
+
+ + + +
+ +
+
+ + +
                    GNU GENERAL PUBLIC LICENSE
+                       Version 3, 29 June 2007
+
+ Copyright (C) 2007 Free Software Foundation, Inc. <http://fsf.org/>
+ Everyone is permitted to copy and distribute verbatim copies
+ of this license document, but changing it is not allowed.
+
+                            Preamble
+
+  The GNU General Public License is a free, copyleft license for
+software and other kinds of works.
+
+  The licenses for most software and other practical works are designed
+to take away your freedom to share and change the works.  By contrast,
+the GNU General Public License is intended to guarantee your freedom to
+share and change all versions of a program--to make sure it remains free
+software for all its users.  We, the Free Software Foundation, use the
+GNU General Public License for most of our software; it applies also to
+any other work released this way by its authors.  You can apply it to
+your programs, too.
+
+  When we speak of free software, we are referring to freedom, not
+price.  Our General Public Licenses are designed to make sure that you
+have the freedom to distribute copies of free software (and charge for
+them if you wish), that you receive source code or can get it if you
+want it, that you can change the software or use pieces of it in new
+free programs, and that you know you can do these things.
+
+  To protect your rights, we need to prevent others from denying you
+these rights or asking you to surrender the rights.  Therefore, you have
+certain responsibilities if you distribute copies of the software, or if
+you modify it: responsibilities to respect the freedom of others.
+
+  For example, if you distribute copies of such a program, whether
+gratis or for a fee, you must pass on to the recipients the same
+freedoms that you received.  You must make sure that they, too, receive
+or can get the source code.  And you must show them these terms so they
+know their rights.
+
+  Developers that use the GNU GPL protect your rights with two steps:
+(1) assert copyright on the software, and (2) offer you this License
+giving you legal permission to copy, distribute and/or modify it.
+
+  For the developers' and authors' protection, the GPL clearly explains
+that there is no warranty for this free software.  For both users' and
+authors' sake, the GPL requires that modified versions be marked as
+changed, so that their problems will not be attributed erroneously to
+authors of previous versions.
+
+  Some devices are designed to deny users access to install or run
+modified versions of the software inside them, although the manufacturer
+can do so.  This is fundamentally incompatible with the aim of
+protecting users' freedom to change the software.  The systematic
+pattern of such abuse occurs in the area of products for individuals to
+use, which is precisely where it is most unacceptable.  Therefore, we
+have designed this version of the GPL to prohibit the practice for those
+products.  If such problems arise substantially in other domains, we
+stand ready to extend this provision to those domains in future versions
+of the GPL, as needed to protect the freedom of users.
+
+  Finally, every program is threatened constantly by software patents.
+States should not allow patents to restrict development and use of
+software on general-purpose computers, but in those that do, we wish to
+avoid the special danger that patents applied to a free program could
+make it effectively proprietary.  To prevent this, the GPL assures that
+patents cannot be used to render the program non-free.
+
+  The precise terms and conditions for copying, distribution and
+modification follow.
+
+                       TERMS AND CONDITIONS
+
+  0. Definitions.
+
+  "This License" refers to version 3 of the GNU General Public License.
+
+  "Copyright" also means copyright-like laws that apply to other kinds of
+works, such as semiconductor masks.
+
+  "The Program" refers to any copyrightable work licensed under this
+License.  Each licensee is addressed as "you".  "Licensees" and
+"recipients" may be individuals or organizations.
+
+  To "modify" a work means to copy from or adapt all or part of the work
+in a fashion requiring copyright permission, other than the making of an
+exact copy.  The resulting work is called a "modified version" of the
+earlier work or a work "based on" the earlier work.
+
+  A "covered work" means either the unmodified Program or a work based
+on the Program.
+
+  To "propagate" a work means to do anything with it that, without
+permission, would make you directly or secondarily liable for
+infringement under applicable copyright law, except executing it on a
+computer or modifying a private copy.  Propagation includes copying,
+distribution (with or without modification), making available to the
+public, and in some countries other activities as well.
+
+  To "convey" a work means any kind of propagation that enables other
+parties to make or receive copies.  Mere interaction with a user through
+a computer network, with no transfer of a copy, is not conveying.
+
+  An interactive user interface displays "Appropriate Legal Notices"
+to the extent that it includes a convenient and prominently visible
+feature that (1) displays an appropriate copyright notice, and (2)
+tells the user that there is no warranty for the work (except to the
+extent that warranties are provided), that licensees may convey the
+work under this License, and how to view a copy of this License.  If
+the interface presents a list of user commands or options, such as a
+menu, a prominent item in the list meets this criterion.
+
+  1. Source Code.
+
+  The "source code" for a work means the preferred form of the work
+for making modifications to it.  "Object code" means any non-source
+form of a work.
+
+  A "Standard Interface" means an interface that either is an official
+standard defined by a recognized standards body, or, in the case of
+interfaces specified for a particular programming language, one that
+is widely used among developers working in that language.
+
+  The "System Libraries" of an executable work include anything, other
+than the work as a whole, that (a) is included in the normal form of
+packaging a Major Component, but which is not part of that Major
+Component, and (b) serves only to enable use of the work with that
+Major Component, or to implement a Standard Interface for which an
+implementation is available to the public in source code form.  A
+"Major Component", in this context, means a major essential component
+(kernel, window system, and so on) of the specific operating system
+(if any) on which the executable work runs, or a compiler used to
+produce the work, or an object code interpreter used to run it.
+
+  The "Corresponding Source" for a work in object code form means all
+the source code needed to generate, install, and (for an executable
+work) run the object code and to modify the work, including scripts to
+control those activities.  However, it does not include the work's
+System Libraries, or general-purpose tools or generally available free
+programs which are used unmodified in performing those activities but
+which are not part of the work.  For example, Corresponding Source
+includes interface definition files associated with source files for
+the work, and the source code for shared libraries and dynamically
+linked subprograms that the work is specifically designed to require,
+such as by intimate data communication or control flow between those
+subprograms and other parts of the work.
+
+  The Corresponding Source need not include anything that users
+can regenerate automatically from other parts of the Corresponding
+Source.
+
+  The Corresponding Source for a work in source code form is that
+same work.
+
+  2. Basic Permissions.
+
+  All rights granted under this License are granted for the term of
+copyright on the Program, and are irrevocable provided the stated
+conditions are met.  This License explicitly affirms your unlimited
+permission to run the unmodified Program.  The output from running a
+covered work is covered by this License only if the output, given its
+content, constitutes a covered work.  This License acknowledges your
+rights of fair use or other equivalent, as provided by copyright law.
+
+  You may make, run and propagate covered works that you do not
+convey, without conditions so long as your license otherwise remains
+in force.  You may convey covered works to others for the sole purpose
+of having them make modifications exclusively for you, or provide you
+with facilities for running those works, provided that you comply with
+the terms of this License in conveying all material for which you do
+not control copyright.  Those thus making or running the covered works
+for you must do so exclusively on your behalf, under your direction
+and control, on terms that prohibit them from making any copies of
+your copyrighted material outside their relationship with you.
+
+  Conveying under any other circumstances is permitted solely under
+the conditions stated below.  Sublicensing is not allowed; section 10
+makes it unnecessary.
+
+  3. Protecting Users' Legal Rights From Anti-Circumvention Law.
+
+  No covered work shall be deemed part of an effective technological
+measure under any applicable law fulfilling obligations under article
+11 of the WIPO copyright treaty adopted on 20 December 1996, or
+similar laws prohibiting or restricting circumvention of such
+measures.
+
+  When you convey a covered work, you waive any legal power to forbid
+circumvention of technological measures to the extent such circumvention
+is effected by exercising rights under this License with respect to
+the covered work, and you disclaim any intention to limit operation or
+modification of the work as a means of enforcing, against the work's
+users, your or third parties' legal rights to forbid circumvention of
+technological measures.
+
+  4. Conveying Verbatim Copies.
+
+  You may convey verbatim copies of the Program's source code as you
+receive it, in any medium, provided that you conspicuously and
+appropriately publish on each copy an appropriate copyright notice;
+keep intact all notices stating that this License and any
+non-permissive terms added in accord with section 7 apply to the code;
+keep intact all notices of the absence of any warranty; and give all
+recipients a copy of this License along with the Program.
+
+  You may charge any price or no price for each copy that you convey,
+and you may offer support or warranty protection for a fee.
+
+  5. Conveying Modified Source Versions.
+
+  You may convey a work based on the Program, or the modifications to
+produce it from the Program, in the form of source code under the
+terms of section 4, provided that you also meet all of these conditions:
+
+    a) The work must carry prominent notices stating that you modified
+    it, and giving a relevant date.
+
+    b) The work must carry prominent notices stating that it is
+    released under this License and any conditions added under section
+    7.  This requirement modifies the requirement in section 4 to
+    "keep intact all notices".
+
+    c) You must license the entire work, as a whole, under this
+    License to anyone who comes into possession of a copy.  This
+    License will therefore apply, along with any applicable section 7
+    additional terms, to the whole of the work, and all its parts,
+    regardless of how they are packaged.  This License gives no
+    permission to license the work in any other way, but it does not
+    invalidate such permission if you have separately received it.
+
+    d) If the work has interactive user interfaces, each must display
+    Appropriate Legal Notices; however, if the Program has interactive
+    interfaces that do not display Appropriate Legal Notices, your
+    work need not make them do so.
+
+  A compilation of a covered work with other separate and independent
+works, which are not by their nature extensions of the covered work,
+and which are not combined with it such as to form a larger program,
+in or on a volume of a storage or distribution medium, is called an
+"aggregate" if the compilation and its resulting copyright are not
+used to limit the access or legal rights of the compilation's users
+beyond what the individual works permit.  Inclusion of a covered work
+in an aggregate does not cause this License to apply to the other
+parts of the aggregate.
+
+  6. Conveying Non-Source Forms.
+
+  You may convey a covered work in object code form under the terms
+of sections 4 and 5, provided that you also convey the
+machine-readable Corresponding Source under the terms of this License,
+in one of these ways:
+
+    a) Convey the object code in, or embodied in, a physical product
+    (including a physical distribution medium), accompanied by the
+    Corresponding Source fixed on a durable physical medium
+    customarily used for software interchange.
+
+    b) Convey the object code in, or embodied in, a physical product
+    (including a physical distribution medium), accompanied by a
+    written offer, valid for at least three years and valid for as
+    long as you offer spare parts or customer support for that product
+    model, to give anyone who possesses the object code either (1) a
+    copy of the Corresponding Source for all the software in the
+    product that is covered by this License, on a durable physical
+    medium customarily used for software interchange, for a price no
+    more than your reasonable cost of physically performing this
+    conveying of source, or (2) access to copy the
+    Corresponding Source from a network server at no charge.
+
+    c) Convey individual copies of the object code with a copy of the
+    written offer to provide the Corresponding Source.  This
+    alternative is allowed only occasionally and noncommercially, and
+    only if you received the object code with such an offer, in accord
+    with subsection 6b.
+
+    d) Convey the object code by offering access from a designated
+    place (gratis or for a charge), and offer equivalent access to the
+    Corresponding Source in the same way through the same place at no
+    further charge.  You need not require recipients to copy the
+    Corresponding Source along with the object code.  If the place to
+    copy the object code is a network server, the Corresponding Source
+    may be on a different server (operated by you or a third party)
+    that supports equivalent copying facilities, provided you maintain
+    clear directions next to the object code saying where to find the
+    Corresponding Source.  Regardless of what server hosts the
+    Corresponding Source, you remain obligated to ensure that it is
+    available for as long as needed to satisfy these requirements.
+
+    e) Convey the object code using peer-to-peer transmission, provided
+    you inform other peers where the object code and Corresponding
+    Source of the work are being offered to the general public at no
+    charge under subsection 6d.
+
+  A separable portion of the object code, whose source code is excluded
+from the Corresponding Source as a System Library, need not be
+included in conveying the object code work.
+
+  A "User Product" is either (1) a "consumer product", which means any
+tangible personal property which is normally used for personal, family,
+or household purposes, or (2) anything designed or sold for incorporation
+into a dwelling.  In determining whether a product is a consumer product,
+doubtful cases shall be resolved in favor of coverage.  For a particular
+product received by a particular user, "normally used" refers to a
+typical or common use of that class of product, regardless of the status
+of the particular user or of the way in which the particular user
+actually uses, or expects or is expected to use, the product.  A product
+is a consumer product regardless of whether the product has substantial
+commercial, industrial or non-consumer uses, unless such uses represent
+the only significant mode of use of the product.
+
+  "Installation Information" for a User Product means any methods,
+procedures, authorization keys, or other information required to install
+and execute modified versions of a covered work in that User Product from
+a modified version of its Corresponding Source.  The information must
+suffice to ensure that the continued functioning of the modified object
+code is in no case prevented or interfered with solely because
+modification has been made.
+
+  If you convey an object code work under this section in, or with, or
+specifically for use in, a User Product, and the conveying occurs as
+part of a transaction in which the right of possession and use of the
+User Product is transferred to the recipient in perpetuity or for a
+fixed term (regardless of how the transaction is characterized), the
+Corresponding Source conveyed under this section must be accompanied
+by the Installation Information.  But this requirement does not apply
+if neither you nor any third party retains the ability to install
+modified object code on the User Product (for example, the work has
+been installed in ROM).
+
+  The requirement to provide Installation Information does not include a
+requirement to continue to provide support service, warranty, or updates
+for a work that has been modified or installed by the recipient, or for
+the User Product in which it has been modified or installed.  Access to a
+network may be denied when the modification itself materially and
+adversely affects the operation of the network or violates the rules and
+protocols for communication across the network.
+
+  Corresponding Source conveyed, and Installation Information provided,
+in accord with this section must be in a format that is publicly
+documented (and with an implementation available to the public in
+source code form), and must require no special password or key for
+unpacking, reading or copying.
+
+  7. Additional Terms.
+
+  "Additional permissions" are terms that supplement the terms of this
+License by making exceptions from one or more of its conditions.
+Additional permissions that are applicable to the entire Program shall
+be treated as though they were included in this License, to the extent
+that they are valid under applicable law.  If additional permissions
+apply only to part of the Program, that part may be used separately
+under those permissions, but the entire Program remains governed by
+this License without regard to the additional permissions.
+
+  When you convey a copy of a covered work, you may at your option
+remove any additional permissions from that copy, or from any part of
+it.  (Additional permissions may be written to require their own
+removal in certain cases when you modify the work.)  You may place
+additional permissions on material, added by you to a covered work,
+for which you have or can give appropriate copyright permission.
+
+  Notwithstanding any other provision of this License, for material you
+add to a covered work, you may (if authorized by the copyright holders of
+that material) supplement the terms of this License with terms:
+
+    a) Disclaiming warranty or limiting liability differently from the
+    terms of sections 15 and 16 of this License; or
+
+    b) Requiring preservation of specified reasonable legal notices or
+    author attributions in that material or in the Appropriate Legal
+    Notices displayed by works containing it; or
+
+    c) Prohibiting misrepresentation of the origin of that material, or
+    requiring that modified versions of such material be marked in
+    reasonable ways as different from the original version; or
+
+    d) Limiting the use for publicity purposes of names of licensors or
+    authors of the material; or
+
+    e) Declining to grant rights under trademark law for use of some
+    trade names, trademarks, or service marks; or
+
+    f) Requiring indemnification of licensors and authors of that
+    material by anyone who conveys the material (or modified versions of
+    it) with contractual assumptions of liability to the recipient, for
+    any liability that these contractual assumptions directly impose on
+    those licensors and authors.
+
+  All other non-permissive additional terms are considered "further
+restrictions" within the meaning of section 10.  If the Program as you
+received it, or any part of it, contains a notice stating that it is
+governed by this License along with a term that is a further
+restriction, you may remove that term.  If a license document contains
+a further restriction but permits relicensing or conveying under this
+License, you may add to a covered work material governed by the terms
+of that license document, provided that the further restriction does
+not survive such relicensing or conveying.
+
+  If you add terms to a covered work in accord with this section, you
+must place, in the relevant source files, a statement of the
+additional terms that apply to those files, or a notice indicating
+where to find the applicable terms.
+
+  Additional terms, permissive or non-permissive, may be stated in the
+form of a separately written license, or stated as exceptions;
+the above requirements apply either way.
+
+  8. Termination.
+
+  You may not propagate or modify a covered work except as expressly
+provided under this License.  Any attempt otherwise to propagate or
+modify it is void, and will automatically terminate your rights under
+this License (including any patent licenses granted under the third
+paragraph of section 11).
+
+  However, if you cease all violation of this License, then your
+license from a particular copyright holder is reinstated (a)
+provisionally, unless and until the copyright holder explicitly and
+finally terminates your license, and (b) permanently, if the copyright
+holder fails to notify you of the violation by some reasonable means
+prior to 60 days after the cessation.
+
+  Moreover, your license from a particular copyright holder is
+reinstated permanently if the copyright holder notifies you of the
+violation by some reasonable means, this is the first time you have
+received notice of violation of this License (for any work) from that
+copyright holder, and you cure the violation prior to 30 days after
+your receipt of the notice.
+
+  Termination of your rights under this section does not terminate the
+licenses of parties who have received copies or rights from you under
+this License.  If your rights have been terminated and not permanently
+reinstated, you do not qualify to receive new licenses for the same
+material under section 10.
+
+  9. Acceptance Not Required for Having Copies.
+
+  You are not required to accept this License in order to receive or
+run a copy of the Program.  Ancillary propagation of a covered work
+occurring solely as a consequence of using peer-to-peer transmission
+to receive a copy likewise does not require acceptance.  However,
+nothing other than this License grants you permission to propagate or
+modify any covered work.  These actions infringe copyright if you do
+not accept this License.  Therefore, by modifying or propagating a
+covered work, you indicate your acceptance of this License to do so.
+
+  10. Automatic Licensing of Downstream Recipients.
+
+  Each time you convey a covered work, the recipient automatically
+receives a license from the original licensors, to run, modify and
+propagate that work, subject to this License.  You are not responsible
+for enforcing compliance by third parties with this License.
+
+  An "entity transaction" is a transaction transferring control of an
+organization, or substantially all assets of one, or subdividing an
+organization, or merging organizations.  If propagation of a covered
+work results from an entity transaction, each party to that
+transaction who receives a copy of the work also receives whatever
+licenses to the work the party's predecessor in interest had or could
+give under the previous paragraph, plus a right to possession of the
+Corresponding Source of the work from the predecessor in interest, if
+the predecessor has it or can get it with reasonable efforts.
+
+  You may not impose any further restrictions on the exercise of the
+rights granted or affirmed under this License.  For example, you may
+not impose a license fee, royalty, or other charge for exercise of
+rights granted under this License, and you may not initiate litigation
+(including a cross-claim or counterclaim in a lawsuit) alleging that
+any patent claim is infringed by making, using, selling, offering for
+sale, or importing the Program or any portion of it.
+
+  11. Patents.
+
+  A "contributor" is a copyright holder who authorizes use under this
+License of the Program or a work on which the Program is based.  The
+work thus licensed is called the contributor's "contributor version".
+
+  A contributor's "essential patent claims" are all patent claims
+owned or controlled by the contributor, whether already acquired or
+hereafter acquired, that would be infringed by some manner, permitted
+by this License, of making, using, or selling its contributor version,
+but do not include claims that would be infringed only as a
+consequence of further modification of the contributor version.  For
+purposes of this definition, "control" includes the right to grant
+patent sublicenses in a manner consistent with the requirements of
+this License.
+
+  Each contributor grants you a non-exclusive, worldwide, royalty-free
+patent license under the contributor's essential patent claims, to
+make, use, sell, offer for sale, import and otherwise run, modify and
+propagate the contents of its contributor version.
+
+  In the following three paragraphs, a "patent license" is any express
+agreement or commitment, however denominated, not to enforce a patent
+(such as an express permission to practice a patent or covenant not to
+sue for patent infringement).  To "grant" such a patent license to a
+party means to make such an agreement or commitment not to enforce a
+patent against the party.
+
+  If you convey a covered work, knowingly relying on a patent license,
+and the Corresponding Source of the work is not available for anyone
+to copy, free of charge and under the terms of this License, through a
+publicly available network server or other readily accessible means,
+then you must either (1) cause the Corresponding Source to be so
+available, or (2) arrange to deprive yourself of the benefit of the
+patent license for this particular work, or (3) arrange, in a manner
+consistent with the requirements of this License, to extend the patent
+license to downstream recipients.  "Knowingly relying" means you have
+actual knowledge that, but for the patent license, your conveying the
+covered work in a country, or your recipient's use of the covered work
+in a country, would infringe one or more identifiable patents in that
+country that you have reason to believe are valid.
+
+  If, pursuant to or in connection with a single transaction or
+arrangement, you convey, or propagate by procuring conveyance of, a
+covered work, and grant a patent license to some of the parties
+receiving the covered work authorizing them to use, propagate, modify
+or convey a specific copy of the covered work, then the patent license
+you grant is automatically extended to all recipients of the covered
+work and works based on it.
+
+  A patent license is "discriminatory" if it does not include within
+the scope of its coverage, prohibits the exercise of, or is
+conditioned on the non-exercise of one or more of the rights that are
+specifically granted under this License.  You may not convey a covered
+work if you are a party to an arrangement with a third party that is
+in the business of distributing software, under which you make payment
+to the third party based on the extent of your activity of conveying
+the work, and under which the third party grants, to any of the
+parties who would receive the covered work from you, a discriminatory
+patent license (a) in connection with copies of the covered work
+conveyed by you (or copies made from those copies), or (b) primarily
+for and in connection with specific products or compilations that
+contain the covered work, unless you entered into that arrangement,
+or that patent license was granted, prior to 28 March 2007.
+
+  Nothing in this License shall be construed as excluding or limiting
+any implied license or other defenses to infringement that may
+otherwise be available to you under applicable patent law.
+
+  12. No Surrender of Others' Freedom.
+
+  If conditions are imposed on you (whether by court order, agreement or
+otherwise) that contradict the conditions of this License, they do not
+excuse you from the conditions of this License.  If you cannot convey a
+covered work so as to satisfy simultaneously your obligations under this
+License and any other pertinent obligations, then as a consequence you may
+not convey it at all.  For example, if you agree to terms that obligate you
+to collect a royalty for further conveying from those to whom you convey
+the Program, the only way you could satisfy both those terms and this
+License would be to refrain entirely from conveying the Program.
+
+  13. Use with the GNU Affero General Public License.
+
+  Notwithstanding any other provision of this License, you have
+permission to link or combine any covered work with a work licensed
+under version 3 of the GNU Affero General Public License into a single
+combined work, and to convey the resulting work.  The terms of this
+License will continue to apply to the part which is the covered work,
+but the special requirements of the GNU Affero General Public License,
+section 13, concerning interaction through a network will apply to the
+combination as such.
+
+  14. Revised Versions of this License.
+
+  The Free Software Foundation may publish revised and/or new versions of
+the GNU General Public License from time to time.  Such new versions will
+be similar in spirit to the present version, but may differ in detail to
+address new problems or concerns.
+
+  Each version is given a distinguishing version number.  If the
+Program specifies that a certain numbered version of the GNU General
+Public License "or any later version" applies to it, you have the
+option of following the terms and conditions either of that numbered
+version or of any later version published by the Free Software
+Foundation.  If the Program does not specify a version number of the
+GNU General Public License, you may choose any version ever published
+by the Free Software Foundation.
+
+  If the Program specifies that a proxy can decide which future
+versions of the GNU General Public License can be used, that proxy's
+public statement of acceptance of a version permanently authorizes you
+to choose that version for the Program.
+
+  Later license versions may give you additional or different
+permissions.  However, no additional obligations are imposed on any
+author or copyright holder as a result of your choosing to follow a
+later version.
+
+  15. Disclaimer of Warranty.
+
+  THERE IS NO WARRANTY FOR THE PROGRAM, TO THE EXTENT PERMITTED BY
+APPLICABLE LAW.  EXCEPT WHEN OTHERWISE STATED IN WRITING THE COPYRIGHT
+HOLDERS AND/OR OTHER PARTIES PROVIDE THE PROGRAM "AS IS" WITHOUT WARRANTY
+OF ANY KIND, EITHER EXPRESSED OR IMPLIED, INCLUDING, BUT NOT LIMITED TO,
+THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR
+PURPOSE.  THE ENTIRE RISK AS TO THE QUALITY AND PERFORMANCE OF THE PROGRAM
+IS WITH YOU.  SHOULD THE PROGRAM PROVE DEFECTIVE, YOU ASSUME THE COST OF
+ALL NECESSARY SERVICING, REPAIR OR CORRECTION.
+
+  16. Limitation of Liability.
+
+  IN NO EVENT UNLESS REQUIRED BY APPLICABLE LAW OR AGREED TO IN WRITING
+WILL ANY COPYRIGHT HOLDER, OR ANY OTHER PARTY WHO MODIFIES AND/OR CONVEYS
+THE PROGRAM AS PERMITTED ABOVE, BE LIABLE TO YOU FOR DAMAGES, INCLUDING ANY
+GENERAL, SPECIAL, INCIDENTAL OR CONSEQUENTIAL DAMAGES ARISING OUT OF THE
+USE OR INABILITY TO USE THE PROGRAM (INCLUDING BUT NOT LIMITED TO LOSS OF
+DATA OR DATA BEING RENDERED INACCURATE OR LOSSES SUSTAINED BY YOU OR THIRD
+PARTIES OR A FAILURE OF THE PROGRAM TO OPERATE WITH ANY OTHER PROGRAMS),
+EVEN IF SUCH HOLDER OR OTHER PARTY HAS BEEN ADVISED OF THE POSSIBILITY OF
+SUCH DAMAGES.
+
+  17. Interpretation of Sections 15 and 16.
+
+  If the disclaimer of warranty and limitation of liability provided
+above cannot be given local legal effect according to their terms,
+reviewing courts shall apply local law that most closely approximates
+an absolute waiver of all civil liability in connection with the
+Program, unless a warranty or assumption of liability accompanies a
+copy of the Program in return for a fee.
+
+                     END OF TERMS AND CONDITIONS
+
+            How to Apply These Terms to Your New Programs
+
+  If you develop a new program, and you want it to be of the greatest
+possible use to the public, the best way to achieve this is to make it
+free software which everyone can redistribute and change under these terms.
+
+  To do so, attach the following notices to the program.  It is safest
+to attach them to the start of each source file to most effectively
+state the exclusion of warranty; and each file should have at least
+the "copyright" line and a pointer to where the full notice is found.
+
+    {one line to give the program's name and a brief idea of what it does.}
+    Copyright (C) {year}  {name of author}
+
+    This program is free software: you can redistribute it and/or modify
+    it under the terms of the GNU General Public License as published by
+    the Free Software Foundation, either version 3 of the License, or
+    (at your option) any later version.
+
+    This program is distributed in the hope that it will be useful,
+    but WITHOUT ANY WARRANTY; without even the implied warranty of
+    MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+    GNU General Public License for more details.
+
+    You should have received a copy of the GNU General Public License
+    along with this program.  If not, see <http://www.gnu.org/licenses/>.
+
+Also add information on how to contact you by electronic and paper mail.
+
+  If the program does terminal interaction, make it output a short
+notice like this when it starts in an interactive mode:
+
+    {project}  Copyright (C) {year}  {fullname}
+    This program comes with ABSOLUTELY NO WARRANTY; for details type `show w'.
+    This is free software, and you are welcome to redistribute it
+    under certain conditions; type `show c' for details.
+
+The hypothetical commands `show w' and `show c' should show the appropriate
+parts of the General Public License.  Of course, your program's commands
+might be different; for a GUI interface, you would use an "about box".
+
+  You should also get your employer (if you work as a programmer) or school,
+if any, to sign a "copyright disclaimer" for the program, if necessary.
+For more information on this, and how to apply and follow the GNU GPL, see
+<http://www.gnu.org/licenses/>.
+
+  The GNU General Public License does not permit incorporating your program
+into proprietary programs.  If your program is a subroutine library, you
+may consider it more useful to permit linking proprietary applications with
+the library.  If this is what you want to do, use the GNU Lesser General
+Public License instead of this License.  But first, please read
+<http://www.gnu.org/philosophy/why-not-lgpl.html>.
+
+ +
+ +
+ + + +
+ + + diff --git a/docs/articles/Biblio.bib b/docs/articles/Biblio.bib new file mode 100644 index 00000000..36d60c21 --- /dev/null +++ b/docs/articles/Biblio.bib @@ -0,0 +1,143 @@ +% Encoding: UTF-8 + +@InProceedings{Brenning2012, + author = {A. Brenning}, + title = {Spatial cross-validation and bootstrap for the assessment of prediction rules in remote sensing: The {R} package sperrorest}, + booktitle = {2012 {IEEE} {International} {Geoscience} and {Remote} {Sensing} {Symposium}}, + year = {2012}, + pages = {5372-5375}, + doi = {10.1109/IGARSS.2012.6352393}, + issn = {2153-6996}, + keywords = {Gabor filters;computer bootstrapping;data analysis;geographic information systems;geophysics computing;pattern classification;prediction theory;remote sensing;sampling methods;statistical analysis;support vector machines;IKONOS-derived Gabor texture features;R package sperrorest;bootstrap strategies;computational prediction methods;geospatial data;maximum-likelihood classification;nonspatial equivalent;open-source statistical data analysis software R;prediction rules;remote-sensing applications;rock-glacier flow structures;spatial autocorrelation;spatial context;spatial crossvalidation;spatial resampling-based estimation procedures;statistical prediction methods;support vector machine;terrain attribute data;Context;Data analysis;Estimation;Predictive models;Remote sensing;Rocks;Support vector machines;Gabor filters;Spatial cross-validation;classification accuracy;land cover classification;rock glaciers;spatial bootstrap}, +} + +@InBook{Russ2010b, + pages = {350--359}, + title = {Data Mining in Precision Agriculture: Management of Spatial Information}, + publisher = {Springer Berlin Heidelberg}, + year = {2010}, + author = {Russ, Georg and Brenning, Alexander}, + editor = {Hüllermeier, Eyke and Kruse, Rudolf and Hoffmann, Frank}, + address = {Berlin, Heidelberg}, + booktitle = {{Computational Intelligence for Knowledge-Based Systems Design: 13th International Conference on Information Processing}and {Management of Uncertainty, IPMU 2010, Dortmund, Germany, June 28 - July 2, 2010. Proceedings}}, + doi = {10.1007/978-3-642-14049-5_36}, + isbn = {978-3-642-14049-5}, + url = {http://dx.doi.org/10.1007/978-3-642-14049-5_36}, +} + +@InCollection{Russ2010a, + author = {Georg Russ and Alexander Brenning}, + title = {Spatial Variable Importance Assessment for Yield Prediction in Precision Agriculture}, + booktitle = {Lecture Notes in Computer Science}, + publisher = {Springer Science + Business Media}, + year = {2010}, + pages = {184--195}, + doi = {10.1007/978-3-642-13062-5_18}, + url = {http://dx.doi.org/10.1007/978-3-642-13062-5_18}, +} + +@Article{Brenning2005, + author = {A. Brenning}, + title = {Spatial prediction models for landslide hazards: review, comparison and evaluation}, + journal = {Natural Hazards and Earth System Science}, + year = {2005}, + volume = {5}, + number = {6}, + pages = {853--862}, + month = {nov}, + doi = {10.5194/nhess-5-853-2005}, + publisher = {Copernicus {GmbH}}, + url = {http://dx.doi.org/10.5194/nhess-5-853-2005}, +} + +@Article{Breiman1996, + author = {Leo Breiman}, + title = {Bagging Predictors}, + journal = {Machine Learning}, + year = {1996}, + volume = {24}, + number = {2}, + pages = {123--140}, + month = {aug}, + doi = {10.1007/bf00058655}, + publisher = {Springer Nature}, + url = {https://doi.org/10.1007%2Fbf00058655}, +} + +@Article{Breiman2001, + author = {Leo Breiman}, + title = {Random Forests}, + journal = {Machine Learning}, + year = {2001}, + volume = {45}, + number = {1}, + pages = {5--32}, + doi = {10.1023/a:1010933404324}, + publisher = {Springer Nature}, + url = {https://doi.org/10.1023%2Fa%3A1010933404324}, +} + +@Article{Pena2015, + author = {M.A. Pe{\~{n}}a and A. Brenning}, + title = {Assessing fruit-tree crop classification from Landsat-8 time series for the Maipo Valley, Chile}, + journal = {Remote Sensing of Environment}, + year = {2015}, + volume = {171}, + pages = {234--244}, + month = {dec}, + doi = {10.1016/j.rse.2015.10.029}, + publisher = {Elsevier {BV}}, + url = {https://doi.org/10.1016%2Fj.rse.2015.10.029}, +} + +@Book{James2013, + title = {An Introduction to Statistical Learning}, + publisher = {Springer New York}, + year = {2013}, + author = {Gareth James and Daniela Witten and Trevor Hastie and Robert Tibshirani}, + doi = {10.1007/978-1-4614-7138-7}, + url = {https://doi.org/10.1007%2F978-1-4614-7138-7}, +} + +@Article{Hothorn2005, + author = {Torsten Hothorn and Berthold Lausen}, + title = {Bundling classifiers by bagging trees}, + journal = {Computational Statistics {\&} Data Analysis}, + year = {2005}, + volume = {49}, + number = {4}, + pages = {1068--1078}, + month = {jun}, + doi = {10.1016/j.csda.2004.06.019}, + publisher = {Elsevier {BV}}, + url = {https://doi.org/10.1016%2Fj.csda.2004.06.019}, +} + +@Article{Goetz2015, + author = {J.N. Goetz and A. Brenning and H. Petschko and P. Leopold}, + title = {Evaluating machine learning and statistical prediction techniques for landslide susceptibility modeling}, + journal = {Computers {\&} Geosciences}, + year = {2015}, + volume = {81}, + pages = {1--11}, + month = {aug}, + doi = {10.1016/j.cageo.2015.04.007}, + publisher = {Elsevier {BV}}, + url = {https://doi.org/10.1016%2Fj.cageo.2015.04.007}, +} + +@Article{Knudby2010, + author = {Anders Knudby and Alexander Brenning and Ellsworth LeDrew}, + title = {New approaches to modelling fish{\textendash}habitat relationships}, + journal = {Ecological Modelling}, + year = {2010}, + volume = {221}, + number = {3}, + pages = {503--511}, + month = {feb}, + doi = {10.1016/j.ecolmodel.2009.11.008}, + publisher = {Elsevier {BV}}, + url = {https://doi.org/10.1016%2Fj.ecolmodel.2009.11.008}, +} + +@Comment{jabref-meta: databaseType:bibtex;} diff --git a/docs/articles/custom-pred-and-model-functions.html b/docs/articles/custom-pred-and-model-functions.html new file mode 100644 index 00000000..9c03ac1a --- /dev/null +++ b/docs/articles/custom-pred-and-model-functions.html @@ -0,0 +1,183 @@ + + + + + + + +Custom Predict and Model Functions • sperrorest + + + + + + +
+
+ + + +
+
+ + + + +
+
+

+Introduction

+

sperrorest is a generic framework which aims to work with all R models/packages. In statistical learning, model setups, their formulas and error measures all depend on the family of the response variable. Various families exist (numeric, binary, multiclass) which again include sub-families (e.g. gaussian or poisson distribution of a numeric response).

+

This detail needs to be specified via the respective function, e.g. when using glm() with a binary response, one needs to set family = "binomial" to make sure that the model does something meaningful. Most of the time, the same applies to the generic predict() function. For the glm() case, one would need to set type = "response" if the predicted values should reflect probabilities instead of log-odds.

+

These settings can be specified using model_args and pred_args in sperrorest(). So fine, “why do we need to write all these wrappers and custom model/predict functions then?!”

+
+
+

+User-defined Model Functions

+
+

+Problem

+

model_fun expects at least formula argument and a data.frame with the learning sample. All arguments, including the additional ones provided via model_args, are getting passed to model_fun via a do.call() call. However, if model_fun does not have an argument named formula but e.g. fixed (like it is the case for glmmPQL()) the do.call() call will fail because sperrorest() tries to pass an argument named formula but glmmPQL expects an argument named fixed.

+
+
+

+Solution

+

In this case, we need to write a wrapper function for glmmPQL (named glmmPQL_modelfun here) which accounts for this naming problem. Here, we are passing the formula argument to our custom model function which then does the actual call to glmmPQL() using the supplied formula object as the fixed argument of glmmPQL.
+By default, glmmPQL() has further arguments like family or random. If we want to use these, we pass them to model_args which then appends these to the arguments of glmmPQL_modelfun.

+
glmmPQL_modelfun <- function(formula = NULL, data = NULL, random = NULL,
+                             family = NULL) {
+  fit <- glmmPQL(fixed = formula, data = data, random = random, family = family)
+  return(fit)
+}
+
+
+
+

+User-defined Predict Functions

+
+

+Problem

+

Unless specified explicitly, sperrorest() tries to use the generic predict() function. This function works differently depending on the class of the provided fitted model, i.e. many models slightly differ in the naming (and availability) of their arguments. For example, when fitting a Support Vector Machine (SVM) with a binary response variable, package kernlab expects an argument type = "probabilities" in its predict() call to receive predicted probabilities while in package e1071 it is "probability = TRUE". Similar to model_args, this can be accounted for in the pred_args of sperrorest().

+

However, sperrorest() expects that the predicted values (of any response type) are stored directly in the returned object of the predict() function. While this is the case for many models, mainly with a numeric response, classification cases often behave differently. Here, the predicted values (classes in this case) are often stored in a sub-object named class or predicted.

+
+
+

+Solution

+

Since there is no way to account for this in a general way (when every package may return the predicted values in a different format/column), we need to account for it by providing a custom predict function which returns only the predicted values so that sperrorest() can continue properly. This time we are showing two examples. The first takes again a binary classification using randomForest.

+
+

+randomForest

+

When calling predict on a fitted randomForest model with a binary response variable, the predicted values are actually stored in the resulting object returned by predict() (here called pred). So why do we have trouble here then?
+Simply because pred is a matrix containing both probabilities for the FALSE (= 0) and TRUE (= 1) case. sperrorest() needs a vector containing only the predicted values of the TRUE case to pass these further onto err_fun() which then takes care of calculating all the error measures. So the important part is to subset the resulting matrix in the pred object to TRUE cases only and return the result.

+
rf_predfun <- function(object = NULL, newdata = NULL, type = NULL) {
+  pred <- predict(object = object, newdata = newdata, type = type)
+  pred <- pred[, 2]
+}
+
+
+

+svm

+

The same case (binary response) using svm from the e1071 package. Here, the predicted probabilities are stored in a sub-object of pred. We can address it using the attr() function. Then again, we only need the TRUE cases for sperrorest().

+
svm_predfun <- function(object = NULL, newdata = NULL, probability = NULL) {
+  pred <- predict(object, newdata = newdata, probability = TRUE)
+  pred <- attr(pred, "probabilities")[, 2]
+}
+
+
+
+
+
+ + + +
+ + + +
+ + + diff --git a/docs/articles/index.html b/docs/articles/index.html new file mode 100644 index 00000000..1e9465dd --- /dev/null +++ b/docs/articles/index.html @@ -0,0 +1,128 @@ + + + + + + + + +Articles • sperrorest + + + + + + + + + + + + + + + + + + + + + + + + + + + +
+
+ + + +
+ + + +
+ +
+ + +
+ + + diff --git a/docs/articles/parallel-modes.html b/docs/articles/parallel-modes.html new file mode 100644 index 00000000..e62b70b8 --- /dev/null +++ b/docs/articles/parallel-modes.html @@ -0,0 +1,209 @@ + + + + + + + +Parallel Modes of <code>sperrorest</code> • sperrorest + + + + + + +
+
+ + + +
+
+ + + + +
+
+

+Introduction

+

sperrorest is parallelized by default from v2.0.0 and higher.

+

Most users are not familiar with parallelization and have no time/motivation to wrap their head around it. Instead, they just accept to wait “a bit” longer until the process finishes.

+

While this is no problem for “quick” cross-validation (CV) cases with a low number of repetitions and models which converge quickly, in some cases processing may take up to several months. For example, running a spatial cross-validation using a Generalized Linear Mixed Model (GLMM) with both random effects and a spatial autocorrelation structure on around 1000 observations takes roughly this time, if executed sequentially. Most of the fitting time hereby is devoted to the integration of the spatial autocorrelation structure.

+

sperrorest comes with four different parallelization modes and also offers sequential execution.

+

Unless specified otherwise, all cores of the machine are used. Limiting the number of cores makes sense in cases when you want to do other work on your machine while running a cross-validation so that your system stays responsive. Also, if you are working on a server and have, let’s say, 48 cores available and want to do a 100 repetition CV. Since most models take roughly the same time to fit, it would be smart to use 34 cores. Taking this number of cores is faster than using 48 because

+
    +
  1. You need 3 iterations (34 in the first, 68 in the second and finishing in the 3rd) to process all repetitions. During the third iteration, a lot of cores would do nothing else but just wait for the others to finish.

  2. +
  3. The parallelization overhead, which is mainly caused by splitting and combining all jobs to the workers, would be higher for the case with 48 cores than for 34 cores. Hence, 34 cores will finish faster than 48 cores on 100 repetitions. Of course, when taking 50 cores it would only need 2 worker iterations to process everything which would again speed up the process.

  4. +
+
+
+

+The future backend

+

All modes expect "apply" (including the sequential one) are running on the parallel API of the future package. It offers a unified, cross-platform API combining all other existing parallel approaches of R into one package. Besides the variety of parallel options to choose from (multiprocess, multisession, multicore, cluster, etc.) it also provides a sequential option. Every options is initiated in the same way:

+
library(future)
+registerDoFuture()
+
+plan("sequential") # sequential
+plan("multicore") # parallel (Unix only)
+plan("multisession") # parallel
+plan("multiprocess") # parallel
+plan("cluster") # parallel
+

Every option has its advantages and disadvantages. Check the future package vignettes for more information.

+
+
+

+Mode “foreach”

+

Unless specified otherwise, the default parallel mode uses foreach with the "cluster" option of the future package. Package doFuture takes care that foreach works with the parallel initialization of the future package.

+

This option is taken as default because it works cross-platform and provides progress output to the console. Unfortunately, on Windows this output is not shown to the console but needs to be written to a file (default to the current working directory). Another downside is that the global environment needs to copied to every worker before processing starts. Workers are started sequentially and therefore the startup of > 10 workers may take some seconds.

+
+
+

+Mode “apply”

+

This mode is also cross-platform but uses different functions on Unix/non-Unix systems for actual processing. On Unix, it uses the pbmcapply package which combines the pbapply package (provides progress bar for ‘apply’ functions) and the future package to speed up processing. On Windows, pbapply is used which in the end uses parApply() to setup a cluster like parallelization including a progress bar.

+
+
+

+Mode “future”

+

This modes entirely uses the future package in combination with future_lapply() as the working horse. It can be used with any future plan specified via par_option. It is the fastest mode but provides no progress output.

+
+
+

+Mode “sequential”

+

This mode executes sperrorest() sequentially. It also runs on the future API using foreach/doFuture which provide the possibility of sequential execution using plan("sequential").

+
+
+

+Performance comparison

+

Example setup:

+
    +
  • Machine: 48 cores, Debian 9 (stretch)
  • +
  • 100 repetitions, 5 folds
  • +
  • 100 variable importance permutations using all variables
  • +
  • non-spatial partitioning (partition_cv)
  • +
  • Model: glm +
  • +
  • Response type: binary
  • +
  • Progress: None
  • +
+

Note that the only argument which needs to be changed is par_mode here. Subsequently, par_mode = "foreach", par_mode = "apply" and par_mode = "future" were used.

+

All default settings of each mode were used. par_mode = "foreach" runs on plan("cluster") while par_mode = "future" runs on plan("multiprocess"). Mode "apply" used pbmcapply in the end since the test was running on a Unix System.

+
data(ecuador)
+fo <- slides ~ dem + slope + hcurv + vcurv + log.carea + cslope
+
+sperrorest(data = ecuador, formula = fo,
+           model_fun = glm, model_args = list(family = "binomial"),
+           pred_args = list(type = "response"),
+           smp_fun = partition_cv,
+           smp_args = list(repetition = 1:100, nfold = 5),
+           par_args = list(par_mode = "foreach", par_units = 20),
+           benchmark = TRUE, progress = FALSE,
+           importance = TRUE, imp_permutations = 100)
+ + + + + + + + + + + + + + + +
foreachapplyfuture
runtime (min)52.3351.6749.54
+
+
+
+ + + +
+ + + +
+ + + diff --git a/docs/articles/spatial-modeling-use-case.html b/docs/articles/spatial-modeling-use-case.html new file mode 100644 index 00000000..69eb4d2d --- /dev/null +++ b/docs/articles/spatial-modeling-use-case.html @@ -0,0 +1,473 @@ + + + + + + + +Spatial Modeling Using Statistical Learning Techniques • sperrorest + + + + + + +
+
+ + + +
+
+ + + + +
+
+

+Introduction

+

Geospatial data scientists often make use of a variety of statistical and machine learning techniques for spatial prediction in applications such as landslide susceptibility modeling (Goetz et al. 2015) or habitat modeling (Knudby, Brenning, and LeDrew 2010). Novel and often more flexible techniques promise improved predictive performances as they are better able to represent nonlinear relationships or higher-order interactions between predictors than less flexible linear models.

+

Nevertheless, this increased flexibility comes with the risk of possible over-fitting to the training data. Since nearby spatial observations often tend to be more similar than distant ones, traditional random cross-validation is unable to detect this over-fitting whenever spatial observations are close to each other (e.g. Brenning (2005)). Spatial cross-validation addresses this by resampling the data not completely randomly, but using larger spatial regions. In some cases, spatial data is grouped, e.g. in remotely-sensed land use mapping grid cells belonging to the same field share the same management procedures and cultivation history, making them more similar to each other than to pixels from other fields with the same crop type.

+

This package provides a customizable toolkit for cross-validation (and bootstrap) estimation using a variety of spatial resampling schemes. More so, this toolkit can even be extended to spatio-temporal data or other complex data structures. This vignette will walk you through a simple case study, crop classification in central Chile (Peña and Brenning 2015).

+

This vignette is based on code that Alex Brenning developed for his course on ‘Environmental Statistics and GeoComputation’ that he teaches in the Geographic Information Science Master’s program at Friedrich Schiller University Jena, Germany. Please take a look at our program and spread the word!

+
+
+

+Data and Packages

+

As a case study we will carry out a supervised classification analysis using remotely-sensed data to predict fruit-tree crop types in central Chile. This data set is a subsample of data from (Peña and Brenning 2015).

+
library(pacman)
+p_load(sperrorest)
+
data("maipo", package = "sperrorest")
+

The remote-sensing predictor variables were derived from an image times series consisting of eight Landsat images acquired throughout the (southern hemisphere) growing season. The data set includes the following variables:

+

Response
+- croptype: response variable (factor) with 4 levels: ground truth information

+

Predictors
+- b[12-87]: spectral data, e.g. b82 = image date #8, spectral band #2
+- ndvi[01-08]: Normalized Difference Vegetation Index, e.g. #8 = image date #8
+- ndwi[01-08]: Normalized Difference Water Index, e.g. #8 = image date #8

+

Others
+- field: field identifier (grouping variable - not to be used as predictor)
+- utmx, utmy: x/y location; not to be used as predictors

+

All but the first four variables of the data set are predictors; their names are used to construct a formula object:

+
predictors <- colnames(maipo)[5:ncol(maipo)]
+# Construct a formula:
+fo <- as.formula(paste("croptype ~", paste(predictors, collapse = "+")))
+
+
+

+Modeling

+

Here we will take a look at a few classification methods with varying degrees of computational complexity and flexibility. This should give you an idea of how different models are handled by sperrorest, depending on the characteristics of their fitting and prediction methods. Please refer to (James et al. 2013) for background information on the models used here.

+
+

+Linear Discriminant Analysis (LDA)

+

LDA is simple and fast, and often performs surprisingly well if the problem at hand is ‘linear enough’. As a start, let’s fit a model with all predictors and using all available data:

+
p_load(MASS)
+fit <- lda(fo, data = maipo)
+

Predict the croptype with the fitted model and calculate the misclassification error rate (MER) on the training sample:

+
pred <- predict(fit, newdata = maipo)$class
+mean(pred != maipo$croptype)
+
## [1] 0.0437
+

But remember that this result is over-optimistic because we are re-using the training sample for model evaluation. We will soon show you how to do better with cross-validation.

+

We can also take a look at the confusion matrix but again, this result is overly optimistic:

+
table(pred = pred, obs = maipo$croptype)
+
##         obs
+##   pred  crop1 crop2 crop3 crop4
+##   crop1  1294     8     4    37
+##   crop2    50  1054     4    44
+##   crop3     0     0  1935     6
+##   crop4    45   110    29  3093
+
+
+

+Classification Tree

+

Classification and regression trees (CART) take a completely different approach—they are based on yes/no questions in the predictor variables and can be referred to as a binary partitioning technique. Fit a model with all predictors and default settings:

+
p_load(rpart)
+
fit <- rpart(fo, data = maipo)
+
+## optional: view the classiciation tree
+# par(xpd = TRUE)
+# plot(fit)
+# text(fit, use.n = TRUE)
+

Again, predict the croptype with the fitted model and calculate the average MER:

+
pred <- predict(fit, newdata = maipo, type = "class")
+mean(pred != maipo$croptype)
+
## [1] 0.113
+

Here the predict call is slightly different. Again, we could calculate a confusion matrix.

+
table(pred = pred, obs = maipo$croptype)
+
##         obs
+##   pred  crop1 crop2 crop3 crop4
+##   crop1  1204    66     0    54
+##   crop2    47   871    38   123
+##   crop3    38     8  1818    53
+##   crop4   100   227   116  2950
+
+
+

+RandomForest

+

Bagging, bundling and random forests build upon the CART technique by fitting many trees on bootstrap resamples of the original data set (Breiman 1996) (Breiman 2001) (Hothorn and Lausen 2005). They differ in that random forest also samples from the predictors, and bundling adds an ancillary classifier for improved classification. We will use the nowadays widely used randomForest() here.

+
p_load(randomForest)
+
fit <- randomForest(fo, data = maipo, coob = TRUE)
+fit
+
## 
+## Call:
+##  randomForest(formula = fo, data = maipo, coob = TRUE) 
+##                Type of random forest: classification
+##                      Number of trees: 500
+## No. of variables tried at each split: 8
+## 
+##         OOB estimate of  error rate: 0.57%
+## Confusion matrix:
+##       crop1 crop2 crop3 crop4 class.error
+## crop1  1382     2     0     5     0.00504
+## crop2     1  1163     0     8     0.00768
+## crop3     0     0  1959    13     0.00659
+## crop4     7     5     3  3165     0.00472
+

Let’s take a look at the MER achieved on the training sample:

+
pred <- predict(fit, newdata = maipo, type = "class")
+mean(pred != maipo$croptype)
+
## [1] 0
+
table(pred = pred, obs = maipo$croptype)
+
##         obs
+##   pred  crop1 crop2 crop3 crop4
+##   crop1  1389     0     0     0
+##   crop2     0  1172     0     0
+##   crop3     0     0  1972     0
+##   crop4     0     0     0  3180
+

Isn’t this amazing? Only one grid cell is misclassified by the bagging classifier! Even the OOB (out-of-bag) estimate of the error rate is < 1%.
+Too good to be true? We’ll see…

+
+
+
+

+Cross-Validation Estimation of Predictive Performance

+

Of course we can’t take the MER on the training set too seriously—it is biased. But we’ve heard of cross-validation, in which disjoint subsets are used for model training and testing. Let’s use sperrorest for cross-validation.

+

Also, at this point we should highlight that the observations in this data set are pixels, and multiple grid cells belong to the same field. In a predictive situation, and when field boundaries are known (as is the case here), we would want to predict the same class for all grid cells that belong to the same field. Here we will use a majority filter. This filter ensures that the final predicted class type of every field is the most often predicted croptype within one field.

+
+

+Linear Discriminant Analysis (LDA)

+

First, we need to create a wrapper predict method for LDA for sperrorest(). This is necessary in order to accomodate the majority filter, and also because class predictions from lda’s predict method are hidden in the $class component of the returned object.

+
lda_predfun <- function(object, newdata, fac = NULL) {
+  
+  p_load(nnet)
+  majority <- function(x) {
+    levels(x)[which.is.max(table(x))]
+  }
+  
+  majority_filter <- function(x, fac) {
+    for (lev in levels(fac)) {
+      x[fac == lev] <- majority(x[fac == lev])
+    }
+    x
+  }
+  
+  pred <- predict(object, newdata = newdata)$class
+  if (!is.null(fac)) pred <- majority_filter(pred, newdata[, fac]) 
+  return(pred)
+}
+

To ensure that custom predict-functions will work with sperrorest(), we need to wrap all custom functions in one single function. Otherwise, sperrorest() might fail during execution.

+

Finally, we can run sperrorest() with a non-spatial sampling setting (partition_cv()). In this example we use a ‘100 repetitions - 5 folds’ setup to reduce the influence of random partitioning.

+
res_lda_nsp <- sperrorest(fo, data = maipo, coords = c("utmx","utmy"), 
+                          model_fun = lda,
+                          pred_fun = lda_predfun, 
+                          pred_args = list(fac = "field"),
+                          smp_fun = partition_cv, 
+                          smp_args = list(repetition = 1:100, nfold = 5),
+                          error_rep = TRUE, error_fold = TRUE,
+                          progress = FALSE)
+
summary(res_lda_nsp$error_rep)
+
##                    mean    sd   median   IQR
+## train_error    3.40e-02 0.001 3.40e-02 0.001
+## train_accuracy 9.66e-01 0.001 9.66e-01 0.001
+## train_events   4.69e+03 0.000 4.69e+03 0.000
+## train_count    3.09e+04 0.000 3.09e+04 0.000
+## test_error     4.00e-02 0.002 4.00e-02 0.002
+## test_accuracy  9.60e-01 0.002 9.60e-01 0.002
+## test_events    1.17e+03 0.000 1.17e+03 0.000
+## test_count     7.71e+03 0.000 7.71e+03 0.000
+

To run a spatial cross-validation at the field level, we can use partition_factor_cv() as the sampling function. Since we are using 5 folds, we get a coarse 80/20 split of our data. 80% will be used for training, 20% for testing our trained model.

+

To take a look where our training and tests sets will be partitioned on each fold, we can plot them. The red colored points represent the test set in each fold, the black colored points the training set. Note that because we plotted over 7000 points, overplotting occurs and since the red crosses are plotted after the black ones, it seems visually that way more than ~20% of red points exist than it is really the case.

+
resamp <- partition_factor_cv(maipo, nfold = 5, repetition = 1:1, fac = "field")
+plot(resamp, maipo, coords = c("utmx","utmy"))
+
+ +
+

Subsequently, we have to specify the location of the fields (fac = "field") in the prediction arguments (pred_args) and sampling arguments (smp_args) in sperrorest().

+
res_lda_sp <- sperrorest(fo, data = maipo, coords = c("utmx","utmy"), 
+                         model_fun = lda,
+                         pred_fun = lda_predfun, 
+                         pred_args = list(fac = "field"),
+                         smp_fun = partition_factor_cv,
+                         smp_args = list(fac = "field", repetition = 1:50, nfold = 5),
+                         error_rep = TRUE, error_fold = TRUE, 
+                         benchmark = TRUE, progress = FALSE)
+res_lda_sp$benchmark$runtime_performance
+
summary(res_lda_sp$error_rep)
+
##                    mean      sd   median     IQR
+## train_error    2.95e-02 0.00177 2.97e-02 0.00261
+## train_accuracy 9.70e-01 0.00177 9.70e-01 0.00261
+## train_events   4.69e+03 0.00000 4.69e+03 0.00000
+## train_count    3.09e+04 0.00000 3.09e+04 0.00000
+## test_error     6.65e-02 0.00807 6.59e-02 0.01083
+## test_accuracy  9.33e-01 0.00807 9.34e-01 0.01083
+## test_events    1.17e+03 0.00000 1.17e+03 0.00000
+## test_count     7.71e+03 0.00000 7.71e+03 0.00000
+ + + + + + + + + + + + + + + + + + + + + + + + + + +
+
+

+RandomForest

+

In the case of Random Forest, the customized pred_fun looks as follows; it is only required because of the majority filter, without it, we could just omit the pred_fun and pred_args arguments below.

+
rf_predfun <- function(object, newdata, fac = NULL) {
+  
+  p_load(nnet)
+  majority <- function(x) {
+    levels(x)[which.is.max(table(x))]
+  }
+  
+  majority_filter <- function(x, fac) {
+    for (lev in levels(fac)) {
+      x[fac == lev] <- majority(x[fac == lev])
+    }
+    x
+  }
+  
+  pred <- predict(object, newdata = newdata)
+  if (!is.null(fac)) pred <- majority_filter(pred, newdata[,fac]) 
+  return(pred)
+}
+

Running sperrorest() takes some time here (Test machine: 3,2 Ghz Intel I-5 with 4 physical cores).

+
res_rf_sp <- sperrorest(fo, data = maipo, coords = c("utmx","utmy"), 
+                        model_fun = randomForest,
+                        pred_fun = rf_predfun,
+                        pred_args = list(fac = "field"),
+                        smp_fun = partition_factor_cv,
+                        smp_args = list(fac = "field",
+                                        repetition = 1:50, nfold = 5),
+                        error_rep = TRUE, error_fold = TRUE,
+                        benchmark = TRUE, progress = 2)
+
## Mon Feb 27 20:56:01 2017 Repetition 1 
+## Mon Feb 27 20:57:12 2017 Repetition 2 
+## Mon Feb 27 20:58:20 2017 Repetition 3 
+## Mon Feb 27 20:59:29 2017 Repetition 4 
+## Mon Feb 27 21:00:36 2017 Repetition 5 
+## Mon Feb 27 21:01:46 2017 Repetition 6 
+## Mon Feb 27 21:02:55 2017 Repetition 7 
+## Mon Feb 27 21:04:01 2017 Repetition 8 
+## Mon Feb 27 21:05:07 2017 Repetition 9 
+## Mon Feb 27 21:06:16 2017 Repetition 10 
+## Mon Feb 27 21:07:23 2017 Repetition 11 
+## Mon Feb 27 21:08:30 2017 Repetition 12 
+## Mon Feb 27 21:09:38 2017 Repetition 13 
+## Mon Feb 27 21:10:45 2017 Repetition 14 
+## Mon Feb 27 21:11:53 2017 Repetition 15 
+## Mon Feb 27 21:13:01 2017 Repetition 16 
+## Mon Feb 27 21:14:09 2017 Repetition 17 
+## Mon Feb 27 21:15:16 2017 Repetition 18 
+## Mon Feb 27 21:16:23 2017 Repetition 19 
+## Mon Feb 27 21:17:31 2017 Repetition 20 
+## Mon Feb 27 21:18:39 2017 Repetition 21 
+## Mon Feb 27 21:19:46 2017 Repetition 22 
+## Mon Feb 27 21:20:53 2017 Repetition 23 
+## Mon Feb 27 21:22:03 2017 Repetition 24 
+## Mon Feb 27 21:23:13 2017 Repetition 25 
+## Mon Feb 27 21:24:23 2017 Repetition 26 
+## Mon Feb 27 21:25:32 2017 Repetition 27 
+## Mon Feb 27 21:26:39 2017 Repetition 28 
+## Mon Feb 27 21:27:47 2017 Repetition 29 
+## Mon Feb 27 21:28:55 2017 Repetition 30 
+## Mon Feb 27 21:30:03 2017 Repetition 31 
+## Mon Feb 27 21:31:11 2017 Repetition 32 
+## Mon Feb 27 21:32:18 2017 Repetition 33 
+## Mon Feb 27 21:33:25 2017 Repetition 34 
+## Mon Feb 27 21:34:33 2017 Repetition 35 
+## Mon Feb 27 21:35:40 2017 Repetition 36 
+## Mon Feb 27 21:36:47 2017 Repetition 37 
+## Mon Feb 27 21:37:54 2017 Repetition 38 
+## Mon Feb 27 21:39:02 2017 Repetition 39 
+## Mon Feb 27 21:40:09 2017 Repetition 40 
+## Mon Feb 27 21:41:17 2017 Repetition 41 
+## Mon Feb 27 21:42:24 2017 Repetition 42 
+## Mon Feb 27 21:43:31 2017 Repetition 43 
+## Mon Feb 27 21:44:38 2017 Repetition 44 
+## Mon Feb 27 21:45:46 2017 Repetition 45 
+## Mon Feb 27 21:46:54 2017 Repetition 46 
+## Mon Feb 27 21:48:01 2017 Repetition 47 
+## Mon Feb 27 21:49:07 2017 Repetition 48 
+## Mon Feb 27 21:50:15 2017 Repetition 49 
+## Mon Feb 27 21:51:21 2017 Repetition 50 
+## Mon Feb 27 21:52:27 2017 Done.
+
+res_rf_sp$benchmark$runtime_performance
+## Time difference of 56.4 mins
+
summary(res_rf_sp$error_rep$test_error)
+
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
+##  0.0630  0.0827  0.0871  0.0868  0.0928  0.1100
+
summary(res_rf_sp$error_rep$test_accuracy)
+
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
+##   0.890   0.907   0.913   0.913   0.917   0.937
+

What a surprise! RandomForest classification isn’t that good after all, if we acknowledge that in ‘real life’ we wouldn’t be making predictions in situations where the class membership of other grid cells in the same field is known in the training stage. So spatial dependence does matter.

+
+
+
+

+Usage Advices

+

Given all the different sampling functions and the required custom predict functions (e.g. rf_predfun()) in this example, you might be a little confused which function to use for your use case.
+If you want to do a “normal”, i.e. non-spatial cross-validation we recommend to use partition_cv() as smp_fun in sperrorest(). If you want to perform a spatial cross-validation (and you do not have a grouping structure like fields in this example), partition_kmeans() takes care of spatial partitioning. In most cases you can simply use the generic predict() method for your model (= skip this argument in sperrorest()). Check our “custom model and predict functions” vignette for more information on cases where adjustments are needed.

+

For further questions/issues, please open an issue at our Github repo.

+
+
+

+References

+
+
+

Breiman, Leo. 1996. “Bagging Predictors.” Machine Learning 24 (2). Springer Nature: 123–40. doi:10.1007/bf00058655.

+
+
+

———. 2001. “Random Forests.” Machine Learning 45 (1). Springer Nature: 5–32. doi:10.1023/a:1010933404324.

+
+
+

Brenning, A. 2005. “Spatial Prediction Models for Landslide Hazards: Review, Comparison and Evaluation.” Natural Hazards and Earth System Science 5 (6). Copernicus GmbH: 853–62. doi:10.5194/nhess-5-853-2005.

+
+
+

Goetz, J.N., A. Brenning, H. Petschko, and P. Leopold. 2015. “Evaluating Machine Learning and Statistical Prediction Techniques for Landslide Susceptibility Modeling.” Computers & Geosciences 81 (August). Elsevier BV: 1–11. doi:10.1016/j.cageo.2015.04.007.

+
+
+

Hothorn, Torsten, and Berthold Lausen. 2005. “Bundling Classifiers by Bagging Trees.” Computational Statistics & Data Analysis 49 (4). Elsevier BV: 1068–78. doi:10.1016/j.csda.2004.06.019.

+
+
+

James, Gareth, Daniela Witten, Trevor Hastie, and Robert Tibshirani. 2013. An Introduction to Statistical Learning. Springer New York. doi:10.1007/978-1-4614-7138-7.

+
+
+

Knudby, Anders, Alexander Brenning, and Ellsworth LeDrew. 2010. “New Approaches to Modelling Fishhabitat Relationships.” Ecological Modelling 221 (3). Elsevier BV: 503–11. doi:10.1016/j.ecolmodel.2009.11.008.

+
+
+

Peña, M.A., and A. Brenning. 2015. “Assessing Fruit-Tree Crop Classification from Landsat-8 Time Series for the Maipo Valley, Chile.” Remote Sensing of Environment 171 (December). Elsevier BV: 234–44. doi:10.1016/j.rse.2015.10.029.

+
+
+
+
+
+ + + +
+ + + +
+ + + diff --git a/docs/authors.html b/docs/authors.html new file mode 100644 index 00000000..bb394e60 --- /dev/null +++ b/docs/authors.html @@ -0,0 +1,152 @@ + + + + + + + + +Citation and Authors • sperrorest + + + + + + + + + + + + + + + + + + + + + + + + + + + +
+
+ + + +
+ +
+
+ + + +

Brenning A (2012). +“Spatial cross-validation and bootstrap for the assessment of prediction rules in remote sensing: the R package 'sperrorest'.” +In IEEE International Symposium on Geoscience and Remote Sensing IGARSS. +In press. +

+
@InProceedings{,
+  author = {Alexander Brenning},
+  title = {Spatial cross-validation and bootstrap for the assessment of prediction rules in remote sensing: the R package 'sperrorest'},
+  booktitle = {IEEE International Symposium on Geoscience and Remote Sensing IGARSS},
+  year = {2012},
+  note = {In press},
+}
+ + +
    +
  • +

    Alexander Brenning. Author, maintainer. +
    0000-0001-6640-679X

    +
  • +
  • +

    Patrick Schratz. Author. +
    0000-0003-0748-6624

    +
  • +
  • +

    Tobias Herrmann. Author. +
    0000-0001-9768-0708

    +
  • +
+ +
+ +
+ + + +
+ + + diff --git a/docs/index.html b/docs/index.html new file mode 100644 index 00000000..f24ab283 --- /dev/null +++ b/docs/index.html @@ -0,0 +1,193 @@ + + + + + + + +Perform Spatial Error Estimation and Variable Importance in Parallel • sperrorest + + + + + + +
+
+ + + +
+
+ + + + +
+ +
+ +

partition_kmeans()has been integrated into mlr (see e865e4). sperrorest is currently not actively developed. We recommend to use mlr for all future (spatial) cross-validation work. We will provide an tutorial for spatial data in the mlr-tutorial soon.

+
+

+General

+

Project Status: Active – The project has reached a stable, usable state and is being actively developed. DOI

+ ++++++ + + + + + + + + + + + + + + + + + + + + + + + + + + +
Resource:CRANTravis CIAppveyor
Platforms:MultipleLinux & macOSWindows
R CMD checkCRAN versionBuild statusBuild status
Test coverageCoverage Status
+
+
+

+CRAN

+

CRAN_Status_Badge Downloads

+
+
+
+

+Description

+

Spatial Error Estimation and Variable Importance

+

This package implements spatial error estimation and permutation-based spatial variable importance using different spatial cross-validation and spatial block bootstrap methods. To cite sperrorest in publications, reference the paper by (???). To see the package in action, please check the vignette.

+
+

+Installation

+

Get the released version from CRAN:

+
install.packages("sperrorest")
+

Or the development version from Github:

+
remotes::install_github("pat-s/sperrorest@dev")
+
+
+
+

+References

+
+ +
+
+
+
+ + + +
+ + + +
+ + + diff --git a/docs/jquery.sticky-kit.min.js b/docs/jquery.sticky-kit.min.js new file mode 100644 index 00000000..e2a3c6de --- /dev/null +++ b/docs/jquery.sticky-kit.min.js @@ -0,0 +1,9 @@ +/* + Sticky-kit v1.1.2 | WTFPL | Leaf Corcoran 2015 | http://leafo.net +*/ +(function(){var b,f;b=this.jQuery||window.jQuery;f=b(window);b.fn.stick_in_parent=function(d){var A,w,J,n,B,K,p,q,k,E,t;null==d&&(d={});t=d.sticky_class;B=d.inner_scrolling;E=d.recalc_every;k=d.parent;q=d.offset_top;p=d.spacer;w=d.bottoming;null==q&&(q=0);null==k&&(k=void 0);null==B&&(B=!0);null==t&&(t="is_stuck");A=b(document);null==w&&(w=!0);J=function(a,d,n,C,F,u,r,G){var v,H,m,D,I,c,g,x,y,z,h,l;if(!a.data("sticky_kit")){a.data("sticky_kit",!0);I=A.height();g=a.parent();null!=k&&(g=g.closest(k)); +if(!g.length)throw"failed to find stick parent";v=m=!1;(h=null!=p?p&&a.closest(p):b("
"))&&h.css("position",a.css("position"));x=function(){var c,f,e;if(!G&&(I=A.height(),c=parseInt(g.css("border-top-width"),10),f=parseInt(g.css("padding-top"),10),d=parseInt(g.css("padding-bottom"),10),n=g.offset().top+c+f,C=g.height(),m&&(v=m=!1,null==p&&(a.insertAfter(h),h.detach()),a.css({position:"",top:"",width:"",bottom:""}).removeClass(t),e=!0),F=a.offset().top-(parseInt(a.css("margin-top"),10)||0)-q, +u=a.outerHeight(!0),r=a.css("float"),h&&h.css({width:a.outerWidth(!0),height:u,display:a.css("display"),"vertical-align":a.css("vertical-align"),"float":r}),e))return l()};x();if(u!==C)return D=void 0,c=q,z=E,l=function(){var b,l,e,k;if(!G&&(e=!1,null!=z&&(--z,0>=z&&(z=E,x(),e=!0)),e||A.height()===I||x(),e=f.scrollTop(),null!=D&&(l=e-D),D=e,m?(w&&(k=e+u+c>C+n,v&&!k&&(v=!1,a.css({position:"fixed",bottom:"",top:c}).trigger("sticky_kit:unbottom"))),eb&&!v&&(c-=l,c=Math.max(b-u,c),c=Math.min(q,c),m&&a.css({top:c+"px"})))):e>F&&(m=!0,b={position:"fixed",top:c},b.width="border-box"===a.css("box-sizing")?a.outerWidth()+"px":a.width()+"px",a.css(b).addClass(t),null==p&&(a.after(h),"left"!==r&&"right"!==r||h.append(a)),a.trigger("sticky_kit:stick")),m&&w&&(null==k&&(k=e+u+c>C+n),!v&&k)))return v=!0,"static"===g.css("position")&&g.css({position:"relative"}), +a.css({position:"absolute",bottom:d,top:"auto"}).trigger("sticky_kit:bottom")},y=function(){x();return l()},H=function(){G=!0;f.off("touchmove",l);f.off("scroll",l);f.off("resize",y);b(document.body).off("sticky_kit:recalc",y);a.off("sticky_kit:detach",H);a.removeData("sticky_kit");a.css({position:"",bottom:"",top:"",width:""});g.position("position","");if(m)return null==p&&("left"!==r&&"right"!==r||a.insertAfter(h),h.remove()),a.removeClass(t)},f.on("touchmove",l),f.on("scroll",l),f.on("resize", +y),b(document.body).on("sticky_kit:recalc",y),a.on("sticky_kit:detach",H),setTimeout(l,0)}};n=0;for(K=this.length;n + + + + + diff --git a/docs/news/index.html b/docs/news/index.html new file mode 100644 index 00000000..5a5f02b2 --- /dev/null +++ b/docs/news/index.html @@ -0,0 +1,331 @@ + + + + + + + + +All news • sperrorest + + + + + + + + + + + + + + + + + + + + + + + + + + + +
+
+ + + +
+ +
+ +
+ + +
+
+

+sperrorest 2.1.1 (15-Oct-2017)

+
+

+Bugfixes:

+
    +
  • +train_fun and test_fun are now handled correctly and eventual sub-sampling is correctly reflected to the resulting ‘resampling’ object
  • +
+
+
+
+

+sperrorest 2.1.0 (25-Sep-2017)

+
+

+Features:

+
    +
  • error handling during model fitting & performance evaluation: If a model does not converge for some folds or an error occurs during performance calculation, results of this fold are set to NA and a message is printed to the console. sperrorest() will continue normally and uses the successful folds to calculate the repetition error. This helps to run CV with many repetitions using models which do not always converge like maxnet(), gamm() or svm().
  • +
+
+
+

+Bugfixes:

+
    +
  • Size of example data set ecuador has been adjusted to avoid exact duplicates of partitions when using partition_kmeans().
  • +
+
+
+
+

+sperrorest 2.0.1 (20-Jul-2017)

+
+

+Bugfixes:

+
    +
  • Fixes a bug which caused equal importance of all predictors when performing permutation-based variable importance assessment
  • +
+
+
+
+

+sperrorest 2.0.0 (12-Jun-2017)

+
+

+Major:

+
    +
  • integration of parsperrorest() into sperrorest().
  • +
  • by default, sperrorest() now runs in parallel using all available cores.
  • +
  • +runfolds() and runreps() are now doing the heavy lifting in the background. All modes are now running on the same code base. Before, all parallel modes were running on different code implementations.
  • +
  • function and argument name changes to ‘snake_case’
  • +
+
+
+

+Features:

+
    +
  • new (parallel) modes: +
      +
    • +apply: calls pbmclapply() on Unix and pbapply() on Windows.
    • +
    • +future: calls future_lapply() with various future options (multiprocess, multicore, etc.).
    • +
    • +foreach: foreach() with various future options (multiprocess, multicore, etc.). Default option to cluster. This is also the overall default mode for sperrorest().
    • +
    • +sequential: sequential execution using future backend.
    • +
    +
  • +
  • RMSE instead of MSE as error measure
  • +
  • You can now pass also single values to repetition argument of sperrorest(). Specifying a range like repetition = 1:10 will also stay valid.
  • +
  • New vignette sperrorest::parallel-modes comparing the various parallel modes.
  • +
  • New vignette sperrorest::custom-pred-and-model-functions explaining why and how custom defined model and predict functions are needed for some model setups.
  • +
+
+
+

+Misc:

+
    +
  • Limit workers to number of repetitions if number of cores > number of repetitions. This ensures that no unnecessary workers are started and increases robustness of parallel execuction.
  • +
  • documentation improvements.
  • +
  • +do_try argument has been removed.
  • +
  • +error.fold, error.rep and err.train arguments have been removed because they are all calculated by default now.
  • +
+
+
+

+Bugfixes:

+
    +
  • partial matching of arguments
  • +
  • account for factor levels only present in test data but missing in training data. Previously, sperrorest errored during the predict step when this case occured. Now, this is accounted for and an informative message is given.
  • +
+
+
+
+

+sperrorest 1.0.0 (08-Mar-2017)

+
+

+New features:

+
    +
  • add parsperrorest(): This function lets you exexute sperrorest() in parallel. It includes two modes (par.mode = 1 and par.mode = 2) which use different parallelization approaches in the background. See ?parsperrorest() for more details.

  • +
  • add partition.factor.cv(): This resampling method enables partitioning based on a given factor variable. This can be used, for example, to resample agricultural data, that is grouped by fields, at the agricultural field level in order to preserve spatial autocorrelation within fields.

  • +
  • sperrorest() and parsperrorest(): Add benchmark item to returned object giving information about execution time, used cores and other system details.

  • +
+

Changes to functions:

+
    +
  • sperrorest(): Change argument naming. err.unpooled is now error.fold and err.pooled is now error.rep

  • +
  • +sperrorest() and parsperrorest(): Change order and naming of returned object +
      +
    • class sperrorestpoolederror is now sperrorestreperror +
    • +
    • returned sperrorest list is now ordered as follows: +
        +
      1. error.rep
      2. +
      3. error.fold
      4. +
      5. importance
      6. +
      7. benchmarks
      8. +
      9. package.version
      10. +
      +
    • +
    +
  • +
+
+ +
+
+

+sperrorest 0.2-1 (19 June 2012)

+
    +
  • First release on CRAN
  • +
+
+
+

+sperrorest 0.2-0 (19 June 2012)

+
    +
  • last pre-release version
  • +
  • replaced Stoyan’s data set with Jannes Muenchow’s data, adapted examples
  • +
+
+
+

+sperrorest 0.1-5 (1 Mar 2012)

+
    +
  • made training set estimation optional
  • +
  • robustified code using try()
  • +
+
+
+

+sperrorest 0.1-2 (29 Jan 2012)

+
    +
  • internal release 0.1-2
  • +
  • some bug fixes, e.g. in err.* functions
  • +
  • improved support of pooled versus unpooled error estimation
  • +
  • changed some argument names
  • +
  • this version was used for Angie’s analyses
  • +
+
+
+

+sperrorest 0.1-1 (29 Dec 2011)

+
    +
  • built internal release 0.1-1
  • +
+
+
+

+sperrorest 0.1

+
    +
  • general code development (2009 - 2011)
  • +
  • package project and documentation created (Oct-Dec 2011)
  • +
+
+
+
+ + + +
+ +
+ + +
+

Site built with pkgdown.

+
+ +
+
+ + + diff --git a/docs/pkgdown.css b/docs/pkgdown.css new file mode 100644 index 00000000..209ce57f --- /dev/null +++ b/docs/pkgdown.css @@ -0,0 +1,163 @@ +/* Sticker footer */ +body > .container { + display: flex; + padding-top: 60px; + min-height: calc(100vh); + flex-direction: column; +} + +body > .container .row { + flex: 1; +} + +footer { + margin-top: 45px; + padding: 35px 0 36px; + border-top: 1px solid #e5e5e5; + color: #666; + display: flex; +} +footer p { + margin-bottom: 0; +} +footer div { + flex: 1; +} +footer .pkgdown { + text-align: right; +} +footer p { + margin-bottom: 0; +} + +img.icon { + float: right; +} + +img { + max-width: 100%; +} + +/* Section anchors ---------------------------------*/ + +a.anchor { + margin-left: -30px; + display:inline-block; + width: 30px; + height: 30px; + visibility: hidden; + + background-image: url(./link.svg); + background-repeat: no-repeat; + background-size: 20px 20px; + background-position: center center; +} + +.hasAnchor:hover a.anchor { + visibility: visible; +} + +@media (max-width: 767px) { + .hasAnchor:hover a.anchor { + visibility: hidden; + } +} + + +/* Fixes for fixed navbar --------------------------*/ + +.contents h1, .contents h2, .contents h3, .contents h4 { + padding-top: 60px; + margin-top: -60px; +} + +/* Static header placement on mobile devices */ +@media (max-width: 767px) { + .navbar-fixed-top { + position: absolute; + } + .navbar { + padding: 0; + } +} + + +/* Sidebar --------------------------*/ + +#sidebar { + margin-top: 30px; +} +#sidebar h2 { + font-size: 1.5em; + margin-top: 1em; +} + +#sidebar h2:first-child { + margin-top: 0; +} + +#sidebar .list-unstyled li { + margin-bottom: 0.5em; +} + +/* Reference index & topics ----------------------------------------------- */ + +.ref-index th {font-weight: normal;} +.ref-index h2 {font-size: 20px;} + +.ref-index td {vertical-align: top;} +.ref-index .alias {width: 40%;} +.ref-index .title {width: 60%;} + +.ref-index .alias {width: 40%;} +.ref-index .title {width: 60%;} + +.ref-arguments th {text-align: right; padding-right: 10px;} +.ref-arguments th, .ref-arguments td {vertical-align: top;} +.ref-arguments .name {width: 20%;} +.ref-arguments .desc {width: 80%;} + +/* Nice scrolling for wide elements --------------------------------------- */ + +table { + display: block; + overflow: auto; +} + +/* Syntax highlighting ---------------------------------------------------- */ + +pre { + word-wrap: normal; + word-break: normal; + border: 1px solid #eee; +} + +pre, code { + background-color: #f8f8f8; + color: #333; +} + +pre .img { + margin: 5px 0; +} + +pre .img img { + background-color: #fff; + display: block; + height: auto; +} + +code a, pre a { + color: #375f84; +} + +.fl {color: #1514b5;} +.fu {color: #000000;} /* function */ +.ch,.st {color: #036a07;} /* string */ +.kw {color: #264D66;} /* keyword */ +.co {color: #888888;} /* comment */ + +.message { color: black; font-weight: bolder;} +.error { color: orange; font-weight: bolder;} +.warning { color: #6A0366; font-weight: bolder;} + diff --git a/docs/pkgdown.js b/docs/pkgdown.js new file mode 100644 index 00000000..4b817132 --- /dev/null +++ b/docs/pkgdown.js @@ -0,0 +1,45 @@ +$(function() { + $("#sidebar").stick_in_parent({offset_top: 40}); + $('body').scrollspy({ + target: '#sidebar', + offset: 60 + }); + + var cur_path = paths(location.pathname); + $("#navbar ul li a").each(function(index, value) { + if (value.text == "Home") + return; + if (value.getAttribute("href") === "#") + return; + + var path = paths(value.pathname); + if (is_prefix(cur_path, path)) { + // Add class to parent
  • , and enclosing
  • if in dropdown + var menu_anchor = $(value); + menu_anchor.parent().addClass("active"); + menu_anchor.closest("li.dropdown").addClass("active"); + } + }); +}); + +function paths(pathname) { + var pieces = pathname.split("/"); + pieces.shift(); // always starts with / + + var end = pieces[pieces.length - 1]; + if (end === "index.html" || end === "") + pieces.pop(); + return(pieces); +} + +function is_prefix(needle, haystack) { + if (needle.length > haystack.lengh) + return(false); + + for (var i = 0; i < haystack.length; i++) { + if (needle[i] != haystack[i]) + return(false); + } + + return(true); +} diff --git a/docs/reference/add.distance.html b/docs/reference/add.distance.html new file mode 100644 index 00000000..98e7896f --- /dev/null +++ b/docs/reference/add.distance.html @@ -0,0 +1,202 @@ + + + + + + + + +Add distance information to resampling objects — add.distance • sperrorest + + + + + + + + + + + + + + + + + + + + + + + + + + + +
    +
    + + + +
    + +
    +
    + + + +

    Add distance information to resampling objects

    + + +
    add.distance(object, ...)
    +
    +# S3 method for resampling
    +add.distance(object, data, coords = c("x", "y"), ...)
    +
    +# S3 method for represampling
    +add.distance(object, ...)
    + +

    Arguments

    + + + + + + + + + + + + + + + + + + +
    object

    resampling or represampling object.

    ...

    Additional arguments to dataset_distance and +add.distance.resampling, respectively.

    data

    data.frame containing at least the columns specified +by coords

    coords

    (ignored by partition_cv)

    + +

    Value

    + +

    A resampling or represampling object +containing an additional. +$distance component in each resampling object. +The distance component is a single numeric value indicating, for +each train / test pair, the (by default, mean) +nearest-neighbour distance between the two sets.

    + +

    Details

    + +

    Nearest-neighbour distances are calculated for each sample in the +test set. These nrow(???$test) nearest-neighbour distances are then +averaged. Aggregation methods other than mean can be chosen using +the fun argument, which will be passed on to +dataset_distance.

    + +

    See also

    + +

    dataset_distance represampling +resampling

    + + +

    Examples

    +
    data(ecuador) # Muenchow et al. (2012), see ?ecuador +nsp.parti <- partition_cv(ecuador) +sp.parti <- partition_kmeans(ecuador) +nsp.parti <- add.distance(nsp.parti, ecuador) +sp.parti <- add.distance(sp.parti, ecuador) +# non-spatial partioning: very small test-training distance: +nsp.parti[[1]][[1]]$distance
    #> [1] 53.79223
    # spatial partitioning: more substantial distance, depending on number of +# folds etc. +sp.parti[[1]][[1]]$distance
    #> [1] 390.1742
    +
    +
    + +
    + +
    + + +
    +

    Site built with pkgdown.

    +
    + +
    +
    + + + diff --git a/docs/reference/as.represampling.html b/docs/reference/as.represampling.html new file mode 100644 index 00000000..5058569e --- /dev/null +++ b/docs/reference/as.represampling.html @@ -0,0 +1,227 @@ + + + + + + + + +Resampling objects with repetition, i.e. sets of partitionings or boostrap +samples — as.represampling • sperrorest + + + + + + + + + + + + + + + + + + + + + + + + + + + +
    +
    + + + +
    + +
    +
    + + + +

    Functions for handling represampling objects, i.e. lists of +resampling objects.

    + + +
    as.represampling(object, ...)
    +
    +# S3 method for list
    +as.represampling(object, ...)
    +
    +# S3 method for represampling
    +print(x, ...)
    +
    +is_represampling(object)
    + +

    Arguments

    + + + + + + + + + + + + + + +
    object

    object of class represampling, or a list to be coerced +to this class.

    ...

    currently not used.

    x

    object of class represampling.

    + +

    Value

    + +

    as.represampling methods return an object of class +represampling with the contents of object.

    + +

    Details

    + +

    represampling objects are (names) lists of +resampling objects. Such objects are typically created by +partition_cv, partition_kmeans, +represampling_disc_bootstrap and related functions.

    +

    In r-repeated k-fold cross-validation, for example, the +corresponding represampling object has length r, and each of +its r resampling objects has length k. + as.resampling_list coerces object to class represampling +while coercing its elements to resampling objects. +Some validity checks are performed.

    + +

    See also

    + +

    resampling, partition_cv, +partition_kmeans, +represampling_disc_bootstrap, etc.

    + + +

    Examples

    +
    data(ecuador) # Muenchow et al. (2012), see ?ecuador +# Partitioning by elevation classes in 200 m steps: +fac <- factor( as.character( floor( ecuador$dem / 300 ) ) ) +summary(fac)
    #> 10 5 6 7 8 9 +#> 4 21 246 255 147 78
    parti <- as.resampling(fac) +# a list of lists specifying sets of training and test sets, +# using each factor at a time as the test set: +str(parti)
    #> List of 6 +#> $ 10:List of 2 +#> ..$ train: int [1:747] 1 2 3 4 5 6 7 8 9 10 ... +#> ..$ test : int [1:4] 535 566 684 734 +#> $ 5 :List of 2 +#> ..$ train: int [1:730] 1 2 3 4 5 6 7 8 9 10 ... +#> ..$ test : int [1:21] 42 77 93 106 115 139 250 332 385 405 ... +#> $ 6 :List of 2 +#> ..$ train: int [1:505] 2 4 7 8 9 12 13 14 15 17 ... +#> ..$ test : int [1:246] 1 3 5 6 10 11 16 19 23 29 ... +#> $ 7 :List of 2 +#> ..$ train: int [1:496] 1 3 5 6 7 8 10 11 12 13 ... +#> ..$ test : int [1:255] 2 4 9 18 20 22 24 26 28 30 ... +#> $ 8 :List of 2 +#> ..$ train: int [1:604] 1 2 3 4 5 6 7 9 10 11 ... +#> ..$ test : int [1:147] 8 12 14 15 21 25 27 32 46 54 ... +#> $ 9 :List of 2 +#> ..$ train: int [1:673] 1 2 3 4 5 6 8 9 10 11 ... +#> ..$ test : int [1:78] 7 13 17 35 44 75 78 79 88 97 ... +#> - attr(*, "class")= chr "resampling"
    summary(parti)
    #> n.train n.test +#> 10 747 4 +#> 5 730 21 +#> 6 505 246 +#> 7 496 255 +#> 8 604 147 +#> 9 673 78
    +
    + +
    + +
    + + +
    +

    Site built with pkgdown.

    +
    + +
    +
    + + + diff --git a/docs/reference/as.resampling.html b/docs/reference/as.resampling.html new file mode 100644 index 00000000..55d90ab1 --- /dev/null +++ b/docs/reference/as.resampling.html @@ -0,0 +1,291 @@ + + + + + + + + +Resampling objects such as partitionings or bootstrap samples — as.resampling • sperrorest + + + + + + + + + + + + + + + + + + + + + + + + + + + +
    +
    + + + +
    + +
    +
    + + + +

    Create/coerce and print resampling objects, e.g., partitionings or boostrap +samples derived from a data set.

    + + +
    as.resampling(object, ...)
    +
    +# S3 method for default
    +as.resampling(object, ...)
    +
    +# S3 method for factor
    +as.resampling(object, ...)
    +
    +# S3 method for list
    +as.resampling(object, ...)
    +
    +validate.resampling(object)
    +
    +is.resampling(x, ...)
    +
    +# S3 method for resampling
    +print(x, ...)
    + +

    Arguments

    + + + + + + + + + + + + + + +
    object

    depending on the function/method, a list or a vector of type +factor defining a partitioning of the dataset.

    ...

    currently not used.

    x

    object of class resampling.

    + +

    Value

    + +

    as.resampling methods: An object of class resampling.

    + +

    Details

    + +

    A resampling object is a list of lists defining a set of +training and test samples.

    +

    In the case of k-fold cross-validation partitioning, for example, +the corresponding resampling object would be of length k, +i.e. contain k lists. Each of these k lists defines a training +set of size n(k-1)/k (where n is the overall sample size), and +a test set of size n/k. +The resampling object does, however, not contain the data itself, but +only indices between 1 and n identifying the selection +(see Examples).

    +

    Another example is bootstrap resampling. represampling_bootstrap +with argument oob = TRUE generates represampling objects +with indices of a bootstrap sample in the train component and indices +of the out-of-bag sample in the test component (see Examples below). + as.resampling.factor: For each factor level of the input variable, +as.resampling.factor determines the indices of samples in this level +(= test samples) and outside this level (= training samples). Empty levels of +object are dropped without warning. + as.resampling_list checks if the list in object has a valid +resampling object structure (with components train and +test etc.) and assigns the class attribute 'resampling' if +successful.

    + +

    See also

    + +

    represampling, partition_cv, +partition_kmeans, represampling_bootstrap, etc.

    + + +

    Examples

    +
    data(ecuador) # Muenchow et al. (2012), see ?ecuador + +# Partitioning by elevation classes in 200 m steps: +parti <- factor( as.character( floor( ecuador$dem / 200 ) ) ) +smp <- as.resampling(parti) +summary(smp)
    #> n.train n.test +#> 10 600 151 +#> 11 585 166 +#> 12 660 91 +#> 13 641 110 +#> 14 727 24 +#> 15 747 4 +#> 8 730 21 +#> 9 567 184
    # Compare: +summary(parti)
    #> 10 11 12 13 14 15 8 9 +#> 151 166 91 110 24 4 21 184
    +# k-fold (non-spatial) cross-validation partitioning: +parti <- partition_cv(ecuador) +parti <- parti[[1]] # the first (and only) resampling object in parti +# data corresponding to the test sample of the first fold: +str( ecuador[ parti[[1]]$test , ])
    #> 'data.frame': 76 obs. of 13 variables: +#> $ x : num 714042 715282 713962 713412 714902 ... +#> $ y : num 9558482 9557602 9561082 9560472 9559262 ... +#> $ dem : num 2408 2837 1839 1869 2363 ... +#> $ slope : num 24.1 34.4 63.4 16.9 50.7 ... +#> $ hcurv : num 0.00659 -0.02191 -0.04951 -0.00156 -0.01407 ... +#> $ vcurv : num 0.01041 -0.00579 -0.00529 0.00406 0.00547 ... +#> $ carea : num 773 2247 3282 2421 519 ... +#> $ cslope : num 27.5 37.7 29.1 31 35.3 ... +#> $ distroad : num 300 300 173 300 300 ... +#> $ slides : Factor w/ 2 levels "FALSE","TRUE": 2 2 2 2 2 2 2 2 2 2 ... +#> $ distdeforest : num 300 300 33.8 52.5 300 ... +#> $ distslidespast: num 6 100 100 39 59 100 37 100 25 0 ... +#> $ log.carea : num 2.89 3.35 3.52 3.38 2.72 ...
    # the corresponding training sample - larger: +str( ecuador[ parti[[1]]$train , ])
    #> 'data.frame': 675 obs. of 13 variables: +#> $ x : num 712882 715232 715392 715042 715382 ... +#> $ y : num 9560002 9559582 9560172 9559312 9560142 ... +#> $ dem : num 1912 2199 1989 2320 2021 ... +#> $ slope : num 25.6 23.2 40.5 42.9 42 ... +#> $ hcurv : num -0.00681 -0.00501 -0.01919 -0.01106 0.00958 ... +#> $ vcurv : num -0.00029 -0.00649 -0.04051 -0.04634 0.02642 ... +#> $ carea : num 5577 1399 351155 501 671 ... +#> $ cslope : num 34.4 30.7 32.8 33.9 41.6 ... +#> $ distroad : num 300 300 300 300 300 ... +#> $ slides : Factor w/ 2 levels "FALSE","TRUE": 2 2 2 2 2 2 2 2 2 2 ... +#> $ distdeforest : num 15 300 300 300 300 9.15 300 300 300 0 ... +#> $ distslidespast: num 9 21 40 100 21 2 100 100 41 5 ... +#> $ log.carea : num 3.75 3.15 5.55 2.7 2.83 ...
    +# Bootstrap training sets, out-of-bag test sets: +parti <- represampling_bootstrap(ecuador, oob = TRUE) +parti <- parti[[1]] # the first (and only) resampling object in parti +# out-of-bag test sample: approx. one-third of nrow(ecuador): +str( ecuador[ parti[[1]]$test , ])
    #> 'data.frame': 290 obs. of 13 variables: +#> $ x : num 715232 715042 715382 712802 714842 ... +#> $ y : num 9559582 9559312 9560142 9559952 9558892 ... +#> $ dem : num 2199 2320 2021 1838 2483 ... +#> $ slope : num 23.2 42.9 42 52.1 68.8 ... +#> $ hcurv : num -0.00501 -0.01106 0.00958 0.00183 -0.04921 ... +#> $ vcurv : num -0.00649 -0.04634 0.02642 -0.09203 -0.12438 ... +#> $ carea : num 1399 501 671 634 754 ... +#> $ cslope : num 30.7 33.9 41.6 30.3 53.7 ... +#> $ distroad : num 300 300 300 300 300 ... +#> $ slides : Factor w/ 2 levels "FALSE","TRUE": 2 2 2 2 2 2 2 2 2 2 ... +#> $ distdeforest : num 300 300 300 9.15 300 ... +#> $ distslidespast: num 21 100 21 2 100 5 20 100 100 100 ... +#> $ log.carea : num 3.15 2.7 2.83 2.8 2.88 ...
    # bootstrap training sample: same size as nrow(ecuador): +str( ecuador[ parti[[1]]$train , ])
    #> 'data.frame': 751 obs. of 13 variables: +#> $ x : num 715382 713132 715432 714892 715722 ... +#> $ y : num 9558062 9560672 9558592 9559282 9557532 ... +#> $ dem : num 2799 1897 2622 2374 3097 ... +#> $ slope : num 50.6 24.7 19 42.8 28.8 ... +#> $ hcurv : num -0.00812 0.00306 -0.00301 -0.01057 0.02327 ... +#> $ vcurv : num -0.02208 -0.00436 -0.01099 0.02427 0.01833 ... +#> $ carea : num 951 2603 1146 381 300 ... +#> $ cslope : num 50.9 29 20.1 27.2 25.9 ... +#> $ distroad : num 300 10 300 300 300 300 300 300 300 300 ... +#> $ slides : Factor w/ 2 levels "FALSE","TRUE": 1 2 1 2 1 1 2 2 2 1 ... +#> $ distdeforest : num 300 166 300 300 300 ... +#> $ distslidespast: num 100 2 26 46 100 55 100 100 11 45 ... +#> $ log.carea : num 2.98 3.42 3.06 2.58 2.48 ...
    +
    +
    + +
    + +
    + + +
    +

    Site built with pkgdown.

    +
    + +
    +
    + + + diff --git a/docs/reference/as.tilename.html b/docs/reference/as.tilename.html new file mode 100644 index 00000000..69c3c2de --- /dev/null +++ b/docs/reference/as.tilename.html @@ -0,0 +1,182 @@ + + + + + + + + +Alphanumeric tile names — as.tilename • sperrorest + + + + + + + + + + + + + + + + + + + + + + + + + + + +
    +
    + + + +
    + +
    +
    + + + +

    Functions for generating and handling alphanumeric tile names of the +form 'X2:Y7' as used by partition_tiles and +represampling_tile_bootstrap.

    + + +
    as.tilename(x, ...)
    +
    +# S3 method for numeric
    +as.tilename(x, ...)
    +
    +# S3 method for tilename
    +as.character(x, ...)
    +
    +# S3 method for tilename
    +as.numeric(x, ...)
    +
    +# S3 method for character
    +as.tilename(x, ...)
    +
    +# S3 method for tilename
    +print(x, ...)
    + +

    Arguments

    + + + + + + + + + + +
    x

    object of class tilename, character, or +numeric (of length 2).

    ...

    additional arguments (currently ignored).

    + +

    Value

    + +

    object of class tilename, character, or numeric +vector of length 2

    + +

    See also

    + +

    partition_tiles, represampling, +represampling_tile_bootstrap

    + + +

    Examples

    +
    tnm <- as.tilename(c(2,3)) +tnm # 'X2:Y3'
    #> [1] "X2:Y3"
    as.numeric(tnm) # c(2,3)
    #> Warning: NAs introduced by coercion
    #> [1] NA
    +
    + +
    + +
    + + +
    +

    Site built with pkgdown.

    +
    + +
    +
    + + + diff --git a/docs/reference/dataset_distance.html b/docs/reference/dataset_distance.html new file mode 100644 index 00000000..9d7beedf --- /dev/null +++ b/docs/reference/dataset_distance.html @@ -0,0 +1,200 @@ + + + + + + + + +Calculate mean nearest-neighbour distance between point datasets — dataset_distance • sperrorest + + + + + + + + + + + + + + + + + + + + + + + + + + + +
    +
    + + + +
    + +
    +
    + + + +

    dataset_distance calculates Euclidean nearest-neighbour distances +between two point datasets and summarizes these distances using some +function, by default the mean.

    + + +
    dataset_distance(d1, d2, x_name = "x", y_name = "y", fun = mean,
    +  method = "euclidean", ...)
    + +

    Arguments

    + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
    d1

    a data.frame with (at least) columns with names given by +x_name and y_name; these contain the x and y coordinates, +respectively.

    d2

    see d1 - second set of points

    x_name

    name of column in d1 and d2 containing the x +coordinates of points.

    y_name

    same for y coordinates

    fun

    function to be applied to the vector of nearest-neighbor +distances of d1 from d2.

    method

    type of distance metric to be used; only 'euclidean' +is currently supported.

    ...

    additional arguments to fun.

    + +

    Value

    + +

    depends on fun; typically (e.g., mean) a numeric vector +of length 1

    + +

    Details

    + +

    Nearest-neighbour distances are calculated for each point in +d1, resulting in a vector of length nrow(d1), and fun +is applied to this vector.

    + +

    See also

    + +

    add.distance

    + + +

    Examples

    +
    df <- data.frame(x = rnorm(100), y = rnorm(100)) +dataset_distance(df, df) # == 0
    #> [1] 0
    +
    +
    + +
    + +
    + + +
    +

    Site built with pkgdown.

    +
    + +
    +
    + + + diff --git a/docs/reference/ecuador-1.png b/docs/reference/ecuador-1.png new file mode 100644 index 00000000..ae0919ed Binary files /dev/null and b/docs/reference/ecuador-1.png differ diff --git a/docs/reference/ecuador.html b/docs/reference/ecuador.html new file mode 100644 index 00000000..683d50dd --- /dev/null +++ b/docs/reference/ecuador.html @@ -0,0 +1,177 @@ + + + + + + + + +J. Muenchow's Ecuador landslide data set — ecuador • sperrorest + + + + + + + + + + + + + + + + + + + + + + + + + + + +
    +
    + + + +
    + +
    +
    + + + +

    Data set created by Jannes Muenchow, University of Erlangen-Nuremberg, +Germany. +These data should be cited as Muenchow et al. (2012) (see reference below). +This publication also contains additional information on data collection and +the geomorphology of the area. The data set provded here is (a subset of) the +one from the 'natural' part of the RBSF area and corresponds to landslide +distribution in the year 2000.

    + + + +

    Format

    + +

    a data.frame with point samples of landslide and +non-landslide locations in a study area in the Andes of southern Ecuador.

    + +

    References

    + +

    Muenchow, J., Brenning, A., Richter, M., 2012. Geomorphic process +rates of landslides along a humidity gradient in the tropical Andes. +Geomorphology, 139-140: 271-284.

    +

    Brenning, A., 2005. Spatial prediction models for landslide hazards: +review, comparison and evaluation. +Natural Hazards and Earth System Sciences, 5(6): 853-862.

    + + +

    Examples

    +
    data(ecuador) +str(ecuador)
    #> 'data.frame': 751 obs. of 13 variables: +#> $ x : num 712882 715232 715392 715042 715382 ... +#> $ y : num 9560002 9559582 9560172 9559312 9560142 ... +#> $ dem : num 1912 2199 1989 2320 2021 ... +#> $ slope : num 25.6 23.2 40.5 42.9 42 ... +#> $ hcurv : num -0.00681 -0.00501 -0.01919 -0.01106 0.00958 ... +#> $ vcurv : num -0.00029 -0.00649 -0.04051 -0.04634 0.02642 ... +#> $ carea : num 5577 1399 351155 501 671 ... +#> $ cslope : num 34.4 30.7 32.8 33.9 41.6 ... +#> $ distroad : num 300 300 300 300 300 ... +#> $ slides : Factor w/ 2 levels "FALSE","TRUE": 2 2 2 2 2 2 2 2 2 2 ... +#> $ distdeforest : num 15 300 300 300 300 9.15 300 300 300 0 ... +#> $ distslidespast: num 9 21 40 100 21 2 100 100 41 5 ... +#> $ log.carea : num 3.75 3.15 5.55 2.7 2.83 ...
    library(rpart) +ctrl <- rpart.control(cp = 0.02) +fit <- rpart(slides ~ dem + slope + hcurv + vcurv + + log.carea + cslope, data = ecuador, control = ctrl) +par(xpd = TRUE) +plot(fit, compress = TRUE, main = 'Muenchows landslide data set')
    text(fit, use.n = TRUE)
    +
    + +
    + +
    + + +
    +

    Site built with pkgdown.

    +
    + +
    +
    + + + diff --git a/docs/reference/err_default.html b/docs/reference/err_default.html new file mode 100644 index 00000000..ab1b4f1d --- /dev/null +++ b/docs/reference/err_default.html @@ -0,0 +1,323 @@ + + + + + + + + +Default error function — err_default • sperrorest + + + + + + + + + + + + + + + + + + + + + + + + + + + +
    +
    + + + +
    + +
    +
    + + + +

    Calculate a variety of accuracy measures from observations +and predictions of numerical and categorical response variables.

    + + +
    err_default(obs, pred)
    + +

    Arguments

    + + + + + + + + + + +
    obs

    factor, logical, or numeric vector with observations

    pred

    factor, logical, or numeric vector with predictions. Must be of +same type as obs with the exception that pred may be numeric +if obs is factor or logical ('soft' classification).

    + +

    Value

    + +

    A list with (currently) the following components, depending on the +type of prediction problem:

    +
    'hard' classification

    misclassification error, overall accuracy; +if two classes, sensitivity, specificity, positive predictive value (PPV), +negative predictive value (NPV), kappa

    +
    'soft' classification

    area under the ROC curve, error and accuracy +at a obs>0.5 dichotomization, false-positive rate (FPR; 1-specificity) +at 70, 80 and 90 percent sensitivity, true-positive rate (sensitivity) +at 80, 90 and 95 percent specificity

    +
    regression

    bias, standard deviation, mean squared error, +MAD (mad), median, interquartile range (IQR) +of residuals

    + + +

    Note

    + +

    NA values are currently not handled by this function, +i.e. they will result in an error.

    + +

    See also

    + +

    ROCR

    + + +

    Examples

    +
    obs <- rnorm(1000) +# Two mock (soft) classification examples: +err_default( obs > 0, rnorm(1000) ) # just noise
    #> $auroc +#> [1] 0.4621542 +#> +#> $error +#> [1] 0.513 +#> +#> $accuracy +#> [1] 0.487 +#> +#> $sensitivity +#> [1] 0.2727273 +#> +#> $specificity +#> [1] 0.6970297 +#> +#> $fpr70 +#> [1] 0.7366337 +#> +#> $fpr80 +#> [1] 0.829703 +#> +#> $fpr90 +#> [1] 0.9069307 +#> +#> $tpr80 +#> [1] 0.1515152 +#> +#> $tpr90 +#> [1] 0.06868687 +#> +#> $tpr95 +#> [1] 0.04040404 +#> +#> $events +#> [1] 495 +#> +#> $count +#> [1] 1000 +#>
    err_default( obs > 0, obs + rnorm(1000) ) # some discrimination
    #> $auroc +#> [1] 0.8270627 +#> +#> $error +#> [1] 0.259 +#> +#> $accuracy +#> [1] 0.741 +#> +#> $sensitivity +#> [1] 0.6282828 +#> +#> $specificity +#> [1] 0.8514851 +#> +#> $fpr70 +#> [1] 0.219802 +#> +#> $fpr80 +#> [1] 0.3465347 +#> +#> $fpr90 +#> [1] 0.5009901 +#> +#> $tpr80 +#> [1] 0.6888889 +#> +#> $tpr90 +#> [1] 0.5313131 +#> +#> $tpr95 +#> [1] 0.4121212 +#> +#> $events +#> [1] 495 +#> +#> $count +#> [1] 1000 +#>
    # Three mock regression examples: +err_default( obs, rnorm(1000) ) # just noise, but no bias
    #> $bias +#> [1] 0.01646476 +#> +#> $stddev +#> [1] 1.437289 +#> +#> $rmse +#> [1] 1.436665 +#> +#> $mad +#> [1] 1.486945 +#> +#> $median +#> [1] 0.01263483 +#> +#> $iqr +#> [1] 2.004161 +#> +#> $count +#> [1] 1000 +#>
    err_default( obs, obs + rnorm(1000) ) # some association, no bias
    #> $bias +#> [1] -0.05961818 +#> +#> $stddev +#> [1] 1.000054 +#> +#> $rmse +#> [1] 1.00133 +#> +#> $mad +#> [1] 0.9719318 +#> +#> $median +#> [1] -0.05654433 +#> +#> $iqr +#> [1] 1.302193 +#> +#> $count +#> [1] 1000 +#>
    err_default( obs, obs + 1 ) # perfect correlation, but with bias
    #> $bias +#> [1] -1 +#> +#> $stddev +#> [1] 6.429096e-17 +#> +#> $rmse +#> [1] 1 +#> +#> $mad +#> [1] 0 +#> +#> $median +#> [1] -1 +#> +#> $iqr +#> [1] 0 +#> +#> $count +#> [1] 1000 +#>
    +
    +
    + +
    + +
    + + +
    +

    Site built with pkgdown.

    +
    + +
    +
    + + + diff --git a/docs/reference/figures/plotresampling.png b/docs/reference/figures/plotresampling.png new file mode 100644 index 00000000..c7bdd599 Binary files /dev/null and b/docs/reference/figures/plotresampling.png differ diff --git a/docs/reference/get_small_tiles.html b/docs/reference/get_small_tiles.html new file mode 100644 index 00000000..f0de316f --- /dev/null +++ b/docs/reference/get_small_tiles.html @@ -0,0 +1,255 @@ + + + + + + + + +Identify small partitions that need to be fixed. — get_small_tiles • sperrorest + + + + + + + + + + + + + + + + + + + + + + + + + + + +
    +
    + + + +
    + +
    +
    + + + +

    get_small_tiles identifies partitions (tiles) that are too small +according to some defined criterion / criteria (minimum number of samples in +tile and/or minimum fraction of entire dataset).

    + + +
    get_small_tiles(tile, min_n = NULL, min_frac = 0, ignore = c())
    + +

    Arguments

    + + + + + + + + + + + + + + + + + + +
    tile

    factor: tile/partition names for all samples; names must be +coercible to class tilename, i.e. of the form 'X4:Y2' etc.

    min_n

    integer (optional): minimum number of samples per partition_

    min_frac

    numeric >0, <1: minimum relative size of partition as +percentage of sample.

    ignore

    character vector: names of tiles to be ignored, i.e. to be +retained even if the inclusion criteria are not met.

    + +

    Value

    + +

    character vector: names of tiles that are considered 'small' +according to these criteria

    + +

    See also

    + +

    partition_tiles, tilename

    + + +

    Examples

    +
    data(ecuador) # Muenchow et al. (2012), see ?ecuador +# Rectangular partitioning without removal of small tiles: +parti <- partition_tiles(ecuador, nsplit = c(10,10), reassign = FALSE) +summary(parti)
    #> $`1` +#> n.train n.test +#> X1:Y7 688 8 +#> X1:Y8 678 18 +#> X10:Y2 685 11 +#> X10:Y3 688 8 +#> X10:Y4 685 11 +#> X2:Y4 683 13 +#> X2:Y5 690 6 +#> X2:Y6 689 7 +#> X2:Y7 670 26 +#> X2:Y8 674 22 +#> X2:Y9 691 5 +#> X3:Y10 689 7 +#> X3:Y5 675 21 +#> X3:Y6 687 9 +#> X3:Y8 691 5 +#> X3:Y9 676 20 +#> X4:Y10 690 6 +#> X4:Y4 686 10 +#> X4:Y5 685 11 +#> X4:Y6 687 9 +#> X4:Y7 684 12 +#> X4:Y8 690 6 +#> X4:Y9 683 13 +#> X5:Y10 687 9 +#> X5:Y2 689 7 +#> X5:Y3 684 12 +#> X5:Y4 676 20 +#> X5:Y5 691 5 +#> X5:Y6 686 10 +#> X5:Y7 690 6 +#> X5:Y9 689 7 +#> X6:Y1 691 5 +#> X6:Y2 689 7 +#> X6:Y3 685 11 +#> X6:Y4 691 5 +#> X6:Y5 681 15 +#> X6:Y7 689 7 +#> X6:Y8 685 11 +#> X6:Y9 691 5 +#> X7:Y1 687 9 +#> X7:Y10 676 20 +#> X7:Y2 686 10 +#> X7:Y3 688 8 +#> X7:Y4 682 14 +#> X7:Y5 688 8 +#> X7:Y6 687 9 +#> X7:Y7 685 11 +#> X7:Y8 685 11 +#> X7:Y9 687 9 +#> X8:Y1 669 27 +#> X8:Y2 683 13 +#> X8:Y3 684 12 +#> X8:Y4 689 7 +#> X8:Y5 673 23 +#> X8:Y6 685 11 +#> X8:Y7 684 12 +#> X9:Y1 687 9 +#> X9:Y2 690 6 +#> X9:Y3 686 10 +#> X9:Y4 678 18 +#> X9:Y6 685 11 +#> X9:Y7 691 5 +#> X9:Y8 686 10 +#> X9:Y9 689 7 +#>
    length(parti[[1]])
    #> [1] 64
    # Same in factor format for the application of get_small_tiles: +parti_fac <- partition_tiles(ecuador, nsplit = c(10, 10), reassign = FALSE, + return_factor = TRUE) +get_small_tiles(parti_fac[[1]], min_n = 20) # tiles with less than 20 samples
    #> [1] X2:Y9 X3:Y8 X5:Y5 X6:Y1 X6:Y4 X6:Y9 X9:Y7 X2:Y5 X4:Y10 X4:Y8 +#> [11] X5:Y7 X9:Y2 X2:Y6 X3:Y10 X5:Y2 X5:Y9 X6:Y2 X6:Y7 X8:Y4 X9:Y9 +#> [21] X1:Y7 X10:Y3 X7:Y3 X7:Y5 X3:Y6 X4:Y6 X5:Y10 X7:Y1 X7:Y6 X7:Y9 +#> [31] X9:Y1 X4:Y4 X5:Y6 X7:Y2 X9:Y3 X9:Y8 X10:Y2 X10:Y4 X4:Y5 X6:Y3 +#> [41] X6:Y8 X7:Y7 X7:Y8 X8:Y6 X9:Y6 X4:Y7 X5:Y3 X8:Y3 X8:Y7 X2:Y4 +#> [51] X4:Y9 X8:Y2 X7:Y4 X6:Y5 X1:Y8 X9:Y4 +#> 64 Levels: X1:Y7 X1:Y8 X10:Y2 X10:Y3 X10:Y4 X2:Y4 X2:Y5 X2:Y6 X2:Y7 ... X9:Y9
    parti2 <- partition_tiles(ecuador, nsplit = c(10, 10), reassign = TRUE, + min_n = 20, min_frac = 0) +length(parti2[[1]]) # < length(parti[[1]])
    #> [1] 31
    +
    + +
    + +
    + + +
    +

    Site built with pkgdown.

    +
    + +
    +
    + + + diff --git a/docs/reference/index.html b/docs/reference/index.html new file mode 100644 index 00000000..cad6f7f8 --- /dev/null +++ b/docs/reference/index.html @@ -0,0 +1,331 @@ + + + + + + + + +Function reference • sperrorest + + + + + + + + + + + + + + + + + + + + + + + + + + + +
    +
    + + + +
    + +
    +
    + + +
    + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
    +

    All functions

    +

    +
    +

    add.distance

    +

    Add distance information to resampling objects

    +

    as.represampling print is_represampling

    +

    Resampling objects with repetition, i.e. sets of partitionings or boostrap +samples

    +

    as.resampling validate.resampling is.resampling print

    +

    Resampling objects such as partitionings or bootstrap samples

    +

    as.tilename as.character as.numeric print

    +

    Alphanumeric tile names

    +

    dataset_distance

    +

    Calculate mean nearest-neighbour distance between point datasets

    +

    err_default

    +

    Default error function

    +

    get_small_tiles

    +

    Identify small partitions that need to be fixed.

    +

    partition_cv_strat

    +

    Partition the data for a stratified (non-spatial) cross-validation

    +

    partition_cv

    +

    Partition the data for a (non-spatial) cross-validation

    +

    partition_disc partition_loo

    +

    Leave-one-disc-out cross-validation and leave-one-out cross-validation

    +

    partition_factor_cv

    +

    Partition the data for a (non-spatial) k-fold cross-validation at the group +level

    +

    partition_factor

    +

    Partition the data for a (non-spatial) leave-one-factor-out cross-validation +based on a given, fixed partitioning

    +

    partition_kmeans

    +

    Partition samples spatially using k-means clustering of the coordinates

    +

    partition_tiles

    +

    Partition the study area into rectangular tiles

    +

    plot

    +

    Plot spatial resampling objects

    +

    represampling_bootstrap

    +

    Non-spatial bootstrap resampling

    +

    represampling_disc_bootstrap

    +

    Overlapping spatial block bootstrap using circular blocks

    +

    represampling_factor_bootstrap

    +

    Bootstrap at an aggregated level

    +

    represampling_kmeans_bootstrap

    +

    Spatial block bootstrap at the level of spatial k-means clusters

    +

    represampling_tile_bootstrap

    +

    Spatial block bootstrap using rectangular blocks

    +

    resample_factor

    +

    Draw uniform random (sub)sample at the group level

    +

    resample_strat_uniform

    +

    Draw stratified random sample

    +

    resample_uniform

    +

    Draw uniform random (sub)sample

    +

    sperrorest-package

    +

    Spatial Error Estimation and Variable Importance

    +

    sperrorest

    +

    Perform spatial error estimation and variable importance assessment +in parallel

    +

    summary

    +

    Summary statistics for a resampling objects

    +

    summary print

    +

    Summary and print methods for sperrorest results

    +

    summary

    +

    Summarize error statistics obtained by sperrorest

    +

    summary

    +

    Summarize variable importance statistics obtained by sperrorest

    +

    tile_neighbors

    +

    Determine the names of neighbouring tiles in a rectangular pattern

    +
    +
    + + +
    + +
    + + +
    +

    Site built with pkgdown.

    +
    + +
    +
    + + + diff --git a/docs/reference/maipo.html b/docs/reference/maipo.html new file mode 100644 index 00000000..c6008ceb --- /dev/null +++ b/docs/reference/maipo.html @@ -0,0 +1,130 @@ + + + + + + + + +Maipo dataset from Marco Pena — maipo • sperrorest + + + + + + + + + + + + + + + + + + + + + + + + + + + +
    +
    + + + +
    + +
    +
    + + + +

    Maipo dataset from Marco Pena

    + + + + +
    + +
    + +
    + + +
    +

    Site built with pkgdown.

    +
    + +
    +
    + + + diff --git a/docs/reference/partition_cv.html b/docs/reference/partition_cv.html new file mode 100644 index 00000000..0e7bb4a6 --- /dev/null +++ b/docs/reference/partition_cv.html @@ -0,0 +1,518 @@ + + + + + + + + +Partition the data for a (non-spatial) cross-validation — partition_cv • sperrorest + + + + + + + + + + + + + + + + + + + + + + + + + + + +
    +
    + + + +
    + +
    +
    + + + +

    partition_cv creates a represampling object for +length(repetition)-repeated nfold-fold cross-validation.

    + + +
    partition_cv(data, coords = c("x", "y"), nfold = 10, repetition = 1,
    +  seed1 = NULL, return_factor = FALSE)
    + +

    Arguments

    + + + + + + + + + + + + + + + + + + + + + + + + + + +
    data

    data.frame containing at least the columns specified +by coords

    coords

    (ignored by partition_cv)

    nfold

    number of partitions (folds) in nfold-fold +cross-validation partitioning

    repetition

    numeric vector: cross-validation repetitions +to be generated. Note that this is not the number of repetitions, +but the indices of these repetitions. E.g., use repetition = c(1:100) +to obtain (the 'first') 100 repetitions, and repetition = c(101:200) +to obtain a different set of 100 repetitions.

    seed1

    seed1+i is the random seed that will be used by +set.seed in repetition i (i in repetition) +to initialize the random number generator before sampling from the data set.

    return_factor

    if FALSE (default), return a +represampling object; if TRUE (used internally by +other sperrorest functions), return a list containing factor +vectors (see Value)

    + +

    Value

    + +

    If return_factor = FALSE (the default), a +represampling object. Specifically, this is a (named) list of +length(repetition) resampling objects. +Each of these resampling objects is a list of length +nfold corresponding to the folds. +Each fold is represented by a list of containing the components train +and test, specifying the indices of training and test samples +(row indices for data). +If return_factor = TRUE (mainly used internally), a (named) list of +length length(repetition). +Each component of this list is a vector of length nrow(data) of type +factor, specifying for each sample the fold to which it belongs. +The factor levels are factor(1:nfold).

    + +

    Details

    + +

    This function does not actually perform a cross-validation +or partition the data set itself; it simply creates a data structure +containing the indices of training and test samples.

    + +

    See also

    + +

    sperrorest, represampling

    + + +

    Examples

    +
    data(ecuador) +## non-spatial cross-validation: +resamp <- partition_cv(ecuador, nfold = 5, repetition = 5) +# plot(resamp, ecuador) +# first repetition, second fold, test set indices: +idx <- resamp[['1']][[2]]$test +# test sample used in this particular repetition and fold: +ecuador[idx , ]
    #> x y dem slope hcurv vcurv carea +#> 4965 715042.5 9559312 2320.49 42.857816 -0.01106 -0.04634 500.5027 +#> 37912 712802.5 9559952 1838.40 52.101344 0.00183 -0.09203 634.3320 +#> 31357 714752.5 9561022 1848.22 33.446411 -0.00347 0.02357 1752.0375 +#> 25090 715362.5 9560102 2059.29 49.119672 0.02059 -0.00628 556.0121 +#> 40756 714022.5 9558862 2331.20 45.085476 -0.00075 0.00475 1001.0861 +#> 14254 714852.5 9557882 2680.00 23.335425 0.00479 0.01261 323.8441 +#> 24072 715282.5 9557602 2837.46 34.394083 -0.02191 -0.00579 2246.7725 +#> 34512 713162.5 9559632 2041.52 46.815236 -0.00857 0.03677 1675.4679 +#> 24129 714602.5 9560542 2038.30 25.584857 0.00065 0.00005 769.3426 +#> 9669 713852.5 9558612 2308.97 52.180412 -0.01059 -0.07431 4079.0366 +#> 39885 715062.5 9561022 1840.94 38.435728 0.00250 -0.01340 604.1032 +#> 32622 714972.5 9557762 2687.38 30.556412 -0.00730 -0.01491 1844.4353 +#> 38993 714202.5 9558402 2475.75 33.159359 -0.01607 0.00677 1343.5006 +#> 22814 713862.5 9558582 2327.48 48.612031 -0.02894 -0.03416 947.4878 +#> 42598 713542.5 9559972 2184.39 35.241488 -0.00841 -0.00010 1527.1028 +#> 38287 714872.5 9561162 1719.41 48.235598 -0.03021 -0.09328 2235.5259 +#> 39061 712922.5 9559912 1931.50 36.315211 0.00414 -0.00104 1641.4587 +#> 30800 715182.5 9557582 2772.37 39.204064 -0.02475 0.00665 25318.3555 +#> 38410 713712.5 9561172 1776.08 36.319222 -0.02678 -0.04272 4955.5586 +#> 48498 714972.5 9557972 2681.66 38.346919 0.00528 0.01392 1255.0884 +#> 35285 714602.5 9560112 2142.75 51.107835 -0.00017 -0.00032 994.6740 +#> 29472 714792.5 9561072 1786.08 36.173117 -0.01029 -0.02312 2657.6936 +#> 22933 712712.5 9560432 1902.17 38.123466 -0.00746 0.00096 3210.5781 +#> 38032 714762.5 9560442 1988.33 35.169868 -0.00905 -0.01385 674.0721 +#> 14522 715012.5 9557732 2721.34 35.181327 -0.00382 0.00672 1114.8801 +#> 31576 713942.5 9558492 2387.12 46.143156 -0.05634 -0.02626 4316.0977 +#> 20290 714812.5 9561092 1775.73 34.626704 -0.01537 -0.00203 10263.2734 +#> 27978 713552.5 9558532 2373.05 17.044921 0.00832 0.00838 238.6685 +#> 15340 715452.5 9558852 2523.95 34.165473 0.00151 0.00899 1394.2274 +#> 4827 714962.5 9559882 2274.42 29.244912 0.00051 0.00889 362.8363 +#> 6808 715192.5 9559492 2219.03 21.817660 0.00713 -0.00173 8967.1924 +#> 18391 715112.5 9559432 2235.02 22.257691 -0.02651 -0.03629 360379.1875 +#> 8073 712572.5 9560302 1917.62 52.009098 0.01937 0.00083 994.7876 +#> 27957 715012.5 9559332 2306.35 20.480376 0.00225 0.00325 616.8652 +#> 15390 713132.5 9560622 1873.26 38.480418 0.00252 0.01118 2359.5059 +#> 22735 714492.5 9559772 2170.30 39.688786 -0.00440 -0.02651 419.0927 +#> 46756 713372.5 9559192 2120.47 46.005073 0.03696 0.02514 297.7842 +#> 2858 714292.5 9559332 2335.04 42.355905 0.04547 0.03323 183.1836 +#> 34563 715022.5 9558162 2675.70 18.889846 -0.01993 0.01373 16031.7500 +#> 46235 712712.5 9561042 2023.11 41.277025 0.01458 0.00672 565.1107 +#> 27084 713632.5 9558472 2380.37 23.135463 -0.01752 -0.00517 1000.7119 +#> 5578 714962.5 9557612 2751.95 39.191459 0.00288 0.01452 437.3965 +#> 2976 715122.5 9559142 2390.05 33.167954 0.01008 0.00102 483.4184 +#> 34707 713962.5 9561092 1819.49 65.686173 -0.04749 0.04779 2972.0193 +#> 20220 714052.5 9558522 2378.67 39.656128 -0.00855 -0.02175 1122.9120 +#> 21952 715352.5 9560172 2013.44 40.488063 0.02409 0.02771 674.3859 +#> 25244 713472.5 9559092 2137.03 18.348973 -0.10372 -0.00008 825875.6875 +#> 26698 713382.5 9559082 2155.56 34.264595 -0.00211 0.00241 585.6182 +#> 19887 714662.5 9559632 2266.47 45.179441 -0.02797 -0.00063 576.2819 +#> 38862 712952.5 9559892 1950.86 37.721822 -0.00048 -0.00332 1418.6622 +#> 2595 714912.5 9557662 2700.72 24.194862 -0.02681 -0.01319 3457.7519 +#> 32909 712812.5 9560452 1883.85 38.251235 -0.00221 0.00121 4159.6738 +#> 10507 714892.5 9559282 2374.45 42.842346 -0.01057 0.02427 380.9472 +#> 7668 713372.5 9559062 2164.82 37.934963 -0.01807 -0.01883 616.1993 +#> 44825 714322.5 9557882 2534.80 40.688598 0.00114 0.03026 247.0894 +#> 46065 713072.5 9559162 2109.39 16.264553 -0.00195 -0.00656 3036.7383 +#> 7064 715592.5 9558662 2597.69 34.222196 0.00099 0.00240 1201.4724 +#> 2552 714942.5 9557652 2719.32 34.873649 -0.01238 -0.00851 1012.5963 +#> 45592 714962.5 9559862 2282.08 22.160862 -0.00028 0.01318 289.0063 +#> 41783 715572.5 9558652 2609.51 32.408212 0.00291 -0.00081 940.6002 +#> 34192 712702.5 9560102 1867.01 35.474682 0.00629 -0.00469 1021.5407 +#> 5239 714832.5 9560992 1859.70 19.960131 -0.00320 0.00180 3904.5254 +#> 31834 713492.5 9560682 1800.85 34.237093 -0.00472 -0.00228 2262.3796 +#> 18897 712802.5 9559882 1851.71 50.234648 -0.02627 -0.05393 8301.6094 +#> 39722 715022.5 9557712 2735.53 30.260766 0.00133 0.00367 806.0966 +#> 30268 714172.5 9558612 2328.84 54.950663 0.03743 -0.05242 507.9263 +#> 39166 712832.5 9560142 1922.46 39.660139 0.01469 0.02940 987.8085 +#> 33910 715202.5 9559572 2195.66 13.791667 -0.04376 -0.01644 1057765.6250 +#> 31117 712892.5 9558922 2129.84 30.550683 0.00521 -0.00870 1895.1399 +#> 47702 714692.5 9559712 2200.51 33.098626 -0.13038 -0.04212 46310.0195 +#> 20797 713012.5 9560392 1842.25 36.649818 0.01299 0.00141 4611.0718 +#> 34615 713242.5 9560732 1879.60 27.593647 0.00384 0.00175 1026.6818 +#> 30933 714672.5 9561082 1798.37 54.810289 0.00542 -0.00722 1373.4147 +#> 26428 713042.5 9560332 1895.15 33.959781 -0.01439 -0.01161 3065.3694 +#> 26101 715562.5 9558682 2594.87 32.411077 0.00079 0.00321 1170.5904 +#> 17058 714032.5 9558502 2402.95 31.352824 0.01924 0.02606 564.6876 +#> 28173 714562.5 9560382 2048.26 27.685894 -0.00124 -0.00456 1531.4274 +#> 2801 715252.5 9559612 2199.28 24.403418 0.03051 0.02190 264.6404 +#> 16071 714562.5 9560062 2166.69 28.696018 0.02281 0.04759 158.9370 +#> 22297 714642.5 9558172 2606.18 59.469008 -0.00495 -0.03756 353.9086 +#> 39624 715222.5 9559552 2205.54 23.897497 0.00401 -0.00391 1794.7964 +#> 750 712812.5 9560032 1847.09 54.959258 -0.00767 -0.03713 6379.7974 +#> 48650 715592.5 9558612 2626.74 31.876507 0.00010 0.00140 1066.3957 +#> 14494 715332.5 9558792 2557.22 20.720446 0.01267 0.00483 343.1384 +#> 44612 714932.5 9558822 2578.80 38.255246 0.01257 0.00044 288.5468 +#> 39405 713122.5 9559052 2169.30 37.646192 -0.02110 -0.02450 2589.9910 +#> 14950 713382.5 9559172 2111.91 48.106682 0.00915 -0.02705 413.5702 +#> 24723 712942.5 9560202 1983.18 33.071697 0.00285 -0.01325 595.0426 +#> 42617 714092.5 9558392 2469.21 42.799947 -0.01397 0.00526 706.1463 +#> 29024 713492.5 9559112 2173.47 53.432325 0.04864 0.04226 338.3474 +#> 28260 715312.5 9558302 2681.75 34.484038 -0.01125 -0.02685 2568.4631 +#> 22847 714582.5 9560382 2040.49 25.336194 0.00181 -0.00341 1363.4000 +#> 46792 713862.5 9559672 2272.03 35.464942 0.00560 0.00610 454.8103 +#> 46627 714852.5 9558932 2459.24 53.445503 0.02451 -0.03921 525.3245 +#> 49459 714692.5 9557342 2639.74 42.047081 0.00216 -0.02606 919.4241 +#> 37115 715612.5 9559202 2358.15 9.727677 -0.01020 0.00279 39495.1797 +#> 17227 714812.5 9558892 2545.43 54.071746 0.00458 0.04291 293.4231 +#> 12074 714562.5 9560372 2045.17 26.287304 -0.00144 -0.00856 1789.1596 +#> 47722 712732.5 9561022 1999.03 37.492639 0.00004 -0.02384 947.2075 +#> 4479 713982.5 9557812 2399.24 41.231189 0.01895 0.04744 226.7911 +#> 10213 715542.5 9558782 2532.10 35.413375 -0.00893 -0.00867 2247.8748 +#> 23832 714252.5 9560182 2188.12 30.647512 0.00427 0.02403 474.8613 +#> 3771 714842.5 9561152 1724.51 45.539831 0.02405 -0.05655 130.9468 +#> 21490 714672.5 9560382 2021.86 32.804126 -0.00783 -0.00367 1966.2112 +#> 39466 714942.5 9557772 2680.35 16.753286 0.00166 -0.01265 1148.6487 +#> 6470 713742.5 9558802 2260.79 30.680171 0.00000 0.00320 911.5911 +#> 10145 713782.5 9560932 1865.62 23.595548 -0.00203 0.00642 1688.1782 +#> 748 715102.5 9559692 2301.39 27.021836 -0.00759 0.00158 837.3730 +#> 34250 713822.5 9559142 2316.36 33.759819 -0.02284 0.01085 2229.5105 +#> 15558 714732.5 9559502 2283.66 53.431752 0.00390 -0.02831 454.9113 +#> 31071 713472.5 9558462 2298.33 29.153239 0.00156 -0.00116 1817.5222 +#> 18286 714792.5 9557882 2650.57 28.253695 -0.00124 0.00204 853.2902 +#> 26023 715832.5 9557632 3097.56 41.710182 0.00133 -0.00452 578.5847 +#> 20067 714242.5 9560442 2125.23 38.640274 0.00363 -0.00363 389.2691 +#> 26963 715702.5 9557882 2880.32 43.733296 -0.01174 -0.01466 2633.9666 +#> 19493 714252.5 9560372 2156.93 24.879546 0.00825 0.00445 355.5078 +#> 29125 714992.5 9558792 2584.41 24.665833 0.01253 -0.00333 422.8840 +#> 44396 715512.5 9558102 2845.31 26.598420 0.01263 0.03598 186.3527 +#> 20547 713732.5 9560822 1851.67 36.968383 -0.01405 0.00476 11992.5273 +#> 40139 712852.5 9560072 1907.89 43.704075 0.00001 0.00759 1880.4403 +#> 34616 714522.5 9558742 2568.77 34.997981 0.00990 0.02790 209.6952 +#> 32849 714302.5 9558222 2520.01 32.157256 -0.00846 0.00216 1412.2295 +#> 21925 714542.5 9559232 2314.33 50.247826 0.00737 0.01033 3446.5520 +#> 14800 712952.5 9558682 2120.89 38.273581 -0.00022 0.00662 1269.2650 +#> 8895 714132.5 9557692 2487.19 37.364297 0.03423 -0.00983 270.7949 +#> 47059 713822.5 9558272 2413.20 38.175032 -0.00687 -0.00223 1056.3701 +#> 34105 715432.5 9558592 2621.73 18.956309 -0.00301 -0.01099 1146.2140 +#> 29835 715402.5 9558512 2610.14 27.118092 -0.01322 -0.00759 4131.5356 +#> 13908 715022.5 9559372 2290.81 27.242997 -0.03004 0.00714 480535.5000 +#> 35038 715092.5 9558152 2715.69 40.183822 0.00270 -0.00250 2225.4426 +#> 42424 714692.5 9560902 1921.34 25.331037 0.01154 -0.00204 618.2027 +#> 41612 713952.5 9560002 2198.00 28.756179 0.00119 0.00361 1011.0248 +#> 1479 715622.5 9557952 2888.67 34.953863 0.01635 -0.00105 868.7802 +#> 14671 712672.5 9560372 1872.21 39.197189 -0.00873 -0.01417 1520.6630 +#> 32430 715832.5 9558112 2824.39 23.694097 0.00944 0.00836 250.5111 +#> 484 713812.5 9560592 2016.41 44.301097 -0.00719 0.00079 1647.6451 +#> 44434 713022.5 9558762 2161.07 31.454237 -0.00273 -0.01047 2826.7700 +#> 9007 714952.5 9560902 1863.75 32.770321 0.00074 -0.02904 608.3561 +#> 280 713182.5 9560632 1862.83 39.004102 0.00745 0.00875 2156.9358 +#> 4943 712452.5 9559172 1927.46 34.465130 0.00105 -0.01576 1045.1074 +#> 10981 713832.5 9560022 2125.33 34.848439 -0.06909 -0.01151 143213.1562 +#> 16845 712772.5 9560022 1831.76 21.399974 -0.00865 -0.00805 7489.7417 +#> 14881 715482.5 9559232 2376.03 29.905532 -0.01137 -0.01523 888.2811 +#> 33881 712992.5 9558822 2173.26 33.197748 -0.00595 0.03064 1485.5201 +#> 35257 713822.5 9557982 2350.80 29.307937 0.01549 -0.00169 649.9975 +#> 39883 712732.5 9560842 2082.69 25.090968 0.00732 0.01858 298.4490 +#> 6577 714142.5 9559902 2244.30 57.890510 -0.06986 -0.05013 616.4128 +#> 32342 714862.5 9559622 2294.74 36.967237 0.01021 0.00109 915.4989 +#> 13548 715812.5 9558122 2821.65 15.508249 0.02751 0.00388 180.4414 +#> 6206 714122.5 9560992 1917.28 37.969913 -0.01023 -0.00227 841.2025 +#> cslope distroad slides distdeforest distslidespast log.carea +#> 4965 33.9059234 300.00 TRUE 300.00 100 2.699406 +#> 37912 30.2945705 300.00 TRUE 9.15 2 2.802317 +#> 31357 23.8172826 158.92 TRUE 0.00 5 3.243543 +#> 25090 43.5316144 300.00 TRUE 300.00 26 2.745084 +#> 40756 39.3352715 300.00 TRUE 300.00 100 3.000471 +#> 14254 15.6652391 300.00 TRUE 300.00 10 2.510336 +#> 24072 37.6668184 300.00 TRUE 300.00 100 3.351559 +#> 34512 34.6398824 300.00 TRUE 195.00 2 3.224136 +#> 24129 24.1289716 300.00 TRUE 300.00 89 2.886120 +#> 9669 31.6645125 300.00 TRUE 300.00 1 3.610558 +#> 39885 6.0848118 210.57 TRUE 0.00 100 2.781111 +#> 32622 30.1713845 300.00 TRUE 300.00 100 3.265863 +#> 38993 29.5485794 300.00 TRUE 300.00 2 3.128238 +#> 22814 34.9120373 300.00 TRUE 300.00 6 2.976574 +#> 42598 20.0689927 300.00 TRUE 247.02 100 3.183868 +#> 38287 14.8969027 41.43 TRUE 1.90 100 3.349380 +#> 39061 35.2409151 300.00 TRUE 70.65 56 3.215230 +#> 30800 33.2481679 300.00 TRUE 300.00 100 4.403435 +#> 38410 8.7610976 69.52 TRUE 47.61 65 3.695093 +#> 48498 33.6182986 300.00 TRUE 300.00 100 3.098674 +#> 35285 42.4664859 300.00 TRUE 300.00 100 2.997681 +#> 29472 18.6486303 111.09 TRUE 0.00 25 3.424505 +#> 22933 33.0631025 87.56 TRUE 115.40 2 3.506583 +#> 38032 35.0082942 300.00 TRUE 300.00 60 2.828706 +#> 14522 28.3986531 300.00 TRUE 300.00 100 3.047228 +#> 31576 26.9353189 300.00 TRUE 300.00 4 3.635091 +#> 20290 15.4566824 95.53 TRUE 0.00 49 4.011286 +#> 27978 14.3285285 300.00 TRUE 300.00 12 2.377795 +#> 15340 27.9752373 300.00 TRUE 300.00 100 3.144334 +#> 4827 21.0739607 300.00 TRUE 300.00 0 2.559711 +#> 6808 28.5659568 300.00 TRUE 300.00 40 3.952656 +#> 18391 34.5075291 300.00 TRUE 300.00 100 5.556760 +#> 8073 36.0757146 138.71 TRUE 76.41 35 2.997730 +#> 27957 30.0602307 300.00 TRUE 300.00 75 2.790190 +#> 15390 29.3222611 60.00 TRUE 118.92 2 3.372821 +#> 22735 36.5168921 300.00 TRUE 300.00 90 2.622310 +#> 46756 38.9617030 300.00 TRUE 300.00 37 2.473902 +#> 2858 35.3194103 300.00 TRUE 300.00 2 2.262887 +#> 34563 38.1566973 300.00 TRUE 300.00 100 4.204981 +#> 46235 30.1083592 300.00 TRUE 41.37 0 2.752133 +#> 27084 21.6727016 300.00 TRUE 300.00 81 3.000309 +#> 5578 25.7418478 300.00 TRUE 300.00 100 2.640875 +#> 2976 26.8820338 300.00 TRUE 300.00 100 2.684323 +#> 34707 27.7025094 165.13 TRUE 39.56 100 3.473052 +#> 20220 32.1755909 300.00 TRUE 300.00 8 3.050346 +#> 21952 40.0915758 300.00 TRUE 300.00 64 2.828908 +#> 25244 33.2189470 300.00 TRUE 300.00 6 5.916915 +#> 26698 34.2496981 300.00 TRUE 300.00 7 2.767615 +#> 19887 38.0621593 300.00 TRUE 300.00 63 2.760635 +#> 38862 34.9000052 300.00 TRUE 104.48 92 3.151879 +#> 2595 30.4395288 300.00 TRUE 300.00 100 3.538794 +#> 32909 28.3791726 118.07 TRUE 98.15 0 3.619059 +#> 10507 27.1622102 300.00 TRUE 300.00 46 2.580865 +#> 7668 36.3203676 300.00 TRUE 300.00 5 2.789721 +#> 44825 29.0214582 300.00 TRUE 300.00 100 2.392854 +#> 46065 29.7708870 300.00 TRUE 251.11 100 3.482407 +#> 7064 29.1326121 300.00 TRUE 300.00 100 3.079714 +#> 2552 32.2156979 300.00 TRUE 300.00 100 3.005436 +#> 45592 15.6486233 300.00 TRUE 300.00 2 2.460907 +#> 41783 28.1860858 300.00 TRUE 300.00 100 2.973405 +#> 34192 33.8749837 273.94 TRUE 4.48 85 3.009256 +#> 5239 24.6652601 197.29 TRUE 4.67 57 3.591568 +#> 31834 24.6469255 215.68 TRUE 0.00 100 3.354565 +#> 18897 31.7882714 300.00 TRUE 20.00 5 3.919162 +#> 39722 26.5416969 300.00 TRUE 300.00 100 2.906387 +#> 30268 47.2724558 300.00 TRUE 300.00 41 2.705801 +#> 39166 29.8184425 300.00 TRUE 1.90 90 2.994673 +#> 33910 32.6734912 300.00 TRUE 300.00 6 6.024389 +#> 31117 26.6459752 300.00 TRUE 291.23 100 3.277641 +#> 47702 33.6664271 300.00 TRUE 300.00 86 4.665675 +#> 20797 32.0569886 285.07 TRUE 0.00 96 3.663802 +#> 34615 34.1895376 17.57 TRUE 142.95 16 3.011436 +#> 30933 12.8434219 99.29 TRUE 0.00 6 3.137802 +#> 26428 35.4752548 300.00 TRUE 0.00 100 3.486483 +#> 26101 28.6461709 300.00 TRUE 300.00 100 3.068405 +#> 17058 26.3228270 300.00 TRUE 300.00 8 2.751808 +#> 28173 23.8837457 300.00 TRUE 300.00 59 3.185096 +#> 2801 25.7464315 300.00 TRUE 300.00 26 2.422656 +#> 16071 29.6213451 300.00 TRUE 300.00 100 2.201225 +#> 22297 46.6496507 300.00 TRUE 300.00 11 2.548891 +#> 39624 33.2945775 300.00 TRUE 300.00 2 3.254015 +#> 750 34.3471009 300.00 TRUE 0.00 0 3.804807 +#> 48650 27.0522023 300.00 TRUE 300.00 100 3.027918 +#> 14494 21.1851145 300.00 TRUE 300.00 28 2.535469 +#> 44612 34.6914486 300.00 TRUE 300.00 100 2.460216 +#> 39405 27.7202711 300.00 TRUE 300.00 100 3.413298 +#> 14950 41.8740475 300.00 TRUE 300.00 15 2.616549 +#> 24723 28.1734807 300.00 TRUE 36.63 100 2.774548 +#> 42617 31.2319294 300.00 TRUE 300.00 100 2.848895 +#> 29024 42.5730560 300.00 TRUE 300.00 32 2.529363 +#> 28260 31.8335351 300.00 TRUE 300.00 100 3.409673 +#> 22847 24.7477660 300.00 TRUE 300.00 41 3.134623 +#> 46792 27.6010959 300.00 TRUE 300.00 18 2.657830 +#> 46627 53.6832806 300.00 TRUE 300.00 100 2.720428 +#> 49459 41.8568588 300.00 TRUE 300.00 100 2.963516 +#> 37115 32.1870501 300.00 TRUE 300.00 100 4.596544 +#> 17227 43.5975045 300.00 TRUE 300.00 100 2.467494 +#> 12074 23.2855778 300.00 TRUE 300.00 63 3.252649 +#> 47722 33.1158146 300.00 TRUE 65.02 1 2.976445 +#> 4479 32.0117250 300.00 TRUE 300.00 55 2.355626 +#> 10213 31.5986224 300.00 TRUE 300.00 100 3.351772 +#> 23832 25.4851627 300.00 TRUE 300.00 8 2.676567 +#> 3771 0.3294507 44.23 TRUE 1.67 94 2.117095 +#> 21490 34.0600491 300.00 TRUE 300.00 1 3.293630 +#> 39466 27.2515916 300.00 TRUE 300.00 100 3.060187 +#> 6470 28.0932666 300.00 TRUE 300.00 2 2.959800 +#> 10145 27.4160305 276.16 FALSE 91.33 100 3.227418 +#> 748 38.1326331 300.00 FALSE 300.00 12 2.922919 +#> 34250 30.8927384 300.00 FALSE 300.00 100 3.348210 +#> 15558 50.3440826 300.00 FALSE 300.00 45 2.657927 +#> 31071 31.7378511 300.00 FALSE 300.00 76 3.259480 +#> 18286 24.3507063 300.00 FALSE 300.00 42 2.931097 +#> 26023 45.0728709 300.00 FALSE 300.00 100 2.762367 +#> 20067 29.8235992 300.00 FALSE 268.03 100 2.590250 +#> 26963 42.0631236 300.00 FALSE 300.00 100 3.420610 +#> 19493 18.5724269 300.00 FALSE 291.23 100 2.550849 +#> 29125 23.9450522 300.00 FALSE 300.00 100 2.626221 +#> 44396 21.4423725 300.00 FALSE 300.00 100 2.270336 +#> 20547 34.2611573 300.00 FALSE 42.56 100 4.078911 +#> 40139 33.1433166 300.00 FALSE 1.11 21 3.274260 +#> 34616 31.4995007 300.00 FALSE 300.00 100 2.321589 +#> 32849 29.5520172 300.00 FALSE 300.00 45 3.149905 +#> 21925 31.2485452 300.00 FALSE 300.00 95 3.537385 +#> 14800 31.5229920 300.00 FALSE 300.00 100 3.103552 +#> 8895 38.0776291 300.00 FALSE 300.00 90 2.432641 +#> 47059 25.7596095 300.00 FALSE 300.00 100 3.023816 +#> 34105 20.0770141 300.00 FALSE 300.00 26 3.059266 +#> 29835 28.8530086 300.00 FALSE 300.00 45 3.616112 +#> 13908 34.3774677 300.00 FALSE 300.00 96 5.681725 +#> 35038 40.2038755 300.00 FALSE 300.00 100 3.347416 +#> 42424 22.6071957 275.21 FALSE 93.50 70 2.791131 +#> 41612 25.5871492 300.00 FALSE 300.00 100 3.004762 +#> 1479 43.6765727 300.00 FALSE 300.00 100 2.938910 +#> 14671 37.8822505 123.04 FALSE 75.11 0 3.182033 +#> 32430 22.8604431 300.00 FALSE 300.00 100 2.398827 +#> 484 32.1457971 300.00 FALSE 55.63 100 3.216864 +#> 44434 36.3197946 300.00 FALSE 300.00 100 3.451290 +#> 9007 30.9643581 300.00 FALSE 14.98 35 2.784158 +#> 280 29.3234070 60.55 FALSE 127.54 29 3.333837 +#> 4943 37.7716697 300.00 FALSE 300.00 100 3.019161 +#> 10981 29.1664166 300.00 FALSE 300.00 100 5.155983 +#> 16845 22.8827884 300.00 FALSE 25.55 33 3.874467 +#> 14881 26.8562507 300.00 FALSE 300.00 100 2.948550 +#> 33881 29.1858971 300.00 FALSE 300.00 100 3.171879 +#> 35257 30.8818522 300.00 FALSE 300.00 79 2.812912 +#> 39883 17.0443485 233.81 FALSE 219.23 100 2.474870 +#> 6577 42.4074712 300.00 FALSE 300.00 70 2.789872 +#> 32342 37.6198359 300.00 FALSE 300.00 81 2.961658 +#> 13548 16.9784583 300.00 FALSE 300.00 100 2.256336 +#> 6206 32.4632157 294.31 FALSE 35.90 100 2.924901
    +
    + +
    + +
    + + +
    +

    Site built with pkgdown.

    +
    + +
    +
    + + + diff --git a/docs/reference/partition_cv_strat.html b/docs/reference/partition_cv_strat.html new file mode 100644 index 00000000..d2de550f --- /dev/null +++ b/docs/reference/partition_cv_strat.html @@ -0,0 +1,213 @@ + + + + + + + + +Partition the data for a stratified (non-spatial) cross-validation — partition_cv_strat • sperrorest + + + + + + + + + + + + + + + + + + + + + + + + + + + +
    +
    + + + +
    + +
    +
    + + + +

    partition_cv_strat creates a set of sample indices corresponding +to cross-validation test and training sets.

    + + +
    partition_cv_strat(data, coords = c("x", "y"), nfold = 10,
    +  return_factor = FALSE, repetition = 1, seed1 = NULL, strat)
    + +

    Arguments

    + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
    data

    data.frame containing at least the columns specified +by coords

    coords

    vector of length 2 defining the variables in data that +contain the x and y coordinates of sample locations

    nfold

    number of partitions (folds) in nfold-fold +cross-validation partitioning

    return_factor

    if FALSE (default), return a +represampling object; if TRUE (used internally by +other sperrorest functions), return a list containing factor +vectors (see Value)

    repetition

    numeric vector: cross-validation repetitions +to be generated. Note that this is not the number of repetitions, +but the indices of these repetitions. E.g., use repetition = c(1:100) +to obtain (the 'first') 100 repetitions, and repetition = c(101:200) +to obtain a different set of 100 repetitions.

    seed1

    seed1+i is the random seed that will be used by +set.seed in repetition i (i in repetition) +to initialize the random number generator before sampling from the data set.

    strat

    character: column in data containing a factor variable +over which the partitioning should be stratified; or factor vector of length +nrow(data): variable over which to stratify

    + +

    Value

    + +

    A represampling object, see also +partition_cv. partition_strat_cv, however, +stratified with respect to the variable data[,strat]; +i.e., cross-validation partitioning is done within each set +data[data[,strat]==i,] (i in levels(data[, strat])), and +the ith folds of all levels are combined into one cross-validation +fold.

    + +

    See also

    + +

    sperrorest, as.resampling, +resample_strat_uniform

    + + +

    Examples

    +
    data(ecuador) +parti <- partition_cv_strat(ecuador, strat = 'slides', nfold = 5, +repetition = 1) +idx <- parti[['1']][[1]]$train +mean(ecuador$slides[idx] == 'TRUE') / mean(ecuador$slides == 'TRUE')
    #> [1] 0.9996672
    # always == 1 +# Non-stratified cross-validation: +parti <- partition_cv(ecuador, nfold = 5, repetition = 1) +idx <- parti[['1']][[1]]$train +mean(ecuador$slides[idx] == 'TRUE') / mean(ecuador$slides == 'TRUE')
    #> [1] 1.002166
    # close to 1 because of large sample size, but with some random variation +
    +
    + +
    + +
    + + +
    +

    Site built with pkgdown.

    +
    + +
    +
    + + + diff --git a/docs/reference/partition_disc.html b/docs/reference/partition_disc.html new file mode 100644 index 00000000..29e58eed --- /dev/null +++ b/docs/reference/partition_disc.html @@ -0,0 +1,278 @@ + + + + + + + + +Leave-one-disc-out cross-validation and leave-one-out cross-validation — partition_disc • sperrorest + + + + + + + + + + + + + + + + + + + + + + + + + + + +
    +
    + + + +
    + +
    +
    + + + +

    partition_disc partitions the sample into training and tests set by +selecting circular test areas (possibly surrounded by an exclusion buffer) +and using the remaining samples as training samples (leave-one-disc-out +cross-validation). partition_loo creates training and test sets for +leave-one-out cross-validation with (optional) buffer.

    + + +
    partition_disc(data, coords = c("x", "y"), radius, buffer = NULL,
    +  ndisc = nrow(data), seed1 = NULL, return_train = TRUE, prob = NULL,
    +  replace = FALSE, repetition = 1)
    +
    +partition_loo(data, ndisc = nrow(data), replace = FALSE, ...)
    + +

    Arguments

    + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
    data

    data.frame containing at least the columns specified +by coords

    coords

    vector of length 2 defining the variables in data that +contain the x and y coordinates of sample locations.

    radius

    radius of test area discs; performs leave-one-out resampling +if radius <0.

    buffer

    radius of additional 'neutral area' around test area discs +that is excluded from training and test sets; defaults to 0, +i.e. all samples are either in the test area or in the training area.

    ndisc

    Number of discs to be randomly selected; each disc constitutes +a separate test set. Defaults to nrow(data), i.e. one disc around +each sample.

    seed1

    seed1+i is the random seed that will be used by +set.seed in repetition i (i in repetition) +to initialize the random number generator before sampling from the data set.

    return_train

    If FALSE, returns only test sample; +if TRUE, also the training area.

    prob

    optional argument to sample.

    replace

    optional argument to sample: sampling with or +without replacement?

    repetition

    see partition_cv; however, +see Note below: repetition should normally be = 1 in this function.

    ...

    arguments to be passed to partition_disc

    + +

    Value

    + +

    A represampling object. +Contains length(repetition) resampling objects. +Each of these contains ndisc lists with indices of test and +(if return_train = TRUE) training sets.

    + +

    Note

    + +

    Test area discs are centered at (random) samples, not at general +random locations. Test area discs may (and likely will) overlap independently +of the value of replace. replace only controls the replacement +of the center point of discs when drawing center points from the samples. + radius < 0 does leave-one-out resampling with an optional buffer. +radius = 0 is similar except that samples with identical coordinates +would fall within the test area disc.

    + +

    References

    + +

    Brenning, A. 2005. Spatial prediction models for landslide +hazards: review, comparison and evaluation. Natural Hazards and Earth System +Sciences, 5(6): 853-862.

    + +

    See also

    + +

    sperrorest, partition_cv, +partition_kmeans

    + + +

    Examples

    +
    data(ecuador) +parti <- partition_disc(ecuador, radius = 200, buffer = 200, + ndisc = 5, repetition = 1:2) +# plot(parti,ecuador) +summary(parti)
    #> $`1` +#> n.train n.test +#> 635 718 6 +#> 44 727 9 +#> 263 723 24 +#> 28 727 6 +#> 129 708 17 +#> +#> $`2` +#> n.train n.test +#> 70 712 6 +#> 594 708 13 +#> 412 711 13 +#> 250 729 5 +#> 689 725 5 +#>
    +# leave-one-out with buffer: +parti.loo <- partition_loo(ecuador, buffer = 200) +summary(parti)
    #> $`1` +#> n.train n.test +#> 635 718 6 +#> 44 727 9 +#> 263 723 24 +#> 28 727 6 +#> 129 708 17 +#> +#> $`2` +#> n.train n.test +#> 70 712 6 +#> 594 708 13 +#> 412 711 13 +#> 250 729 5 +#> 689 725 5 +#>
    +
    + +
    + +
    + + +
    +

    Site built with pkgdown.

    +
    + +
    +
    + + + diff --git a/docs/reference/partition_factor.html b/docs/reference/partition_factor.html new file mode 100644 index 00000000..65ec430a --- /dev/null +++ b/docs/reference/partition_factor.html @@ -0,0 +1,214 @@ + + + + + + + + +Partition the data for a (non-spatial) leave-one-factor-out cross-validation +based on a given, fixed partitioning — partition_factor • sperrorest + + + + + + + + + + + + + + + + + + + + + + + + + + + +
    +
    + + + +
    + +
    +
    + + + +

    partition_factor creates a represampling object, i.e. a set of sample +indices defining cross-validation test and training sets.

    + + +
    partition_factor(data, coords = c("x", "y"), fac, return_factor = FALSE,
    +  repetition = 1)
    + +

    Arguments

    + + + + + + + + + + + + + + + + + + + + + + +
    data

    data.frame containing at least the columns specified +by coords

    coords

    vector of length 2 defining the variables in data that +contain the x and y coordinates of sample locations.

    fac

    either the name of a variable (column) in data, or a vector +of type factor and length nrow(data) that contains the partitions +to be used for defining training and test samples.

    return_factor

    if FALSE (default), return a +represampling object; if TRUE (used internally by +other sperrorest functions), return a list containing factor +vectors (see Value)

    repetition

    numeric vector: cross-validation repetitions +to be generated. Note that this is not the number of repetitions, +but the indices of these repetitions. E.g., use repetition = c(1:100) +to obtain (the 'first') 100 repetitions, and repetition = c(101:200) +to obtain a different set of 100 repetitions.

    + +

    Value

    + +

    A represampling object, +see also partition_cv for details.

    + +

    Note

    + +

    In this partitioning approach, all repetitions are identical and +therefore pseudo-replications.

    + +

    See also

    + +

    sperrorest, partition_cv, +as.resampling.factor

    + + +

    Examples

    +
    data(ecuador) +# I don't recommend using this partitioning for cross-validation, +# this is only for demonstration purposes: +breaks <- quantile(ecuador$dem, seq(0, 1, length = 6)) +ecuador$zclass <- cut(ecuador$dem, breaks, include.lowest = TRUE) +summary(ecuador$zclass)
    #> [1.72e+03,1.92e+03] (1.92e+03,2.14e+03] (2.14e+03,2.31e+03] (2.31e+03,2.57e+03] +#> 151 150 150 150 +#> (2.57e+03,3.11e+03] +#> 150
    parti <- partition_factor(ecuador, fac = 'zclass') +# plot(parti,ecuador) +summary(parti)
    #> $`1` +#> n.train n.test +#> [1.72e+03,1.92e+03] 600 151 +#> (1.92e+03,2.14e+03] 601 150 +#> (2.14e+03,2.31e+03] 601 150 +#> (2.31e+03,2.57e+03] 601 150 +#> (2.57e+03,3.11e+03] 601 150 +#>
    +
    + +
    + +
    + + +
    +

    Site built with pkgdown.

    +
    + +
    +
    + + + diff --git a/docs/reference/partition_factor_cv.html b/docs/reference/partition_factor_cv.html new file mode 100644 index 00000000..326ce658 --- /dev/null +++ b/docs/reference/partition_factor_cv.html @@ -0,0 +1,210 @@ + + + + + + + + +Partition the data for a (non-spatial) k-fold cross-validation at the group +level — partition_factor_cv • sperrorest + + + + + + + + + + + + + + + + + + + + + + + + + + + +
    +
    + + + +
    + +
    +
    + + + +

    partition_factor_cv creates a represampling object, +i.e. a set of sample indices defining cross-validation test and +training sets, where partitions are obtained by resampling at the level of +groups of observations as defined by a given factor variable. +This can be used, for example, to resample agricultural data that is grouped +by fields, at the agricultural field level in order to preserve +spatial autocorrelation within fields.

    + + +
    partition_factor_cv(data, coords = c("x", "y"), fac, nfold = 10,
    +  repetition = 1, seed1 = NULL, return_factor = FALSE)
    + +

    Arguments

    + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
    data

    data.frame containing at least the columns specified +by coords

    coords

    vector of length 2 defining the variables in data +that contain the x and y coordinates of sample locations.

    fac

    either the name of a variable (column) in data, or a +vector of type factor and length nrow(data) that defines groups or +clusters of observations.

    nfold

    number of partitions (folds) in nfold-fold +cross-validation partitioning

    repetition

    numeric vector: cross-validation repetitions +to be generated. Note that this is not the number of repetitions, +but the indices of these repetitions. E.g., use repetition = c(1:100) +to obtain (the 'first') 100 repetitions, and repetition = c(101:200) +to obtain a different set of 100 repetitions.

    seed1

    seed1+i is the random seed that will be used by +set.seed in repetition i (i in repetition) +to initialize the random number generator before sampling from the data set.

    return_factor

    if FALSE (default), return a +represampling object; if TRUE (used internally by +other sperrorest functions), return a list containing factor +vectors (see Value)

    + +

    Value

    + +

    A represampling object, +see also partition_cv for details.

    + +

    Note

    + +

    In this partitioning approach, the number of factor levels in +fac must be large enough for this factor-level resampling to make +sense.

    + +

    See also

    + +

    sperrorest, partition_cv, +partition_factor, as.resampling.factor

    + + +
    + +
    + +
    + + +
    +

    Site built with pkgdown.

    +
    + +
    +
    + + + diff --git a/docs/reference/partition_kmeans.html b/docs/reference/partition_kmeans.html new file mode 100644 index 00000000..fe8b217e --- /dev/null +++ b/docs/reference/partition_kmeans.html @@ -0,0 +1,237 @@ + + + + + + + + +Partition samples spatially using k-means clustering of the coordinates — partition_kmeans • sperrorest + + + + + + + + + + + + + + + + + + + + + + + + + + + +
    +
    + + + +
    + +
    +
    + + + +

    partition_kmeans divides the study area into irregularly shaped +spatial partitions based on k-means (kmeans) clustering +of spatial coordinates.

    + + +
    partition_kmeans(data, coords = c("x", "y"), nfold = 10, repetition = 1,
    +  seed1 = NULL, return_factor = FALSE, balancing_steps = 1,
    +  order_clusters = TRUE, ...)
    + +

    Arguments

    + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
    data

    data.frame containing at least the columns specified +by coords

    coords

    vector of length 2 defining the variables in data that +contain the x and y coordinates of sample locations.

    nfold

    number of cross-validation folds, i.e. parameter k in +k-means clustering.

    repetition

    numeric vector: cross-validation repetitions +to be generated. Note that this is not the number of repetitions, +but the indices of these repetitions. E.g., use repetition = c(1:100) +to obtain (the 'first') 100 repetitions, and repetition = c(101:200) +to obtain a different set of 100 repetitions.

    seed1

    seed1+i is the random seed that will be used by +set.seed in repetition i (i in repetition) +to initialize the random number generator before sampling from the data set.

    return_factor

    if FALSE (default), return a +represampling object; if TRUE (used internally by +other sperrorest functions), return a list containing factor +vectors (see Value)

    balancing_steps

    if > 1, perform nfold-means clustering +balancing_steps times, and pick the clustering that minimizes the Gini +index of the sample size distribution among the partitions. The idea is that +'degenerate' partitions will be avoided, but this also has the side effect of +reducing variation among partitioning repetitions. More meaningful +constraints (e.g., minimum number of positive and negative samples within +each partition should be added in the future.

    order_clusters

    if TRUE, clusters are ordered by increasing x +coordinate of center point.

    ...

    additional arguments to kmeans.

    + +

    Value

    + +

    A represampling object, see also +partition_cv for details.

    + +

    Note

    + +

    Default parameter settings may change in future releases.

    + +

    References

    + +

    Brenning, A., Long, S., & Fieguth, P. (2012). +Detecting rock glacier flow structures using Gabor filters and IKONOS +imagery. Remote Sensing of Environment, 125, 227-237. +doi:10.1016/j.rse.2012.07.005

    +

    Russ, G. & A. Brenning. 2010a. Data mining in precision agriculture: +Management of spatial information. In 13th International Conference on +Information Processing and Management of Uncertainty, +IPMU 2010; Dortmund; 28 June - 2 July 2010. +Lecture Notes in Computer Science, 6178 LNAI: 350-359.

    + +

    See also

    + +

    sperrorest, partition_cv, +partition_disc, partition_tiles, +kmeans

    + + +

    Examples

    +
    data(ecuador) +resamp <- partition_kmeans(ecuador, nfold = 5, repetition = 2) +# plot(resamp, ecuador)
    +
    + +
    + +
    + + +
    +

    Site built with pkgdown.

    +
    + +
    +
    + + + diff --git a/docs/reference/partition_tiles.html b/docs/reference/partition_tiles.html new file mode 100644 index 00000000..a72d7c3b --- /dev/null +++ b/docs/reference/partition_tiles.html @@ -0,0 +1,315 @@ + + + + + + + + +Partition the study area into rectangular tiles — partition_tiles • sperrorest + + + + + + + + + + + + + + + + + + + + + + + + + + + +
    +
    + + + +
    + +
    +
    + + + +

    partition_tiles divides the study area into a specified number of +rectangular tiles. Optionally small partitions can be merged with adjacent +tiles to achieve a minimum number or percentage of samples in each tile.

    + + +
    partition_tiles(data, coords = c("x", "y"), dsplit = NULL, nsplit = NULL,
    +  rotation = c("none", "random", "user"), user_rotation, offset = c("none",
    +  "random", "user"), user_offset, reassign = TRUE, min_frac = 0.025,
    +  min_n = 5, iterate = 1, return_factor = FALSE, repetition = 1,
    +  seed1 = NULL)
    + +

    Arguments

    + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
    data

    data.frame containing at least the columns specified +by coords

    coords

    vector of length 2 defining the variables in data that +contain the x and y coordinates of sample locations

    dsplit

    optional vector of length 2: equidistance of splits in +(possibly rotated) x direction (dsplit[1]) and y direction +(dsplit[2]) used to define tiles. If dsplit is of length 1, +its value is recycled. Either dsplit or nsplit must be specified.

    nsplit

    optional vector of length 2: number of splits in +(possibly rotated) x direction (nsplit[1]) and y direction +(nsplit[2]) used to define tiles. If nsplit is of length 1, +its value is recycled.

    rotation

    indicates whether and how the rectangular grid should +be rotated; random rotation is only between -45 and +45 degrees.

    user_rotation

    if rotation='user', angles (in degrees) by which +the rectangular grid is to be rotated in each repetition. Either a vector of +same length as repetition, or a single number that will be replicated +length(repetition) times.

    offset

    indicates whether and how the rectangular grid should be +shifted by an offset.

    user_offset

    if offset='user', a list (or vector) of two +components specifying a shift of the rectangular grid in (possibly rotated) x +and y direction. The offset values are relative values, a value of 0.5 +resulting in a one-half tile shift towards the left, or upward. +If this is a list, its first (second) component refers to the rotated x (y) +direction, and both components must have same length as repetition +(or length 1). If a vector of length 2 (or list components have length 1), +the two values will be interpreted as relative shifts in (rotated) x and y +direction, respectively, and will therefore be recycled as needed +(length(repetition) times each).

    reassign

    logical (default TRUE): if TRUE, 'small' tiles +(as per min_frac and min_n arguments and +get_small_tiles) are merged with (smallest) adjacent tiles. +If FALSE, small tiles are 'eliminated', i.e. set to NA.

    min_frac

    numeric >=0, <1: minimum relative size of partition as +percentage of sample; argument passed to get_small_tiles. +Will be ignored if NULL.

    min_n

    integer >=0: minimum number of samples per partition; +argument passed to get_small_tiles. +Will be ignored if NULL.

    iterate

    argument to be passed to tile_neighbors

    return_factor

    if FALSE (default), return a +represampling object; if TRUE (used internally by +other sperrorest functions), return a list containing factor +vectors (see Value)

    repetition

    numeric vector: cross-validation repetitions +to be generated. Note that this is not the number of repetitions, +but the indices of these repetitions. E.g., use repetition = c(1:100) +to obtain (the 'first') 100 repetitions, and repetition = c(101:200) +to obtain a different set of 100 repetitions.

    seed1

    seed1+i is the random seed that will be used by +set.seed in repetition i (i in repetition) +to initialize the random number generator before sampling from the data set.

    + +

    Value

    + +

    A represampling object. +Contains length(repetition) resampling objects as +repetitions. The exact number of folds / test-set tiles within each +resampling objects depends on the spatial configuration of +the data set and possible cleaning steps (see min_frac, min_n).

    + +

    Note

    + +

    Default parameter settings may change in future releases. +This function, especially the rotation and shifting part of it and the +algorithm for cleaning up small tiles is still a bit experimental. +Use with caution. +For non-zero offsets (offset!='none')), the number of tiles may +actually be greater than nsplit[1]*nsplit[2] because of fractional +tiles lurking into the study region. reassign=TRUE with suitable +thresholds is therefore recommended for non-zero (including random) offsets.

    + +

    See also

    + +

    sperrorest, as.resampling.factor, +get_small_tiles, tile_neighbors

    + + +

    Examples

    +
    data(ecuador) +parti <- partition_tiles(ecuador, nsplit = c(4, 3), reassign = FALSE) +# plot(parti,ecuador) +summary(parti) # tile A4 has only 55 samples
    #> $`1` +#> n.train n.test +#> X1:Y2 686 65 +#> X1:Y3 665 86 +#> X2:Y1 711 40 +#> X2:Y2 666 85 +#> X2:Y3 690 61 +#> X3:Y1 664 87 +#> X3:Y2 661 90 +#> X3:Y3 681 70 +#> X4:Y1 671 80 +#> X4:Y2 692 59 +#> X4:Y3 723 28 +#>
    # same partitioning, but now merge tiles with less than 100 samples to +# adjacent tiles: +parti2 <- partition_tiles(ecuador, nsplit = c(4,3), reassign = TRUE, +min_n = 100) +# plot(parti2,ecuador) +summary(parti2)
    #> $`1` +#> n.train n.test +#> X1:Y3 600 151 +#> X2:Y2 626 125 +#> X3:Y1 584 167 +#> X3:Y2 574 177 +#> X3:Y3 620 131 +#>
    # tile B4 (in 'parti') was smaller than A3, therefore A4 was merged with B4, +# not with A3 +# now with random rotation and offset, and tiles of 2000 m length: +parti3 <- partition_tiles(ecuador, dsplit = 2000, offset = 'random', +rotation = 'random', reassign = TRUE, min_n = 100) +# plot(parti3, ecuador) +summary(parti3)
    #> $`1` +#> n.train n.test +#> X1:Y2 584 167 +#> X2:Y1 530 221 +#> X2:Y2 508 243 +#> X3:Y2 631 120 +#>
    +
    + +
    + +
    + + +
    +

    Site built with pkgdown.

    +
    + +
    +
    + + + diff --git a/docs/reference/plot.represampling.html b/docs/reference/plot.represampling.html new file mode 100644 index 00000000..aaf4738e --- /dev/null +++ b/docs/reference/plot.represampling.html @@ -0,0 +1,192 @@ + + + + + + + + +Plot spatial resampling objects — plot.represampling • sperrorest + + + + + + + + + + + + + + + + + + + + + + + + + + + +
    +
    + + + +
    + +
    +
    + + + +

    plot.represampling displays the partitions or samples corresponding +arising from the resampling of a data set.

    + + +
    # S3 method for represampling
    +plot(x, data, coords = c("x", "y"), pch = "+",
    +  wiggle_sd = 0, ...)
    +
    +# S3 method for resampling
    +plot(x, ...)
    + +

    Arguments

    + + + + + + + + + + + + + + + + + + + + + + + + + + +
    x

    a represampling resp. resampling object.

    data

    a data.frame of samples containing at least the x and y +coordinates of samples as specified by coords.

    coords

    vector of length 2 defining the variables in data that +contain the x and y coordinates of sample locations.

    pch

    point symbol (to be passed to points).

    wiggle_sd

    'wiggle' the point locations in x and y direction to avoid +overplotting of samples drawn multiple times by bootstrap methods; +this is a standard deviation (in the units of the x/y coordinates) of a +normal distribution and defaults to 0 (no wiggling).

    ...

    additional arguments to plot.

    + +

    Note

    + +

    This function is not intended for samples obtained by resampling with +replacement (e.g., bootstrap) because training and test points will be +overplotted in that case. The size of the plotting region will also limit +the number of maps that can be displayed at once, i.e., the number of rows +(repetitions) and fields (columns).

    + + +

    Examples

    +
    data(ecuador) +# non-spatial cross-validation: +resamp <- partition_cv(ecuador, nfold = 5, repetition = 1:2) +# plot(resamp, ecuador) +# spatial cross-validation using k-means clustering: +resamp <- partition_kmeans(ecuador, nfold = 5, repetition = 1:2) +# plot(resamp, ecuador)
    +
    + +
    + +
    + + +
    +

    Site built with pkgdown.

    +
    + +
    +
    + + + diff --git a/docs/reference/remove_missing_levels.html b/docs/reference/remove_missing_levels.html new file mode 100644 index 00000000..4ed3380d --- /dev/null +++ b/docs/reference/remove_missing_levels.html @@ -0,0 +1,152 @@ + + + + + + + + +remove_missing_levels — remove_missing_levels • sperrorest + + + + + + + + + + + + + + + + + + + + + + + + + + + +
    +
    + + + +
    + +
    +
    + + + +

    Accounts for missing factor levels present only in test data +but not in train data by setting values to NA

    + + +
    remove_missing_levels(fit, test_data)
    + +

    Arguments

    + + + + + + + + + + +
    fit

    fitted model on training data

    test_data

    data to make predictions for

    + +

    Value

    + +

    data.frame with matching factor levels to fitted model

    + + +
    + +
    + +
    + + +
    +

    Site built with pkgdown.

    +
    + +
    +
    + + + diff --git a/docs/reference/represampling_bootstrap.html b/docs/reference/represampling_bootstrap.html new file mode 100644 index 00000000..77ac946a --- /dev/null +++ b/docs/reference/represampling_bootstrap.html @@ -0,0 +1,192 @@ + + + + + + + + +Non-spatial bootstrap resampling — represampling_bootstrap • sperrorest + + + + + + + + + + + + + + + + + + + + + + + + + + + +
    +
    + + + +
    + +
    +
    + + + +

    represampling_bootstrap draws a bootstrap random sample +(with replacement) from data.

    + + +
    represampling_bootstrap(data, coords = c("x", "y"), nboot = nrow(data),
    +  repetition = 1, seed1 = NULL, oob = FALSE)
    + +

    Arguments

    + + + + + + + + + + + + + + + + + + + + + + + + + + +
    data

    data.frame containing at least the columns specified +by coords

    coords

    vector of length 2 defining the variables in data that +contain the x and y coordinates of sample locations.

    nboot

    Size of bootstrap sample

    repetition

    numeric vector: cross-validation repetitions +to be generated. Note that this is not the number of repetitions, +but the indices of these repetitions. E.g., use repetition = c(1:100) +to obtain (the 'first') 100 repetitions, and repetition = c(101:200) +to obtain a different set of 100 repetitions.

    seed1

    seed1+i is the random seed that will be used by +set.seed in repetition i (i in repetition) +to initialize the random number generator before sampling from the data set.

    oob

    logical (default FALSE): if TRUE, use the out-of-bag +sample as the test sample; if FALSE, draw a second bootstrap sample of +size nboot independently to obtain a test sample.

    + +

    Value

    + +

    A represampling object. This is a (named) list +containing length(repetition). +resampling objects. Each of these contains only one list with +indices of training and test samples. +Indices are row indices for data.

    + + +

    Examples

    +
    data(ecuador) +# only 10 bootstrap repetitions, normally use >=100: +parti <- represampling_bootstrap(ecuador, repetition = 10) +# plot(parti, ecuador) # careful: overplotting occurs +# because some samples are included in both the training and +# the test sample (possibly even multiple times)
    +
    + +
    + +
    + + +
    +

    Site built with pkgdown.

    +
    + +
    +
    + + + diff --git a/docs/reference/represampling_disc_bootstrap.html b/docs/reference/represampling_disc_bootstrap.html new file mode 100644 index 00000000..f7e95276 --- /dev/null +++ b/docs/reference/represampling_disc_bootstrap.html @@ -0,0 +1,209 @@ + + + + + + + + +Overlapping spatial block bootstrap using circular blocks — represampling_disc_bootstrap • sperrorest + + + + + + + + + + + + + + + + + + + + + + + + + + + +
    +
    + + + +
    + +
    +
    + + + +

    represampling_disc_bootstrap performs a spatial block bootstrap by +resampling at the level of rectangular partitions or 'tiles' generated by +partition_tiles.

    + + +
    represampling_disc_bootstrap(data, coords = c("x", "y"), nboot,
    +  repetition = 1, seed1 = NULL, oob = FALSE, ...)
    + +

    Arguments

    + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
    data

    data.frame containing at least the columns specified +by coords

    coords

    vector of length 2 defining the variables in data that +contain the x and y coordinates of sample locations.

    nboot

    number of bootstrap samples; you may specify different values +for the training sample (nboot[1]) and for the test sample +(nboot[2]).

    repetition

    numeric vector: cross-validation repetitions +to be generated. Note that this is not the number of repetitions, +but the indices of these repetitions. E.g., use repetition = c(1:100) +to obtain (the 'first') 100 repetitions, and repetition = c(101:200) +to obtain a different set of 100 repetitions.

    seed1

    seed1+i is the random seed that will be used by +set.seed in repetition i (i in repetition) +to initialize the random number generator before sampling from the data set.

    oob

    logical (default FALSE): if TRUE, use the out-of-bag +sample as the test sample (the complement of the nboot[1] test set +discs, minus the buffer area as specified in the ... arguments to +partition_disc); if FALSE, draw a second bootstrap +sample of size nboot independently to obtain a test sample +(sets of overlapping discs drawn with replacement).

    ...

    additional arguments to be passed to partition_disc; +note that a buffer argument has not effect if oob=FALSE; +see example below

    + +

    Note

    + +

    Performs nboot out of nrow(data) resampling of circular +discs. This is an overlapping spatial block bootstrap where the +blocks are circular.

    + + +

    Examples

    +
    data(ecuador) +# Overlapping disc bootstrap: +parti <- represampling_disc_bootstrap(ecuador, radius = 200, nboot = 20, +oob = FALSE) +# plot(parti, ecuador) +# Note that a 'buffer' argument would make no difference because boostrap +# sets of discs are drawn independently for the training and test sample. +# +# Overlapping disc bootstrap for training sample, out-of-bag sample as test +# sample: +parti <- represampling_disc_bootstrap(ecuador, radius = 200, buffer = 200, + nboot = 10, oob = TRUE) +# plot(parti,ecuador)
    +
    + +
    + +
    + + +
    +

    Site built with pkgdown.

    +
    + +
    +
    + + + diff --git a/docs/reference/represampling_factor_bootstrap.html b/docs/reference/represampling_factor_bootstrap.html new file mode 100644 index 00000000..18f308b8 --- /dev/null +++ b/docs/reference/represampling_factor_bootstrap.html @@ -0,0 +1,222 @@ + + + + + + + + +Bootstrap at an aggregated level — represampling_factor_bootstrap • sperrorest + + + + + + + + + + + + + + + + + + + + + + + + + + + +
    +
    + + + +
    + +
    +
    + + + +

    represampling_factor_bootstrap resamples partitions defined by a +factor variable. This can be used for non-overlapping block bootstraps and +similar.

    + + +
    represampling_factor_bootstrap(data, fac, repetition = 1, nboot = -1,
    +  seed1 = NULL, oob = FALSE)
    + +

    Arguments

    + + + + + + + + + + + + + + + + + + + + + + + + + + +
    data

    data.frame containing at least the columns specified +by coords

    fac

    defines a grouping or partitioning of the samples in data; +three possible types: +(1) the name of a variable in data (coerced to factor if not already +a factor variable); +(2) a factor variable (or a vector that can be coerced to factor); +(3) a list of factor variables (or vectors that can be coerced to factor); +this list must be of length length(repetition), and if it is named, +the names must be equal to as.character(repetition); this list will +typically be generated by a partition.* function with +return_factor = TRUE (see Examples below)

    repetition

    numeric vector: cross-validation repetitions +to be generated. Note that this is not the number of repetitions, +but the indices of these repetitions. E.g., use repetition = c(1:100) +to obtain (the 'first') 100 repetitions, and repetition = c(101:200) +to obtain a different set of 100 repetitions.

    nboot

    number of bootstrap replications used for generating the +bootstrap training sample (nboot[1]) and the test sample +(nboot[2]); nboot[2] is ignored (with a warning) if +oob = TRUE. A value of -1 will be substituted with the number +of levels of the factor variable, corresponding to an n out of +n bootstrap at the grouping level defined by fac.

    seed1

    seed1+i is the random seed that will be used by +set.seed in repetition i (i in repetition) +to initialize the random number generator before sampling from the data set.

    oob

    if TRUE, the test sample will be the out-of-bag sample; +if FALSE (default), the test sample is an independently drawn +bootstrap sample of size nboot[2].

    + +

    Details

    + +

    nboot refers to the number of groups +(as defined by the factors) to be drawn with replacement from the set of +groups. I.e., if fac is a factor variable, nboot would normally +not be greater than nlevels(fac), nlevels(fac) being the +default as per nboot = -1.

    + +

    See also

    + +

    represampling_disc_bootstrap, +represampling_tile_bootstrap

    + + +

    Examples

    +
    data(ecuador) +# a dummy example for demonstration, performing bootstrap +# at the level of an arbitrary factor variable: +parti <- represampling_factor_bootstrap(ecuador, + factor(floor(ecuador$dem / 100)), + oob = TRUE) +# plot(parti,ecuador) +# using the factor bootstrap for a non-overlapping block bootstrap +# (see also represampling_tile_bootstrap): +fac <- partition_tiles(ecuador, return_factor = TRUE, repetition = c(1:3), + dsplit = 500, min_n = 200, rotation = 'random', + offset = 'random') +parti <- represampling_factor_bootstrap(ecuador, fac, oob = TRUE, +repetition = c(1:3)) +# plot(parti, ecuador)
    +
    + +
    + +
    + + +
    +

    Site built with pkgdown.

    +
    + +
    +
    + + + diff --git a/docs/reference/represampling_kmeans_bootstrap.html b/docs/reference/represampling_kmeans_bootstrap.html new file mode 100644 index 00000000..340b3e1d --- /dev/null +++ b/docs/reference/represampling_kmeans_bootstrap.html @@ -0,0 +1,180 @@ + + + + + + + + +Spatial block bootstrap at the level of spatial k-means clusters — represampling_kmeans_bootstrap • sperrorest + + + + + + + + + + + + + + + + + + + + + + + + + + + +
    +
    + + + +
    + +
    +
    + + + +

    represampling_kmeans_bootstrap performs a non-overlapping spatial +block bootstrap by resampling at the level of irregularly-shaped partitions +generated by partition_kmeans.

    + + +
    represampling_kmeans_bootstrap(data, coords = c("x", "y"), repetition = 1,
    +  nfold = 10, nboot = nfold, seed1 = NULL, oob = FALSE, ...)
    + +

    Arguments

    + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
    data

    data.frame containing at least the columns specified +by coords

    coords

    vector of length 2 defining the variables in data that +contain the x and y coordinates of sample locations.

    repetition

    numeric vector: cross-validation repetitions +to be generated. Note that this is not the number of repetitions, +but the indices of these repetitions. E.g., use repetition = c(1:100) +to obtain (the 'first') 100 repetitions, and repetition = c(101:200) +to obtain a different set of 100 repetitions.

    nfold

    see partition_kmeans

    nboot

    see represampling_factor_bootstrap

    seed1

    seed1+i is the random seed that will be used by +set.seed in repetition i (i in repetition) +to initialize the random number generator before sampling from the data set.

    oob

    see represampling_factor_bootstrap

    ...

    additional arguments to be passed to partition_kmeans

    + + +
    + +
    + +
    + + +
    +

    Site built with pkgdown.

    +
    + +
    +
    + + + diff --git a/docs/reference/represampling_tile_bootstrap.html b/docs/reference/represampling_tile_bootstrap.html new file mode 100644 index 00000000..48aa1f81 --- /dev/null +++ b/docs/reference/represampling_tile_bootstrap.html @@ -0,0 +1,176 @@ + + + + + + + + +Spatial block bootstrap using rectangular blocks — represampling_tile_bootstrap • sperrorest + + + + + + + + + + + + + + + + + + + + + + + + + + + +
    +
    + + + +
    + +
    +
    + + + +

    represampling_tile_bootstrap performs a non-overlapping spatial +block bootstrap by resampling at the level of rectangular partitions or +'tiles' generated by partition_tiles.

    + + +
    represampling_tile_bootstrap(data, coords = c("x", "y"), repetition = 1,
    +  nboot = -1, seed1 = NULL, oob = FALSE, ...)
    + +

    Arguments

    + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
    data

    data.frame containing at least the columns specified +by coords

    coords

    vector of length 2 defining the variables in data that +contain the x and y coordinates of sample locations.

    repetition

    numeric vector: cross-validation repetitions +to be generated. Note that this is not the number of repetitions, +but the indices of these repetitions. E.g., use repetition = c(1:100) +to obtain (the 'first') 100 repetitions, and repetition = c(101:200) +to obtain a different set of 100 repetitions.

    nboot

    see represampling_factor_bootstrap

    seed1

    seed1+i is the random seed that will be used by +set.seed in repetition i (i in repetition) +to initialize the random number generator before sampling from the data set.

    oob

    see represampling_factor_bootstrap

    ...

    additional arguments to be passed to partition_tiles

    + + +
    + +
    + +
    + + +
    +

    Site built with pkgdown.

    +
    + +
    +
    + + + diff --git a/docs/reference/resample_factor.html b/docs/reference/resample_factor.html new file mode 100644 index 00000000..821d1705 --- /dev/null +++ b/docs/reference/resample_factor.html @@ -0,0 +1,171 @@ + + + + + + + + +Draw uniform random (sub)sample at the group level — resample_factor • sperrorest + + + + + + + + + + + + + + + + + + + + + + + + + + + +
    +
    + + + +
    + +
    +
    + + + +

    resample_factor draws a random (sub)sample +(with or without replacement) of the groups or clusters identified by +the fac argument.

    + + +
    resample_factor(data, param = list(fac = "class", n = Inf, replace = FALSE))
    + +

    Arguments

    + + + + + + + + + + +
    data

    a data.frame, rows represent samples

    param

    a list with the following components: fac is a factor +variable of length nrow(data) or the name of a factor variable +in data; n is a numeric value specifying the size of the +subsample (in terms of groups, not observations); replace determines +if resampling of groups is to be done with or without replacement.

    + +

    Value

    + +

    a data.frame containing a subset of the rows of data.

    + +

    Details

    + +

    If param$replace=FALSE, a subsample of +min(param$n,nlevel(data[,fac])) groups will be drawn from data. +If param$replace=TRUE, the number of groups to be drawn is param$n.

    + +

    See also

    + +

    resample_strat_uniform(), sample()

    + + +
    + +
    + +
    + + +
    +

    Site built with pkgdown.

    +
    + +
    +
    + + + diff --git a/docs/reference/resample_strat_uniform.html b/docs/reference/resample_strat_uniform.html new file mode 100644 index 00000000..55ab6c5d --- /dev/null +++ b/docs/reference/resample_strat_uniform.html @@ -0,0 +1,181 @@ + + + + + + + + +Draw stratified random sample — resample_strat_uniform • sperrorest + + + + + + + + + + + + + + + + + + + + + + + + + + + +
    +
    + + + +
    + +
    +
    + + + +

    resample_strat_uniform draws a stratified random sample +(with or without replacement) from the samples in data. +Stratification is over the levels of data[, param$response]. +The same number of samples is drawn within each level.

    + + +
    resample_strat_uniform(data, param = list(strat = "class", nstrat = Inf,
    +  replace = FALSE))
    + +

    Arguments

    + + + + + + + + + + +
    data

    a data.frame, rows represent samples

    param

    a list with the following components: strat is either +the name of a factor variable in data that defines the stratification +levels, or a vector of type factor and length nrow(data); +n is a numeric value specifying the size of the subsample; +replace determines if sampling is with or without replacement

    + +

    Value

    + +

    a data.frame containing a subset of the rows of data.

    + +

    Details

    + +

    If param$replace=FALSE, a subsample of size +min(param$n,nrow(data)) will be drawn from data. +If param$replace=TRUE, the size of the subsample is param$n.

    + +

    See also

    + +

    resample_uniform(), sample()

    + + +

    Examples

    +
    data(ecuador) # Muenchow et al. (2012), see ?ecuador +d <- resample_strat_uniform(ecuador, + param = list(strat = 'slides', nstrat = 100)) +nrow(d) # == 200
    #> [1] 200
    sum(d$slides == 'TRUE') # == 100
    #> [1] 100
    +
    +
    + +
    + +
    + + +
    +

    Site built with pkgdown.

    +
    + +
    +
    + + + diff --git a/docs/reference/resample_uniform.html b/docs/reference/resample_uniform.html new file mode 100644 index 00000000..549c0ecd --- /dev/null +++ b/docs/reference/resample_uniform.html @@ -0,0 +1,175 @@ + + + + + + + + +Draw uniform random (sub)sample — resample_uniform • sperrorest + + + + + + + + + + + + + + + + + + + + + + + + + + + +
    +
    + + + +
    + +
    +
    + + + +

    resample_uniform draws a random (sub)sample +(with or without replacement) from the samples in data.

    + + +
    resample_uniform(data, param = list(n = Inf, replace = FALSE))
    + +

    Arguments

    + + + + + + + + + + +
    data

    a data.frame, rows represent samples

    param

    a list with the following components: n is a numeric +value specifying the size of the subsample; replace determines if +sampling is with or without replacement

    + +

    Value

    + +

    a data.frame containing a subset of the rows of data.

    + +

    Details

    + +

    If param$replace=FALSE, a subsample of size +min(param$n,nrow(data)) will be drawn from data. +If param$replace=TRUE, the size of the subsample is param$n.

    + +

    See also

    + +

    resample_strat_uniform(), sample()

    + + +

    Examples

    +
    data(ecuador) # Muenchow et al. (2012), see ?ecuador +d <- resample_uniform(ecuador, param = list(strat = 'slides', n = 200)) +nrow(d) # == 200
    #> [1] 200
    sum(d$slides == 'TRUE')
    #> [1] 139
    +
    +
    + +
    + +
    + + +
    +

    Site built with pkgdown.

    +
    + +
    +
    + + + diff --git a/docs/reference/runfolds.html b/docs/reference/runfolds.html new file mode 100644 index 00000000..5248381e --- /dev/null +++ b/docs/reference/runfolds.html @@ -0,0 +1,141 @@ + + + + + + + + +runfolds — runfolds • sperrorest + + + + + + + + + + + + + + + + + + + + + + + + + + + +
    +
    + + + +
    + +
    +
    + + + +

    Runs model fitting, error estimation and variable importance +on fold level

    + + +
    runfolds(j = NULL, current_sample = NULL, data = NULL, i = NULL,
    +  formula = NULL, model_args = NULL, par_cl = NULL, par_mode = NULL,
    +  model_fun = NULL, pred_fun = NULL, imp_variables = NULL,
    +  imp_permutations = NULL, err_fun = NULL, train_fun = NULL,
    +  importance = NULL, current_res = NULL, current_impo = NULL,
    +  pred_args = NULL, pooled_obs_train = NULL, pooled_obs_test = NULL,
    +  pooled_pred_train = NULL, response = NULL, progress = NULL,
    +  is_factor_prediction = NULL, pooled_pred_test = NULL, coords = NULL,
    +  test_fun = NULL, imp_one_rep = NULL, do_gc = NULL, test_param = NULL,
    +  train_param = NULL)
    + + +
    + +
    + +
    + + +
    +

    Site built with pkgdown.

    +
    + +
    +
    + + + diff --git a/docs/reference/runreps.html b/docs/reference/runreps.html new file mode 100644 index 00000000..71663181 --- /dev/null +++ b/docs/reference/runreps.html @@ -0,0 +1,141 @@ + + + + + + + + +runreps — runreps • sperrorest + + + + + + + + + + + + + + + + + + + + + + + + + + + +
    +
    + + + +
    + +
    +
    + + + +

    Runs model fitting, error estimation and variable importance +on fold level

    + + +
    runreps(current_sample = NULL, data = NULL, formula = NULL,
    +  model_args = NULL, par_cl = NULL, do_gc = NULL, imp_one_rep = NULL,
    +  model_fun = NULL, pred_fun = NULL, imp_variables = NULL,
    +  imp_permutations = NULL, err_fun = NULL, importance = NULL,
    +  current_res = NULL, current_impo = NULL, pred_args = NULL,
    +  progress = NULL, pooled_obs_train = NULL, pooled_obs_test = NULL,
    +  pooled_pred_train = NULL, response = NULL, is_factor_prediction = NULL,
    +  pooled_pred_test = NULL, test_fun = NULL, test_param = NULL,
    +  train_fun = NULL, train_param = NULL, coords = NULL, par_mode = NULL,
    +  i = NULL)
    + + +
    + +
    + +
    + + +
    +

    Site built with pkgdown.

    +
    + +
    +
    + + + diff --git a/docs/reference/sperrorest-package.html b/docs/reference/sperrorest-package.html new file mode 100644 index 00000000..1062e502 --- /dev/null +++ b/docs/reference/sperrorest-package.html @@ -0,0 +1,155 @@ + + + + + + + + +Spatial Error Estimation and Variable Importance — sperrorest-package • sperrorest + + + + + + + + + + + + + + + + + + + + + + + + + + + +
    +
    + + + +
    + +
    +
    + + + +

    This package implements spatial error estimation and permutation-based +spatial variable importance using different spatial cross-validation +and spatial block bootstrap methods. To cite `sperrorest' in publications, +reference the paper by Brenning (2012).

    + + + +

    References

    + +

    Brenning, A. 2012. Spatial cross-validation and bootstrap for the +assessment of prediction rules in remote sensing: the R package 'sperrorest'. +2012 IEEE International Geoscience and Remote Sensing Symposium (IGARSS), +23-27 July 2012, p. 5372-5375.

    +

    Brenning, A. 2005. Spatial prediction models for landslide hazards: +review, comparison and evaluation. Natural Hazards and Earth System Sciences, +5(6): 853-862.

    +

    Russ, G. & A. Brenning. 2010a. Data mining in precision agriculture: +Management of spatial information. In 13th International Conference on +Information Processing and Management of Uncertainty, IPMU 2010; Dortmund; +28 June - 2 July 2010. Lecture Notes in Computer Science, +6178 LNAI: 350-359.

    +

    Russ, G. & A. Brenning. 2010b. Spatial variable importance assessment for +yield prediction in Precision Agriculture. In Advances in Intelligent +Data Analysis IX, Proceedings, 9th International Symposium, IDA 2010, +Tucson, AZ, USA, 19-21 May 2010. +Lecture Notes in Computer Science, 6065 LNCS: 184-195.

    + + +
    + +
    + +
    + + +
    +

    Site built with pkgdown.

    +
    + +
    +
    + + + diff --git a/docs/reference/sperrorest.html b/docs/reference/sperrorest.html new file mode 100644 index 00000000..34158fe9 --- /dev/null +++ b/docs/reference/sperrorest.html @@ -0,0 +1,440 @@ + + + + + + + + +Perform spatial error estimation and variable importance assessment +in parallel — sperrorest • sperrorest + + + + + + + + + + + + + + + + + + + + + + + + + + + +
    +
    + + + +
    + +
    +
    + + + +

    sperrorest is a flexible interface for multiple types of +parallelized spatial and non-spatial cross-validation +and bootstrap error estimation and parallelized permutation-based +assessment of spatial variable importance.

    + + +
    sperrorest(formula, data, coords = c("x", "y"), model_fun,
    +  model_args = list(), pred_fun = NULL, pred_args = list(),
    +  smp_fun = partition_cv, smp_args = list(), train_fun = NULL,
    +  train_param = NULL, test_fun = NULL, test_param = NULL,
    +  err_fun = err_default, imp_variables = NULL, imp_permutations = 1000,
    +  importance = !is.null(imp_variables), distance = FALSE,
    +  par_args = list(par_mode = "foreach", par_units = NULL, par_option = NULL),
    +  do_gc = 1, progress = "all", out_progress = "", benchmark = FALSE,
    +  ...)
    + +

    Arguments

    + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
    formula

    A formula specifying the variables used by the model. +Only simple formulas without interactions or nonlinear terms should +be used, e.g. y~x1+x2+x3 but not y~x1*x2+log(x3). +Formulas involving interaction and nonlinear terms may possibly work +for error estimation but not for variable importance assessment, +but should be used with caution.

    data

    a data.frame with predictor and response variables. +Training and test samples will be drawn from this data set by train_fun +and test_fun, respectively.

    coords

    vector of length 2 defining the variables in data that +contain the x and y coordinates of sample locations.

    model_fun

    Function that fits a predictive model, such as glm +or rpart. The function must accept at least two arguments, the first +one being a formula and the second a data.frame with the learning sample.

    model_args

    Arguments to be passed to model_fun +(in addition to the formula and data argument, +which are provided by sperrorest)

    pred_fun

    Prediction function for a fitted model object created +by model. Must accept at least two arguments: the fitted +object and a data.frame newdata with data +on which to predict the outcome.

    pred_args

    (optional) Arguments to pred_fun (in addition to the +fitted model object and the newdata argument, +which are provided by sperrorest).

    smp_fun

    A function for sampling training and test sets from +data. E.g. partition_kmeans for +spatial cross-validation using spatial k-means clustering.

    smp_args

    (optional) Arguments to be passed to smp_fun.

    train_fun

    (optional) A function for resampling or subsampling the +training sample in order to achieve, e.g., uniform sample sizes on all +training sets, or maintaining a certain ratio of positives and negatives +in training sets. +E.g. resample_uniform or resample_strat_uniform.

    train_param

    (optional) Arguments to be passed to resample_fun.

    test_fun

    (optional) Like train_fun but for the test set.

    test_param

    (optional) Arguments to be passed to test_fun.

    err_fun

    A function that calculates selected error measures from the +known responses in data and the model predictions delivered +by pred_fun. E.g. err_default (the default).

    imp_variables

    (optional; used if importance = TRUE). +Variables for which permutation-based variable importance assessment +is performed. If importance = TRUE and imp_variables == +NULL, all variables in formula will be used.

    imp_permutations

    (optional; used if importance = TRUE). +Number of permutations used for variable importance assessment.

    importance

    logical (default: FALSE): perform permutation-based +variable importance assessment?

    distance

    logical (default: FALSE): if TRUE, calculate +mean nearest-neighbour distances from test samples to training samples using +add.distance.represampling.

    par_args

    list of parallelization parameters:

      +
    • par_mode: the parallelization mode. See details.

    • +
    • par_units: the number of parallel processing units.

    • +
    • par_option: optional future settings for par_mode = "future" or +par_mode = "foreach".

    • +
    do_gc

    numeric (default: 1): defines frequency of memory garbage +collection by calling gc; if < 1, no garbage collection; +if >= 1, run a gc after each repetition; +if >= 2, after each fold.

    progress

    character (default: all): Whether to show progress +information (if possible). Default shows repetition, fold and (if enabled) +variable importance progress for par_mode = "foreach" or +par_mode = "sequential". +Set to "rep" for repetition information only or FALSE for no progress +information.

    out_progress

    only used if par_mode = foreach: Write progress +output to a file instead of console output. +The default ('') results in console output for Unix-systems and +file output ('sperrorest.progress.txt') in the current working directory +for Windows systems. No console output is possible on Windows systems.

    benchmark

    (optional) logical (default: FALSE): if TRUE, +perform benchmarking and return sperrorestbenchmark object.

    ...

    Further options passed to makeCluster for +par_mode = "foreach".

    + +

    Value

    + +

    A list (object of class sperrorest) with (up to) six components:

    +
    error_rep

    a sperrorestreperror object containing +predictive performances at the repetition level

    +
    error_fold

    a sperroresterror object containing predictive +performances at the fold level

    +
    represampling

    a represampling() object

    +
    importance

    a sperrorestimportance object containing +permutation-based variable importances at the fold level

    +
    benchmark

    a sperrorestbenchmark object containing +information on the system the code is running on, starting and +finishing times, number of available CPU cores, parallelization mode, +number of parallel units, and runtime performance

    +
    package_version

    a sperrorestpackageversion object containing +information about the sperrorest package version

    + + +

    Details

    + +

    By default sperrorest runs in parallel on all cores using +foreach with the future backend. If this is not desired, specify +par_units in par_args or set par_mode = "sequential".

    +

    Available parallelization modes include par_mode = "apply" +(calls pbmclapply on Unix, parApply on Windows) and +future (future_lapply). +For the latter and par_mode = "foreach", par_option +(default to multiprocess and +cluster, respectively) can be specified. See plan for further details.

    + +

    Note

    + +

    Custom predict functions passed to pred_fun, which consist of +multiple custom defined child functions, must be defined in one function.

    + +

    References

    + +

    Brenning, A. 2012. Spatial cross-validation and bootstrap for +the assessment of prediction rules in remote sensing: the R package +'sperrorest'. +2012 IEEE International Geoscience and Remote Sensing Symposium (IGARSS), +23-27 July 2012, p. 5372-5375.

    +

    Brenning, A. 2005. Spatial prediction models for landslide hazards: review, +comparison and evaluation. Natural Hazards and Earth System Sciences, +5(6): 853-862.

    +

    Brenning, A., S. Long & P. Fieguth. Forthcoming. Detecting rock glacier flow +structures using Gabor filters and IKONOS imagery. +Submitted to Remote Sensing of Environment.

    +

    Russ, G. & A. Brenning. 2010a. Data mining in precision agriculture: +Management of spatial information. In 13th International Conference on +Information Processing and Management of Uncertainty, IPMU 2010; Dortmund; +28 June - 2 July 2010. Lecture Notes in Computer Science, 6178 LNAI: 350-359.

    +

    Russ, G. & A. Brenning. 2010b. Spatial variable importance assessment for +yield prediction in Precision Agriculture. In Advances in Intelligent +Data Analysis IX, Proceedings, 9th International Symposium, +IDA 2010, Tucson, AZ, USA, 19-21 May 2010. +Lecture Notes in Computer Science, 6065 LNCS: 184-195.

    + + +

    Examples

    +
    # NOT RUN {
    +##------------------------------------------------------------
    +## Classification tree example using non-spatial partitioning
    +## setup and default parallel mode ("foreach")
    +##------------------------------------------------------------
    +
    +data(ecuador) # Muenchow et al. (2012), see ?ecuador
    +fo <- slides ~ dem + slope + hcurv + vcurv + log.carea + cslope
    +
    +library(rpart)
    +mypred_part <- function(object, newdata) predict(object, newdata)[, 2]
    +ctrl <- rpart.control(cp = 0.005) # show the effects of overfitting
    +fit <- rpart(fo, data = ecuador, control = ctrl)
    +
    +### Non-spatial 5-repeated 10-fold cross-validation:
    +mypred_part <- function(object, newdata) predict(object, newdata)[, 2]
    +par_nsp_res <- sperrorest(data = ecuador, formula = fo,
    +                          model_fun = rpart,
    +                          model_args = list(control = ctrl),
    +                          pred_fun = mypred_part,
    +                          progress = TRUE,
    +                          smp_fun = partition_cv,
    +                          smp_args = list(repetition = 1:5, nfold = 10))
    +summary(par_nsp_res$error_rep)
    +summary(par_nsp_res$error_fold)
    +summary(par_nsp_res$represampling)
    +# plot(par_nsp_res$represampling, ecuador)
    +
    +### Spatial 5-repeated 10-fold spatial cross-validation:
    +par_sp_res <- sperrorest(data = ecuador, formula = fo,
    +                         model_fun = rpart,
    +                         model_args = list(control = ctrl),
    +                         pred_fun = mypred_part,
    +                         progress = TRUE,
    +                         smp_fun = partition_kmeans,
    +                         smp_args = list(repetition = 1:5, nfold = 10))
    +summary(par_sp_res$error_rep)
    +summary(par_sp_res$error_fold)
    +summary(par_sp_res$represampling)
    +# plot(par_sp_res$represampling, ecuador)
    +
    +smry <- data.frame(
    +    nonspat_training = unlist(summary(par_nsp_res$error_rep,
    +                                      level = 1)$train_auroc),
    +    nonspat_test     = unlist(summary(par_nsp_res$error_rep,
    +                                      level = 1)$test_auroc),
    +    spatial_training = unlist(summary(par_sp_res$error_rep,
    +                                      level = 1)$train_auroc),
    +    spatial_test     = unlist(summary(par_sp_res$error_rep,
    +                                     level = 1)$test_auroc))
    +boxplot(smry, col = c('red','red','red','green'),
    +    main = 'Training vs. test, nonspatial vs. spatial',
    +    ylab = 'Area under the ROC curve')
    +
    +##------------------------------------------------------------
    +## Logistic regression example (glm) using partition_kmeans
    +## and computation of permutation based variable importance
    +##------------------------------------------------------------
    +
    +data(ecuador)
    +fo <- slides ~ dem + slope + hcurv + vcurv + log.carea + cslope
    +
    +out <- sperrorest(data = ecuador, formula = fo,
    +                  model_fun = glm,
    +                  model_args = list(family = "binomial"),
    +                  pred_fun = predict,
    +                  pred_args = list(type = "response"),
    +                  smp_fun = partition_cv,
    +                  smp_args = list(repetition = 1:2, nfold = 4),
    +                  par_args = list(par_mode = "future"),
    +                  importance = TRUE, imp_permutations = 10)
    +summary(out$error_rep)
    +summary(out$importance)
    +# }
    +
    + +
    + +
    + + +
    +

    Site built with pkgdown.

    +
    + +
    +
    + + + diff --git a/docs/reference/summary.represampling.html b/docs/reference/summary.represampling.html new file mode 100644 index 00000000..e58e12b9 --- /dev/null +++ b/docs/reference/summary.represampling.html @@ -0,0 +1,157 @@ + + + + + + + + +Summary statistics for a resampling objects — summary.represampling • sperrorest + + + + + + + + + + + + + + + + + + + + + + + + + + + +
    +
    + + + +
    + +
    +
    + + + +

    Calculates sample sizes of training and test sets within repetitions and +folds of a resampling or represampling object.

    + + +
    # S3 method for represampling
    +summary(object, ...)
    +
    +# S3 method for resampling
    +summary(object, ...)
    + +

    Arguments

    + + + + + + + + + + +
    object

    A resampling or represampling object.

    ...

    currently ignored.

    + +

    Value

    + +

    A list of data.frames summarizing the sample sizes of training +and test sets in each fold of each repetition.

    + + +
    + +
    + +
    + + +
    +

    Site built with pkgdown.

    +
    + +
    +
    + + + diff --git a/docs/reference/summary.sperrorest.html b/docs/reference/summary.sperrorest.html new file mode 100644 index 00000000..db450c30 --- /dev/null +++ b/docs/reference/summary.sperrorest.html @@ -0,0 +1,191 @@ + + + + + + + + +Summary and print methods for sperrorest results — summary.sperrorestreperror • sperrorest + + + + + + + + + + + + + + + + + + + + + + + + + + + +
    +
    + + + +
    + +
    +
    + + + +

    Summary methods provide varying level of detail while print methods +provide full details.

    + + +
    # S3 method for sperrorestreperror
    +summary(object, level = 0, na.rm = TRUE, ...)
    +
    +# S3 method for sperrorest
    +summary(object, ...)
    +
    +# S3 method for sperrorestimportance
    +print(x, ...)
    +
    +# S3 method for sperroresterror
    +print(x, ...)
    +
    +# S3 method for sperrorestreperror
    +print(x, ...)
    +
    +# S3 method for sperrorest
    +print(x, ...)
    +
    +# S3 method for sperrorestbenchmarks
    +print(x, ...)
    +
    +# S3 method for sperrorestpackageversion
    +print(x, ...)
    + +

    Arguments

    + + + + + + + + + + + + + + + + + + + + + + +
    object

    a sperrorest object

    level

    Level at which errors are summarized: +0: overall; 1: repetition; 2: fold

    na.rm

    Remove NA values? See mean etc.

    ...

    additional arguments for summary.sperroresterror +or summary.sperrorestimportance

    x

    Depending on method, a sperrorest, +sperroresterror or sperrorestimportance object

    + +

    See also

    + +

    sperrorest, +summary.sperroresterror, +summary.sperrorestimportance

    + + +
    + +
    + +
    + + +
    +

    Site built with pkgdown.

    +
    + +
    +
    + + + diff --git a/docs/reference/summary.sperroresterror.html b/docs/reference/summary.sperroresterror.html new file mode 100644 index 00000000..6cec4f93 --- /dev/null +++ b/docs/reference/summary.sperroresterror.html @@ -0,0 +1,202 @@ + + + + + + + + +Summarize error statistics obtained by <code>sperrorest</code> — summary.sperroresterror • sperrorest + + + + + + + + + + + + + + + + + + + + + + + + + + + +
    +
    + + + +
    + +
    +
    + + + +

    summary.sperroresterror calculates mean, standard deviation, +median etc. of the calculated error measures at the specified level +(overall, repetition, or fold). +summary.sperrorestreperror does the same with the pooled error, +at the overall or repetition level.

    + + +
    # S3 method for sperroresterror
    +summary(object, level = 0, pooled = TRUE,
    +  na.rm = TRUE, ...)
    + +

    Arguments

    + + + + + + + + + + + + + + + + + + + + + + +
    object

    sperroresterror resp. sperrorestcombinederror +error object calculated by sperrorest

    level

    Level at which errors are summarized: +0: overall; 1: repetition; 2: fold

    pooled

    If TRUE (default), mean and standard deviation etc are +calculated between fold-level error estimates. If FALSE, +apply first a weighted.mean among folds before calculating +mean, standard deviation etc among repetitions. See also Details.

    na.rm

    Remove NA values? See mean etc.

    ...

    additional arguments (currently ignored)

    + +

    Value

    + +

    Depending on the level of aggregation, a list or +data.frame with mean, and at level 0 also standard deviation, +median and IQR of the error measures.

    + +

    Details

    + +

    Let's use an example to explain the error_rep argument. +E.g., assume we are using 100-repeated 10-fold cross-validation. +If error_rep = TRUE (default), the mean and standard deviation calculated +when summarizing at level = 0 +are calculated across the error estimates obtained for +each of the 100*10 = 1000 folds. +If error_rep = FALSE, mean and standard deviation are calculated across +the 100 repetitions, using the weighted average of the fold-level +errors to calculate an error value for the entire sample. +This will essentially not affect the mean value but of course the +standard deviation of the error. error_rep = FALSE is not recommended, +it is mainly for testing purposes; when the test sets are small +(as in leave-one-out cross-validation, in the extreme case), +consider running sperrorest with error_rep = TRUE and +examine only the error_rep component of its result.

    + +

    See also

    + +

    sperrorest

    + + +
    + +
    + +
    + + +
    +

    Site built with pkgdown.

    +
    + +
    +
    + + + diff --git a/docs/reference/summary.sperrorestimportance.html b/docs/reference/summary.sperrorestimportance.html new file mode 100644 index 00000000..9424050f --- /dev/null +++ b/docs/reference/summary.sperrorestimportance.html @@ -0,0 +1,170 @@ + + + + + + + + +Summarize variable importance statistics obtained by <code>sperrorest</code> — summary.sperrorestimportance • sperrorest + + + + + + + + + + + + + + + + + + + + + + + + + + + +
    +
    + + + +
    + +
    +
    + + + +

    summary.sperrorestimportance calculated mean, standard deviation, +median etc. of the calculated error measures at the specified level +(overall, repetition, or fold).

    + + +
    # S3 method for sperrorestimportance
    +summary(object, level = 0, na.rm = TRUE,
    +  which = NULL, ...)
    + +

    Arguments

    + + + + + + + + + + + + + + + + + + + + + + +
    object

    sperrorestimportance object calculated by +sperrorest called with argument importance = TRUE

    level

    Level at which errors are summarized: +0: overall; 1: repetition; 2: fold

    na.rm

    Remove NA values? See mean etc.

    which

    optional character vector specifying selected variables for +which the importances should be summarized (to do: check implementation)

    ...

    additional arguments (currently ignored)

    + +

    Value

    + +

    a list or data.frame, depending on the level of aggregation

    + + +
    + +
    + +
    + + +
    +

    Site built with pkgdown.

    +
    + +
    +
    + + + diff --git a/docs/reference/tile_neighbors.html b/docs/reference/tile_neighbors.html new file mode 100644 index 00000000..ed8831b2 --- /dev/null +++ b/docs/reference/tile_neighbors.html @@ -0,0 +1,162 @@ + + + + + + + + +Determine the names of neighbouring tiles in a rectangular pattern — tile_neighbors • sperrorest + + + + + + + + + + + + + + + + + + + + + + + + + + + +
    +
    + + + +
    + +
    +
    + + + +

    This based on 'counting' up and down based on the tile name.

    + + +
    tile_neighbors(nm, tileset, iterate = 0, diagonal = FALSE)
    + +

    Arguments

    + + + + + + + + + + + + + + + + + + +
    nm

    Character string or factor: name of a tile, e.g., 'X4:Y6'

    tileset

    Admissible tile names; if missing and nm is a factor +variable, then levels(nm) is used as a default for tileset.

    iterate

    internal - do not change default: to control behaviour in an +interactive call to this function.

    diagonal

    if TRUE, diagonal neighbours are also considered +neighbours.

    + +

    Value

    + +

    Character string.

    + + +
    + +
    + +
    + + +
    +

    Site built with pkgdown.

    +
    + +
    +
    + + + diff --git a/docs/reference/transfer_parallel_output.html b/docs/reference/transfer_parallel_output.html new file mode 100644 index 00000000..a8afc033 --- /dev/null +++ b/docs/reference/transfer_parallel_output.html @@ -0,0 +1,132 @@ + + + + + + + + +transfer_parallel_output — transfer_parallel_output • sperrorest + + + + + + + + + + + + + + + + + + + + + + + + + + + +
    +
    + + + +
    + +
    +
    + + + +

    transfers output of parallel calls to runreps

    + + +
    transfer_parallel_output(my_res = NULL, res = NULL, impo = NULL,
    +  pooled_error = NULL)
    + + +
    + +
    + + +
    + + +