diff --git a/_sections/home-english.md b/_sections/home-english.md index 7d96366..d22dfd6 100644 --- a/_sections/home-english.md +++ b/_sections/home-english.md @@ -121,7 +121,9 @@ For example, to automatically capture points with an accuracy of 10m or better w ### Multiple choice -XLSForm supports both **select_one** (select only one answer) and **select_multiple** (select multiple answers) questions. Writing a multiple-choice question requires adding a **choices** worksheet to your Excel workbook. Here is an example of a **select_one** question: +XLSForm supports both **select_one** (select only one answer) and **select_multiple** (select multiple answers) questions. You can define choices for multiple choice questions directly in the form or, for very long choice lists or ones that need to be updated by an external process, [in attached files](#multiple-choice-from-file). + +Defining choices in your form requires adding a **choices** worksheet to your Excel workbook. Here is an example of a **select_one** question: | type | name | label | @@ -171,6 +173,16 @@ In general, choice names should be unique within a single-choice list. If two ch | =========== | | settings | +#### Cascading selects + +If you want question responses to filter the options available in later questions, use a cascading select. For example, if you want to ask which city the respondent lives in, instead of choosing from all cities in the world, you could first ask for their country. Then you could filter the list of states to only show the ones in the selected country, filter districts to only show the ones in the selected state, then filter cities to only show the ones in that district. + +To chain or cascade selects, you will need to create a **choice_filter** column in your survey worksheet. The expression in this column will be used to filter down the list of choices for the corresponding select. Any choice for which the expression is **true** will be included. Check out an example XLSForm [here](/assets/xlsx/cascading_select.xlsx). + +#### Look up values in a choice list + +You can add additional columns to the **choices** sheet and then look up values for those columns using the [`instance` function](#look-up-values-in-choice-lists-or-attached-files). + #### Specify other {% include alerts/warning.html content="We generally recommend using [relevance](#relevant) to specify your own **other** choice. The shortcut described in this section only works for selects without translations or **choice_filter**s. It uses English for the \"Specify other\" choice which cannot be customized." %} @@ -188,7 +200,6 @@ For multiple-choice questions, surveys often include an option of marking **othe | list name | name | label | | -------------- | ---------- | ------------------------- | -| list name | name | label | | pizza_toppings | cheese | Cheese | | pizza_toppings | pepperoni | Pepperoni | | pizza_toppings | sausage | Sausage | @@ -197,12 +208,11 @@ For multiple-choice questions, surveys often include an option of marking **othe Click on the link to look at the complete [pizza_questionnaire](https://docs.google.com/spreadsheets/d/1y9LcFUaJ_MDRpqbzHVxkD_k6YzSQCllqh3Excy4iffg/edit?usp=sharing). -#### Location widget -A user may want to select a location from a map view during data collection. To enable this feature, you need to add the **map** or **quick map** appearance attribute to a **select_one** question. The choices sheet will also need a **geometry** column added for the list_name noted in the select_one questions. The geometry must be specified using the [ODK format](https://docs.getodk.org/form-question-types/#location-widgets). This feature is only currently available on ODK Collect. See below: +#### Select from map +To show select choices on a map, add the **map** or **quick map** appearance attribute to a **select_one** question. The choices sheet will also need a **geometry** column added for the list_name noted in the select_one questions. The geometry must be specified using the [ODK format](https://docs.getodk.org/form-question-types/#location-widgets). This feature is only currently available on ODK Collect. See below: | list name | name | label | geometry | | -------------- | ---------- | ------------------------- | ------------------------- | -| list name | name | label | geometry | | site | shofco | Shofco | 36.7965483 -1.3182517 0 0 | | site | gemkam | Gemkam Medical Clinic | 36.7967088 -1.3170903 0 0 | | site | silanga | Silanga Pharmacy | 36.7955008 -1.3167834 0 0 | @@ -210,15 +220,15 @@ A user may want to select a location from a map view during data collection. To | ============ | ========== | ========================= | ========================= | | choices | | | | -### Multiple choice from file +#### Multiple choice from file -The options in a multiple-choice question can also be taken from a separate file instead of the choices sheet. This is particularly useful if the options are dynamic or if the list of options is used in multiple surveys. Three types of files are supported: CSV, XML, and GeoJSON files. See usage examples below: +If you want to provide the options for a multiple-choice question in a separate file, use the **select_one_from_file** or **select_mutliple_from_file** question type. Using separate files can make it easier to manage longer option lists, and to re-use option lists between surveys. Three types of files are supported: CSV, XML, and GeoJSON files. See usage examples below: | type | name | label | choice_filter | | --------------------------------------- | ---- | ------------------------------ | --------------- | | select_multiple_from_file country.csv | liv | In which countries did you live? | | | select_one_from_file countries.xml | cou | In which country do you live now? | | -| select_one_from_file countries.xml | cit | What is the closest city? | name=${cou} | +| select_one_from_file cities.xml | cit | What is the closest city? | country=${cou} | | select_one_from_file households.csv | hh | Select household number | | | ======================================= | ==== | ===============================|=================| | survey | | | | @@ -231,7 +241,9 @@ The options in a multiple-choice question can also be taken from a separate file | ============================================= | ==== | ====================================|==============| | survey | | | | -The files require a specific format. A CSV file requires a `name` and `label` column which represent the value and label of the options. An XML file requires a structure as shown below: +The files require a specific format. A CSV file requires columns which represent the value and label of the options. If you use the column names `name` and `label`, these will be used automatically. You can also [specify the columns to use](#specify-custom-columns-for-label-and-value). + +An XML file requires a structure as shown below: ``` @@ -242,17 +254,21 @@ The files require a specific format. A CSV file requires a `name` and `label` co ``` -A GeoJSON requires each feature, or point, to have an id and title property, or an attribute of the point. The GeoJSON must be defined by a single top-level FeatureCollection, and it currently only works for point geometry, as noted in detail on the [ODK documentation site](https://docs.getodk.org/form-datasets/#selects-from-geojson). +A GeoJSON requires each feature to have an id and a title property. The GeoJSON must be defined by a single top-level FeatureCollection. Learn more from [the ODK documentation](https://docs.getodk.org/form-datasets/#selects-from-geojson). CSV, XML, and GeoJSON files may have additional columns, XML nodes, or features and custom properties as long as the above-mentioned basic requirements are met. +This question type is generally the preferred way of building select questions from external data as it is the most versatile and works across applications. However, selects from files with tens of thousands of options can affect the responsiveness of the form. If you have long choice lists, check whether your form is adequately responsive on the lowest performance device that your data collection team will use. If it is too slow, consider using [Dynamic selects from preloaded data](#dynamic-selects-from-pre-loaded-data) if your data collection application supports it. + +#### Specify custom columns for label and value + If the CSV, XML, or GeoJSON files use different names for the choice `name` and `label`, add a column to the survey sheet named `parameters`, and specify the custom names with the `value` and `label` parameters. See usage examples below: | type | name | label | parameters | | --------------------------------------- | ---- | --------------------------------- | --------------- | | select_multiple_from_file country.csv | liv | In which countries did you live? | value=ccode | | select_one_from_file countries.xml | cou | In which country do you live now? | label=cname | -| select_one_from_file households.csv | hh | Select household number | value=housenum, label=housename | +| select_one_from_file households.csv | hh | Select household number | value=housenum label=housename | | ======================================= | ==== | ==================================|=================| | survey | | | | @@ -260,12 +276,10 @@ If the CSV, XML, or GeoJSON files use different names for the choice `name` and | type | name | label | appearance | parameters | | --------------------------------------------- | ---- | ----------------------------------- | ------------ | ------------------------ | -| select_one_from_file health_facility.geojson | site | Select the health facility visited | map | value = id, label = name | +| select_one_from_file health_facility.geojson | site | Select the health facility visited | map | value=id label=name | | ============================================= | ==== | ====================================|==============| ======================== | | survey | | | | | -Note that, this question type is generally the preferred way of building select questions from external data as it is the most versatile and works across applications. However, if your external data file consists of many thousands of lines, please test carefully whether the performance is satisfactory on the lowest-spec device you intend to use. If it is too slow, consider using [External Selects](#external-selects) or [Dynamic selects from preloaded data](#dynamic-selects-from-pre-loaded-data) if your data collection application supports it. - ### Rank The rank widget can be used to let respondents order a list of options. The answer is saved as an ordered, space-separated list of option values where all options are always included. The syntax is very similar to multiple-choice questions. @@ -383,18 +397,6 @@ This example below would collect the precise GPS location every 180 seconds and See [this page](https://docs.getodk.org/form-audit-log/) in the ODK Collect documentation for full details about the **audit** metaquestion, available location tracking parameters, and the format of the **audit.csv** log file created for each submission. -### External XML data - -For advanced users, who need to perform complex queries on external data without restrictions, an external XML data file can be added with question type **xml-external**. The value in the **name** column can be used to refer to this data in any formula (e.g. for a calculation, constraint, relevant, or choice_filter) using the **instance('name')** function. A file with the same name and the **.xml** extension should be uploaded with the form. See below for an example that requires uploading a file called houses.xml with the form. - -| type | name | label | calculation | -| ------------------------- | ------------ | --------------- | ------------------------------------------------------------ | -| xml-external | houses | | | -| integer | rooms | How many rooms? | | -| calculate | count | | count(instance('houses')/house[rooms = current()/../rooms ]) | -| ========================= | ============ | ========== | ============================================================ | -| survey | | | | - ## Hints ### Regular hints @@ -428,6 +430,62 @@ Formulas are used in the [constraint](#constraints), [relevant](#relevant), [cal Formulas are composed of functions and operators (+,*,div,etc.). A well-documented full list of operators and functions can be found in the [ODK documentation](https://docs.getodk.org/form-operators-functions/). For the technically inclined, the underlying XForms specification is the actual source document for the supported [functions](https://getodk.github.io/xforms-spec/#xpath-functions). +### Look up values in choice lists or attached files + +You can look up values from lists defined in the choices sheet, attached CSVs, attached geoJSON files and attached XML files by using the `instance` function. You will use this general structure: + +`instance('list_name')/root/item[filter expression]/desired_property` + +The instance function needs the name of the list that you want to look up a value in. For lists specified in the choices sheet, this is the `list_name`. For attached files, use the filename without the extension. For example: + +* To look values up in a `fruits` list defined on the choices sheet, use `instance('fruits')` +* To look values up in a `participants.csv` file, use `instance('participants')` +* To look values up in a `places.geojson` file, use `instance('places')` + +The next part of the expression is `/root/item[filter expression]`. `/root/item` says to look at every item in the list and the provided filter expression will be used to include items for which the expression is `true` and exclude items for which the expression is `false`. This is the same kind of expression used in the [choice_filter](#cascading-selects) column. The most common kind of filter expression looks for an exact match on the `name` of an item: + +* To get the item in `participants.csv` with `name` that matches a scanned barcode: `instance('participants')/root/item[name=${barcode_id}]` +* To get the item in `participants.csv` with `name` that matches a value picked from a select: `instance('participants')/root/item[name=${participant}]` + +The last part of the expression is a property or column name to access for the item(s) that match the filter expression: + +* To get the age of a participant whose id card was scanned: `instance('participants')/root/item[name=${barcode_id}]/age` +* To get the first name of a participant selected from a list: `instance('participants')/root/item[name=${participant}]/fname` + +This is generally the preferred way of looking values up in attached files as it is the most versatile and works across applications. However, looking up values in files with many tens of thousands of options can affect the responsiveness of the form. If you have long choice lists, check whether your form is adequately responsive on the lowest performance device that your data collection team will use. If it is too slow, consider using [`pulldata()`](#how-to-pull-data-from-csv) if your data collection application supports it. + +💡 These expressions use a subset of [XPath 1.0](https://developer.mozilla.org/en-US/docs/Web/XPath). The filter expression in square brackets can be any expression that evaluates to true or false, including using functions. + +#### External CSV data + +If you want to attach a CSV file to your form so that you can look values up in it, you have a few options. If you need to build a select from values in that CSV, you can use [select_one_from_file](#multiple-choice-from-file). This will attach the CSV to your form and also allow you to look values up in it as described above. + +If you don't need to build a select from values in your CSV, you can use the type **csv-external** and specify the name of the file without extension: + +| type | name | label | calculation | +| ------------------------- | ------------ | --------------- | ---------------------------------------------------------------- | +| csv-external | participants | | | +| barcode | id | Scan id | | +| calculate | first_name | | instance('participants')/root/item[participant_id=${id}]/fname | +| ========================= | ============ | ========== | ================================================================ | +| survey | | | | + +The sample form above attaches the `participants.csv` list to the form. It then asks for a barcode scan and uses the scanned value to look up the participant with matching id in the `participant_id` column. The value in that participant's `fname` column is stored in the `first_name` calculate. + +#### External XML data + +For users who need to perform complex queries on external data without restrictions, an external XML data file can be added with question type **xml-external**. The value in the **name** column can be used to refer to this data in any formula (e.g. for a calculation, constraint, relevant, or choice_filter) using the **instance('name')** function. A file with the same name and the **.xml** extension should be uploaded with the form. See below for an example that requires uploading a file called houses.xml with the form. + +| type | name | label | calculation | +| ------------------------- | ------------ | --------------- | ------------------------------------------------------------ | +| xml-external | houses | | | +| integer | rooms | How many rooms? | | +| calculate | count | | count(instance('houses')/house[rooms = current()/../rooms ]) | +| ========================= | ============ | ========== | ============================================================ | +| survey | | | | + +Note that XML files can have any structure so `instance` calls to look up values in an XML file may not be followed `/root/item`. + ## Constraints One way to ensure data quality is to add constraints to the data fields in your form. For example, when asking for a person's age, you want to avoid impossible answers, like -22 or 200. Adding data constraints in your form is easy to do. You simply add a new column, called **constraint**, and type in the formula specifying the limits on the answer. In the example below, the answer for the person's age must be less than or equal to 150. Note how the ``.`` in the formula refers back to the question variable. @@ -826,154 +884,6 @@ Check out the [Birds XLSForm](https://docs.google.com/spreadsheets/d/1Rxft3H3xl3 Media is translatable in the same way as labels and hints as explained in the [languages section](#multiple-language-support). -## Pre-loading CSV data - -Pre-loading data is done when one wants to reference pre-existing data in a survey form. You can be able to reference data in your survey form (the survey you are now authoring), from pre-existing data in a specific survey form or from any other source. For example, if you have pre-existing data from a household survey and you want to collect follow-up data about the household occupants. You can be able to reference the household survey data in your survey form. -To reference pre-existing data in a survey form: - - * Upload one or more .csv files as support files when you upload your form definition (the same way you upload media support files as explained in the [Media](#media) section). The first row of each .csv file should be a header that includes short: - * unique names for each column - * subsequent rows which should contain the data itself - -Each csv file should contain at least one column that can be used to uniquely identify each row. Such columns will be used, at survey time, to look up which row's data to pull into the survey. For the columns that will be used for looking up rows add **_key** to the end of the column name in the first row. Any columns with names ending in **_key** will be indexed for faster look-ups on your survey devices. See below an example of the columns on a .csv file: - -| name_key | name | -| -------- | ------ | -| mango | Mango | -| orange | Orange | - -### How to pull data from CSV - -You can be able to pull data from .csv file by including one or more .csv files in your form during the survey time. -For each data field that you want to pull into your survey: - - * Add a **calculate field** to your survey. - * Give that field a **name** - * Then in its **calculation** column, call the **pulldata()** function, indicating which field to pull from which row of which .csv file. - -See below for an example: - -| type | name | label | calculation | -| --------- | ---------- | ----------------------------------- | ----------------------------------------------- | -| calculate | fruit | | pulldata('fruits', 'name', 'name_key', 'mango') | -| note | note_fruit | The fruit ${fruit} pulled from csv. | | -| ========= | ========== | =================================== | =============================================== | -| survey | | | | - -Once you have loaded .csv data into a survey field using the **pulldata()** function, you can reference that field in later relevance conditions, constraints, and labels, just as you would reference any other field that was filled in by the user. - -Click on the link to see an example of a [pre-loading sample form ](https://docs.google.com/spreadsheets/d/1evieF8RW8CMlhbhksgfikXAYvK6uXh3DS5c50ejTSEw/edit?usp=sharing) and the .csv file used with form can be found [here](https://docs.google.com/spreadsheets/d/1gprb7ocTYlT_seOBFY5CuoxyodcXwWOuVxmp38OX1dE/edit?usp=sharing) - -**Important notes on usage of pre-loaded data** - - * Compress a large .csv file into a **.zip archive** before uploading it. - * Save .csv file in **UTF-8 format** if pre-loaded data contains non-English fonts or special characters this enables your Android device to render the text correctly. - * Data fields pulled from a .csv file are considered to be text strings therefore use the **int()** or **number()** functions to convert a pre-loaded field into numeric form. - * If the .csv file contains sensitive data that you may not want to upload to the server, upload a blank .csv file as part of your form, then replace it with the real .csv file by hand-copying the file onto each of your devices. - -## Dynamic selects from pre-loaded data - -If the recommended methods described in [Multiple Choice from File](#multiple-choice-from-file) do not meet your requirements you can consider the method below if your data collection application supports it. - -Once your form has one or more pre-loaded .csv files, you can dynamically pull the choice lists for **select_one** and **select_multiple** fields from those .csv files. Multiple-choice fields with dynamic choice lists follow the same general syntax as regular, static select_one and select_multiple fields as previously covered in the [Multiple choice questions](#multiple-choice) section. - -The following should be done: - -* specify **select_one listname** or **select_multiple listname** in the type column (where **listname** is the name of your choice list) -* specify any special **appearance styles** in the appearance column -* include one or more rows for your listname on the choices worksheet. - -Below is an example of the **survey worksheet**: - -| type | name | label | appearance | -| ----------------- | ------ | -------------- | ---------------- | -| select_one fruits | fruits | Select a fruit | search('fruits') | -| ================= | ====== | ============== | ================ | -| survey | | | | - -There are three differences when the choice list should be pulled from one of your pre-loaded .csv files: - -* In the appearance column: - * Include a **search() expression** that specifies which .csv rows to include in the choice list. - * If the field should use a non-default appearance style. The non-default appearance style goes into the column first, followed by a **space**, then the **search() expression**. [e.g., **quick search()**] -* On the **choices worksheet**: - * a row should indicate which .csv columns to use for the label and selected value. As follows: - * **list_name** column: specify the name of your choice list as you normally would. - * **name** column: include the name of the .csv column to use for uniquely identifying selected choices. - * **label** column: include the name of the .csv column to use for labeling the choices. -
-**Note**: -
-If you wish to include multiple columns in the labels, include a comma-separated list of all columns to include. The name column will be dynamically populated based on the column name you put there, and the label column will be dynamically populated based on the column name(s) you put there. -* In your choices worksheet row, you may also include a .csv column name in the image column. If you do, the image filename to use will be pulled from the specified .csv column. -
-**Note**: -
-If you refer to image files in this way, you must always upload those image files as media file attachments when you upload your form to the server. -
-See below an example of the choices worksheet: -
- -| list name | name | label | -| ----------------- | -------- | -------------- | -| fruits | name_key | name | -| ================= | ====== | ============== | -| choices | | | - -Click on the link to see an example of a [search-and-select sample form](https://docs.google.com/spreadsheets/d/1Y0vW0cjl1nbkZczXRmcTC71Pso8dRbouPSYWGBdvBWU/edit?usp=sharing) and the .csv file used with form can be found [here](https://docs.google.com/spreadsheets/d/1gprb7ocTYlT_seOBFY5CuoxyodcXwWOuVxmp38OX1dE/edit?usp=sharing). -
- -For the **search() expression**, there are a series of options to indicate which .csv rows to include in the choice list: - - 1. **search(csvName)**: The single-parameter search expression includes all distinct rows as choices (e.g., "search('hhplotdata')"). - All rows in the specified .csv file will be considered as choices, but only distinct rows -- those with unique selection values -- - will be listed for the user. In other words, duplicates will be automatically filtered from the list shown to users. - - 2. **search(csvName, "contains", columnsToSearch, searchText)**: This search expression includes all distinct rows that contain the - specified text in the specified column(s) (e.g., "search('hhplotdata', 'contains', 'respondentname', ${nametofind})"). The third - parameter specifies either a single column name to search, or a comma-separated list of column names to search. Rows with matches - in any specified column will be included. - - 3. **search(csvName, "startswith", columnsToSearch, searchText)**: This search expression includes all distinct rows that start with - the specified text in the specified column(s) (e.g., "search('hhplotdata', 'startswith', 'respondentname', ${nameprefix})"). The third - parameter specifies either a single column name to search, or a comma-separated list of column names to search. Rows with matches in - any specified column will be included. - - 4. **search(csvName, "endswith", columnsToSearch, searchText)**: This search expression includes all distinct rows that end with the - specified text in the specified column(s) (e.g., "search('hhplotdata', 'endswith', 'respondentname', ${namesuffix})"). The third - parameter specifies either a single column name to search, or a comma-separated list of column names to search. Rows with matches - in any specified column will be included. - - 5. **search(csvName, "matches", columnsToSearch, searchText)**: This search expression includes all distinct rows that exactly contain the specified text in the specified column(s) (e.g., "search('hhplotdata', 'matches', 'respondentname', ${nametofind})"). The third parameter specifies either a single column name to search, or a comma-separated list of column names to search. Rows with exact matches in any specified column will be included. - - 6. **search(csvName, searchType, columnsToSearch, searchText, columnToFilter, filterText)**: Finally, any of the four search types above can be further filtered to only include a subset of .csv data. Simply add two extra parameters to any of the search types above, with the first extra parameter being the column name to filter and the second extra parameter being the exact value to filter. For whichever search is specified in the first four parameters, only rows exactly containing the sixth parameter value in the column named by the fifth parameter will be included (e.g., "search('hhplotdata', 'contains', 'respondentname', ${nametofind}, 'villageid', ${villageid})" to list all matching names within a particular village). - -For an example, see the [advanced search-and-select sample form here](https://drive.google.com/drive/folders/1Wx09ZFOJuiWmy0XKKtpxzGsZHfxeTO9c?usp=sharing). - -Additional notes on usage: - -1. Choices will be ordered, by default, in the order that they appear in your .csv file. If you want to specify a different ordering, include a numeric column in your .csv file named sortby; choices will be ordered numerically, according to the sortby column (if present). -2. You can include one or more static choice options, in addition to the dynamic ones loaded from your .csv file. Simply include static -choices, as you normally would, on the choices worksheet. These can appear before and/or after the row that indicates the columns to use for your dynamic choices. The one restriction is that the values you specify for your static choices in the name column must be numeric. - -## Cascading selects - -A lot of forms start out by asking the location of the respondent, with each location selection specifying what the subsequent location choices will be (e.g., state >> district >> village). Instead of adding a **select_one** field for each location option, you can use cascade select. In order to use cascade selects, you will need to create a **choice_filter** column in your survey worksheet and add the location attribute columns in your choices worksheet. Check out an example XLSForm [here](/assets/xlsx/cascading_select.xlsx). - -## External selects - -If a form has selects with a large number of choices (e.g., hundreds or thousands), that form can slow down form loading and navigation if [Multiple Choice from File](#multiple-choice-from-file) is used. The best workaround to this issue is to use external selects in those data collection applications (such as ODK Collect) that support it. - -Enabling external selects is straightforward. - - Instead of **select_one** for the prompt type, use **select_one_external**. - - Instead of the **choices** sheet, put external choices in the **external_choices** sheet. - -See [select_one_external](https://docs.google.com/spreadsheets/d/12qZL34kuHSZGWDv0BBJ1qf7dSmml-d2VnMWH0Vtg-O4/edit?usp=sharing) form for an example that uses normal and external choices. - -When an XLSForm with external choices is converted to an XForm, two files will be produced, the **XForm** (e.g., form-filename.xml) with all the normal choices and an **itemsets.csv** with the external choices. - -The **itemsets.csv** file can be uploaded to any ODK-compatible server (e.g., ODK Aggregate) as a media file. It will be downloaded to any ODK-compatible (e.g., ODK Collect) like any other media file and saved to the [form-filename]-media folder. Clients like ODK Collect load media files from the SD card and so your form with a large number of choices will now load very quickly. - ## Default Adding a default field means that a question will be pre-populated with an answer when the user first sees the question. This can help save time if the answer is one that is commonly selected or it can serve to show the user what type of answer choice is expected. See the example below. @@ -1217,6 +1127,153 @@ As with the above **survey** columns, the **attribute::** setting can be combine * [Community Health Toolkit](https://communityhealthtoolkit.org) * [CyberTracker](https://cybertrackerwiki.org/xlsform) +## Appendix - loading big CSVs + +{% include alerts/warning.html content="This section describes less general alternatives to [select_one_from_file](#multiple-choice-from-file) and the [`instance` function](#look-up-values-in-choice-lists-or-attached-files). They may be more performant in some tools that allow filling XLSForms but may not work in others." %} + +### Data preloading + +Pre-loading data is done when one wants to reference pre-existing data in a survey form. You can reference data in your survey form (the survey you are now authoring), from pre-existing data in a specific survey form or from any other source. For example, if you have pre-existing data from a household survey and you want to collect follow-up data about the household occupants. You can reference the household survey data in your survey form. + +To reference pre-existing data in a survey form: + + * Upload one or more .csv files as support files when you upload your form definition (the same way you upload media support files as explained in the [Media](#media) section). The first row of each .csv file should be a header that includes short: + * unique names for each column + * subsequent rows which should contain the data itself + +Each csv file should contain at least one column that can be used to uniquely identify each row. Such columns will be used, at survey time, to look up which row's data to pull into the survey. For the columns that will be used for looking up rows add **_key** to the end of the column name in the first row. Any columns with names ending in **_key** will be indexed for faster look-ups on your survey devices. See below an example of the columns on a .csv file: + +| name_key | name | +| -------- | ------ | +| mango | Mango | +| orange | Orange | + +#### How to pull data from CSV + +{% include alerts/warning.html content="If you use [select_one_from_file](#multiple-choice-from-file) to show select options from a file, you should generally use the [`instance` function](#look-up-values-in-choice-lists-or-attached-files) to look values up in that file rather than `pulldata`." %} + +You can be able to pull data from .csv file by including one or more .csv files in your form during the survey time. +For each data field that you want to pull into your survey: + + * Add a **calculate field** to your survey. + * Give that field a **name** + * Then in its **calculation** column, call the **pulldata()** function, indicating which field to pull from which row of which .csv file. + +See below for an example: + +| type | name | label | calculation | +| --------- | ---------- | ----------------------------------- | ----------------------------------------------- | +| calculate | fruit | | pulldata('fruits', 'name', 'name_key', 'mango') | +| note | note_fruit | The fruit ${fruit} pulled from csv. | | +| ========= | ========== | =================================== | =============================================== | +| survey | | | | + +Once you have loaded .csv data into a survey field using the **pulldata()** function, you can reference that field in later relevance conditions, constraints, and labels, just as you would reference any other field that was filled in by the user. + +Click on the link to see an example of a [pre-loading sample form ](https://docs.google.com/spreadsheets/d/1evieF8RW8CMlhbhksgfikXAYvK6uXh3DS5c50ejTSEw/edit?usp=sharing) and the .csv file used with form can be found [here](https://docs.google.com/spreadsheets/d/1gprb7ocTYlT_seOBFY5CuoxyodcXwWOuVxmp38OX1dE/edit?usp=sharing) + +**Important notes on usage of pre-loaded data** + + * Save .csv file in **UTF-8 format** if pre-loaded data contains non-English fonts or special characters this enables your Android device to render the text correctly. + * Data fields pulled from a .csv file are considered to be text strings therefore use the **int()** or **number()** functions to convert a pre-loaded field into numeric form. + * If the .csv file contains sensitive data that you may not want to upload to the server, upload a blank .csv file as part of your form, then replace it with the real .csv file by hand-copying the file onto each of your devices. + +#### Dynamic selects from pre-loaded data + +{% include alerts/warning.html content="Use [select_one_from_file](#multiple-choice-from-file) unless you need to use more than 50 thousand options, or will be collecting data on old or low performance devices. This approach is not supported by Enketo web forms." %} + +Once your form has one or more pre-loaded .csv files, you can dynamically pull the choice lists for **select_one** and **select_multiple** fields from those .csv files. Multiple-choice fields with dynamic choice lists follow the same general syntax as regular, static select_one and select_multiple fields as previously covered in the [Multiple choice questions](#multiple-choice) section. + +The following should be done: + +* specify **select_one listname** or **select_multiple listname** in the type column (where **listname** is the name of your choice list) +* specify any special **appearance styles** in the appearance column +* include one or more rows for your listname on the choices worksheet. + +Below is an example of the **survey worksheet**: + +| type | name | label | appearance | +| ----------------- | ------ | -------------- | ---------------- | +| select_one fruits | fruits | Select a fruit | search('fruits') | +| ================= | ====== | ============== | ================ | +| survey | | | | + +There are three differences when the choice list should be pulled from one of your pre-loaded .csv files: + +* In the appearance column: + * Include a **search() expression** that specifies which .csv rows to include in the choice list. + * If the field should use a non-default appearance style. The non-default appearance style goes into the column first, followed by a **space**, then the **search() expression**. [e.g., **quick search()**] +* On the **choices worksheet**: + * a row should indicate which .csv columns to use for the label and selected value. As follows: + * **list_name** column: specify the name of your choice list as you normally would. + * **name** column: include the name of the .csv column to use for uniquely identifying selected choices. + * **label** column: include the name of the .csv column to use for labeling the choices. +
+**Note**: +
+If you wish to include multiple columns in the labels, include a comma-separated list of all columns to include. The name column will be dynamically populated based on the column name you put there, and the label column will be dynamically populated based on the column name(s) you put there. +* In your choices worksheet row, you may also include a .csv column name in the image column. If you do, the image filename to use will be pulled from the specified .csv column. +
+**Note**: +
+If you refer to image files in this way, you must always upload those image files as media file attachments when you upload your form to the server. +
+See below an example of the choices worksheet: +
+ +| list name | name | label | +| ----------------- | -------- | -------------- | +| fruits | name_key | name | +| ================= | ====== | ============== | +| choices | | | + +Click on the link to see an example of a [search-and-select sample form](https://docs.google.com/spreadsheets/d/1Y0vW0cjl1nbkZczXRmcTC71Pso8dRbouPSYWGBdvBWU/edit?usp=sharing) and the .csv file used with form can be found [here](https://docs.google.com/spreadsheets/d/1gprb7ocTYlT_seOBFY5CuoxyodcXwWOuVxmp38OX1dE/edit?usp=sharing). +
+ +For the **search() expression**, there are a series of options to indicate which .csv rows to include in the choice list: + + 1. **search(csvName)**: The single-parameter search expression includes all distinct rows as choices (e.g., "search('hhplotdata')"). + All rows in the specified .csv file will be considered as choices, but only distinct rows -- those with unique selection values -- + will be listed for the user. In other words, duplicates will be automatically filtered from the list shown to users. + + 2. **search(csvName, "contains", columnsToSearch, searchText)**: This search expression includes all distinct rows that contain the + specified text in the specified column(s) (e.g., "search('hhplotdata', 'contains', 'respondentname', ${nametofind})"). The third + parameter specifies either a single column name to search, or a comma-separated list of column names to search. Rows with matches + in any specified column will be included. + + 3. **search(csvName, "startswith", columnsToSearch, searchText)**: This search expression includes all distinct rows that start with + the specified text in the specified column(s) (e.g., "search('hhplotdata', 'startswith', 'respondentname', ${nameprefix})"). The third + parameter specifies either a single column name to search, or a comma-separated list of column names to search. Rows with matches in + any specified column will be included. + + 4. **search(csvName, "endswith", columnsToSearch, searchText)**: This search expression includes all distinct rows that end with the + specified text in the specified column(s) (e.g., "search('hhplotdata', 'endswith', 'respondentname', ${namesuffix})"). The third + parameter specifies either a single column name to search, or a comma-separated list of column names to search. Rows with matches + in any specified column will be included. + + 5. **search(csvName, "matches", columnsToSearch, searchText)**: This search expression includes all distinct rows that exactly contain the specified text in the specified column(s) (e.g., "search('hhplotdata', 'matches', 'respondentname', ${nametofind})"). The third parameter specifies either a single column name to search, or a comma-separated list of column names to search. Rows with exact matches in any specified column will be included. + + 6. **search(csvName, searchType, columnsToSearch, searchText, columnToFilter, filterText)**: Finally, any of the four search types above can be further filtered to only include a subset of .csv data. Simply add two extra parameters to any of the search types above, with the first extra parameter being the column name to filter and the second extra parameter being the exact value to filter. For whichever search is specified in the first four parameters, only rows exactly containing the sixth parameter value in the column named by the fifth parameter will be included (e.g., "search('hhplotdata', 'contains', 'respondentname', ${nametofind}, 'villageid', ${villageid})" to list all matching names within a particular village). + +Additional notes on usage: + +1. Choices will be ordered, by default, in the order that they appear in your .csv file. If you want to specify a different ordering, include a numeric column in your .csv file named sortby; choices will be ordered numerically, according to the sortby column (if present). +2. You can include one or more static choice options, in addition to the dynamic ones loaded from your .csv file. Simply include static choices, as you normally would, on the choices worksheet. These can appear before and/or after the row that indicates the columns to use for your dynamic choices. The one restriction is that the values you specify for your static choices in the name column must be numeric. + +### Database-backed "fast external itemsets" + +{% include alerts/warning.html content="We generally recommend using [select_one_from_file](multiple-choice-from-file) unless you need to use more than 50k rows or very old devices. This approach is not supported by Enketo web forms." %} + +Enabling external selects is straightforward. + - Instead of **select_one** for the prompt type, use **select_one_external**. + - Instead of the **choices** sheet, put external choices in the **external_choices** sheet. + +See [select_one_external](https://docs.google.com/spreadsheets/d/12qZL34kuHSZGWDv0BBJ1qf7dSmml-d2VnMWH0Vtg-O4/edit?usp=sharing) form for an example that uses normal and external choices. + +When an XLSForm with external choices is converted to an XForm, two files will be produced, the **XForm** (e.g., form-filename.xml) with all the normal choices and an **itemsets.csv** with the external choices. + +The **itemsets.csv** file can be uploaded to any ODK-compatible server (e.g., ODK Aggregate) as a media file. It will be downloaded to any ODK-compatible (e.g., ODK Collect) like any other media file and saved to the [form-filename]-media folder. Clients like ODK Collect load media files from the SD card and so your form with a large number of choices will now load very quickly. + ## More resources If you want to dig in deeper to understand XForms and go beyond the XLSForms information on this site, here are some resources: @@ -1234,4 +1291,4 @@ If you want to contribute to or improve this documentation, please visit our [pr ## History -The XLSForm was originally developed by Andrew Marder and Alex Dorey of the [Sustainable Engineering Lab at Columbia University](http://sel.columbia.edu). As XLSForms became adopted by the ODK Community, SEL worked with the ODK Team to develop the current specification. [PyXForm](https://github.com/XLSForm/pyxform), the library used to convert XLSForms to XForms, is an open-source project supported by members of ODK, SEL, Ona, SurveyCTO, and KoBoToolbox. +XLSForm was originally developed by Andrew Marder and Alex Dorey of the [Sustainable Engineering Lab at Columbia University](http://sel.columbia.edu). As XLSForms became adopted by the ODK Community, SEL worked with the ODK Team to develop the current specification. [PyXForm](https://github.com/XLSForm/pyxform), the library used to convert XLSForms to XForms, is an open-source project supported by members of ODK, Ona, SurveyCTO, and KoBoToolbox.