Skip to content

Commit

Permalink
ICU-22489 Clarify the default setting of Collator
Browse files Browse the repository at this point in the history
See #2595
  • Loading branch information
FrankYFTang committed Sep 14, 2023
1 parent 9fb9bd4 commit 9e9bc36
Show file tree
Hide file tree
Showing 2 changed files with 48 additions and 29 deletions.
44 changes: 27 additions & 17 deletions icu4c/source/i18n/unicode/ucol.h
Original file line number Diff line number Diff line change
Expand Up @@ -251,42 +251,52 @@ typedef enum {
*/
UCOL_FRENCH_COLLATION,
/** Attribute for handling variable elements.
* Acceptable values are UCOL_NON_IGNORABLE (default)
* which treats all the codepoints with non-ignorable
* Acceptable values are UCOL_NON_IGNORABLE
* which treats all the codepoints with non-ignorable
* primary weights in the same way,
* and UCOL_SHIFTED which causes codepoints with primary
* and UCOL_SHIFTED which causes codepoints with primary
* weights that are equal or below the variable top value
* to be ignored on primary level and moved to the quaternary
* level.
* to be ignored on primary level and moved to the quaternary
* level. The default setting in a Collator object depends on the
* locale data loaded from the resources. For most locales, the
* default is UCOL_NON_IGNORABLE, but for others, such as "th",
* the default could be UCOL_SHIFTED.
* @stable ICU 2.0
*/
UCOL_ALTERNATE_HANDLING,
UCOL_ALTERNATE_HANDLING,
/** Controls the ordering of upper and lower case letters.
* Acceptable values are UCOL_OFF (default), which orders
* Acceptable values are UCOL_OFF, which orders
* upper and lower case letters in accordance to their tertiary
* weights, UCOL_UPPER_FIRST which forces upper case letters to
* sort before lower case letters, and UCOL_LOWER_FIRST which does
* the opposite.
* weights, UCOL_UPPER_FIRST which forces upper case letters to
* sort before lower case letters, and UCOL_LOWER_FIRST which does
* the opposite. The default setting in a Collator object depends on the
* locale data loaded from the resources. For most locales, the
* default is UCOL_OFF, but for others, such as "da" or "mt",
* the default could be UCOL_UPPER.
* @stable ICU 2.0
*/
UCOL_CASE_FIRST,
UCOL_CASE_FIRST,
/** Controls whether an extra case level (positioned before the third
* level) is generated or not. Acceptable values are UCOL_OFF (default),
* level) is generated or not. Acceptable values are UCOL_OFF,
* when case level is not generated, and UCOL_ON which causes the case
* level to be generated. Contents of the case level are affected by
* the value of UCOL_CASE_FIRST attribute. A simple way to ignore
* the value of UCOL_CASE_FIRST attribute. A simple way to ignore
* accent differences in a string is to set the strength to UCOL_PRIMARY
* and enable case level.
* and enable case level. The default setting in a Collator object depends
* on the locale data loaded from the resources.
* @stable ICU 2.0
*/
UCOL_CASE_LEVEL,
/** Controls whether the normalization check and necessary normalizations
* are performed. When set to UCOL_OFF (default) no normalization check
* is performed. The correctness of the result is guaranteed only if the
* are performed. When set to UCOL_OFF no normalization check
* is performed. The correctness of the result is guaranteed only if the
* input data is in so-called FCD form (see users manual for more info).
* When set to UCOL_ON, an incremental check is performed to see whether
* the input data is in the FCD form. If the data is not in the FCD form,
* incremental NFD normalization is performed.
* incremental NFD normalization is performed. The default setting in a
* Collator object depends on the locale data loaded from the resources.
* For many locales, the default is UCOL_OFF, but for others, such as "hi"
* "vi', or "bn", * the default could be UCOL_ON.
* @stable ICU 2.0
*/
UCOL_NORMALIZATION_MODE,
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -404,9 +404,11 @@ public void setHiraganaQuaternaryDefault() {
}

/**
* Sets whether uppercase characters sort before lowercase characters or vice versa, in strength TERTIARY. The
* default mode is false, and so lowercase characters sort before uppercase characters. If true, sort upper case
* characters first.
* Sets whether uppercase characters sort before lowercase characters or vice versa, in strength TERTIARY.
* If false, lowercase characters sort before uppercase characters. If true, sort upper case
* characters first. The default setting in a Collator object depends on the
* locale data loaded from the resources. For most locales, the default is false,
* but for others, such as "da" or "mt", the default could be true.
*
* @param upperfirst
* true to sort uppercase characters before lowercase characters, false to sort lowercase characters
Expand All @@ -426,9 +428,11 @@ public void setUpperCaseFirst(boolean upperfirst) {
}

/**
* Sets the orders of lower cased characters to sort before upper cased characters, in strength TERTIARY. The
* default mode is false. If true is set, the RuleBasedCollator will sort lower cased characters before the upper
* Sets the orders of lower cased characters to sort before upper cased characters, in strength TERTIARY.
* If true is set, the RuleBasedCollator will sort lower cased characters before the upper
* cased ones. Otherwise, if false is set, the RuleBasedCollator will ignore case preferences.
* The default default setting in a Collator object depends on the locale data loaded from
* the resources.
*
* @param lowerfirst
* true for sorting lower cased characters before upper cased characters, false to ignore case
Expand Down Expand Up @@ -568,10 +572,11 @@ public void setNumericCollationDefault() {
}

/**
* Sets the mode for the direction of SECONDARY weights to be used in French collation. The default value is false,
* Sets the mode for the direction of SECONDARY weights to be used in French collation. If set to false,
* which treats SECONDARY weights in the order they appear. If set to true, the SECONDARY weights will be sorted
* backwards. See the section on <a href="https://unicode-org.github.io/icu/userguide/collation/architecture">
* French collation</a> for more information.
* French collation</a> for more information. The default setting in a Collator object depends on the
* locale data loaded from the resources. For example, for "fr_CA" locale, the default is true.
*
* @param flag
* true to set the French collation on, false to set it off
Expand All @@ -590,11 +595,14 @@ public void setFrenchCollation(boolean flag) {
/**
* Sets the alternate handling for QUATERNARY strength to be either shifted or non-ignorable. See the UCA definition
* on <a href="https://www.unicode.org/reports/tr10/#Variable_Weighting">Variable Weighting</a>. This
* attribute will only be effective when QUATERNARY strength is set. The default value for this mode is false,
* corresponding to the NON_IGNORABLE mode in UCA. In the NON_IGNORABLE mode, the RuleBasedCollator treats all
* attribute will only be effective when QUATERNARY strength is set. If the mode is set to
* false, it corresponds to the NON_IGNORABLE mode in UCA. In the NON_IGNORABLE mode, the RuleBasedCollator treats all
* the code points with non-ignorable primary weights in the same way. If the mode is set to true, the behavior
* corresponds to SHIFTED defined in UCA, this causes code points with PRIMARY orders that are equal or below the
* variable top value to be ignored in PRIMARY order and moved to the QUATERNARY order.
* The default setting in a Collator object depends on the locale data loaded from the
* resources. For most locales, the default is false, but for others, such as "th",
* the default could be true.
*
* @param shifted
* true if SHIFTED behavior for alternate handling is desired, false for the NON_IGNORABLE behavior.
Expand All @@ -614,10 +622,11 @@ public void setAlternateHandlingShifted(boolean shifted) {
* <p>
* When case level is set to true, an additional weight is formed between the SECONDARY and TERTIARY weight, known
* as the case level. The case level is used to distinguish large and small Japanese Kana characters. Case level
* could also be used in other situations. For example to distinguish certain Pinyin characters. The default value
* is false, which means the case level is not generated. The contents of the case level are affected by the case
* could also be used in other situations. For example to distinguish certain Pinyin characters. If the value
* is false, it means the case level is not generated. The contents of the case level are affected by the case
* first mode. A simple way to ignore accent differences in a string is to set the strength to PRIMARY and enable
* case level.
* case level. The default setting in a Collator object depends
* on the locale data loaded from the resources.
* <p>
* See the section on <a href="https://unicode-org.github.io/icu/userguide/collation/architecture">case
* level</a> for more information.
Expand Down

0 comments on commit 9e9bc36

Please sign in to comment.