-
Notifications
You must be signed in to change notification settings - Fork 1
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
language codes #32
Comments
New enum is here: http://api.gbif-dev.org/v1/enumeration/basic/TranslationLanguage Java code here: https://github.com/gbif/gbif-api/blob/master/src/main/java/org/gbif/api/vocabulary/TranslationLanguage.java It's deployed in DEV |
replaced Language enum with TranslationLanguage
What's this for? (Meaning the API change more than the vocabulary using it.) We have language codes for interpreting the language of a vernacular name, which will include minority and dead languages. I don't know if that's the same thing as the languages we translate the portal / registry into. |
@MortenHofft requested it to differentiate between language variants like the Chinese ones (our current Language enum doesn't support that). The vocabularies will also be used to for example populate dropdowns in the UI and the UI uses these variants. The only reason I put it in gbif-api is for consistency for front-end developers to have this enum in the same endpoint as the others (http://api.gbif-dev.org/v1/enumeration/basic/TranslationLanguage). Should I move it? |
I'm not sure if we should add a second language vocabulary to the v1 API. We already have one, and should consider how it might be extended. It seems a bit arbitrary to choose Crowdin's list of supported languages. There's a mixture of two and three letter codes, a few without countries, and stuff like Upside Down English and "Quenya" which is Lord of the Rings Elvish. We'll support these APIs and vocabularies/enumerations for years, so it's worth spending the time to get it right. @timrobertson100, @mdoering, what do you think? |
Yes, I feel similar. There is a prominent open issue in the GBIF API for some time about extending the existing but limited language enumeration: gbif/gbif-api#29 For CoL we have the need to support a wide array of languages for vernacular names. We decided to drop the GBIF language enum and instead go with a large list of 3 letter iso codes (>8000) taken from https://iso639-3.sil.org/code_tables/download_tables. These do not fit into an enum anymore. This does not solve the problem with simplified and traditional chinese though. These are seen as the same language but using different scripts. So you need a locale to distinguish them. |
I think it wouldn't be so easy to extend the current One solution could be to rename this new enum to
and this |
Since we can't fix If we did this, in deprecating we should state that 3-letter ISO codes look like a repeat of previous mistakes. The |
What were those mistakes? Or, what are our requirements?
If, alternatively, this is only for a small number of languages we choose to support, then an enum with 8-10 values seems reasonable. |
Summarizing and if I understood correctly, looks like the So I think we can do this next:
Please comment if I've missed something or you disagree with something. |
This is the kind of thing I had in mind: public class LanguageCode {
private final String code2, code3, englishName;
- Always use three-letter codes to serialize
public static fromString(String code) ...
- Validate and cache based on the ISO list Markus posted Then we need a "Locale", except since there's public class LanguageRegion {
private final LanguageCode languageCode;
private final Optional<Country> region;
- Serialize using the IETF form, i.e. prefer the two-letter language code if it exists.
- It's possible to create "es_JP" or whatever, I don't think deciding what's valid is this class's issue.
private static final EN = ... // if it's useful to have these in code
private static final ES = ...
... fromString(String code)
... fromString(language, region)
} The current |
Ok so I understand you mean to load all languages at startup form the file Markus posted (as they do in CoL)? And for the Also, since they are not enums anymore we'd also have to do some changes in the http://api.gbif.org/v1/enumeration endpoint to accommodate these classes (probably agreed with the front-end developers). As a long-term solution it looks good but requires some time, specially to come up with the file with all the possible combinations. Since this is blocking the vocabulary from starting the import and curation of vocabularies, I suggest that we move this issue to the gbif-api and I move the Does this make sense to you? |
Yes, that's fine for the moment. It gives more time to consider how the API should handle languages. |
I moved the enum and renamed it: https://github.com/gbif/vocabulary/blob/master/model/src/main/java/org/gbif/vocabulary/model/enums/LanguageRegion.java Also created endpoint in the vocabulary to retrieve the values: http://api.gbif-dev.org/v1/vocabularyLanguage It's only deployed in DEV for now. Changes can be done due to UI needs or if more language cleaning is required. |
I close this and this discussion can be continued in gbif/gbif-api#51 |
We are currently using 3 letter language codes. That is not enough to describe all the languages we would like to support/describe. An example is zh-TW Chinese traditional/taiwanese. We already have the website translated into traditional Chinese - we do not want to loose this option.
So we need a new enumeration for languages (existing is here).
It seems natural to look to Crowdin as they make a living from translations.
The text was updated successfully, but these errors were encountered: