Skip to content

Commit

Permalink
refactor: 💥 ascii -> portable, unicode -> utf8, 'A' -> 'P'
Browse files Browse the repository at this point in the history
  • Loading branch information
mpusz committed Oct 9, 2024
1 parent cb424a7 commit 4eb6322
Show file tree
Hide file tree
Showing 13 changed files with 158 additions and 150 deletions.
6 changes: 3 additions & 3 deletions docs/getting_started/faq.md
Original file line number Diff line number Diff line change
Expand Up @@ -164,19 +164,19 @@ code.
please let us know in the associated [GitHub Issue](https://github.com/mpusz/mp-units/issues/93).
## Why Unicode quantity symbols are used by default instead of ASCII-only characters?
## Why UTF-8 quantity symbols are used by default instead of portable characters?
Both C++ and [ISO 80000](../appendix/references.md#ISO80000) are standardized by the ISO.
[ISO 80000](../appendix/references.md#ISO80000) and the [SI](../appendix/references.md#SIBrochure)
standards specify Unicode symbols as the official unit names for some quantities
standards specify UTF-8 symbols as the official unit names for some quantities
(e.g. `Ω` symbol for the resistance quantity).
As the **mp-units** library will be proposed for standardization as a part of the C++ Standard Library
we have to obey the rules and be consistent with ISO specifications.
!!! note
We do understand engineering reality and the constraints of some environments. This is why the library
has the option of [ASCII-only Quantity Symbols](../users_guide/framework_basics/text_output.md#unit_symbol_formatting).
has the option of [Portable Quantity Symbols](../users_guide/framework_basics/text_output.md#unit_symbol_formatting).
## Why don't we have CMake options to disable the building of tests and examples?
Expand Down
8 changes: 4 additions & 4 deletions docs/users_guide/framework_basics/systems_of_units.md
Original file line number Diff line number Diff line change
Expand Up @@ -240,18 +240,18 @@ are opt-in. A user has to explicitly "import" them from a dedicated `unit_symbol
quantity q2 = 42 * km / h;
```

We also provide alternative object identifiers using Unicode characters in their names for most
unit symbols. The code using Unicode looks nicer, but it is harder to type on the keyboard.
We also provide alternative object identifiers using UTF-8 characters in their names for most
unit symbols. The code using UTF-8 looks nicer, but it is harder to type on the keyboard.
This is why we provide both versions of identifiers for such units.

=== "ASCII only"
=== "Portable"

```cpp
quantity resistance = 60 * kohm;
quantity capacitance = 100 * uF;
```

=== "With Unicode glyphs"
=== "With UTF-8 glyphs"

```cpp
quantity resistance = 60 * kΩ;
Expand Down
38 changes: 19 additions & 19 deletions docs/users_guide/framework_basics/text_output.md
Original file line number Diff line number Diff line change
Expand Up @@ -114,18 +114,18 @@ and units of derived quantities.
### `text_encoding`

[ISQ](../../appendix/glossary.md#isq) and [SI](../../appendix/glossary.md#si) standards always
specify symbols using Unicode encoding. This is why it is a default and primary target for
text output. However, in some applications or environments, a standard ASCII-like text output
specify symbols using UTF-8 encoding. This is why it is a default and primary target for
text output. However, in some applications or environments, a standard portable text output
using only the characters from the [basic literal character set](https://en.cppreference.com/w/cpp/language/charset)
can be preferred by users.

This is why the library provides an option to change the default encoding to the ASCII one with:
This is why the library provides an option to change the default encoding to the portable one with:

```cpp
enum class text_encoding : std::int8_t {
unicode, // µs; m³; L²MT⁻³
ascii, // us; m^3; L^2MT^-3
default_encoding = unicode
utf8, // µs; m³; L²MT⁻³
portable, // us; m^3; L^2MT^-3
default_encoding = utf8
};
```
Expand Down Expand Up @@ -154,7 +154,7 @@ template<dimension_symbol_formatting fmt = dimension_symbol_formatting{}, typena
For example:
```cpp
static_assert(dimension_symbol<{.encoding = text_encoding::ascii}>(isq::power.dimension) == "L^2MT^-3");
static_assert(dimension_symbol<{.encoding = text_encoding::portable}>(isq::power.dimension) == "L^2MT^-3");
```

!!! note
Expand All @@ -175,7 +175,7 @@ For example:
```cpp
std::string txt;
dimension_symbol_to(std::back_inserter(txt), isq::power.dimension, {.encoding = text_encoding::ascii});
dimension_symbol_to(std::back_inserter(txt), isq::power.dimension, {.encoding = text_encoding::portable});
std::cout << txt << "\n";
```

Expand Down Expand Up @@ -203,7 +203,7 @@ enum class unit_symbol_solidus : std::int8_t {

enum class unit_symbol_separator : std::int8_t {
space, // kg m²/s²
half_high_dot, // kg⋅m²/s² (valid only for unicode encoding)
half_high_dot, // kg⋅m²/s² (valid only for utf8 encoding)
default_separator = space
};

Expand Down Expand Up @@ -455,16 +455,16 @@ as text and, thus, are aligned to the left by default.
```ebnf
dimension-format-spec = [fill-and-align], [width], [dimension-spec];
dimension-spec = [text-encoding];
text-encoding = 'U' | 'A';
text-encoding = 'U' | 'P';
```

In the above grammar:

- `fill-and-align` and `width` tokens are defined in the [format.string.std](https://wg21.link/format.string.std)
chapter of the C++ standard specification,
- `text-encoding` token specifies the symbol text encoding:
- `U` (default) uses the **Unicode** symbols defined by [@ISO80000] (e.g., `LT⁻²`),
- `A` forces non-standard **ASCII**-only output (e.g., `LT^-2`).
- `U` (default) uses the **UTF-8** symbols defined by [@ISO80000] (e.g., `LT⁻²`),
- `P` forces non-standard **portable** output (e.g., `LT^-2`).

Dimension symbols of some quantities are specified to use Unicode signs by the
[ISQ](../../appendix/glossary.md#isq) (e.g., `Θ` symbol for the _thermodynamic temperature_
Expand All @@ -475,9 +475,9 @@ symbol can be forced to be printed using such characters thanks to `text-encodin

```cpp
std::println("{}", isq::dim_thermodynamic_temperature); // Θ
std::println("{:A}", isq::dim_thermodynamic_temperature); // O
std::println("{:P}", isq::dim_thermodynamic_temperature); // O
std::println("{}", isq::power.dimension); // L²MT⁻³
std::println("{:A}", isq::power.dimension); // L^2MT^-3
std::println("{:P}", isq::power.dimension); // L^2MT^-3
```
### Unit formatting
Expand Down Expand Up @@ -506,7 +506,7 @@ In the above grammar:
(e.g., `m s⁻¹`, `kg m⁻¹ s⁻¹`)
- `unit-symbol-separator` token specifies how multiplied unit symbols should be separated:
- 's' (default) uses **space** as a separator (e.g., `kg m²/s²`)
- 'd' uses half-high **dot** (``) as a separator (e.g., `kg⋅m²/s²`) (requires the Unicode encoding)
- 'd' uses half-high **dot** (``) as a separator (e.g., `kg⋅m²/s²`) (requires the UTF-8 encoding)
- 'L' is reserved for possible future localization use in case the C++ standard library gets access to
the ICU-like database.

Expand All @@ -525,11 +525,11 @@ In such a case, the unit symbol can be forced to be printed using such character

```cpp
std::println("{}", si::ohm); // Ω
std::println("{:A}", si::ohm); // ohm
std::println("{:P}", si::ohm); // ohm
std::println("{}", us); // µs
std::println("{:A}", us); // us
std::println("{:P}", us); // us
std::println("{}", m / s2); // m/s²
std::println("{:A}", m / s2); // m/s^2
std::println("{:P}", m / s2); // m/s^2
```
Additionally, both ISO 80000 and [SI](../../appendix/glossary.md#si) leave some freedom on how to
Expand Down Expand Up @@ -576,7 +576,7 @@ std::println("{:d}", kg * m2 / s2); // kg⋅m²/s²
!!! note
'd' requires the Unicode encoding to be set.
'd' requires the UTF-8 encoding to be set.
### Quantity formatting
Expand Down
2 changes: 1 addition & 1 deletion example/currency.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -77,7 +77,7 @@ template<Unit auto From, Unit auto To>

#else

[[nodiscard]] std::string_view to_string_view(Unit auto u) { return u.symbol.ascii().c_str(); }
[[nodiscard]] std::string_view to_string_view(Unit auto u) { return u.symbol.portable().c_str(); }

template<Unit auto From, Unit auto To>
[[nodiscard]] double exchange_rate(std::chrono::sys_seconds timestamp)
Expand Down
12 changes: 6 additions & 6 deletions src/core/include/mp-units/bits/text_tools.h
Original file line number Diff line number Diff line change
Expand Up @@ -98,19 +98,19 @@ template<std::intmax_t Value>
template<typename CharT, std::size_t N, std::size_t M, std::output_iterator<CharT> Out>
constexpr Out copy(const symbol_text<N, M>& txt, text_encoding encoding, Out out)
{
if (encoding == text_encoding::unicode) {
if (encoding == text_encoding::utf8) {
if constexpr (is_same_v<CharT, char8_t>)
return ::mp_units::detail::copy(txt.unicode().begin(), txt.unicode().end(), out);
return ::mp_units::detail::copy(txt.utf8().begin(), txt.utf8().end(), out);
else if constexpr (is_same_v<CharT, char>) {
for (const char8_t ch : txt.unicode()) *out++ = static_cast<char>(ch);
for (const char8_t ch : txt.utf8()) *out++ = static_cast<char>(ch);
return out;
} else
MP_UNITS_THROW(std::invalid_argument("Unicode text can't be copied to CharT output"));
MP_UNITS_THROW(std::invalid_argument("UTF-8 text can't be copied to CharT output"));
} else {
if constexpr (is_same_v<CharT, char>)
return ::mp_units::detail::copy(txt.ascii().begin(), txt.ascii().end(), out);
return ::mp_units::detail::copy(txt.portable().begin(), txt.portable().end(), out);
else
MP_UNITS_THROW(std::invalid_argument("ASCII text can't be copied to CharT output"));
MP_UNITS_THROW(std::invalid_argument("Portable text can't be copied to CharT output"));
}
}

Expand Down
20 changes: 11 additions & 9 deletions src/core/include/mp-units/format.h
Original file line number Diff line number Diff line change
Expand Up @@ -112,7 +112,7 @@ MP_UNITS_EXPORT_END
//
// dimension-format-spec = [fill-and-align], [width], [dimension-spec];
// dimension-spec = [text-encoding];
// text-encoding = 'U' | 'A';
// text-encoding = 'U' | 'P';
//
template<mp_units::Dimension D, typename Char>
class MP_UNITS_STD_FMT::formatter<D, Char> {
Expand All @@ -126,15 +126,16 @@ class MP_UNITS_STD_FMT::formatter<D, Char> {
auto it = begin;
if (it == end || *it == '}') return begin;

constexpr auto valid_modifiers = std::string_view{"UA"};
constexpr auto valid_modifiers = std::string_view{"UP"};
for (; it != end && *it != '}'; ++it) {
if (valid_modifiers.find(*it) == std::string_view::npos)
throw MP_UNITS_STD_FMT::format_error("invalid dimension modifier specified");
}
end = it;

if (it = mp_units::detail::at_most_one_of(begin, end, "UA"); it != end)
specs_.encoding = (*it == 'U') ? mp_units::text_encoding::unicode : mp_units::text_encoding::ascii;
if (it = mp_units::detail::at_most_one_of(begin, end, "UAP"); it != end)
// TODO 'A' stands for an old and deprecated ASCII encoding
specs_.encoding = (*it == 'U') ? mp_units::text_encoding::utf8 : mp_units::text_encoding::portable;

return end;
}
Expand Down Expand Up @@ -198,15 +199,16 @@ class MP_UNITS_STD_FMT::formatter<U, Char> {
auto it = begin;
if (it == end || *it == '}') return begin;

constexpr auto valid_modifiers = std::string_view{"UA1ansd"};
constexpr auto valid_modifiers = std::string_view{"UAP1ansd"};
for (; it != end && *it != '}'; ++it) {
if (valid_modifiers.find(*it) == std::string_view::npos)
throw MP_UNITS_STD_FMT::format_error("invalid unit modifier specified");
}
end = it;

if (it = mp_units::detail::at_most_one_of(begin, end, "UA"); it != end)
specs_.encoding = (*it == 'U') ? mp_units::text_encoding::unicode : mp_units::text_encoding::ascii;
if (it = mp_units::detail::at_most_one_of(begin, end, "UAP"); it != end)
// TODO 'A' stands for an old and deprecated ASCII encoding
specs_.encoding = (*it == 'U') ? mp_units::text_encoding::utf8 : mp_units::text_encoding::portable;
if (it = mp_units::detail::at_most_one_of(begin, end, "1an"); it != end) {
switch (*it) {
case '1':
Expand All @@ -221,8 +223,8 @@ class MP_UNITS_STD_FMT::formatter<U, Char> {
}
}
if (it = mp_units::detail::at_most_one_of(begin, end, "sd"); it != end) {
if (*it == 'd' && specs_.encoding == mp_units::text_encoding::ascii)
throw MP_UNITS_STD_FMT::format_error("half_high_dot unit separator allowed only for Unicode encoding");
if (*it == 'd' && specs_.encoding == mp_units::text_encoding::portable)
throw MP_UNITS_STD_FMT::format_error("half_high_dot unit separator allowed only for UTF-8 encoding");
specs_.separator =
(*it == 's') ? mp_units::unit_symbol_separator::space : mp_units::unit_symbol_separator::half_high_dot;
}
Expand Down
4 changes: 2 additions & 2 deletions src/core/include/mp-units/framework/magnitude.h
Original file line number Diff line number Diff line change
Expand Up @@ -310,9 +310,9 @@ template<typename CharT, std::output_iterator<CharT> Out>
constexpr Out print_separator(Out out, const unit_symbol_formatting& fmt)
{
if (fmt.separator == unit_symbol_separator::half_high_dot) {
if (fmt.encoding != text_encoding::unicode)
if (fmt.encoding != text_encoding::utf8)
MP_UNITS_THROW(
std::invalid_argument("'unit_symbol_separator::half_high_dot' can be only used with 'text_encoding::unicode'"));
std::invalid_argument("'unit_symbol_separator::half_high_dot' can be only used with 'text_encoding::utf8'"));
const std::string_view dot = "";
out = detail::copy(dot.begin(), dot.end(), out);
} else {
Expand Down
50 changes: 27 additions & 23 deletions src/core/include/mp-units/framework/symbol_text.h
Original file line number Diff line number Diff line change
Expand Up @@ -56,9 +56,11 @@ namespace mp_units {

// NOLINTNEXTLINE(readability-enum-initial-value)
MP_UNITS_EXPORT enum class text_encoding : std::int8_t {
unicode, // µs; m³; L²MT⁻³
ascii, // us; m^3; L^2MT^-3
default_encoding = unicode
utf8, // µs; m³; L²MT⁻³
unicode [[deprecated("Use `utf8` instead")]] = utf8,
portable, // us; m^3; L^2MT^-3
ascii [[deprecated("Use `portable` instead")]] = portable,
default_encoding = utf8
};

namespace detail {
Expand Down Expand Up @@ -89,81 +91,83 @@ constexpr fixed_u8string<N> to_u8string(fixed_string<N> txt)
*
* This class template is responsible for definition and handling of a symbol text
* representation. In the libary it is used to define symbols of units and prefixes.
* Each symbol can have two versions: Unicode and ASCI-only.
* Each symbol can have two versions: UTF-8 and portable.
*
* @tparam N The size of a Unicode symbol
* @tparam M The size of the ASCII-only symbol
* @tparam N The size of a UTF-8 symbol
* @tparam M The size of the portable symbol
*/
MP_UNITS_EXPORT template<std::size_t N, std::size_t M>
class symbol_text {
public:
fixed_u8string<N> unicode_;
fixed_string<M> ascii_;
fixed_u8string<N> utf8_;
fixed_string<M> portable_;

// NOLINTNEXTLINE(google-explicit-constructor, hicpp-explicit-conversions)
constexpr explicit(false) symbol_text(char ch) : unicode_(static_cast<char8_t>(ch)), ascii_(ch)
constexpr explicit(false) symbol_text(char ch) : utf8_(static_cast<char8_t>(ch)), portable_(ch)
{
MP_UNITS_EXPECTS(detail::is_basic_literal_character_set_char(ch));
}

// NOLINTNEXTLINE(*-avoid-c-arrays, google-explicit-constructor, hicpp-explicit-conversions)
consteval explicit(false) symbol_text(const char (&txt)[N + 1]) :
unicode_(detail::to_u8string(basic_fixed_string{txt})), ascii_(txt)
utf8_(detail::to_u8string(basic_fixed_string{txt})), portable_(txt)
{
MP_UNITS_EXPECTS(txt[N] == char{});
MP_UNITS_EXPECTS(detail::is_basic_literal_character_set(txt));
}

// NOLINTNEXTLINE(google-explicit-constructor, hicpp-explicit-conversions)
constexpr explicit(false) symbol_text(const fixed_string<N>& txt) : unicode_(detail::to_u8string(txt)), ascii_(txt)
constexpr explicit(false) symbol_text(const fixed_string<N>& txt) : utf8_(detail::to_u8string(txt)), portable_(txt)
{
MP_UNITS_EXPECTS(detail::is_basic_literal_character_set(txt.data_));
}

// NOLINTNEXTLINE(*-avoid-c-arrays)
consteval symbol_text(const char8_t (&u)[N + 1], const char (&a)[M + 1]) : unicode_(u), ascii_(a)
consteval symbol_text(const char8_t (&u)[N + 1], const char (&a)[M + 1]) : utf8_(u), portable_(a)
{
MP_UNITS_EXPECTS(u[N] == char8_t{});
MP_UNITS_EXPECTS(a[M] == char{});
MP_UNITS_EXPECTS(detail::is_basic_literal_character_set(a));
}

constexpr symbol_text(const fixed_u8string<N>& unicode, const fixed_string<M>& ascii) :
unicode_(unicode), ascii_(ascii)
constexpr symbol_text(const fixed_u8string<N>& utf8, const fixed_string<M>& portable) :
utf8_(utf8), portable_(portable)
{
MP_UNITS_EXPECTS(detail::is_basic_literal_character_set(ascii.data_));
MP_UNITS_EXPECTS(detail::is_basic_literal_character_set(portable.data_));
}

[[nodiscard]] constexpr const auto& unicode() const { return unicode_; }
[[nodiscard]] constexpr const auto& ascii() const { return ascii_; }
[[nodiscard]] constexpr const auto& utf8() const { return utf8_; }
[[nodiscard]] constexpr const auto& portable() const { return portable_; }
[[deprecated("Use `utf8()` instead")]] constexpr const auto& unicode() const { return utf8(); }
[[deprecated("Use `portable()` instead")]] constexpr const auto& ascii() const { return portable(); }

[[nodiscard]] constexpr bool empty() const
{
MP_UNITS_ASSERT_DEBUG(unicode().empty() == ascii().empty());
return unicode().empty();
MP_UNITS_ASSERT_DEBUG(utf8().empty() == portable().empty());
return utf8().empty();
}

template<std::size_t N2, std::size_t M2>
[[nodiscard]] constexpr friend symbol_text<N + N2, M + M2> operator+(const symbol_text& lhs,
const symbol_text<N2, M2>& rhs)
{
return symbol_text<N + N2, M + M2>(lhs.unicode() + rhs.unicode(), lhs.ascii() + rhs.ascii());
return symbol_text<N + N2, M + M2>(lhs.utf8() + rhs.utf8(), lhs.portable() + rhs.portable());
}

template<std::size_t N2, std::size_t M2>
[[nodiscard]] friend constexpr auto operator<=>(const symbol_text& lhs, const symbol_text<N2, M2>& rhs) noexcept
{
MP_UNITS_DIAGNOSTIC_PUSH
MP_UNITS_DIAGNOSTIC_IGNORE_ZERO_AS_NULLPOINTER_CONSTANT
if (const auto cmp = lhs.unicode() <=> rhs.unicode(); cmp != 0) return cmp;
if (const auto cmp = lhs.utf8() <=> rhs.utf8(); cmp != 0) return cmp;
MP_UNITS_DIAGNOSTIC_POP
return lhs.ascii() <=> rhs.ascii();
return lhs.portable() <=> rhs.portable();
}

template<std::size_t N2, std::size_t M2>
[[nodiscard]] friend constexpr bool operator==(const symbol_text& lhs, const symbol_text<N2, M2>& rhs) noexcept
{
return lhs.unicode() == rhs.unicode() && lhs.ascii() == rhs.ascii();
return lhs.utf8() == rhs.utf8() && lhs.portable() == rhs.portable();
}
};

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -48,7 +48,7 @@ enum class unit_symbol_solidus : std::int8_t {
// NOLINTNEXTLINE(readability-enum-initial-value)
enum class unit_symbol_separator : std::int8_t {
space, // kg m²/s²
half_high_dot, // kg⋅m²/s² (valid only for unicode encoding)
half_high_dot, // kg⋅m²/s² (valid only for utf8 encoding)
default_separator = space
};

Expand Down
Loading

0 comments on commit 4eb6322

Please sign in to comment.