C/C++ standard string conversion functions don't support binary string literals?

1 day ago 1
ARTICLE AD BOX

Are there any standard C or C++ functions that will auto-detect a binary string, for example "0b1010" and convert it to integer 10?

Just trying to confirm what feels like a bit of an oversight in the C & C++ standards. Recent versions of both support binary integer literals (0b) but the various string conversion functions don't when auto-detecting the base. Although those aren't technically related, the asymmetry feels like an oversight to me.

Since C++14 and C23, both C and C++ support the following common integer literal[1] formats:

Base Prefix Example Standard
Decimal (base-10) no prefix 123
Octal (base-8) 0 0755
Hexadecimal (base-16) 0x or 0X 0x7F
Binary (base-2) 0b or 0B 0b1010 Since C23 and C++14

Therefore in your code you can write something like int x = 0xAB; or int y = 0b11;.

Now, let's say we want to parse several strings into numbers, for example "123", "0755", "0x7F", and "0b1010". Lucky for us, C and C++ provide a host of options such as:

The strtol family[2] since (at least) C99 and C++98 (or C++11, for the ll variants) The std::stol family[2] since C++11 std::from_chars since at least C++17

Although std::from_chars requires you to specify a base greater than 2, both the strtol and stol family of functions allow for a base 0 and let the functions auto-detect the base. Here's the description from strtol's C documentation, highlighted for emphasis:

strtol

long strtol( const char* restrict str, char** restrict str_end, int base ); (since C99)

Interprets an integer value in a byte string pointed to by str.

Discards any whitespace characters (as identified by calling isspace) until the first non-whitespace character is found, then takes as many characters as possible to form a valid base-n (where n=base) integer number representation and converts them to an integer value. The valid integer value consists of the following parts:

(optional) plus or minus sign (optional) prefix (0) indicating octal base (applies only when the base is 8 or ​0​) (optional) prefix (0x or 0X) indicating hexadecimal base (applies only when the base is 16 or ​0​) a sequence of digits

The set of valid values for base is {0, 2, 3, ..., 36}. The set of valid digits for base-2 integers is {0, 1}, for base-3 integers is {0, 1, 2}, and so on. For bases larger than 10, valid digits include alphabetic characters, starting from Aa for base-11 integer, to Zz for base-36 integer. The case of the characters is ignored.

Additional numeric formats may be accepted by the currently installed C locale.

If the value of base is ​0​, the numeric base is auto-detected: if the prefix is 0, the base is octal, if the prefix is 0x or 0X, the base is hexadecimal, otherwise the base is decimal.

The reference page for std::stol says that it calls strtol internally, so it is effectively identical in behavior.

Both the entire strtol and std::stol families of functions were added to the C & C++ standards by C99 and C++14, respectively, and prior to C23's and C++17's inclusion of binary integer literals.

Also, C++ now has std::format since C++20, and the standard format specification supports the b binary format, for example {:b}. ~~Interestingly, printf does not appear to support a comparable %b format type. (There's none listed in the reference)~~ EDIT: Nevermind, it appears that at least as of C23 printf does support %b. See §7.23.6.1 of n3220. Not sure when it was added, as I haven't checked prior C or C++ standards

This means that you can run into the following asymmetry:

#include <stdlib.h> #include <stdio.h> #include <string.h> #include <format> int main() { unsigned bin = 0b1010; // 10 unsigned oct = 0755; // 493 unsigned dec = 123; // 123 unsigned hex = 0x7F; // 127 auto text = std::format("0b{:b}, 0{:o}, {:d}, 0x{:x}", bin, oct, dec, hex); puts(text.c_str()); char str[] = "0b1010 0755 123 0x7F"; char* token = strtok(str, " "); while (token) { unsigned val = strtoul(token, nullptr, 0); printf("parsed string '%s' as integer '%d'\n", token, val); token = strtok(nullptr, " "); } return 0; }

Program Output[3] below. Notice that "0b1010" is incorrectly parsed as 0!

0b1010, 0755, 123, 0x7f parsed string '0b1010' as integer '0' parsed string '0755' as integer '493' parsed string '123' as integer '123' parsed string '0x7F' as integer '127'

I know I can write a special case for detecting "0b", but that's annoying and I wanted to see if there's any standard function in C or C++ I am missing?

[1] Technically they're "integer constants" in C and "integer literals" in C++, but that isn't important for this question.

[2] For sake of brevity, I am treating strtol, strtoll, strtoul, and strtoull as the same. Except for return type and signed/unsigned differences, they are otherwise identical. Likewise for std::stoi, std::stol, std::stoll, std::stoul, and std::stoull.

[3] https://godbolt.org/z/EETe1eYGz

Read Entire Article