Convert char to int in C and C++


How do I convert a char to an int in C and C++?

2/17/2011 2:11:04 PM

Well, in ASCII code, the numbers (digits) start from 48. All you need to do is:

int x = (int)character - 48;

Or, since the character '0' has the ASCII code of 48, you can just write:

int x = character - '0';  // The (int) cast is not necessary.

C and C++ always promote types to at least int. Furthermore character literals are of type int in C and char in C++.

You can convert a char type simply by assigning to an int.

char c = 'a'; // narrowing on C
int a = c;

char is just a 1 byte integer. There is nothing magic with the char type! Just as you can assign a short to an int, or an int to a long, you can assign a char to an int.

Yes, the name of the primitive data type happens to be "char", which insinuates that it should only contain characters. But in reality, "char" is just a poor name choise to confuse everyone who tries to learn the language. A better name for it is int8_t, and you can use that name instead, if your compiler follows the latest C standard.

Though of course you should use the char type when doing string handling, because the index of the classic ASCII table fits in 1 byte. You could however do string handling with regular ints as well, although there is no practical reason in the real world why you would ever want to do that. For example, the following code will work perfectly:

  int str[] = {'h', 'e', 'l', 'l', 'o', '\0' };

  for(i=0; i<6; i++)
    printf("%c", str[i]);

You have to realize that characters and strings are just numbers, like everything else in the computer. When you write 'a' in the source code, it is pre-processed into the number 97, which is an integer constant.

So if you write an expression like

char ch = '5';
ch = ch - '0';

this is actually equivalent to

char ch = (int)53;
ch = ch - (int)48;

which is then going through the C language integer promotions

ch = (int)ch - (int)48;

and then truncated to a char to fit the result type

ch = (char)( (int)ch - (int)48 );

There's a lot of subtle things like this going on between the lines, where char is implicitly treated as an int.


(This answer addresses the C++ side of things, but the sign extension problem exists in C too.)

Handling all three char types (signed, unsigned, and char) is more delicate than it first appears. Values in the range 0 to SCHAR_MAX (which is 127 for an 8-bit char) are easy:

char c = somevalue;
signed char sc = c;
unsigned char uc = c;
int n = c;

But, when somevalue is outside of that range, only going through unsigned char gives you consistent results for the "same" char values in all three types:

char c = somevalue;
signed char sc = c;
unsigned char uc = c;
// Might not be true: int(c) == int(sc) and int(c) == int(uc).
int nc = (unsigned char)c;
int nsc = (unsigned char)sc;
int nuc = (unsigned char)uc;
// Always true: nc == nsc and nc == nuc.

This is important when using functions from ctype.h, such as isupper or toupper, because of sign extension:

char c = negative_char;  // Assuming CHAR_MIN < 0.
int n = c;
bool b = isupper(n);  // Undefined behavior.

Note the conversion through int is implicit; this has the same UB:

char c = negative_char;
bool b = isupper(c);

To fix this, go through unsigned char, which is easily done by wrapping ctype.h functions through safe_ctype:

template<int (&F)(int)>
int safe_ctype(unsigned char c) { return F(c); }

char c = CHAR_MIN;
bool b = safe_ctype<isupper>(c);  // No UB.

std::string s = "value that may contain negative chars; e.g. user input";
std::transform(s.begin(), s.end(), s.begin(), &safe_ctype<toupper>);
// Must wrap toupper to eliminate UB in this case, you can't cast
// to unsigned char because the function is called inside transform.

This works because any function taking any of the three char types can also take the other two char types. It leads to two functions which can handle any of the types:

int ord(char c) { return (unsigned char)c; }
char chr(int n) {
  assert(0 <= n);  // Or other error-/sanity-checking.
  assert(n <= UCHAR_MAX);
  return (unsigned char)n;

// Ord and chr are named to match similar functions in other languages
// and libraries.

ord(c) always gives you a non-negative value – even when passed a negative char or negative signed char – and chr takes any value ord produces and gives back the exact same char.

In practice, I would probably just cast through unsigned char instead of using these, but they do succinctly wrap the cast, provide a convenient place to add error checking for int-to-char, and would be shorter and more clear when you need to use them several times in close proximity.


Use static_cast<int>:

int num = static_cast<int>(letter); // if letter='a', num=97

Edit: You probably should try to avoid to use (int)

int num = (int) letter;

check out Why use static_cast<int>(x) instead of (int)x? for more info.


Licensed under: CC-BY-SA with attribution
Not affiliated with: Stack Overflow
Email: [email protected]