CHARACTER SET IN C++

The character set in C++ comprises of all the keys we see on our keyboard. Character set in C++ is the collection of single characters or symbols enclosed in single quotes.

Example: ‘A’ , ‘b’ , ‘6’ , ‘#’ , etc

The character set in C++ can be divided into few sub categories:

Letters : A to Z , a to z

Digits : 0 to 9

Special Symbols : space, +, -, *, /, &, %, #, $, @, ^, !, _(underscore), <, >

White spaces : Horizontal tab (à), Return, Newline

Characters are represented using their ASCII values. ASCII stands for American Standard Code for Information Interchange. It has assigned a unique integer value for every character.

For example : 32 for space, 48-57 for 0 to 9, 65-90 for ‘A’ to ‘Z’, 97-122 for ‘a’ to ‘z’, etc.

A character is a 1 byte entity, i.e. it gets 8 bits of space for storage in the memory. As we know in 2’s compliment form, the range of 8 bits space is -128 to 127, there are a total of 256 characters defined in C++.

The following tables give the ASCII codes and the corresponding characters:

As shown in the table, there are a total of 255 characters. But these ASCII values are not the integer values stored in memory. As char is a 1 byte variable, the maximum number it can store is 127. ASCII above 127 are stored as negative numbers in range -128 to -1. See the further topics on this post for details.

How does compiler differentiate between a character value and an integer value ?

As mentioned earlier, the characters in C++ are represented using integers. Then how does compiler understand when to use an integer as character or as an ordinary integer? Well the answer lies in the question itself. The Compiler is the brain behind all these storing and conversions. Whenever it encounters a character value, it marks the memory segment (1 byte) and then converts the character’s ASCII into binary form. Whenever this segment of memory is called, the compiler knows that the value is a character and not an integer.

Another thing to note is that a character value takes only 1 byte of space in the memory while integer value takes 4 bytes (2 bytes in case of older compilers).

What happens if a value greater than 127 is stored in a char variable?

Storing a value which is out of the range of a variable is a common mistake that programmers commit. The way the compiler reads such values is not very tough to understand. The compiler simply converts the value to its binary form and takes the first 8 bits from right as the legal value of the variable. Henceforth, that value is used as the value of that variable for the rest of the program.

Let us understand it with the help of an example :

Consider the following code (C++):

int main()

{

char a = 128,b = 257;

cout<<a<<“ ”<<b;

return 0;

}

Here two character variables are created 'a' & 'b'. 'a' is assigned with 128 while 'b' is assigned with 257.

For a : Binary form of number 128 = (10000000)₂

Now this is an 8 bit number but note the MSB. MSB has ‘1’ which means the compiler, while retrieving it, will consider it as a negative number (according to 2’s compliment rules). So, the value stored in 'a' is actually -128. Nevertheless, it is still in range and everything works fine.

For b: Binary form of number 257 = (100000001)₂

Now this is a 9 bit binary number but compiler allocates only 8 bits for a character type variable. So, the compiler takes the first 8 bits (starting from right) and ignores the remaining bits.

So, the number stored for 'b' is (00000001) and not (100000001). Thus, 'b' holds the character for the ASCII value 1.

Do not stop here. To learn all about C++ click here

If you have any doubts, queries or any suggestions please comment.

For daily post updates subscribe to this website.

SEARCH THIS BLOG

CHARACTER SET IN C++

Post a Comment

Popular Posts

SETW() & SETPRECISION() IN C++