IEEE 754-2008 contains a half precision that is only 16 bits wide. The leftmost
ID: 3593398 • Letter: I
Question
IEEE 754-2008 contains a half precision that is only 16 bits wide. The leftmost bit is still the sign bit, the exponent is 5 bits wide and has a bias of 15, and the mantissa is 10 bits long. A hidden 1 is assumed. Write down the bit pattern to represent -1.5625 * 10-2 assuming a version of this format. Calculate the sum of 2.6125*102 and 4.150390625 * 10-1 by hand, assuming both numbers are stored in the 16-bit half precision described above. Assume 1 guard, 1 round bit, and 1 sticky bit, and round to the nearest even. Show all the steps.
Explanation / Answer
ANSWER:
-1.5625 (decimal) = - 0.000001 (binary)
-0.000001 = 1.00000 x 2^-6
.00 x 2^9
.00 x 2^1001
1 01001 0000000000
Now to the next exercise.
2.6125 x10^2 in binary is 100000101.01 = 1.0000010101 x 2^8
4.150390625 x 10^-1 in binary is 0.0110101001 = 1.10101001 x 2^-2
0 10111 0000010101
0 01101 0010101001
1.0000010101 x 2^8 +
1.1010100100 x 2^-2
=
1.0000010101 x 2^8 +
0.0000000001 x 2^8
Note however that, since we have only 10 bits for the mantissa, we have to truncate the second number, and place some of its bits in the guard bit, round bit and sticky bit:
=
1.0000010101 x 2^8 +
0.0000000001 x 2^8
-------------------------
1.0000010110 x 2^8
Now we round the resulting number to 10 bits:
1.0000010110 x 2^8
which in 16-bit format is:
0 10111 0000010110....