Academic Integrity: tutoring, explanations, and feedback — we don’t complete graded work or submit on a student’s behalf.

The floating-point format to be used in this problem is an 8-bit IEEE 754 normal

ID: 3566374 • Letter: T

Question

The floating-point format to be used in this problem is an 8-bit IEEE 754 normalized format with 1 sign bit, 4 exponent bits, and 3 mantissa bits. It is identical to the 32-bit and 64-bit formats in terms of the meaning of fields and special encodings. The exponent field employs an excess- 7coding. The bit fields in a number are (sign, exponent, mantissa). Assume that we use unbiased rounding to the nearest even specified in the IEEE floating point standard.

(a) Encode the following numbers the 8-bit IEEE format: (1) 0.0011011binary (2) 16.0decimal

(b) Perform the computation 1.011binary + 0.0011011binary showing the correct state of the guard, round and sticky bits. There are three mantissa bits.

(c) Decode the following 8-bit IEEE number into their decimal value: 1 1010 101

(d) Decide which number in the following pairs are greater in value (the numbers are in 8-bit IEEE 754 format): (1) 0 0100 100 and 0 0100 111 (2) 0 1100 100 and 1 1100 101

(e) In the 32-bit IEEE format, what is the encoding for negative zero? (f) In the 32-bit IEEE format, what is the encoding for positive infinity?

Explanation / Answer

As you demanded an 8 bit Floating point number (1 bit --> Sign , 4 bits--> biased exponent and 3 bits --> mantissa

a) Encode the following numbers the 8-bit IEEE format: (1) 0.0011011binary (2) 16.0decimal

Ans a) (1) expressing 0.0011011 in binary in normalized scientific notation --> 1.1011x2^(-3) , (exponent and base written in decimal)

So, the encoded number is 0 0100 101

(2) 16 in decimal is 1000.0 in binary, representing the same as 1.000x2^(3)

So, the encoded number is 0 1010 000

(b) Perform the computation 1.011binary + 0.0011011binary showing the correct state of the guard, round and sticky bits. There are three mantissa bits.

Ans b) normalizing second number as 1.1011x2^(-3) but we have three mantissa bits, so leaving behind 1

let second number=n2= 1.101x2^(-3)

(c) Decode the following 8-bit IEEE number into their decimal value: 1 1010 101

Ans c) Sign=1 , biased exponent = (1010)binary=10(decimal) , mantissa= 101

(d) Decide which number in the following pairs are greater in value (the numbers are in 8-bit IEEE 754 format): (1) 0 0100 100 and 0 0100 111 (2) 0 1100 100 and 1 1100 101

Ans d) (1) decoding the numbers

(2) decoding the numbers 0 1100 100 and 1 1100 101

(e) In the 32-bit IEEE format, what is the encoding for negative zero?

With sign bit as 1 , all other field zero

1 00000000 000000...23 times....000000

(f) In the 32-bit IEEE format, what is the encoding for positive infinity?

Sign bit=0

all other field max i.e. all set as 1

0 11111111 1111111...23times...11111111111111