The floating-point format to be used in this problem is an 8-bit IEEE 754 normal

ID: 3566374 • Letter: T

Question

The floating-point format to be used in this problem is an 8-bit IEEE 754 normalized format with 1 sign bit, 4 exponent bits, and 3 mantissa bits. It is identical to the 32-bit and 64-bit formats in terms of the meaning of fields and special encodings. The exponent field employs an excess- 7coding. The bit fields in a number are (sign, exponent, mantissa). Assume that we use unbiased rounding to the nearest even specified in the IEEE floating point standard.

(a) Encode the following numbers the 8-bit IEEE format: (1) 0.0011011binary (2) 16.0decimal

(b) Perform the computation 1.011binary + 0.0011011binary showing the correct state of the guard, round and sticky bits. There are three mantissa bits.

(d) Decide which number in the following pairs are greater in value (the numbers are in 8-bit IEEE 754 format): (1) 0 0100 100 and 0 0100 111 (2) 0 1100 100 and 1 1100 101

(e) In the 32-bit IEEE format, what is the encoding for negative zero? (f) In the 32-bit IEEE format, what is the encoding for positive infinity?

Explanation / Answer

As you demanded an 8 bit Floating point number (1 bit --> Sign , 4 bits--> biased exponent and 3 bits --> mantissa

a) Encode the following numbers the 8-bit IEEE format: (1) 0.0011011binary (2) 16.0decimal

Ans a) (1) expressing 0.0011011 in binary in normalized scientific notation --> 1.1011x2^(-3) , (exponent and base written in decimal)

So, the encoded number is 0 0100 101

(2) 16 in decimal is 1000.0 in binary, representing the same as 1.000x2^(3)

So, the encoded number is 0 1010 000

(b) Perform the computation 1.011binary + 0.0011011binary showing the correct state of the guard, round and sticky bits. There are three mantissa bits.

Ans b) normalizing second number as 1.1011x2^(-3) but we have three mantissa bits, so leaving behind 1

let second number=n2= 1.101x2^(-3)

(c) Decode the following 8-bit IEEE number into their decimal value: 1 1010 101

Ans c) Sign=1 , biased exponent = (1010)binary=10(decimal) , mantissa= 101

(d) Decide which number in the following pairs are greater in value (the numbers are in 8-bit IEEE 754 format): (1) 0 0100 100 and 0 0100 111 (2) 0 1100 100 and 1 1100 101

Ans d) (1) decoding the numbers

(2) decoding the numbers 0 1100 100 and 1 1100 101

(e) In the 32-bit IEEE format, what is the encoding for negative zero?

With sign bit as 1 , all other field zero

1 00000000 000000...23 times....000000

(f) In the 32-bit IEEE format, what is the encoding for positive infinity?

Sign bit=0

all other field max i.e. all set as 1

0 11111111 1111111...23times...11111111111111

Navigate

The floating-point format to be used in this problem is a normalized format with

The floor function returns the index of the greatest value in an array whose res

The floating-point format to be used in this problem is an 8-bit IEEE 754 normal

Question

Explanation / Answer

Related Questions

Navigate