Slide 10.2: Floating-point representation

Slide 10.1: Floating point
Slide 10.3: Floating-point representation (cont.)
Home Print version

Floating-Point Representation

Floating-point numbers are usually a multiple of the size of a word. The representation of a MIPS floating-point number is shown below, where 1 in sign bit means negative, exponent is the value of the 8-bit exponent field (including the sign of the exponent), and fraction is the 23-bit number. The bit string represents the following number, which will be explained later:

   0.15625 = (1.0+2^-2)×2^124-127 = 1.25×2^-3

In general, floating-point numbers are of the form (-1)^S×F×2^E where F involves the value in the fraction field and E involves the value in the exponent field. Two cases may occur for floating-point arithmetic:

Overflow: A situation in which a positive exponent becomes too large to fit in the exponent field, and

Underflow: A situation in which a negative exponent becomes too large to fit in the exponent field.

One way to reduce chances of underflow or overflow is to offer another format that has a larger exponent, called double precision floating-point numbers, whereas the above format is called single precision floating point. The representation of a double precision floating-point number takes two MIPS words, as shown below, where exponent is the value of the 11-bit exponent field, and fraction is the 52-bit number in the fraction field.

◀
Previous

Slide 10.1: Floating point
Slide 10.3: Floating-point representation (cont.)
Home Print version

▶
Next

“Never forget that justice is what love looks like in public.”
― Cornel West