Floating-Point Addition
1.1112×2-1 + 1.0112×2-3 = 1.0012×20
The following steps show how to add the above numbers in scientific notation.
For simplicity, we assume 4 bits of precision (or 3 bits of fraction).
- Step 1. Making Exponents Equal
-
We cannot add significands because their exponents are not equal.
In order to make exponents equal, shift the significand of the lesser exponent right until its exponent matches the larger number:
1.0112×2-3 = 0.10112×2-2 = 0.010112×2-1
- Step 2. Adding the Significands
-
Add the significands as the right.
The result of addition is as follows:
1.1112×2-1 + 0.010112×2-1 = 10.001112×2-1
|
|
1.111
+ 0.01011
———————————
10.00111
|
|
- Step 3. Normalizing the Sum
-
The result
10.001112×2-1
needs to be normalized as follows:
10.001112×2-1 = 1.0001112×20
Shifting right by 1 bit has to be followed by incrementing the exponent.
- Step 4. Rounding the Significand
- Round the significand to fit in appropriate number of bits.
We assumed 4 bits of precision or 3 bits of fraction.
Round the significand to nearest digit:
1.0001112×20 ≈ 1.0012×20
- Step 5. Checking for Overflow or Underflow
- Check whether exponent becomes too large (overflow) or too small (underflow).
“Prediction is very difficult, especially about the future.”
― Niels Bohr
|