How is floating point number stored
The way that the exponent is stored as not as straightforward as you might think. The reason for this is to make more efficient use of the 8 bits we have available for storing the exponent.
Thus, we can represent very large positive exponent or very small negative exponent numbers. Next, we have to subtract the bias. As you can see from the result above, bias is As we can see from the calculation above, our exponent is 1.
This is also called de-normalizing the number. As we said before, we have a mantissa of 1. If we write this out in scientific notation using a base of 2 , this would look like:. You can see that we are now using the 2 1 , which we figured out in step 3.
We can do this by moving the decimal over to the right by one, which will also subtract one from our exponent. Thus, we end up with:. This is the fun part. Just like in decimal we have increasing powers of ten to the left of the decimal, and decreasing powers of ten to the right of the decimal like so :. If our sign bit is 0, then our number is positive.
If the sign bit is 1, then our number is negative. A mathematical way to say the exact same thing as the previous sentence is:. Instead, when you have an exponent of all 1s, like so:.
This actually represents not-a-number NaN. Also note that the mantissa must be non-zero notice how we have at least one bit set to 1 in the mantissa. If we have an exponent filled with 1s but the mantissa is zero, then we have a representation of infinity. More specifically, we can represent positive infinity as:. But why? For normal bit floating-point values, this corresponds to values in the range from 1.
When it comes to the representation, you can see all normal floating-point numbers as a value in the range 1. In layman's terms, it's essentially scientific notation in binary. The formal standard with details is IEEE Is an example how memory is set up if compiler uses IEEE double precision which is the default for a C double on little endian systems e.
Intel x Here it is in C based binary form and better read wikipedia about double precision to understand it.
There are a number of different floating-point formats. Most of them share a few common characteristics: a sign bit, some bits dedicated to storing an exponent, and some bits dedicated to storing the significand also called the mantissa. The IEEE floating-point standard attempts to define a single format or rather set of formats of a few sizes that can be implemented on a variety of systems.
It also defines the available operations and their semantics. It's caught on quite well, and most systems you're likely to encounter probably use IEEE floating-point. But other formats are still in use, as well as not-quite-complete IEEE implementations.
I suggest you try with the Wiki version of the explanation. It's quite clear and has various examples:. The exponent represents how many shifts are to be performed on the mantissa in order to get the actual value of the number. Encoding specifies how are represented sign of mantissa and sign of exponent basically whether shifting to the left or to the right.
Stack Overflow for Teams — Collaborate and share knowledge with a private group. Create a free Team What is Teams? Collectives on Stack Overflow. Learn more. How are floating point numbers stored in memory?
Ask Question. Asked 10 years, 1 month ago. How does this encoding work? Significand The remaining bits used for the significand AKA mantissa. We are done with basics. Let's understand practically So, we consider very famous float value 3. Sign : Zero here, as PI is positive! Exponent calculation 3 is easy: in binary The rest, 0. So, 0. If you don't know how to convert decimal no in binary then refer this float to binary. Add 3 , The format used follows the IEEE standard.
A floating-point number is expressed as the product of two parts: the mantissa and a power of two. For example:. The power of two is represented by the exponent. The stored form of the exponent is an 8-bit value from 0 to The mantissa is a bit value representing about seven decimal digits whose most significant bit MSB is always 1 and is, therefore, not stored.
There is also a sign bit that indicates whether the floating-point number is positive or negative. Using the above format, the floating-point number
0コメント