Difference between revisions of "Floating point"

Latest revision as of 16:34, 17 March 2024

Floating point is a term used to describe computer support for real numbers; originally performed in software, it is now invariably done in hardware, often in a special floating point processor.

Most implementations are fixed accuracy, and thus may truncate or round some results. It is possible to use floating point for irrational numbers as well, provided the user is prepared to accept the slight loss of accuracy.

Most hardware implementations of floating point uses three fields - a sign bit, an exponent, and a mantissa (fractional part); i.e. floating point numbers are represented in the form .X * 2^Y, where Y can be negative or positive.

Historically, different architectures devised their own floating point specifications (e.g. FP11 floating point), but there are now IEEE standards for floating point. The following formats (each with one sign bit, and exponent and mantissa bits as given) for storing floating point numbers exist:

Single-precision: 8+23
Double-precision: 11+52
Quadruple-precision: 15+112
Octuple-precision: 19+236

In all four, there is actually one more bit of mantissa than shown, because they are usually stored in normalized form (i.e. the mantissa is constrained to be between 1/2 and 1, without any leading 0's to the right of the 'point'), and so there is always a '1' there, which is not stored.

External links

Floating-Point Formats - good overview

@@ Line 12: / Line 12: @@
 * Octuple-precision: 19+236
-In all four, there is actually one more bit of mantissa than shown, because they are usually stored in 'normalized' form (i.e. the mantissa is constrained to be between 1/2 and 1, without any leading 0's to the right of the 'point'), and so there is always a '1' there, which is not stored.
+In all four, there is actually one more bit of mantissa than shown, because they are usually stored in [[normalization|normalized]] form (i.e. the mantissa is constrained to be between 1/2 and 1, without any leading 0's to the right of the 'point'), and so there is always a '1' there, which is not stored.
+==External links==
+* [http://employees.oneonta.edu/zhangs/csci201/general%20Floating%20Point%20Format.htm Floating-Point Formats] - good overview
+[[Category: Theory]]

Difference between revisions of "Floating point"

Latest revision as of 16:34, 17 March 2024

External links

Navigation menu

Views

Personal tools

Navigation

Search

Tools