Difference between revisions of "Floating point"

From Computer History Wiki
Jump to: navigation, search
(A reasonable start)
 
m (Link normalized)
 
(3 intermediate revisions by the same user not shown)
Line 1: Line 1:
'''Floating point''' is a term used to describe computer support for real numbers; originally performed in software, it is now invariably done in hardware. Most implementations are fixed accuracy, and thus may truncate or round some results. It is possible to use floating point for irrational numbers as well, provided the user is prepared to accept the slight loss of accuracy.
+
'''Floating point''' is a term used to describe computer support for real numbers; originally performed in [[software]], it is now invariably done in [[hardware]], often in a special [[floating point processor]].
 +
 
 +
Most implementations are fixed accuracy, and thus may truncate or round some results. It is possible to use floating point for irrational numbers as well, provided the user is prepared to accept the slight loss of accuracy.
  
 
Most hardware implementations of floating point uses three fields - a sign bit, an exponent, and a mantissa (fractional part); i.e. floating point numbers are represented in the form .X * 2^Y, where Y can be negative or positive.
 
Most hardware implementations of floating point uses three fields - a sign bit, an exponent, and a mantissa (fractional part); i.e. floating point numbers are represented in the form .X * 2^Y, where Y can be negative or positive.
Line 10: Line 12:
 
* Octuple-precision: 19+236
 
* Octuple-precision: 19+236
  
In all four, there is actually one more bit of mantissa than shown, because they are usually stored in 'normalized' form (i.e. the mantissa is constrained to be between 1/2 and 1, without any leading 0's to the right of the 'point'), and so there is always a '1' there, which is not stored.
+
In all four, there is actually one more bit of mantissa than shown, because they are usually stored in [[normalization|normalized]] form (i.e. the mantissa is constrained to be between 1/2 and 1, without any leading 0's to the right of the 'point'), and so there is always a '1' there, which is not stored.
 +
 
 +
==External links==
 +
 
 +
* [http://employees.oneonta.edu/zhangs/csci201/general%20Floating%20Point%20Format.htm Floating-Point Formats] - good overview
 +
 
 +
[[Category: Theory]]

Latest revision as of 17:34, 17 March 2024

Floating point is a term used to describe computer support for real numbers; originally performed in software, it is now invariably done in hardware, often in a special floating point processor.

Most implementations are fixed accuracy, and thus may truncate or round some results. It is possible to use floating point for irrational numbers as well, provided the user is prepared to accept the slight loss of accuracy.

Most hardware implementations of floating point uses three fields - a sign bit, an exponent, and a mantissa (fractional part); i.e. floating point numbers are represented in the form .X * 2^Y, where Y can be negative or positive.

Historically, different architectures devised their own floating point specifications (e.g. FP11 floating point), but there are now IEEE standards for floating point. The following formats (each with one sign bit, and exponent and mantissa bits as given) for storing floating point numbers exist:

  • Single-precision: 8+23
  • Double-precision: 11+52
  • Quadruple-precision: 15+112
  • Octuple-precision: 19+236

In all four, there is actually one more bit of mantissa than shown, because they are usually stored in normalized form (i.e. the mantissa is constrained to be between 1/2 and 1, without any leading 0's to the right of the 'point'), and so there is always a '1' there, which is not stored.

External links