/*
* Copyright (c) 2003, Oracle and/or its affiliates. All rights reserved.
* DO NOT ALTER OR REMOVE COPYRIGHT NOTICES OR THIS FILE HEADER.
*
* This code is free software; you can redistribute it and/or modify it
* under the terms of the GNU General Public License version 2 only, as
* published by the Free Software Foundation. Oracle designates this
* particular file as subject to the "Classpath" exception as provided
* by Oracle in the LICENSE file that accompanied this code.
*
* This code is distributed in the hope that it will be useful, but WITHOUT
* ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
* FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License
* version 2 for more details (a copy is included in the LICENSE file that
* accompanied this code).
*
* You should have received a copy of the GNU General Public License version
* 2 along with this work; if not, write to the Free Software Foundation,
* Inc., 51 Franklin St, Fifth Floor, Boston, MA 02110-1301 USA.
*
* Please contact Oracle, 500 Oracle Parkway, Redwood Shores, CA 94065 USA
* or visit www.oracle.com if you need additional information or have any
* questions.
*/
package sun.misc;
import sun.misc.FloatConsts;
import sun.misc.DoubleConsts;
The class FpUtils
contains static utility methods for
manipulating and inspecting float
and
double
floating-point numbers. These methods include
functionality recommended or required by the IEEE 754
floating-point standard.
Author: Joseph D. Darcy
/**
* The class <code>FpUtils</code> contains static utility methods for
* manipulating and inspecting <code>float</code> and
* <code>double</code> floating-point numbers. These methods include
* functionality recommended or required by the IEEE 754
* floating-point standard.
*
* @author Joseph D. Darcy
*/
public class FpUtils {
/*
* The methods in this class are reasonably implemented using
* direct or indirect bit-level manipulation of floating-point
* values. However, having access to the IEEE 754 recommended
* functions would obviate the need for most programmers to engage
* in floating-point bit-twiddling.
*
* An IEEE 754 number has three fields, from most significant bit
* to to least significant, sign, exponent, and significand.
*
* msb lsb
* [sign|exponent| fractional_significand]
*
* Using some encoding cleverness, explained below, the high order
* bit of the logical significand does not need to be explicitly
* stored, thus "fractional_significand" instead of simply
* "significand" in the figure above.
*
* For finite normal numbers, the numerical value encoded is
*
* (-1)^sign * 2^(exponent)*(1.fractional_significand)
*
* Most finite floating-point numbers are normalized; the exponent
* value is reduced until the leading significand bit is 1.
* Therefore, the leading 1 is redundant and is not explicitly
* stored. If a numerical value is so small it cannot be
* normalized, it has a subnormal representation. Subnormal
* numbers don't have a leading 1 in their significand; subnormals
* are encoding using a special exponent value. In other words,
* the high-order bit of the logical significand can be elided in
* from the representation in either case since the bit's value is
* implicit from the exponent value.
*
* The exponent field uses a biased representation; if the bits of
* the exponent are interpreted as a unsigned integer E, the
* exponent represented is E - E_bias where E_bias depends on the
* floating-point format. E can range between E_min and E_max,
* constants which depend on the floating-point format. E_min and
* E_max are -126 and +127 for float, -1022 and +1023 for double.
*
* The 32-bit float format has 1 sign bit, 8 exponent bits, and 23
* bits for the significand (which is logically 24 bits wide
* because of the implicit bit). The 64-bit double format has 1
* sign bit, 11 exponent bits, and 52 bits for the significand
* (logically 53 bits).
*
* Subnormal numbers and zero have the special exponent value
* E_min -1; the numerical value represented by a subnormal is:
*
* (-1)^sign * 2^(E_min)*(0.fractional_significand)
*
* Zero is represented by all zero bits in the exponent and all
* zero bits in the significand; zero can have either sign.
*
* Infinity and NaN are encoded using the exponent value E_max +
* 1. Signed infinities have all significand bits zero; NaNs have
* at least one non-zero significand bit.
*
* The details of IEEE 754 floating-point encoding will be used in
* the methods below without further comment. For further
* exposition on IEEE 754 numbers, see "IEEE Standard for Binary
* Floating-Point Arithmetic" ANSI/IEEE Std 754-1985 or William
* Kahan's "Lecture Notes on the Status of IEEE Standard 754 for
* Binary Floating-Point Arithmetic",
* http://www.cs.berkeley.edu/~wkahan/ieee754status/ieee754.ps.
*
* Many of this class's methods are members of the set of IEEE 754
* recommended functions or similar functions recommended or
* required by IEEE 754R. Discussion of various implementation
* techniques for these functions have occurred in:
*
* W.J. Cody and Jerome T. Coonen, "Algorithm 772 Functions to
* Support the IEEE Standard for Binary Floating-Point
* Arithmetic," ACM Transactions on Mathematical Software,
* vol. 19, no. 4, December 1993, pp. 443-451.
*
* Joseph D. Darcy, "Writing robust IEEE recommended functions in
* ``100% Pure Java''(TM)," University of California, Berkeley
* technical report UCB//CSD-98-1009.
*/
Don't let anyone instantiate this class.
/**
* Don't let anyone instantiate this class.
*/
private FpUtils() {}
// Constants used in scalb
static double twoToTheDoubleScaleUp = powerOfTwoD(512);
static double twoToTheDoubleScaleDown = powerOfTwoD(-512);
// Helper Methods
// The following helper methods are used in the implementation of
// the public recommended functions; they generally omit certain
// tests for exception cases.
Returns unbiased exponent of a double
.
/**
* Returns unbiased exponent of a <code>double</code>.
*/
public static int getExponent(double d){
/*
* Bitwise convert d to long, mask out exponent bits, shift
* to the right and then subtract out double's bias adjust to
* get true exponent value.
*/
return (int)(((Double.doubleToRawLongBits(d) & DoubleConsts.EXP_BIT_MASK) >>
(DoubleConsts.SIGNIFICAND_WIDTH - 1)) - DoubleConsts.EXP_BIAS);
}
Returns unbiased exponent of a float
.
/**
* Returns unbiased exponent of a <code>float</code>.
*/
public static int getExponent(float f){
/*
* Bitwise convert f to integer, mask out exponent bits, shift
* to the right and then subtract out float's bias adjust to
* get true exponent value
*/
return ((Float.floatToRawIntBits(f) & FloatConsts.EXP_BIT_MASK) >>
(FloatConsts.SIGNIFICAND_WIDTH - 1)) - FloatConsts.EXP_BIAS;
}
Returns a floating-point power of two in the normal range.
/**
* Returns a floating-point power of two in the normal range.
*/
static double powerOfTwoD(int n) {
assert(n >= DoubleConsts.MIN_EXPONENT && n <= DoubleConsts.MAX_EXPONENT);
return Double.longBitsToDouble((((long)n + (long)DoubleConsts.EXP_BIAS) <<
(DoubleConsts.SIGNIFICAND_WIDTH-1))
& DoubleConsts.EXP_BIT_MASK);
}
Returns a floating-point power of two in the normal range.
/**
* Returns a floating-point power of two in the normal range.
*/
static float powerOfTwoF(int n) {
assert(n >= FloatConsts.MIN_EXPONENT && n <= FloatConsts.MAX_EXPONENT);
return Float.intBitsToFloat(((n + FloatConsts.EXP_BIAS) <<
(FloatConsts.SIGNIFICAND_WIDTH-1))
& FloatConsts.EXP_BIT_MASK);
}
Returns the first floating-point argument with the sign of the second floating-point argument. Note that unlike the copySign
method, this method does not require NaN sign
arguments to be treated
as positive values; implementations are permitted to treat some
NaN arguments as positive and other NaN arguments as negative
to allow greater performance.
Author: Joseph D. Darcy Params: - magnitude – the parameter providing the magnitude of the result
- sign – the parameter providing the sign of the result
Returns: a value with the magnitude of magnitude
and the sign of sign
.
/**
* Returns the first floating-point argument with the sign of the
* second floating-point argument. Note that unlike the {@link
* FpUtils#copySign(double, double) copySign} method, this method
* does not require NaN <code>sign</code> arguments to be treated
* as positive values; implementations are permitted to treat some
* NaN arguments as positive and other NaN arguments as negative
* to allow greater performance.
*
* @param magnitude the parameter providing the magnitude of the result
* @param sign the parameter providing the sign of the result
* @return a value with the magnitude of <code>magnitude</code>
* and the sign of <code>sign</code>.
* @author Joseph D. Darcy
*/
public static double rawCopySign(double magnitude, double sign) {
return Double.longBitsToDouble((Double.doubleToRawLongBits(sign) &
(DoubleConsts.SIGN_BIT_MASK)) |
(Double.doubleToRawLongBits(magnitude) &
(DoubleConsts.EXP_BIT_MASK |
DoubleConsts.SIGNIF_BIT_MASK)));
}
Returns the first floating-point argument with the sign of the second floating-point argument. Note that unlike the copySign
method, this method does not require NaN sign
arguments to be treated
as positive values; implementations are permitted to treat some
NaN arguments as positive and other NaN arguments as negative
to allow greater performance.
Author: Joseph D. Darcy Params: - magnitude – the parameter providing the magnitude of the result
- sign – the parameter providing the sign of the result
Returns: a value with the magnitude of magnitude
and the sign of sign
.
/**
* Returns the first floating-point argument with the sign of the
* second floating-point argument. Note that unlike the {@link
* FpUtils#copySign(float, float) copySign} method, this method
* does not require NaN <code>sign</code> arguments to be treated
* as positive values; implementations are permitted to treat some
* NaN arguments as positive and other NaN arguments as negative
* to allow greater performance.
*
* @param magnitude the parameter providing the magnitude of the result
* @param sign the parameter providing the sign of the result
* @return a value with the magnitude of <code>magnitude</code>
* and the sign of <code>sign</code>.
* @author Joseph D. Darcy
*/
public static float rawCopySign(float magnitude, float sign) {
return Float.intBitsToFloat((Float.floatToRawIntBits(sign) &
(FloatConsts.SIGN_BIT_MASK)) |
(Float.floatToRawIntBits(magnitude) &
(FloatConsts.EXP_BIT_MASK |
FloatConsts.SIGNIF_BIT_MASK)));
}
/* ***************************************************************** */
Returns true
if the argument is a finite
floating-point value; returns false
otherwise (for
NaN and infinity arguments).
Params: - d – the
double
value to be tested
Returns: true
if the argument is a finite
floating-point value, false
otherwise.
/**
* Returns <code>true</code> if the argument is a finite
* floating-point value; returns <code>false</code> otherwise (for
* NaN and infinity arguments).
*
* @param d the <code>double</code> value to be tested
* @return <code>true</code> if the argument is a finite
* floating-point value, <code>false</code> otherwise.
*/
public static boolean isFinite(double d) {
return Math.abs(d) <= DoubleConsts.MAX_VALUE;
}
Returns true
if the argument is a finite
floating-point value; returns false
otherwise (for
NaN and infinity arguments).
Params: - f – the
float
value to be tested
Returns: true
if the argument is a finite
floating-point value, false
otherwise.
/**
* Returns <code>true</code> if the argument is a finite
* floating-point value; returns <code>false</code> otherwise (for
* NaN and infinity arguments).
*
* @param f the <code>float</code> value to be tested
* @return <code>true</code> if the argument is a finite
* floating-point value, <code>false</code> otherwise.
*/
public static boolean isFinite(float f) {
return Math.abs(f) <= FloatConsts.MAX_VALUE;
}
Returns true
if the specified number is infinitely
large in magnitude, false
otherwise.
Note that this method is equivalent to the Double.isInfinite
method; the functionality is included in this class for convenience.
Params: - d – the value to be tested.
Returns: true
if the value of the argument is positive
infinity or negative infinity; false
otherwise.
/**
* Returns <code>true</code> if the specified number is infinitely
* large in magnitude, <code>false</code> otherwise.
*
* <p>Note that this method is equivalent to the {@link
* Double#isInfinite(double) Double.isInfinite} method; the
* functionality is included in this class for convenience.
*
* @param d the value to be tested.
* @return <code>true</code> if the value of the argument is positive
* infinity or negative infinity; <code>false</code> otherwise.
*/
public static boolean isInfinite(double d) {
return Double.isInfinite(d);
}
Returns true
if the specified number is infinitely
large in magnitude, false
otherwise.
Note that this method is equivalent to the Float.isInfinite
method; the functionality is included in this class for convenience.
Params: - f – the value to be tested.
Returns: true
if the argument is positive infinity or
negative infinity; false
otherwise.
/**
* Returns <code>true</code> if the specified number is infinitely
* large in magnitude, <code>false</code> otherwise.
*
* <p>Note that this method is equivalent to the {@link
* Float#isInfinite(float) Float.isInfinite} method; the
* functionality is included in this class for convenience.
*
* @param f the value to be tested.
* @return <code>true</code> if the argument is positive infinity or
* negative infinity; <code>false</code> otherwise.
*/
public static boolean isInfinite(float f) {
return Float.isInfinite(f);
}
Returns true
if the specified number is a
Not-a-Number (NaN) value, false
otherwise.
Note that this method is equivalent to the Double.isNaN
method; the functionality is included in this class for convenience.
Params: - d – the value to be tested.
Returns: true
if the value of the argument is NaN;
false
otherwise.
/**
* Returns <code>true</code> if the specified number is a
* Not-a-Number (NaN) value, <code>false</code> otherwise.
*
* <p>Note that this method is equivalent to the {@link
* Double#isNaN(double) Double.isNaN} method; the functionality is
* included in this class for convenience.
*
* @param d the value to be tested.
* @return <code>true</code> if the value of the argument is NaN;
* <code>false</code> otherwise.
*/
public static boolean isNaN(double d) {
return Double.isNaN(d);
}
Returns true
if the specified number is a
Not-a-Number (NaN) value, false
otherwise.
Note that this method is equivalent to the Float.isNaN
method; the functionality is included in this class for convenience.
Params: - f – the value to be tested.
Returns: true
if the argument is NaN;
false
otherwise.
/**
* Returns <code>true</code> if the specified number is a
* Not-a-Number (NaN) value, <code>false</code> otherwise.
*
* <p>Note that this method is equivalent to the {@link
* Float#isNaN(float) Float.isNaN} method; the functionality is
* included in this class for convenience.
*
* @param f the value to be tested.
* @return <code>true</code> if the argument is NaN;
* <code>false</code> otherwise.
*/
public static boolean isNaN(float f) {
return Float.isNaN(f);
}
Returns true
if the unordered relation holds
between the two arguments. When two floating-point values are
unordered, one value is neither less than, equal to, nor
greater than the other. For the unordered relation to be true,
at least one argument must be a NaN
.
Params: - arg1 – the first argument
- arg2 – the second argument
Returns: true
if at least one argument is a NaN,
false
otherwise.
/**
* Returns <code>true</code> if the unordered relation holds
* between the two arguments. When two floating-point values are
* unordered, one value is neither less than, equal to, nor
* greater than the other. For the unordered relation to be true,
* at least one argument must be a <code>NaN</code>.
*
* @param arg1 the first argument
* @param arg2 the second argument
* @return <code>true</code> if at least one argument is a NaN,
* <code>false</code> otherwise.
*/
public static boolean isUnordered(double arg1, double arg2) {
return isNaN(arg1) || isNaN(arg2);
}
Returns true
if the unordered relation holds
between the two arguments. When two floating-point values are
unordered, one value is neither less than, equal to, nor
greater than the other. For the unordered relation to be true,
at least one argument must be a NaN
.
Params: - arg1 – the first argument
- arg2 – the second argument
Returns: true
if at least one argument is a NaN,
false
otherwise.
/**
* Returns <code>true</code> if the unordered relation holds
* between the two arguments. When two floating-point values are
* unordered, one value is neither less than, equal to, nor
* greater than the other. For the unordered relation to be true,
* at least one argument must be a <code>NaN</code>.
*
* @param arg1 the first argument
* @param arg2 the second argument
* @return <code>true</code> if at least one argument is a NaN,
* <code>false</code> otherwise.
*/
public static boolean isUnordered(float arg1, float arg2) {
return isNaN(arg1) || isNaN(arg2);
}
Returns unbiased exponent of a double
; for
subnormal values, the number is treated as if it were
normalized. That is for all finite, non-zero, positive numbers
x, scalb(x, -ilogb(x))
is
always in the range [1, 2).
Special cases:
- If the argument is NaN, then the result is 230.
- If the argument is infinite, then the result is 228.
- If the argument is zero, then the result is -(228).
Author: Joseph D. Darcy Params: - d – floating-point number whose exponent is to be extracted
Returns: unbiased exponent of the argument.
/**
* Returns unbiased exponent of a <code>double</code>; for
* subnormal values, the number is treated as if it were
* normalized. That is for all finite, non-zero, positive numbers
* <i>x</i>, <code>scalb(<i>x</i>, -ilogb(<i>x</i>))</code> is
* always in the range [1, 2).
* <p>
* Special cases:
* <ul>
* <li> If the argument is NaN, then the result is 2<sup>30</sup>.
* <li> If the argument is infinite, then the result is 2<sup>28</sup>.
* <li> If the argument is zero, then the result is -(2<sup>28</sup>).
* </ul>
*
* @param d floating-point number whose exponent is to be extracted
* @return unbiased exponent of the argument.
* @author Joseph D. Darcy
*/
public static int ilogb(double d) {
int exponent = getExponent(d);
switch (exponent) {
case DoubleConsts.MAX_EXPONENT+1: // NaN or infinity
if( isNaN(d) )
return (1<<30); // 2^30
else // infinite value
return (1<<28); // 2^28
// break;
case DoubleConsts.MIN_EXPONENT-1: // zero or subnormal
if(d == 0.0) {
return -(1<<28); // -(2^28)
}
else {
long transducer = Double.doubleToRawLongBits(d);
/*
* To avoid causing slow arithmetic on subnormals,
* the scaling to determine when d's significand
* is normalized is done in integer arithmetic.
* (there must be at least one "1" bit in the
* significand since zero has been screened out.
*/
// isolate significand bits
transducer &= DoubleConsts.SIGNIF_BIT_MASK;
assert(transducer != 0L);
// This loop is simple and functional. We might be
// able to do something more clever that was faster;
// e.g. number of leading zero detection on
// (transducer << (# exponent and sign bits).
while (transducer <
(1L << (DoubleConsts.SIGNIFICAND_WIDTH - 1))) {
transducer *= 2;
exponent--;
}
exponent++;
assert( exponent >=
DoubleConsts.MIN_EXPONENT - (DoubleConsts.SIGNIFICAND_WIDTH-1) &&
exponent < DoubleConsts.MIN_EXPONENT);
return exponent;
}
// break;
default:
assert( exponent >= DoubleConsts.MIN_EXPONENT &&
exponent <= DoubleConsts.MAX_EXPONENT);
return exponent;
// break;
}
}
Returns unbiased exponent of a float
; for
subnormal values, the number is treated as if it were
normalized. That is for all finite, non-zero, positive numbers
x, scalb(x, -ilogb(x))
is
always in the range [1, 2).
Special cases:
- If the argument is NaN, then the result is 230.
- If the argument is infinite, then the result is 228.
- If the argument is zero, then the result is -(228).
Author: Joseph D. Darcy Params: - f – floating-point number whose exponent is to be extracted
Returns: unbiased exponent of the argument.
/**
* Returns unbiased exponent of a <code>float</code>; for
* subnormal values, the number is treated as if it were
* normalized. That is for all finite, non-zero, positive numbers
* <i>x</i>, <code>scalb(<i>x</i>, -ilogb(<i>x</i>))</code> is
* always in the range [1, 2).
* <p>
* Special cases:
* <ul>
* <li> If the argument is NaN, then the result is 2<sup>30</sup>.
* <li> If the argument is infinite, then the result is 2<sup>28</sup>.
* <li> If the argument is zero, then the result is -(2<sup>28</sup>).
* </ul>
*
* @param f floating-point number whose exponent is to be extracted
* @return unbiased exponent of the argument.
* @author Joseph D. Darcy
*/
public static int ilogb(float f) {
int exponent = getExponent(f);
switch (exponent) {
case FloatConsts.MAX_EXPONENT+1: // NaN or infinity
if( isNaN(f) )
return (1<<30); // 2^30
else // infinite value
return (1<<28); // 2^28
// break;
case FloatConsts.MIN_EXPONENT-1: // zero or subnormal
if(f == 0.0f) {
return -(1<<28); // -(2^28)
}
else {
int transducer = Float.floatToRawIntBits(f);
/*
* To avoid causing slow arithmetic on subnormals,
* the scaling to determine when f's significand
* is normalized is done in integer arithmetic.
* (there must be at least one "1" bit in the
* significand since zero has been screened out.
*/
// isolate significand bits
transducer &= FloatConsts.SIGNIF_BIT_MASK;
assert(transducer != 0);
// This loop is simple and functional. We might be
// able to do something more clever that was faster;
// e.g. number of leading zero detection on
// (transducer << (# exponent and sign bits).
while (transducer <
(1 << (FloatConsts.SIGNIFICAND_WIDTH - 1))) {
transducer *= 2;
exponent--;
}
exponent++;
assert( exponent >=
FloatConsts.MIN_EXPONENT - (FloatConsts.SIGNIFICAND_WIDTH-1) &&
exponent < FloatConsts.MIN_EXPONENT);
return exponent;
}
// break;
default:
assert( exponent >= FloatConsts.MIN_EXPONENT &&
exponent <= FloatConsts.MAX_EXPONENT);
return exponent;
// break;
}
}
/*
* The scalb operation should be reasonably fast; however, there
* are tradeoffs in writing a method to minimize the worst case
* performance and writing a method to minimize the time for
* expected common inputs. Some processors operate very slowly on
* subnormal operands, taking hundreds or thousands of cycles for
* one floating-point add or multiply as opposed to, say, four
* cycles for normal operands. For processors with very slow
* subnormal execution, scalb would be fastest if written entirely
* with integer operations; in other words, scalb would need to
* include the logic of performing correct rounding of subnormal
* values. This could be reasonably done in at most a few hundred
* cycles. However, this approach may penalize normal operations
* since at least the exponent of the floating-point argument must
* be examined.
*
* The approach taken in this implementation is a compromise.
* Floating-point multiplication is used to do most of the work;
* but knowingly multiplying by a subnormal scaling factor is
* avoided. However, the floating-point argument is not examined
* to see whether or not it is subnormal since subnormal inputs
* are assumed to be rare. At most three multiplies are needed to
* scale from the largest to smallest exponent ranges (scaling
* down, at most two multiplies are needed if subnormal scaling
* factors are allowed). However, in this implementation an
* expensive integer remainder operation is avoided at the cost of
* requiring five floating-point multiplies in the worst case,
* which should still be a performance win.
*
* If scaling of entire arrays is a concern, it would probably be
* more efficient to provide a double[] scalb(double[], int)
* version of scalb to avoid having to recompute the needed
* scaling factors for each floating-point value.
*/
Return d
×
2scale_factor
rounded as if performed
by a single correctly rounded floating-point multiply to a
member of the double value set. See §4.2.3
of the Java
Language Specification for a discussion of floating-point
value sets. If the exponent of the result is between the
double
's minimum exponent and maximum exponent,
the answer is calculated exactly. If the exponent of the
result would be larger than doubles
's maximum
exponent, an infinity is returned. Note that if the result is
subnormal, precision may be lost; that is, when scalb(x,
n)
is subnormal, scalb(scalb(x, n), -n)
may
not equal x. When the result is non-NaN, the result has
the same sign as d
.
Special cases:
- If the first argument is NaN, NaN is returned.
- If the first argument is infinite, then an infinity of the
same sign is returned.
- If the first argument is zero, then a zero of the same
sign is returned.
Author: Joseph D. Darcy Params: - d – number to be scaled by a power of two.
- scale_factor – power of 2 used to scale
d
Returns: d *
2scale_factor
/**
* Return <code>d</code> ×
* 2<sup><code>scale_factor</code></sup> rounded as if performed
* by a single correctly rounded floating-point multiply to a
* member of the double value set. See <a
* href="http://java.sun.com/docs/books/jls/second_edition/html/typesValues.doc.html#9208">§4.2.3</a>
* of the <a href="http://java.sun.com/docs/books/jls/html/">Java
* Language Specification</a> for a discussion of floating-point
* value sets. If the exponent of the result is between the
* <code>double</code>'s minimum exponent and maximum exponent,
* the answer is calculated exactly. If the exponent of the
* result would be larger than <code>doubles</code>'s maximum
* exponent, an infinity is returned. Note that if the result is
* subnormal, precision may be lost; that is, when <code>scalb(x,
* n)</code> is subnormal, <code>scalb(scalb(x, n), -n)</code> may
* not equal <i>x</i>. When the result is non-NaN, the result has
* the same sign as <code>d</code>.
*
*<p>
* Special cases:
* <ul>
* <li> If the first argument is NaN, NaN is returned.
* <li> If the first argument is infinite, then an infinity of the
* same sign is returned.
* <li> If the first argument is zero, then a zero of the same
* sign is returned.
* </ul>
*
* @param d number to be scaled by a power of two.
* @param scale_factor power of 2 used to scale <code>d</code>
* @return <code>d * </code>2<sup><code>scale_factor</code></sup>
* @author Joseph D. Darcy
*/
public static double scalb(double d, int scale_factor) {
/*
* This method does not need to be declared strictfp to
* compute the same correct result on all platforms. When
* scaling up, it does not matter what order the
* multiply-store operations are done; the result will be
* finite or overflow regardless of the operation ordering.
* However, to get the correct result when scaling down, a
* particular ordering must be used.
*
* When scaling down, the multiply-store operations are
* sequenced so that it is not possible for two consecutive
* multiply-stores to return subnormal results. If one
* multiply-store result is subnormal, the next multiply will
* round it away to zero. This is done by first multiplying
* by 2 ^ (scale_factor % n) and then multiplying several
* times by by 2^n as needed where n is the exponent of number
* that is a covenient power of two. In this way, at most one
* real rounding error occurs. If the double value set is
* being used exclusively, the rounding will occur on a
* multiply. If the double-extended-exponent value set is
* being used, the products will (perhaps) be exact but the
* stores to d are guaranteed to round to the double value
* set.
*
* It is _not_ a valid implementation to first multiply d by
* 2^MIN_EXPONENT and then by 2 ^ (scale_factor %
* MIN_EXPONENT) since even in a strictfp program double
* rounding on underflow could occur; e.g. if the scale_factor
* argument was (MIN_EXPONENT - n) and the exponent of d was a
* little less than -(MIN_EXPONENT - n), meaning the final
* result would be subnormal.
*
* Since exact reproducibility of this method can be achieved
* without any undue performance burden, there is no
* compelling reason to allow double rounding on underflow in
* scalb.
*/
// magnitude of a power of two so large that scaling a finite
// nonzero value by it would be guaranteed to over or
// underflow; due to rounding, scaling down takes takes an
// additional power of two which is reflected here
final int MAX_SCALE = DoubleConsts.MAX_EXPONENT + -DoubleConsts.MIN_EXPONENT +
DoubleConsts.SIGNIFICAND_WIDTH + 1;
int exp_adjust = 0;
int scale_increment = 0;
double exp_delta = Double.NaN;
// Make sure scaling factor is in a reasonable range
if(scale_factor < 0) {
scale_factor = Math.max(scale_factor, -MAX_SCALE);
scale_increment = -512;
exp_delta = twoToTheDoubleScaleDown;
}
else {
scale_factor = Math.min(scale_factor, MAX_SCALE);
scale_increment = 512;
exp_delta = twoToTheDoubleScaleUp;
}
// Calculate (scale_factor % +/-512), 512 = 2^9, using
// technique from "Hacker's Delight" section 10-2.
int t = (scale_factor >> 9-1) >>> 32 - 9;
exp_adjust = ((scale_factor + t) & (512 -1)) - t;
d *= powerOfTwoD(exp_adjust);
scale_factor -= exp_adjust;
while(scale_factor != 0) {
d *= exp_delta;
scale_factor -= scale_increment;
}
return d;
}
Return f
×
2scale_factor
rounded as if performed
by a single correctly rounded floating-point multiply to a
member of the float value set. See §4.2.3
of the Java
Language Specification for a discussion of floating-point
value set. If the exponent of the result is between the
float
's minimum exponent and maximum exponent, the
answer is calculated exactly. If the exponent of the result
would be larger than float
's maximum exponent, an
infinity is returned. Note that if the result is subnormal,
precision may be lost; that is, when scalb(x, n)
is subnormal, scalb(scalb(x, n), -n)
may not equal
x. When the result is non-NaN, the result has the same
sign as f
.
Special cases:
- If the first argument is NaN, NaN is returned.
- If the first argument is infinite, then an infinity of the
same sign is returned.
- If the first argument is zero, then a zero of the same
sign is returned.
Author: Joseph D. Darcy Params: - f – number to be scaled by a power of two.
- scale_factor – power of 2 used to scale
f
Returns: f *
2scale_factor
/**
* Return <code>f </code>×
* 2<sup><code>scale_factor</code></sup> rounded as if performed
* by a single correctly rounded floating-point multiply to a
* member of the float value set. See <a
* href="http://java.sun.com/docs/books/jls/second_edition/html/typesValues.doc.html#9208">§4.2.3</a>
* of the <a href="http://java.sun.com/docs/books/jls/html/">Java
* Language Specification</a> for a discussion of floating-point
* value set. If the exponent of the result is between the
* <code>float</code>'s minimum exponent and maximum exponent, the
* answer is calculated exactly. If the exponent of the result
* would be larger than <code>float</code>'s maximum exponent, an
* infinity is returned. Note that if the result is subnormal,
* precision may be lost; that is, when <code>scalb(x, n)</code>
* is subnormal, <code>scalb(scalb(x, n), -n)</code> may not equal
* <i>x</i>. When the result is non-NaN, the result has the same
* sign as <code>f</code>.
*
*<p>
* Special cases:
* <ul>
* <li> If the first argument is NaN, NaN is returned.
* <li> If the first argument is infinite, then an infinity of the
* same sign is returned.
* <li> If the first argument is zero, then a zero of the same
* sign is returned.
* </ul>
*
* @param f number to be scaled by a power of two.
* @param scale_factor power of 2 used to scale <code>f</code>
* @return <code>f * </code>2<sup><code>scale_factor</code></sup>
* @author Joseph D. Darcy
*/
public static float scalb(float f, int scale_factor) {
// magnitude of a power of two so large that scaling a finite
// nonzero value by it would be guaranteed to over or
// underflow; due to rounding, scaling down takes takes an
// additional power of two which is reflected here
final int MAX_SCALE = FloatConsts.MAX_EXPONENT + -FloatConsts.MIN_EXPONENT +
FloatConsts.SIGNIFICAND_WIDTH + 1;
// Make sure scaling factor is in a reasonable range
scale_factor = Math.max(Math.min(scale_factor, MAX_SCALE), -MAX_SCALE);
/*
* Since + MAX_SCALE for float fits well within the double
* exponent range and + float -> double conversion is exact
* the multiplication below will be exact. Therefore, the
* rounding that occurs when the double product is cast to
* float will be the correctly rounded float result. Since
* all operations other than the final multiply will be exact,
* it is not necessary to declare this method strictfp.
*/
return (float)((double)f*powerOfTwoD(scale_factor));
}
Returns the floating-point number adjacent to the first
argument in the direction of the second argument. If both
arguments compare as equal the second argument is returned.
Special cases:
- If either argument is a NaN, then NaN is returned.
- If both arguments are signed zeros,
direction
is returned unchanged (as implied by the requirement of
returning the second argument if the arguments compare as
equal).
- If
start
is
±Double.MIN_VALUE
and direction
has a value such that the result should have a smaller
magnitude, then a zero with the same sign as start
is returned.
- If
start
is infinite and
direction
has a value such that the result should
have a smaller magnitude, Double.MAX_VALUE
with the
same sign as start
is returned.
- If
start
is equal to ±
Double.MAX_VALUE
and direction
has a
value such that the result should have a larger magnitude, an
infinity with same sign as start
is returned.
Author: Joseph D. Darcy Params: - start – starting floating-point value
- direction – value indicating which of
start
's neighbors or start
should
be returned
Returns: The floating-point number adjacent to start
in the
direction of direction
.
/**
* Returns the floating-point number adjacent to the first
* argument in the direction of the second argument. If both
* arguments compare as equal the second argument is returned.
*
* <p>
* Special cases:
* <ul>
* <li> If either argument is a NaN, then NaN is returned.
*
* <li> If both arguments are signed zeros, <code>direction</code>
* is returned unchanged (as implied by the requirement of
* returning the second argument if the arguments compare as
* equal).
*
* <li> If <code>start</code> is
* ±<code>Double.MIN_VALUE</code> and <code>direction</code>
* has a value such that the result should have a smaller
* magnitude, then a zero with the same sign as <code>start</code>
* is returned.
*
* <li> If <code>start</code> is infinite and
* <code>direction</code> has a value such that the result should
* have a smaller magnitude, <code>Double.MAX_VALUE</code> with the
* same sign as <code>start</code> is returned.
*
* <li> If <code>start</code> is equal to ±
* <code>Double.MAX_VALUE</code> and <code>direction</code> has a
* value such that the result should have a larger magnitude, an
* infinity with same sign as <code>start</code> is returned.
* </ul>
*
* @param start starting floating-point value
* @param direction value indicating which of
* <code>start</code>'s neighbors or <code>start</code> should
* be returned
* @return The floating-point number adjacent to <code>start</code> in the
* direction of <code>direction</code>.
* @author Joseph D. Darcy
*/
public static double nextAfter(double start, double direction) {
/*
* The cases:
*
* nextAfter(+infinity, 0) == MAX_VALUE
* nextAfter(+infinity, +infinity) == +infinity
* nextAfter(-infinity, 0) == -MAX_VALUE
* nextAfter(-infinity, -infinity) == -infinity
*
* are naturally handled without any additional testing
*/
// First check for NaN values
if (isNaN(start) || isNaN(direction)) {
// return a NaN derived from the input NaN(s)
return start + direction;
} else if (start == direction) {
return direction;
} else { // start > direction or start < direction
// Add +0.0 to get rid of a -0.0 (+0.0 + -0.0 => +0.0)
// then bitwise convert start to integer.
long transducer = Double.doubleToRawLongBits(start + 0.0d);
/*
* IEEE 754 floating-point numbers are lexicographically
* ordered if treated as signed- magnitude integers .
* Since Java's integers are two's complement,
* incrementing" the two's complement representation of a
* logically negative floating-point value *decrements*
* the signed-magnitude representation. Therefore, when
* the integer representation of a floating-point values
* is less than zero, the adjustment to the representation
* is in the opposite direction than would be expected at
* first .
*/
if (direction > start) { // Calculate next greater value
transducer = transducer + (transducer >= 0L ? 1L:-1L);
} else { // Calculate next lesser value
assert direction < start;
if (transducer > 0L)
--transducer;
else
if (transducer < 0L )
++transducer;
/*
* transducer==0, the result is -MIN_VALUE
*
* The transition from zero (implicitly
* positive) to the smallest negative
* signed magnitude value must be done
* explicitly.
*/
else
transducer = DoubleConsts.SIGN_BIT_MASK | 1L;
}
return Double.longBitsToDouble(transducer);
}
}
Returns the floating-point number adjacent to the first
argument in the direction of the second argument. If both
arguments compare as equal, the second argument is returned.
Special cases:
- If either argument is a NaN, then NaN is returned.
- If both arguments are signed zeros, a
float
zero with the same sign as direction
is returned
(as implied by the requirement of returning the second argument
if the arguments compare as equal).
- If
start
is
±Float.MIN_VALUE
and direction
has a value such that the result should have a smaller
magnitude, then a zero with the same sign as start
is returned.
- If
start
is infinite and
direction
has a value such that the result should
have a smaller magnitude, Float.MAX_VALUE
with the
same sign as start
is returned.
- If
start
is equal to ±
Float.MAX_VALUE
and direction
has a
value such that the result should have a larger magnitude, an
infinity with same sign as start
is returned.
Author: Joseph D. Darcy Params: - start – starting floating-point value
- direction – value indicating which of
start
's neighbors or start
should
be returned
Returns: The floating-point number adjacent to start
in the
direction of direction
.
/**
* Returns the floating-point number adjacent to the first
* argument in the direction of the second argument. If both
* arguments compare as equal, the second argument is returned.
*
* <p>
* Special cases:
* <ul>
* <li> If either argument is a NaN, then NaN is returned.
*
* <li> If both arguments are signed zeros, a <code>float</code>
* zero with the same sign as <code>direction</code> is returned
* (as implied by the requirement of returning the second argument
* if the arguments compare as equal).
*
* <li> If <code>start</code> is
* ±<code>Float.MIN_VALUE</code> and <code>direction</code>
* has a value such that the result should have a smaller
* magnitude, then a zero with the same sign as <code>start</code>
* is returned.
*
* <li> If <code>start</code> is infinite and
* <code>direction</code> has a value such that the result should
* have a smaller magnitude, <code>Float.MAX_VALUE</code> with the
* same sign as <code>start</code> is returned.
*
* <li> If <code>start</code> is equal to ±
* <code>Float.MAX_VALUE</code> and <code>direction</code> has a
* value such that the result should have a larger magnitude, an
* infinity with same sign as <code>start</code> is returned.
* </ul>
*
* @param start starting floating-point value
* @param direction value indicating which of
* <code>start</code>'s neighbors or <code>start</code> should
* be returned
* @return The floating-point number adjacent to <code>start</code> in the
* direction of <code>direction</code>.
* @author Joseph D. Darcy
*/
public static float nextAfter(float start, double direction) {
/*
* The cases:
*
* nextAfter(+infinity, 0) == MAX_VALUE
* nextAfter(+infinity, +infinity) == +infinity
* nextAfter(-infinity, 0) == -MAX_VALUE
* nextAfter(-infinity, -infinity) == -infinity
*
* are naturally handled without any additional testing
*/
// First check for NaN values
if (isNaN(start) || isNaN(direction)) {
// return a NaN derived from the input NaN(s)
return start + (float)direction;
} else if (start == direction) {
return (float)direction;
} else { // start > direction or start < direction
// Add +0.0 to get rid of a -0.0 (+0.0 + -0.0 => +0.0)
// then bitwise convert start to integer.
int transducer = Float.floatToRawIntBits(start + 0.0f);
/*
* IEEE 754 floating-point numbers are lexicographically
* ordered if treated as signed- magnitude integers .
* Since Java's integers are two's complement,
* incrementing" the two's complement representation of a
* logically negative floating-point value *decrements*
* the signed-magnitude representation. Therefore, when
* the integer representation of a floating-point values
* is less than zero, the adjustment to the representation
* is in the opposite direction than would be expected at
* first.
*/
if (direction > start) {// Calculate next greater value
transducer = transducer + (transducer >= 0 ? 1:-1);
} else { // Calculate next lesser value
assert direction < start;
if (transducer > 0)
--transducer;
else
if (transducer < 0 )
++transducer;
/*
* transducer==0, the result is -MIN_VALUE
*
* The transition from zero (implicitly
* positive) to the smallest negative
* signed magnitude value must be done
* explicitly.
*/
else
transducer = FloatConsts.SIGN_BIT_MASK | 1;
}
return Float.intBitsToFloat(transducer);
}
}
Returns the floating-point value adjacent to d
in
the direction of positive infinity. This method is
semantically equivalent to nextAfter(d,
Double.POSITIVE_INFINITY)
; however, a nextUp
implementation may run faster than its equivalent
nextAfter
call.
Special Cases:
- If the argument is NaN, the result is NaN.
- If the argument is positive infinity, the result is
positive infinity.
- If the argument is zero, the result is
Double.MIN_VALUE
Author: Joseph D. Darcy Params: - d – starting floating-point value
Returns: The adjacent floating-point value closer to positive
infinity.
/**
* Returns the floating-point value adjacent to <code>d</code> in
* the direction of positive infinity. This method is
* semantically equivalent to <code>nextAfter(d,
* Double.POSITIVE_INFINITY)</code>; however, a <code>nextUp</code>
* implementation may run faster than its equivalent
* <code>nextAfter</code> call.
*
* <p>Special Cases:
* <ul>
* <li> If the argument is NaN, the result is NaN.
*
* <li> If the argument is positive infinity, the result is
* positive infinity.
*
* <li> If the argument is zero, the result is
* <code>Double.MIN_VALUE</code>
*
* </ul>
*
* @param d starting floating-point value
* @return The adjacent floating-point value closer to positive
* infinity.
* @author Joseph D. Darcy
*/
public static double nextUp(double d) {
if( isNaN(d) || d == Double.POSITIVE_INFINITY)
return d;
else {
d += 0.0d;
return Double.longBitsToDouble(Double.doubleToRawLongBits(d) +
((d >= 0.0d)?+1L:-1L));
}
}
Returns the floating-point value adjacent to f
in
the direction of positive infinity. This method is
semantically equivalent to nextAfter(f,
Double.POSITIVE_INFINITY)
; however, a nextUp
implementation may run faster than its equivalent
nextAfter
call.
Special Cases:
- If the argument is NaN, the result is NaN.
- If the argument is positive infinity, the result is
positive infinity.
- If the argument is zero, the result is
Float.MIN_VALUE
Author: Joseph D. Darcy Params: - f – starting floating-point value
Returns: The adjacent floating-point value closer to positive
infinity.
/**
* Returns the floating-point value adjacent to <code>f</code> in
* the direction of positive infinity. This method is
* semantically equivalent to <code>nextAfter(f,
* Double.POSITIVE_INFINITY)</code>; however, a <code>nextUp</code>
* implementation may run faster than its equivalent
* <code>nextAfter</code> call.
*
* <p>Special Cases:
* <ul>
* <li> If the argument is NaN, the result is NaN.
*
* <li> If the argument is positive infinity, the result is
* positive infinity.
*
* <li> If the argument is zero, the result is
* <code>Float.MIN_VALUE</code>
*
* </ul>
*
* @param f starting floating-point value
* @return The adjacent floating-point value closer to positive
* infinity.
* @author Joseph D. Darcy
*/
public static float nextUp(float f) {
if( isNaN(f) || f == FloatConsts.POSITIVE_INFINITY)
return f;
else {
f += 0.0f;
return Float.intBitsToFloat(Float.floatToRawIntBits(f) +
((f >= 0.0f)?+1:-1));
}
}
Returns the floating-point value adjacent to d
in
the direction of negative infinity. This method is
semantically equivalent to nextAfter(d,
Double.NEGATIVE_INFINITY)
; however, a
nextDown
implementation may run faster than its
equivalent nextAfter
call.
Special Cases:
- If the argument is NaN, the result is NaN.
- If the argument is negative infinity, the result is
negative infinity.
- If the argument is zero, the result is
-Double.MIN_VALUE
Author: Joseph D. Darcy Params: - d – starting floating-point value
Returns: The adjacent floating-point value closer to negative
infinity.
/**
* Returns the floating-point value adjacent to <code>d</code> in
* the direction of negative infinity. This method is
* semantically equivalent to <code>nextAfter(d,
* Double.NEGATIVE_INFINITY)</code>; however, a
* <code>nextDown</code> implementation may run faster than its
* equivalent <code>nextAfter</code> call.
*
* <p>Special Cases:
* <ul>
* <li> If the argument is NaN, the result is NaN.
*
* <li> If the argument is negative infinity, the result is
* negative infinity.
*
* <li> If the argument is zero, the result is
* <code>-Double.MIN_VALUE</code>
*
* </ul>
*
* @param d starting floating-point value
* @return The adjacent floating-point value closer to negative
* infinity.
* @author Joseph D. Darcy
*/
public static double nextDown(double d) {
if( isNaN(d) || d == Double.NEGATIVE_INFINITY)
return d;
else {
if (d == 0.0)
return -Double.MIN_VALUE;
else
return Double.longBitsToDouble(Double.doubleToRawLongBits(d) +
((d > 0.0d)?-1L:+1L));
}
}
Returns the floating-point value adjacent to f
in
the direction of negative infinity. This method is
semantically equivalent to nextAfter(f,
Float.NEGATIVE_INFINITY)
; however, a
nextDown
implementation may run faster than its
equivalent nextAfter
call.
Special Cases:
- If the argument is NaN, the result is NaN.
- If the argument is negative infinity, the result is
negative infinity.
- If the argument is zero, the result is
-Float.MIN_VALUE
Author: Joseph D. Darcy Params: - f – starting floating-point value
Returns: The adjacent floating-point value closer to negative
infinity.
/**
* Returns the floating-point value adjacent to <code>f</code> in
* the direction of negative infinity. This method is
* semantically equivalent to <code>nextAfter(f,
* Float.NEGATIVE_INFINITY)</code>; however, a
* <code>nextDown</code> implementation may run faster than its
* equivalent <code>nextAfter</code> call.
*
* <p>Special Cases:
* <ul>
* <li> If the argument is NaN, the result is NaN.
*
* <li> If the argument is negative infinity, the result is
* negative infinity.
*
* <li> If the argument is zero, the result is
* <code>-Float.MIN_VALUE</code>
*
* </ul>
*
* @param f starting floating-point value
* @return The adjacent floating-point value closer to negative
* infinity.
* @author Joseph D. Darcy
*/
public static double nextDown(float f) {
if( isNaN(f) || f == Float.NEGATIVE_INFINITY)
return f;
else {
if (f == 0.0f)
return -Float.MIN_VALUE;
else
return Float.intBitsToFloat(Float.floatToRawIntBits(f) +
((f > 0.0f)?-1:+1));
}
}
Returns the first floating-point argument with the sign of the
second floating-point argument. For this method, a NaN
sign
argument is always treated as if it were
positive.
Author: Joseph D. Darcy Params: - magnitude – the parameter providing the magnitude of the result
- sign – the parameter providing the sign of the result
Returns: a value with the magnitude of magnitude
and the sign of sign
. Since: 1.5
/**
* Returns the first floating-point argument with the sign of the
* second floating-point argument. For this method, a NaN
* <code>sign</code> argument is always treated as if it were
* positive.
*
* @param magnitude the parameter providing the magnitude of the result
* @param sign the parameter providing the sign of the result
* @return a value with the magnitude of <code>magnitude</code>
* and the sign of <code>sign</code>.
* @author Joseph D. Darcy
* @since 1.5
*/
public static double copySign(double magnitude, double sign) {
return rawCopySign(magnitude, (isNaN(sign)?1.0d:sign));
}
Returns the first floating-point argument with the sign of the
second floating-point argument. For this method, a NaN
sign
argument is always treated as if it were
positive.
Author: Joseph D. Darcy Params: - magnitude – the parameter providing the magnitude of the result
- sign – the parameter providing the sign of the result
Returns: a value with the magnitude of magnitude
and the sign of sign
.
/**
* Returns the first floating-point argument with the sign of the
* second floating-point argument. For this method, a NaN
* <code>sign</code> argument is always treated as if it were
* positive.
*
* @param magnitude the parameter providing the magnitude of the result
* @param sign the parameter providing the sign of the result
* @return a value with the magnitude of <code>magnitude</code>
* and the sign of <code>sign</code>.
* @author Joseph D. Darcy
*/
public static float copySign(float magnitude, float sign) {
return rawCopySign(magnitude, (isNaN(sign)?1.0f:sign));
}
Returns the size of an ulp of the argument. An ulp of a
double
value is the positive distance between this
floating-point value and the double
value next
larger in magnitude. Note that for non-NaN x,
ulp(-x) == ulp(x)
.
Special Cases:
- If the argument is NaN, then the result is NaN.
- If the argument is positive or negative infinity, then the
result is positive infinity.
- If the argument is positive or negative zero, then the result is
Double.MIN_VALUE
.
- If the argument is ±
Double.MAX_VALUE
, then
the result is equal to 2971.
Author: Joseph D. Darcy Params: - d – the floating-point value whose ulp is to be returned
Returns: the size of an ulp of the argument Since: 1.5
/**
* Returns the size of an ulp of the argument. An ulp of a
* <code>double</code> value is the positive distance between this
* floating-point value and the <code>double</code> value next
* larger in magnitude. Note that for non-NaN <i>x</i>,
* <code>ulp(-<i>x</i>) == ulp(<i>x</i>)</code>.
*
* <p>Special Cases:
* <ul>
* <li> If the argument is NaN, then the result is NaN.
* <li> If the argument is positive or negative infinity, then the
* result is positive infinity.
* <li> If the argument is positive or negative zero, then the result is
* <code>Double.MIN_VALUE</code>.
* <li> If the argument is ±<code>Double.MAX_VALUE</code>, then
* the result is equal to 2<sup>971</sup>.
* </ul>
*
* @param d the floating-point value whose ulp is to be returned
* @return the size of an ulp of the argument
* @author Joseph D. Darcy
* @since 1.5
*/
public static double ulp(double d) {
int exp = getExponent(d);
switch(exp) {
case DoubleConsts.MAX_EXPONENT+1: // NaN or infinity
return Math.abs(d);
// break;
case DoubleConsts.MIN_EXPONENT-1: // zero or subnormal
return Double.MIN_VALUE;
// break
default:
assert exp <= DoubleConsts.MAX_EXPONENT && exp >= DoubleConsts.MIN_EXPONENT;
// ulp(x) is usually 2^(SIGNIFICAND_WIDTH-1)*(2^ilogb(x))
exp = exp - (DoubleConsts.SIGNIFICAND_WIDTH-1);
if (exp >= DoubleConsts.MIN_EXPONENT) {
return powerOfTwoD(exp);
}
else {
// return a subnormal result; left shift integer
// representation of Double.MIN_VALUE appropriate
// number of positions
return Double.longBitsToDouble(1L <<
(exp - (DoubleConsts.MIN_EXPONENT - (DoubleConsts.SIGNIFICAND_WIDTH-1)) ));
}
// break
}
}
Returns the size of an ulp of the argument. An ulp of a
float
value is the positive distance between this
floating-point value and the float
value next
larger in magnitude. Note that for non-NaN x,
ulp(-x) == ulp(x)
.
Special Cases:
- If the argument is NaN, then the result is NaN.
- If the argument is positive or negative infinity, then the
result is positive infinity.
- If the argument is positive or negative zero, then the result is
Float.MIN_VALUE
.
- If the argument is ±
Float.MAX_VALUE
, then
the result is equal to 2104.
Author: Joseph D. Darcy Params: - f – the floating-point value whose ulp is to be returned
Returns: the size of an ulp of the argument Since: 1.5
/**
* Returns the size of an ulp of the argument. An ulp of a
* <code>float</code> value is the positive distance between this
* floating-point value and the <code>float</code> value next
* larger in magnitude. Note that for non-NaN <i>x</i>,
* <code>ulp(-<i>x</i>) == ulp(<i>x</i>)</code>.
*
* <p>Special Cases:
* <ul>
* <li> If the argument is NaN, then the result is NaN.
* <li> If the argument is positive or negative infinity, then the
* result is positive infinity.
* <li> If the argument is positive or negative zero, then the result is
* <code>Float.MIN_VALUE</code>.
* <li> If the argument is ±<code>Float.MAX_VALUE</code>, then
* the result is equal to 2<sup>104</sup>.
* </ul>
*
* @param f the floating-point value whose ulp is to be returned
* @return the size of an ulp of the argument
* @author Joseph D. Darcy
* @since 1.5
*/
public static float ulp(float f) {
int exp = getExponent(f);
switch(exp) {
case FloatConsts.MAX_EXPONENT+1: // NaN or infinity
return Math.abs(f);
// break;
case FloatConsts.MIN_EXPONENT-1: // zero or subnormal
return FloatConsts.MIN_VALUE;
// break
default:
assert exp <= FloatConsts.MAX_EXPONENT && exp >= FloatConsts.MIN_EXPONENT;
// ulp(x) is usually 2^(SIGNIFICAND_WIDTH-1)*(2^ilogb(x))
exp = exp - (FloatConsts.SIGNIFICAND_WIDTH-1);
if (exp >= FloatConsts.MIN_EXPONENT) {
return powerOfTwoF(exp);
}
else {
// return a subnormal result; left shift integer
// representation of FloatConsts.MIN_VALUE appropriate
// number of positions
return Float.intBitsToFloat(1 <<
(exp - (FloatConsts.MIN_EXPONENT - (FloatConsts.SIGNIFICAND_WIDTH-1)) ));
}
// break
}
}
Returns the signum function of the argument; zero if the argument
is zero, 1.0 if the argument is greater than zero, -1.0 if the
argument is less than zero.
Special Cases:
- If the argument is NaN, then the result is NaN.
- If the argument is positive zero or negative zero, then the
result is the same as the argument.
Author: Joseph D. Darcy Params: - d – the floating-point value whose signum is to be returned
Returns: the signum function of the argument Since: 1.5
/**
* Returns the signum function of the argument; zero if the argument
* is zero, 1.0 if the argument is greater than zero, -1.0 if the
* argument is less than zero.
*
* <p>Special Cases:
* <ul>
* <li> If the argument is NaN, then the result is NaN.
* <li> If the argument is positive zero or negative zero, then the
* result is the same as the argument.
* </ul>
*
* @param d the floating-point value whose signum is to be returned
* @return the signum function of the argument
* @author Joseph D. Darcy
* @since 1.5
*/
public static double signum(double d) {
return (d == 0.0 || isNaN(d))?d:copySign(1.0, d);
}
Returns the signum function of the argument; zero if the argument
is zero, 1.0f if the argument is greater than zero, -1.0f if the
argument is less than zero.
Special Cases:
- If the argument is NaN, then the result is NaN.
- If the argument is positive zero or negative zero, then the
result is the same as the argument.
Author: Joseph D. Darcy Params: - f – the floating-point value whose signum is to be returned
Returns: the signum function of the argument Since: 1.5
/**
* Returns the signum function of the argument; zero if the argument
* is zero, 1.0f if the argument is greater than zero, -1.0f if the
* argument is less than zero.
*
* <p>Special Cases:
* <ul>
* <li> If the argument is NaN, then the result is NaN.
* <li> If the argument is positive zero or negative zero, then the
* result is the same as the argument.
* </ul>
*
* @param f the floating-point value whose signum is to be returned
* @return the signum function of the argument
* @author Joseph D. Darcy
* @since 1.5
*/
public static float signum(float f) {
return (f == 0.0f || isNaN(f))?f:copySign(1.0f, f);
}
}