/*
* Copyright (c) 2003, 2010, Oracle and/or its affiliates. All rights reserved.
* DO NOT ALTER OR REMOVE COPYRIGHT NOTICES OR THIS FILE HEADER.
*
* This code is free software; you can redistribute it and/or modify it
* under the terms of the GNU General Public License version 2 only, as
* published by the Free Software Foundation. Oracle designates this
* particular file as subject to the "Classpath" exception as provided
* by Oracle in the LICENSE file that accompanied this code.
*
* This code is distributed in the hope that it will be useful, but WITHOUT
* ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
* FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License
* version 2 for more details (a copy is included in the LICENSE file that
* accompanied this code).
*
* You should have received a copy of the GNU General Public License version
* 2 along with this work; if not, write to the Free Software Foundation,
* Inc., 51 Franklin St, Fifth Floor, Boston, MA 02110-1301 USA.
*
* Please contact Oracle, 500 Oracle Parkway, Redwood Shores, CA 94065 USA
* or visit www.oracle.com if you need additional information or have any
* questions.
*/
package sun.misc;
import sun.misc.FloatConsts;
import sun.misc.DoubleConsts;
The class FpUtils
contains static utility methods for manipulating and inspecting float
and double
floating-point numbers. These methods include functionality recommended or required by the IEEE 754 floating-point standard. Author: Joseph D. Darcy
/**
* The class {@code FpUtils} contains static utility methods for
* manipulating and inspecting {@code float} and
* {@code double} floating-point numbers. These methods include
* functionality recommended or required by the IEEE 754
* floating-point standard.
*
* @author Joseph D. Darcy
*/
public class FpUtils {
/*
* The methods in this class are reasonably implemented using
* direct or indirect bit-level manipulation of floating-point
* values. However, having access to the IEEE 754 recommended
* functions would obviate the need for most programmers to engage
* in floating-point bit-twiddling.
*
* An IEEE 754 number has three fields, from most significant bit
* to to least significant, sign, exponent, and significand.
*
* msb lsb
* [sign|exponent| fractional_significand]
*
* Using some encoding cleverness, explained below, the high order
* bit of the logical significand does not need to be explicitly
* stored, thus "fractional_significand" instead of simply
* "significand" in the figure above.
*
* For finite normal numbers, the numerical value encoded is
*
* (-1)^sign * 2^(exponent)*(1.fractional_significand)
*
* Most finite floating-point numbers are normalized; the exponent
* value is reduced until the leading significand bit is 1.
* Therefore, the leading 1 is redundant and is not explicitly
* stored. If a numerical value is so small it cannot be
* normalized, it has a subnormal representation. Subnormal
* numbers don't have a leading 1 in their significand; subnormals
* are encoding using a special exponent value. In other words,
* the high-order bit of the logical significand can be elided in
* from the representation in either case since the bit's value is
* implicit from the exponent value.
*
* The exponent field uses a biased representation; if the bits of
* the exponent are interpreted as a unsigned integer E, the
* exponent represented is E - E_bias where E_bias depends on the
* floating-point format. E can range between E_min and E_max,
* constants which depend on the floating-point format. E_min and
* E_max are -126 and +127 for float, -1022 and +1023 for double.
*
* The 32-bit float format has 1 sign bit, 8 exponent bits, and 23
* bits for the significand (which is logically 24 bits wide
* because of the implicit bit). The 64-bit double format has 1
* sign bit, 11 exponent bits, and 52 bits for the significand
* (logically 53 bits).
*
* Subnormal numbers and zero have the special exponent value
* E_min -1; the numerical value represented by a subnormal is:
*
* (-1)^sign * 2^(E_min)*(0.fractional_significand)
*
* Zero is represented by all zero bits in the exponent and all
* zero bits in the significand; zero can have either sign.
*
* Infinity and NaN are encoded using the exponent value E_max +
* 1. Signed infinities have all significand bits zero; NaNs have
* at least one non-zero significand bit.
*
* The details of IEEE 754 floating-point encoding will be used in
* the methods below without further comment. For further
* exposition on IEEE 754 numbers, see "IEEE Standard for Binary
* Floating-Point Arithmetic" ANSI/IEEE Std 754-1985 or William
* Kahan's "Lecture Notes on the Status of IEEE Standard 754 for
* Binary Floating-Point Arithmetic",
* http://www.cs.berkeley.edu/~wkahan/ieee754status/ieee754.ps.
*
* Many of this class's methods are members of the set of IEEE 754
* recommended functions or similar functions recommended or
* required by IEEE 754R. Discussion of various implementation
* techniques for these functions have occurred in:
*
* W.J. Cody and Jerome T. Coonen, "Algorithm 772 Functions to
* Support the IEEE Standard for Binary Floating-Point
* Arithmetic," ACM Transactions on Mathematical Software,
* vol. 19, no. 4, December 1993, pp. 443-451.
*
* Joseph D. Darcy, "Writing robust IEEE recommended functions in
* ``100% Pure Java''(TM)," University of California, Berkeley
* technical report UCB//CSD-98-1009.
*/
Don't let anyone instantiate this class.
/**
* Don't let anyone instantiate this class.
*/
private FpUtils() {}
// Constants used in scalb
static double twoToTheDoubleScaleUp = powerOfTwoD(512);
static double twoToTheDoubleScaleDown = powerOfTwoD(-512);
// Helper Methods
// The following helper methods are used in the implementation of
// the public recommended functions; they generally omit certain
// tests for exception cases.
Returns unbiased exponent of a double
. /**
* Returns unbiased exponent of a {@code double}.
*/
public static int getExponent(double d){
/*
* Bitwise convert d to long, mask out exponent bits, shift
* to the right and then subtract out double's bias adjust to
* get true exponent value.
*/
return (int)(((Double.doubleToRawLongBits(d) & DoubleConsts.EXP_BIT_MASK) >>
(DoubleConsts.SIGNIFICAND_WIDTH - 1)) - DoubleConsts.EXP_BIAS);
}
Returns unbiased exponent of a float
. /**
* Returns unbiased exponent of a {@code float}.
*/
public static int getExponent(float f){
/*
* Bitwise convert f to integer, mask out exponent bits, shift
* to the right and then subtract out float's bias adjust to
* get true exponent value
*/
return ((Float.floatToRawIntBits(f) & FloatConsts.EXP_BIT_MASK) >>
(FloatConsts.SIGNIFICAND_WIDTH - 1)) - FloatConsts.EXP_BIAS;
}
Returns a floating-point power of two in the normal range.
/**
* Returns a floating-point power of two in the normal range.
*/
static double powerOfTwoD(int n) {
assert(n >= DoubleConsts.MIN_EXPONENT && n <= DoubleConsts.MAX_EXPONENT);
return Double.longBitsToDouble((((long)n + (long)DoubleConsts.EXP_BIAS) <<
(DoubleConsts.SIGNIFICAND_WIDTH-1))
& DoubleConsts.EXP_BIT_MASK);
}
Returns a floating-point power of two in the normal range.
/**
* Returns a floating-point power of two in the normal range.
*/
static float powerOfTwoF(int n) {
assert(n >= FloatConsts.MIN_EXPONENT && n <= FloatConsts.MAX_EXPONENT);
return Float.intBitsToFloat(((n + FloatConsts.EXP_BIAS) <<
(FloatConsts.SIGNIFICAND_WIDTH-1))
& FloatConsts.EXP_BIT_MASK);
}
Returns the first floating-point argument with the sign of the second floating-point argument. Note that unlike the copySign
method, this method does not require NaN sign
arguments to be treated as positive values; implementations are permitted to treat some NaN arguments as positive and other NaN arguments as negative to allow greater performance. Author: Joseph D. Darcy Params: - magnitude – the parameter providing the magnitude of the result
- sign – the parameter providing the sign of the result
Returns: a value with the magnitude of magnitude
and the sign of sign
.
/**
* Returns the first floating-point argument with the sign of the
* second floating-point argument. Note that unlike the {@link
* FpUtils#copySign(double, double) copySign} method, this method
* does not require NaN {@code sign} arguments to be treated
* as positive values; implementations are permitted to treat some
* NaN arguments as positive and other NaN arguments as negative
* to allow greater performance.
*
* @param magnitude the parameter providing the magnitude of the result
* @param sign the parameter providing the sign of the result
* @return a value with the magnitude of {@code magnitude}
* and the sign of {@code sign}.
* @author Joseph D. Darcy
*/
public static double rawCopySign(double magnitude, double sign) {
return Double.longBitsToDouble((Double.doubleToRawLongBits(sign) &
(DoubleConsts.SIGN_BIT_MASK)) |
(Double.doubleToRawLongBits(magnitude) &
(DoubleConsts.EXP_BIT_MASK |
DoubleConsts.SIGNIF_BIT_MASK)));
}
Returns the first floating-point argument with the sign of the second floating-point argument. Note that unlike the copySign
method, this method does not require NaN sign
arguments to be treated as positive values; implementations are permitted to treat some NaN arguments as positive and other NaN arguments as negative to allow greater performance. Author: Joseph D. Darcy Params: - magnitude – the parameter providing the magnitude of the result
- sign – the parameter providing the sign of the result
Returns: a value with the magnitude of magnitude
and the sign of sign
.
/**
* Returns the first floating-point argument with the sign of the
* second floating-point argument. Note that unlike the {@link
* FpUtils#copySign(float, float) copySign} method, this method
* does not require NaN {@code sign} arguments to be treated
* as positive values; implementations are permitted to treat some
* NaN arguments as positive and other NaN arguments as negative
* to allow greater performance.
*
* @param magnitude the parameter providing the magnitude of the result
* @param sign the parameter providing the sign of the result
* @return a value with the magnitude of {@code magnitude}
* and the sign of {@code sign}.
* @author Joseph D. Darcy
*/
public static float rawCopySign(float magnitude, float sign) {
return Float.intBitsToFloat((Float.floatToRawIntBits(sign) &
(FloatConsts.SIGN_BIT_MASK)) |
(Float.floatToRawIntBits(magnitude) &
(FloatConsts.EXP_BIT_MASK |
FloatConsts.SIGNIF_BIT_MASK)));
}
/* ***************************************************************** */
Returns true
if the argument is a finite floating-point value; returns false
otherwise (for NaN and infinity arguments). Params: - d – the
double
value to be tested
Returns: true
if the argument is a finite floating-point value, false
otherwise.
/**
* Returns {@code true} if the argument is a finite
* floating-point value; returns {@code false} otherwise (for
* NaN and infinity arguments).
*
* @param d the {@code double} value to be tested
* @return {@code true} if the argument is a finite
* floating-point value, {@code false} otherwise.
*/
public static boolean isFinite(double d) {
return Math.abs(d) <= DoubleConsts.MAX_VALUE;
}
Returns true
if the argument is a finite floating-point value; returns false
otherwise (for NaN and infinity arguments). Params: - f – the
float
value to be tested
Returns: true
if the argument is a finite floating-point value, false
otherwise.
/**
* Returns {@code true} if the argument is a finite
* floating-point value; returns {@code false} otherwise (for
* NaN and infinity arguments).
*
* @param f the {@code float} value to be tested
* @return {@code true} if the argument is a finite
* floating-point value, {@code false} otherwise.
*/
public static boolean isFinite(float f) {
return Math.abs(f) <= FloatConsts.MAX_VALUE;
}
Returns true
if the specified number is infinitely large in magnitude, false
otherwise. Note that this method is equivalent to the Double.isInfinite
method; the functionality is included in this class for convenience.
Params: - d – the value to be tested.
Returns: true
if the value of the argument is positive infinity or negative infinity; false
otherwise.
/**
* Returns {@code true} if the specified number is infinitely
* large in magnitude, {@code false} otherwise.
*
* <p>Note that this method is equivalent to the {@link
* Double#isInfinite(double) Double.isInfinite} method; the
* functionality is included in this class for convenience.
*
* @param d the value to be tested.
* @return {@code true} if the value of the argument is positive
* infinity or negative infinity; {@code false} otherwise.
*/
public static boolean isInfinite(double d) {
return Double.isInfinite(d);
}
Returns true
if the specified number is infinitely large in magnitude, false
otherwise. Note that this method is equivalent to the Float.isInfinite
method; the functionality is included in this class for convenience.
Params: - f – the value to be tested.
Returns: true
if the argument is positive infinity or negative infinity; false
otherwise.
/**
* Returns {@code true} if the specified number is infinitely
* large in magnitude, {@code false} otherwise.
*
* <p>Note that this method is equivalent to the {@link
* Float#isInfinite(float) Float.isInfinite} method; the
* functionality is included in this class for convenience.
*
* @param f the value to be tested.
* @return {@code true} if the argument is positive infinity or
* negative infinity; {@code false} otherwise.
*/
public static boolean isInfinite(float f) {
return Float.isInfinite(f);
}
Returns true
if the specified number is a Not-a-Number (NaN) value, false
otherwise. Note that this method is equivalent to the Double.isNaN
method; the functionality is included in this class for convenience.
Params: - d – the value to be tested.
Returns: true
if the value of the argument is NaN; false
otherwise.
/**
* Returns {@code true} if the specified number is a
* Not-a-Number (NaN) value, {@code false} otherwise.
*
* <p>Note that this method is equivalent to the {@link
* Double#isNaN(double) Double.isNaN} method; the functionality is
* included in this class for convenience.
*
* @param d the value to be tested.
* @return {@code true} if the value of the argument is NaN;
* {@code false} otherwise.
*/
public static boolean isNaN(double d) {
return Double.isNaN(d);
}
Returns true
if the specified number is a Not-a-Number (NaN) value, false
otherwise. Note that this method is equivalent to the Float.isNaN
method; the functionality is included in this class for convenience.
Params: - f – the value to be tested.
Returns: true
if the argument is NaN; false
otherwise.
/**
* Returns {@code true} if the specified number is a
* Not-a-Number (NaN) value, {@code false} otherwise.
*
* <p>Note that this method is equivalent to the {@link
* Float#isNaN(float) Float.isNaN} method; the functionality is
* included in this class for convenience.
*
* @param f the value to be tested.
* @return {@code true} if the argument is NaN;
* {@code false} otherwise.
*/
public static boolean isNaN(float f) {
return Float.isNaN(f);
}
Returns true
if the unordered relation holds between the two arguments. When two floating-point values are unordered, one value is neither less than, equal to, nor greater than the other. For the unordered relation to be true, at least one argument must be a NaN
. Params: - arg1 – the first argument
- arg2 – the second argument
Returns: true
if at least one argument is a NaN, false
otherwise.
/**
* Returns {@code true} if the unordered relation holds
* between the two arguments. When two floating-point values are
* unordered, one value is neither less than, equal to, nor
* greater than the other. For the unordered relation to be true,
* at least one argument must be a {@code NaN}.
*
* @param arg1 the first argument
* @param arg2 the second argument
* @return {@code true} if at least one argument is a NaN,
* {@code false} otherwise.
*/
public static boolean isUnordered(double arg1, double arg2) {
return isNaN(arg1) || isNaN(arg2);
}
Returns true
if the unordered relation holds between the two arguments. When two floating-point values are unordered, one value is neither less than, equal to, nor greater than the other. For the unordered relation to be true, at least one argument must be a NaN
. Params: - arg1 – the first argument
- arg2 – the second argument
Returns: true
if at least one argument is a NaN, false
otherwise.
/**
* Returns {@code true} if the unordered relation holds
* between the two arguments. When two floating-point values are
* unordered, one value is neither less than, equal to, nor
* greater than the other. For the unordered relation to be true,
* at least one argument must be a {@code NaN}.
*
* @param arg1 the first argument
* @param arg2 the second argument
* @return {@code true} if at least one argument is a NaN,
* {@code false} otherwise.
*/
public static boolean isUnordered(float arg1, float arg2) {
return isNaN(arg1) || isNaN(arg2);
}
Returns unbiased exponent of a double
; for subnormal values, the number is treated as if it were normalized. That is for all finite, non-zero, positive numbers x, scalb(x, -ilogb(x))
is
always in the range [1, 2).
Special cases:
- If the argument is NaN, then the result is 230.
- If the argument is infinite, then the result is 228.
- If the argument is zero, then the result is -(228).
Author: Joseph D. Darcy Params: - d – floating-point number whose exponent is to be extracted
Returns: unbiased exponent of the argument.
/**
* Returns unbiased exponent of a {@code double}; for
* subnormal values, the number is treated as if it were
* normalized. That is for all finite, non-zero, positive numbers
* <i>x</i>, <code>scalb(<i>x</i>, -ilogb(<i>x</i>))</code> is
* always in the range [1, 2).
* <p>
* Special cases:
* <ul>
* <li> If the argument is NaN, then the result is 2<sup>30</sup>.
* <li> If the argument is infinite, then the result is 2<sup>28</sup>.
* <li> If the argument is zero, then the result is -(2<sup>28</sup>).
* </ul>
*
* @param d floating-point number whose exponent is to be extracted
* @return unbiased exponent of the argument.
* @author Joseph D. Darcy
*/
public static int ilogb(double d) {
int exponent = getExponent(d);
switch (exponent) {
case DoubleConsts.MAX_EXPONENT+1: // NaN or infinity
if( isNaN(d) )
return (1<<30); // 2^30
else // infinite value
return (1<<28); // 2^28
case DoubleConsts.MIN_EXPONENT-1: // zero or subnormal
if(d == 0.0) {
return -(1<<28); // -(2^28)
}
else {
long transducer = Double.doubleToRawLongBits(d);
/*
* To avoid causing slow arithmetic on subnormals,
* the scaling to determine when d's significand
* is normalized is done in integer arithmetic.
* (there must be at least one "1" bit in the
* significand since zero has been screened out.
*/
// isolate significand bits
transducer &= DoubleConsts.SIGNIF_BIT_MASK;
assert(transducer != 0L);
// This loop is simple and functional. We might be
// able to do something more clever that was faster;
// e.g. number of leading zero detection on
// (transducer << (# exponent and sign bits).
while (transducer <
(1L << (DoubleConsts.SIGNIFICAND_WIDTH - 1))) {
transducer *= 2;
exponent--;
}
exponent++;
assert( exponent >=
DoubleConsts.MIN_EXPONENT - (DoubleConsts.SIGNIFICAND_WIDTH-1) &&
exponent < DoubleConsts.MIN_EXPONENT);
return exponent;
}
default:
assert( exponent >= DoubleConsts.MIN_EXPONENT &&
exponent <= DoubleConsts.MAX_EXPONENT);
return exponent;
}
}
Returns unbiased exponent of a float
; for subnormal values, the number is treated as if it were normalized. That is for all finite, non-zero, positive numbers x, scalb(x, -ilogb(x))
is
always in the range [1, 2).
Special cases:
- If the argument is NaN, then the result is 230.
- If the argument is infinite, then the result is 228.
- If the argument is zero, then the result is -(228).
Author: Joseph D. Darcy Params: - f – floating-point number whose exponent is to be extracted
Returns: unbiased exponent of the argument.
/**
* Returns unbiased exponent of a {@code float}; for
* subnormal values, the number is treated as if it were
* normalized. That is for all finite, non-zero, positive numbers
* <i>x</i>, <code>scalb(<i>x</i>, -ilogb(<i>x</i>))</code> is
* always in the range [1, 2).
* <p>
* Special cases:
* <ul>
* <li> If the argument is NaN, then the result is 2<sup>30</sup>.
* <li> If the argument is infinite, then the result is 2<sup>28</sup>.
* <li> If the argument is zero, then the result is -(2<sup>28</sup>).
* </ul>
*
* @param f floating-point number whose exponent is to be extracted
* @return unbiased exponent of the argument.
* @author Joseph D. Darcy
*/
public static int ilogb(float f) {
int exponent = getExponent(f);
switch (exponent) {
case FloatConsts.MAX_EXPONENT+1: // NaN or infinity
if( isNaN(f) )
return (1<<30); // 2^30
else // infinite value
return (1<<28); // 2^28
case FloatConsts.MIN_EXPONENT-1: // zero or subnormal
if(f == 0.0f) {
return -(1<<28); // -(2^28)
}
else {
int transducer = Float.floatToRawIntBits(f);
/*
* To avoid causing slow arithmetic on subnormals,
* the scaling to determine when f's significand
* is normalized is done in integer arithmetic.
* (there must be at least one "1" bit in the
* significand since zero has been screened out.
*/
// isolate significand bits
transducer &= FloatConsts.SIGNIF_BIT_MASK;
assert(transducer != 0);
// This loop is simple and functional. We might be
// able to do something more clever that was faster;
// e.g. number of leading zero detection on
// (transducer << (# exponent and sign bits).
while (transducer <
(1 << (FloatConsts.SIGNIFICAND_WIDTH - 1))) {
transducer *= 2;
exponent--;
}
exponent++;
assert( exponent >=
FloatConsts.MIN_EXPONENT - (FloatConsts.SIGNIFICAND_WIDTH-1) &&
exponent < FloatConsts.MIN_EXPONENT);
return exponent;
}
default:
assert( exponent >= FloatConsts.MIN_EXPONENT &&
exponent <= FloatConsts.MAX_EXPONENT);
return exponent;
}
}
/*
* The scalb operation should be reasonably fast; however, there
* are tradeoffs in writing a method to minimize the worst case
* performance and writing a method to minimize the time for
* expected common inputs. Some processors operate very slowly on
* subnormal operands, taking hundreds or thousands of cycles for
* one floating-point add or multiply as opposed to, say, four
* cycles for normal operands. For processors with very slow
* subnormal execution, scalb would be fastest if written entirely
* with integer operations; in other words, scalb would need to
* include the logic of performing correct rounding of subnormal
* values. This could be reasonably done in at most a few hundred
* cycles. However, this approach may penalize normal operations
* since at least the exponent of the floating-point argument must
* be examined.
*
* The approach taken in this implementation is a compromise.
* Floating-point multiplication is used to do most of the work;
* but knowingly multiplying by a subnormal scaling factor is
* avoided. However, the floating-point argument is not examined
* to see whether or not it is subnormal since subnormal inputs
* are assumed to be rare. At most three multiplies are needed to
* scale from the largest to smallest exponent ranges (scaling
* down, at most two multiplies are needed if subnormal scaling
* factors are allowed). However, in this implementation an
* expensive integer remainder operation is avoided at the cost of
* requiring five floating-point multiplies in the worst case,
* which should still be a performance win.
*
* If scaling of entire arrays is a concern, it would probably be
* more efficient to provide a double[] scalb(double[], int)
* version of scalb to avoid having to recompute the needed
* scaling factors for each floating-point value.
*/
Return d
× 2scale_factor
rounded as if performed
by a single correctly rounded floating-point multiply to a
member of the double value set. See section 4.2.3 of
The Java™ Language Specification for a discussion of floating-point value sets. If the exponent of the result is between the double
's minimum exponent and maximum exponent, the answer is calculated exactly. If the exponent of the result would be larger than doubles
's maximum exponent, an infinity is returned. Note that if the result is subnormal, precision may be lost; that is, when scalb(x,
n)
is subnormal, scalb(scalb(x, n), -n)
may not equal x. When the result is non-NaN, the result has the same sign as d
.
Special cases:
- If the first argument is NaN, NaN is returned.
- If the first argument is infinite, then an infinity of the
same sign is returned.
- If the first argument is zero, then a zero of the same
sign is returned.
Author: Joseph D. Darcy Params: - d – number to be scaled by a power of two.
- scale_factor – power of 2 used to scale
d
Returns: d *
2scale_factor
/**
* Return {@code d} ×
* 2<sup>{@code scale_factor}</sup> rounded as if performed
* by a single correctly rounded floating-point multiply to a
* member of the double value set. See section 4.2.3 of
* <cite>The Java™ Language Specification</cite>
* for a discussion of floating-point
* value sets. If the exponent of the result is between the
* {@code double}'s minimum exponent and maximum exponent,
* the answer is calculated exactly. If the exponent of the
* result would be larger than {@code doubles}'s maximum
* exponent, an infinity is returned. Note that if the result is
* subnormal, precision may be lost; that is, when {@code scalb(x,
* n)} is subnormal, {@code scalb(scalb(x, n), -n)} may
* not equal <i>x</i>. When the result is non-NaN, the result has
* the same sign as {@code d}.
*
*<p>
* Special cases:
* <ul>
* <li> If the first argument is NaN, NaN is returned.
* <li> If the first argument is infinite, then an infinity of the
* same sign is returned.
* <li> If the first argument is zero, then a zero of the same
* sign is returned.
* </ul>
*
* @param d number to be scaled by a power of two.
* @param scale_factor power of 2 used to scale {@code d}
* @return {@code d * }2<sup>{@code scale_factor}</sup>
* @author Joseph D. Darcy
*/
public static double scalb(double d, int scale_factor) {
/*
* This method does not need to be declared strictfp to
* compute the same correct result on all platforms. When
* scaling up, it does not matter what order the
* multiply-store operations are done; the result will be
* finite or overflow regardless of the operation ordering.
* However, to get the correct result when scaling down, a
* particular ordering must be used.
*
* When scaling down, the multiply-store operations are
* sequenced so that it is not possible for two consecutive
* multiply-stores to return subnormal results. If one
* multiply-store result is subnormal, the next multiply will
* round it away to zero. This is done by first multiplying
* by 2 ^ (scale_factor % n) and then multiplying several
* times by by 2^n as needed where n is the exponent of number
* that is a covenient power of two. In this way, at most one
* real rounding error occurs. If the double value set is
* being used exclusively, the rounding will occur on a
* multiply. If the double-extended-exponent value set is
* being used, the products will (perhaps) be exact but the
* stores to d are guaranteed to round to the double value
* set.
*
* It is _not_ a valid implementation to first multiply d by
* 2^MIN_EXPONENT and then by 2 ^ (scale_factor %
* MIN_EXPONENT) since even in a strictfp program double
* rounding on underflow could occur; e.g. if the scale_factor
* argument was (MIN_EXPONENT - n) and the exponent of d was a
* little less than -(MIN_EXPONENT - n), meaning the final
* result would be subnormal.
*
* Since exact reproducibility of this method can be achieved
* without any undue performance burden, there is no
* compelling reason to allow double rounding on underflow in
* scalb.
*/
// magnitude of a power of two so large that scaling a finite
// nonzero value by it would be guaranteed to over or
// underflow; due to rounding, scaling down takes takes an
// additional power of two which is reflected here
final int MAX_SCALE = DoubleConsts.MAX_EXPONENT + -DoubleConsts.MIN_EXPONENT +
DoubleConsts.SIGNIFICAND_WIDTH + 1;
int exp_adjust = 0;
int scale_increment = 0;
double exp_delta = Double.NaN;
// Make sure scaling factor is in a reasonable range
if(scale_factor < 0) {
scale_factor = Math.max(scale_factor, -MAX_SCALE);
scale_increment = -512;
exp_delta = twoToTheDoubleScaleDown;
}
else {
scale_factor = Math.min(scale_factor, MAX_SCALE);
scale_increment = 512;
exp_delta = twoToTheDoubleScaleUp;
}
// Calculate (scale_factor % +/-512), 512 = 2^9, using
// technique from "Hacker's Delight" section 10-2.
int t = (scale_factor >> 9-1) >>> 32 - 9;
exp_adjust = ((scale_factor + t) & (512 -1)) - t;
d *= powerOfTwoD(exp_adjust);
scale_factor -= exp_adjust;
while(scale_factor != 0) {
d *= exp_delta;
scale_factor -= scale_increment;
}
return d;
}
Return f
× 2scale_factor
rounded as if performed
by a single correctly rounded floating-point multiply to a
member of the float value set. See section 4.2.3 of
The Java™ Language Specification for a discussion of floating-point value sets. If the exponent of the result is between the float
's minimum exponent and maximum exponent, the answer is calculated exactly. If the exponent of the result would be larger than float
's maximum exponent, an infinity is returned. Note that if the result is subnormal, precision may be lost; that is, when scalb(x, n)
is subnormal, scalb(scalb(x, n), -n)
may not equal x. When the result is non-NaN, the result has the same sign as f
.
Special cases:
- If the first argument is NaN, NaN is returned.
- If the first argument is infinite, then an infinity of the
same sign is returned.
- If the first argument is zero, then a zero of the same
sign is returned.
Author: Joseph D. Darcy Params: - f – number to be scaled by a power of two.
- scale_factor – power of 2 used to scale
f
Returns: f *
2scale_factor
/**
* Return {@code f} ×
* 2<sup>{@code scale_factor}</sup> rounded as if performed
* by a single correctly rounded floating-point multiply to a
* member of the float value set. See section 4.2.3 of
* <cite>The Java™ Language Specification</cite>
* for a discussion of floating-point
* value sets. If the exponent of the result is between the
* {@code float}'s minimum exponent and maximum exponent, the
* answer is calculated exactly. If the exponent of the result
* would be larger than {@code float}'s maximum exponent, an
* infinity is returned. Note that if the result is subnormal,
* precision may be lost; that is, when {@code scalb(x, n)}
* is subnormal, {@code scalb(scalb(x, n), -n)} may not equal
* <i>x</i>. When the result is non-NaN, the result has the same
* sign as {@code f}.
*
*<p>
* Special cases:
* <ul>
* <li> If the first argument is NaN, NaN is returned.
* <li> If the first argument is infinite, then an infinity of the
* same sign is returned.
* <li> If the first argument is zero, then a zero of the same
* sign is returned.
* </ul>
*
* @param f number to be scaled by a power of two.
* @param scale_factor power of 2 used to scale {@code f}
* @return {@code f * }2<sup>{@code scale_factor}</sup>
* @author Joseph D. Darcy
*/
public static float scalb(float f, int scale_factor) {
// magnitude of a power of two so large that scaling a finite
// nonzero value by it would be guaranteed to over or
// underflow; due to rounding, scaling down takes takes an
// additional power of two which is reflected here
final int MAX_SCALE = FloatConsts.MAX_EXPONENT + -FloatConsts.MIN_EXPONENT +
FloatConsts.SIGNIFICAND_WIDTH + 1;
// Make sure scaling factor is in a reasonable range
scale_factor = Math.max(Math.min(scale_factor, MAX_SCALE), -MAX_SCALE);
/*
* Since + MAX_SCALE for float fits well within the double
* exponent range and + float -> double conversion is exact
* the multiplication below will be exact. Therefore, the
* rounding that occurs when the double product is cast to
* float will be the correctly rounded float result. Since
* all operations other than the final multiply will be exact,
* it is not necessary to declare this method strictfp.
*/
return (float)((double)f*powerOfTwoD(scale_factor));
}
Returns the floating-point number adjacent to the first
argument in the direction of the second argument. If both
arguments compare as equal the second argument is returned.
Special cases:
- If either argument is a NaN, then NaN is returned.
- If both arguments are signed zeros,
direction
is returned unchanged (as implied by the requirement of returning the second argument if the arguments compare as equal). - If
start
is ±Double.MIN_VALUE
and direction
has a value such that the result should have a smaller magnitude, then a zero with the same sign as start
is returned. - If
start
is infinite and direction
has a value such that the result should have a smaller magnitude, Double.MAX_VALUE
with the same sign as start
is returned. - If
start
is equal to ± Double.MAX_VALUE
and direction
has a value such that the result should have a larger magnitude, an infinity with same sign as start
is returned.
Author: Joseph D. Darcy Params: - start – starting floating-point value
- direction – value indicating which of
start
's neighbors or start
should be returned
Returns: The floating-point number adjacent to start
in the direction of direction
.
/**
* Returns the floating-point number adjacent to the first
* argument in the direction of the second argument. If both
* arguments compare as equal the second argument is returned.
*
* <p>
* Special cases:
* <ul>
* <li> If either argument is a NaN, then NaN is returned.
*
* <li> If both arguments are signed zeros, {@code direction}
* is returned unchanged (as implied by the requirement of
* returning the second argument if the arguments compare as
* equal).
*
* <li> If {@code start} is
* ±{@code Double.MIN_VALUE} and {@code direction}
* has a value such that the result should have a smaller
* magnitude, then a zero with the same sign as {@code start}
* is returned.
*
* <li> If {@code start} is infinite and
* {@code direction} has a value such that the result should
* have a smaller magnitude, {@code Double.MAX_VALUE} with the
* same sign as {@code start} is returned.
*
* <li> If {@code start} is equal to ±
* {@code Double.MAX_VALUE} and {@code direction} has a
* value such that the result should have a larger magnitude, an
* infinity with same sign as {@code start} is returned.
* </ul>
*
* @param start starting floating-point value
* @param direction value indicating which of
* {@code start}'s neighbors or {@code start} should
* be returned
* @return The floating-point number adjacent to {@code start} in the
* direction of {@code direction}.
* @author Joseph D. Darcy
*/
public static double nextAfter(double start, double direction) {
/*
* The cases:
*
* nextAfter(+infinity, 0) == MAX_VALUE
* nextAfter(+infinity, +infinity) == +infinity
* nextAfter(-infinity, 0) == -MAX_VALUE
* nextAfter(-infinity, -infinity) == -infinity
*
* are naturally handled without any additional testing
*/
// First check for NaN values
if (isNaN(start) || isNaN(direction)) {
// return a NaN derived from the input NaN(s)
return start + direction;
} else if (start == direction) {
return direction;
} else { // start > direction or start < direction
// Add +0.0 to get rid of a -0.0 (+0.0 + -0.0 => +0.0)
// then bitwise convert start to integer.
long transducer = Double.doubleToRawLongBits(start + 0.0d);
/*
* IEEE 754 floating-point numbers are lexicographically
* ordered if treated as signed- magnitude integers .
* Since Java's integers are two's complement,
* incrementing" the two's complement representation of a
* logically negative floating-point value *decrements*
* the signed-magnitude representation. Therefore, when
* the integer representation of a floating-point values
* is less than zero, the adjustment to the representation
* is in the opposite direction than would be expected at
* first .
*/
if (direction > start) { // Calculate next greater value
transducer = transducer + (transducer >= 0L ? 1L:-1L);
} else { // Calculate next lesser value
assert direction < start;
if (transducer > 0L)
--transducer;
else
if (transducer < 0L )
++transducer;
/*
* transducer==0, the result is -MIN_VALUE
*
* The transition from zero (implicitly
* positive) to the smallest negative
* signed magnitude value must be done
* explicitly.
*/
else
transducer = DoubleConsts.SIGN_BIT_MASK | 1L;
}
return Double.longBitsToDouble(transducer);
}
}
Returns the floating-point number adjacent to the first
argument in the direction of the second argument. If both
arguments compare as equal, the second argument is returned.
Special cases:
- If either argument is a NaN, then NaN is returned.
- If both arguments are signed zeros, a
float
zero with the same sign as direction
is returned (as implied by the requirement of returning the second argument if the arguments compare as equal). - If
start
is ±Float.MIN_VALUE
and direction
has a value such that the result should have a smaller magnitude, then a zero with the same sign as start
is returned. - If
start
is infinite and direction
has a value such that the result should have a smaller magnitude, Float.MAX_VALUE
with the same sign as start
is returned. - If
start
is equal to ± Float.MAX_VALUE
and direction
has a value such that the result should have a larger magnitude, an infinity with same sign as start
is returned.
Author: Joseph D. Darcy Params: - start – starting floating-point value
- direction – value indicating which of
start
's neighbors or start
should be returned
Returns: The floating-point number adjacent to start
in the direction of direction
.
/**
* Returns the floating-point number adjacent to the first
* argument in the direction of the second argument. If both
* arguments compare as equal, the second argument is returned.
*
* <p>
* Special cases:
* <ul>
* <li> If either argument is a NaN, then NaN is returned.
*
* <li> If both arguments are signed zeros, a {@code float}
* zero with the same sign as {@code direction} is returned
* (as implied by the requirement of returning the second argument
* if the arguments compare as equal).
*
* <li> If {@code start} is
* ±{@code Float.MIN_VALUE} and {@code direction}
* has a value such that the result should have a smaller
* magnitude, then a zero with the same sign as {@code start}
* is returned.
*
* <li> If {@code start} is infinite and
* {@code direction} has a value such that the result should
* have a smaller magnitude, {@code Float.MAX_VALUE} with the
* same sign as {@code start} is returned.
*
* <li> If {@code start} is equal to ±
* {@code Float.MAX_VALUE} and {@code direction} has a
* value such that the result should have a larger magnitude, an
* infinity with same sign as {@code start} is returned.
* </ul>
*
* @param start starting floating-point value
* @param direction value indicating which of
* {@code start}'s neighbors or {@code start} should
* be returned
* @return The floating-point number adjacent to {@code start} in the
* direction of {@code direction}.
* @author Joseph D. Darcy
*/
public static float nextAfter(float start, double direction) {
/*
* The cases:
*
* nextAfter(+infinity, 0) == MAX_VALUE
* nextAfter(+infinity, +infinity) == +infinity
* nextAfter(-infinity, 0) == -MAX_VALUE
* nextAfter(-infinity, -infinity) == -infinity
*
* are naturally handled without any additional testing
*/
// First check for NaN values
if (isNaN(start) || isNaN(direction)) {
// return a NaN derived from the input NaN(s)
return start + (float)direction;
} else if (start == direction) {
return (float)direction;
} else { // start > direction or start < direction
// Add +0.0 to get rid of a -0.0 (+0.0 + -0.0 => +0.0)
// then bitwise convert start to integer.
int transducer = Float.floatToRawIntBits(start + 0.0f);
/*
* IEEE 754 floating-point numbers are lexicographically
* ordered if treated as signed- magnitude integers .
* Since Java's integers are two's complement,
* incrementing" the two's complement representation of a
* logically negative floating-point value *decrements*
* the signed-magnitude representation. Therefore, when
* the integer representation of a floating-point values
* is less than zero, the adjustment to the representation
* is in the opposite direction than would be expected at
* first.
*/
if (direction > start) {// Calculate next greater value
transducer = transducer + (transducer >= 0 ? 1:-1);
} else { // Calculate next lesser value
assert direction < start;
if (transducer > 0)
--transducer;
else
if (transducer < 0 )
++transducer;
/*
* transducer==0, the result is -MIN_VALUE
*
* The transition from zero (implicitly
* positive) to the smallest negative
* signed magnitude value must be done
* explicitly.
*/
else
transducer = FloatConsts.SIGN_BIT_MASK | 1;
}
return Float.intBitsToFloat(transducer);
}
}
Returns the floating-point value adjacent to d
in the direction of positive infinity. This method is semantically equivalent to nextAfter(d,
Double.POSITIVE_INFINITY)
; however, a nextUp
implementation may run faster than its equivalent nextAfter
call. Special Cases:
- If the argument is NaN, the result is NaN.
- If the argument is positive infinity, the result is
positive infinity.
- If the argument is zero, the result is
Double.MIN_VALUE
Author: Joseph D. Darcy Params: - d – starting floating-point value
Returns: The adjacent floating-point value closer to positive
infinity.
/**
* Returns the floating-point value adjacent to {@code d} in
* the direction of positive infinity. This method is
* semantically equivalent to {@code nextAfter(d,
* Double.POSITIVE_INFINITY)}; however, a {@code nextUp}
* implementation may run faster than its equivalent
* {@code nextAfter} call.
*
* <p>Special Cases:
* <ul>
* <li> If the argument is NaN, the result is NaN.
*
* <li> If the argument is positive infinity, the result is
* positive infinity.
*
* <li> If the argument is zero, the result is
* {@code Double.MIN_VALUE}
*
* </ul>
*
* @param d starting floating-point value
* @return The adjacent floating-point value closer to positive
* infinity.
* @author Joseph D. Darcy
*/
public static double nextUp(double d) {
if( isNaN(d) || d == Double.POSITIVE_INFINITY)
return d;
else {
d += 0.0d;
return Double.longBitsToDouble(Double.doubleToRawLongBits(d) +
((d >= 0.0d)?+1L:-1L));
}
}
Returns the floating-point value adjacent to f
in the direction of positive infinity. This method is semantically equivalent to nextAfter(f,
Double.POSITIVE_INFINITY)
; however, a nextUp
implementation may run faster than its equivalent nextAfter
call. Special Cases:
- If the argument is NaN, the result is NaN.
- If the argument is positive infinity, the result is
positive infinity.
- If the argument is zero, the result is
Float.MIN_VALUE
Author: Joseph D. Darcy Params: - f – starting floating-point value
Returns: The adjacent floating-point value closer to positive
infinity.
/**
* Returns the floating-point value adjacent to {@code f} in
* the direction of positive infinity. This method is
* semantically equivalent to {@code nextAfter(f,
* Double.POSITIVE_INFINITY)}; however, a {@code nextUp}
* implementation may run faster than its equivalent
* {@code nextAfter} call.
*
* <p>Special Cases:
* <ul>
* <li> If the argument is NaN, the result is NaN.
*
* <li> If the argument is positive infinity, the result is
* positive infinity.
*
* <li> If the argument is zero, the result is
* {@code Float.MIN_VALUE}
*
* </ul>
*
* @param f starting floating-point value
* @return The adjacent floating-point value closer to positive
* infinity.
* @author Joseph D. Darcy
*/
public static float nextUp(float f) {
if( isNaN(f) || f == FloatConsts.POSITIVE_INFINITY)
return f;
else {
f += 0.0f;
return Float.intBitsToFloat(Float.floatToRawIntBits(f) +
((f >= 0.0f)?+1:-1));
}
}
Returns the floating-point value adjacent to d
in the direction of negative infinity. This method is semantically equivalent to nextAfter(d,
Double.NEGATIVE_INFINITY)
; however, a nextDown
implementation may run faster than its equivalent nextAfter
call. Special Cases:
- If the argument is NaN, the result is NaN.
- If the argument is negative infinity, the result is
negative infinity.
- If the argument is zero, the result is
-Double.MIN_VALUE
Author: Joseph D. Darcy Params: - d – starting floating-point value
Returns: The adjacent floating-point value closer to negative
infinity.
/**
* Returns the floating-point value adjacent to {@code d} in
* the direction of negative infinity. This method is
* semantically equivalent to {@code nextAfter(d,
* Double.NEGATIVE_INFINITY)}; however, a
* {@code nextDown} implementation may run faster than its
* equivalent {@code nextAfter} call.
*
* <p>Special Cases:
* <ul>
* <li> If the argument is NaN, the result is NaN.
*
* <li> If the argument is negative infinity, the result is
* negative infinity.
*
* <li> If the argument is zero, the result is
* {@code -Double.MIN_VALUE}
*
* </ul>
*
* @param d starting floating-point value
* @return The adjacent floating-point value closer to negative
* infinity.
* @author Joseph D. Darcy
*/
public static double nextDown(double d) {
if( isNaN(d) || d == Double.NEGATIVE_INFINITY)
return d;
else {
if (d == 0.0)
return -Double.MIN_VALUE;
else
return Double.longBitsToDouble(Double.doubleToRawLongBits(d) +
((d > 0.0d)?-1L:+1L));
}
}
Returns the floating-point value adjacent to f
in the direction of negative infinity. This method is semantically equivalent to nextAfter(f,
Float.NEGATIVE_INFINITY)
; however, a nextDown
implementation may run faster than its equivalent nextAfter
call. Special Cases:
- If the argument is NaN, the result is NaN.
- If the argument is negative infinity, the result is
negative infinity.
- If the argument is zero, the result is
-Float.MIN_VALUE
Author: Joseph D. Darcy Params: - f – starting floating-point value
Returns: The adjacent floating-point value closer to negative
infinity.
/**
* Returns the floating-point value adjacent to {@code f} in
* the direction of negative infinity. This method is
* semantically equivalent to {@code nextAfter(f,
* Float.NEGATIVE_INFINITY)}; however, a
* {@code nextDown} implementation may run faster than its
* equivalent {@code nextAfter} call.
*
* <p>Special Cases:
* <ul>
* <li> If the argument is NaN, the result is NaN.
*
* <li> If the argument is negative infinity, the result is
* negative infinity.
*
* <li> If the argument is zero, the result is
* {@code -Float.MIN_VALUE}
*
* </ul>
*
* @param f starting floating-point value
* @return The adjacent floating-point value closer to negative
* infinity.
* @author Joseph D. Darcy
*/
public static double nextDown(float f) {
if( isNaN(f) || f == Float.NEGATIVE_INFINITY)
return f;
else {
if (f == 0.0f)
return -Float.MIN_VALUE;
else
return Float.intBitsToFloat(Float.floatToRawIntBits(f) +
((f > 0.0f)?-1:+1));
}
}
Returns the first floating-point argument with the sign of the second floating-point argument. For this method, a NaN sign
argument is always treated as if it were positive. Author: Joseph D. Darcy Params: - magnitude – the parameter providing the magnitude of the result
- sign – the parameter providing the sign of the result
Returns: a value with the magnitude of magnitude
and the sign of sign
. Since: 1.5
/**
* Returns the first floating-point argument with the sign of the
* second floating-point argument. For this method, a NaN
* {@code sign} argument is always treated as if it were
* positive.
*
* @param magnitude the parameter providing the magnitude of the result
* @param sign the parameter providing the sign of the result
* @return a value with the magnitude of {@code magnitude}
* and the sign of {@code sign}.
* @author Joseph D. Darcy
* @since 1.5
*/
public static double copySign(double magnitude, double sign) {
return rawCopySign(magnitude, (isNaN(sign)?1.0d:sign));
}
Returns the first floating-point argument with the sign of the second floating-point argument. For this method, a NaN sign
argument is always treated as if it were positive. Author: Joseph D. Darcy Params: - magnitude – the parameter providing the magnitude of the result
- sign – the parameter providing the sign of the result
Returns: a value with the magnitude of magnitude
and the sign of sign
.
/**
* Returns the first floating-point argument with the sign of the
* second floating-point argument. For this method, a NaN
* {@code sign} argument is always treated as if it were
* positive.
*
* @param magnitude the parameter providing the magnitude of the result
* @param sign the parameter providing the sign of the result
* @return a value with the magnitude of {@code magnitude}
* and the sign of {@code sign}.
* @author Joseph D. Darcy
*/
public static float copySign(float magnitude, float sign) {
return rawCopySign(magnitude, (isNaN(sign)?1.0f:sign));
}
Returns the size of an ulp of the argument. An ulp of a double
value is the positive distance between this floating-point value and the double
value next larger in magnitude. Note that for non-NaN x,
ulp(-x) == ulp(x)
.
Special Cases:
- If the argument is NaN, then the result is NaN.
- If the argument is positive or negative infinity, then the
result is positive infinity.
- If the argument is positive or negative zero, then the result is
Double.MIN_VALUE
. - If the argument is ±
Double.MAX_VALUE
, then the result is equal to 2971.
Author: Joseph D. Darcy Params: - d – the floating-point value whose ulp is to be returned
Returns: the size of an ulp of the argument Since: 1.5
/**
* Returns the size of an ulp of the argument. An ulp of a
* {@code double} value is the positive distance between this
* floating-point value and the {@code double} value next
* larger in magnitude. Note that for non-NaN <i>x</i>,
* <code>ulp(-<i>x</i>) == ulp(<i>x</i>)</code>.
*
* <p>Special Cases:
* <ul>
* <li> If the argument is NaN, then the result is NaN.
* <li> If the argument is positive or negative infinity, then the
* result is positive infinity.
* <li> If the argument is positive or negative zero, then the result is
* {@code Double.MIN_VALUE}.
* <li> If the argument is ±{@code Double.MAX_VALUE}, then
* the result is equal to 2<sup>971</sup>.
* </ul>
*
* @param d the floating-point value whose ulp is to be returned
* @return the size of an ulp of the argument
* @author Joseph D. Darcy
* @since 1.5
*/
public static double ulp(double d) {
int exp = getExponent(d);
switch(exp) {
case DoubleConsts.MAX_EXPONENT+1: // NaN or infinity
return Math.abs(d);
case DoubleConsts.MIN_EXPONENT-1: // zero or subnormal
return Double.MIN_VALUE;
default:
assert exp <= DoubleConsts.MAX_EXPONENT && exp >= DoubleConsts.MIN_EXPONENT;
// ulp(x) is usually 2^(SIGNIFICAND_WIDTH-1)*(2^ilogb(x))
exp = exp - (DoubleConsts.SIGNIFICAND_WIDTH-1);
if (exp >= DoubleConsts.MIN_EXPONENT) {
return powerOfTwoD(exp);
}
else {
// return a subnormal result; left shift integer
// representation of Double.MIN_VALUE appropriate
// number of positions
return Double.longBitsToDouble(1L <<
(exp - (DoubleConsts.MIN_EXPONENT - (DoubleConsts.SIGNIFICAND_WIDTH-1)) ));
}
}
}
Returns the size of an ulp of the argument. An ulp of a float
value is the positive distance between this floating-point value and the float
value next larger in magnitude. Note that for non-NaN x,
ulp(-x) == ulp(x)
.
Special Cases:
- If the argument is NaN, then the result is NaN.
- If the argument is positive or negative infinity, then the
result is positive infinity.
- If the argument is positive or negative zero, then the result is
Float.MIN_VALUE
. - If the argument is ±
Float.MAX_VALUE
, then the result is equal to 2104.
Author: Joseph D. Darcy Params: - f – the floating-point value whose ulp is to be returned
Returns: the size of an ulp of the argument Since: 1.5
/**
* Returns the size of an ulp of the argument. An ulp of a
* {@code float} value is the positive distance between this
* floating-point value and the {@code float} value next
* larger in magnitude. Note that for non-NaN <i>x</i>,
* <code>ulp(-<i>x</i>) == ulp(<i>x</i>)</code>.
*
* <p>Special Cases:
* <ul>
* <li> If the argument is NaN, then the result is NaN.
* <li> If the argument is positive or negative infinity, then the
* result is positive infinity.
* <li> If the argument is positive or negative zero, then the result is
* {@code Float.MIN_VALUE}.
* <li> If the argument is ±{@code Float.MAX_VALUE}, then
* the result is equal to 2<sup>104</sup>.
* </ul>
*
* @param f the floating-point value whose ulp is to be returned
* @return the size of an ulp of the argument
* @author Joseph D. Darcy
* @since 1.5
*/
public static float ulp(float f) {
int exp = getExponent(f);
switch(exp) {
case FloatConsts.MAX_EXPONENT+1: // NaN or infinity
return Math.abs(f);
case FloatConsts.MIN_EXPONENT-1: // zero or subnormal
return FloatConsts.MIN_VALUE;
default:
assert exp <= FloatConsts.MAX_EXPONENT && exp >= FloatConsts.MIN_EXPONENT;
// ulp(x) is usually 2^(SIGNIFICAND_WIDTH-1)*(2^ilogb(x))
exp = exp - (FloatConsts.SIGNIFICAND_WIDTH-1);
if (exp >= FloatConsts.MIN_EXPONENT) {
return powerOfTwoF(exp);
}
else {
// return a subnormal result; left shift integer
// representation of FloatConsts.MIN_VALUE appropriate
// number of positions
return Float.intBitsToFloat(1 <<
(exp - (FloatConsts.MIN_EXPONENT - (FloatConsts.SIGNIFICAND_WIDTH-1)) ));
}
}
}
Returns the signum function of the argument; zero if the argument
is zero, 1.0 if the argument is greater than zero, -1.0 if the
argument is less than zero.
Special Cases:
- If the argument is NaN, then the result is NaN.
- If the argument is positive zero or negative zero, then the
result is the same as the argument.
Author: Joseph D. Darcy Params: - d – the floating-point value whose signum is to be returned
Returns: the signum function of the argument Since: 1.5
/**
* Returns the signum function of the argument; zero if the argument
* is zero, 1.0 if the argument is greater than zero, -1.0 if the
* argument is less than zero.
*
* <p>Special Cases:
* <ul>
* <li> If the argument is NaN, then the result is NaN.
* <li> If the argument is positive zero or negative zero, then the
* result is the same as the argument.
* </ul>
*
* @param d the floating-point value whose signum is to be returned
* @return the signum function of the argument
* @author Joseph D. Darcy
* @since 1.5
*/
public static double signum(double d) {
return (d == 0.0 || isNaN(d))?d:copySign(1.0, d);
}
Returns the signum function of the argument; zero if the argument
is zero, 1.0f if the argument is greater than zero, -1.0f if the
argument is less than zero.
Special Cases:
- If the argument is NaN, then the result is NaN.
- If the argument is positive zero or negative zero, then the
result is the same as the argument.
Author: Joseph D. Darcy Params: - f – the floating-point value whose signum is to be returned
Returns: the signum function of the argument Since: 1.5
/**
* Returns the signum function of the argument; zero if the argument
* is zero, 1.0f if the argument is greater than zero, -1.0f if the
* argument is less than zero.
*
* <p>Special Cases:
* <ul>
* <li> If the argument is NaN, then the result is NaN.
* <li> If the argument is positive zero or negative zero, then the
* result is the same as the argument.
* </ul>
*
* @param f the floating-point value whose signum is to be returned
* @return the signum function of the argument
* @author Joseph D. Darcy
* @since 1.5
*/
public static float signum(float f) {
return (f == 0.0f || isNaN(f))?f:copySign(1.0f, f);
}
}