/*
 * Copyright (c) 2009, 2020, Oracle and/or its affiliates. All rights reserved.
 * DO NOT ALTER OR REMOVE COPYRIGHT NOTICES OR THIS FILE HEADER.
 *
 * This code is free software; you can redistribute it and/or modify it
 * under the terms of the GNU General Public License version 2 only, as
 * published by the Free Software Foundation.  Oracle designates this
 * particular file as subject to the "Classpath" exception as provided
 * by Oracle in the LICENSE file that accompanied this code.
 *
 * This code is distributed in the hope that it will be useful, but WITHOUT
 * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
 * FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License
 * version 2 for more details (a copy is included in the LICENSE file that
 * accompanied this code).
 *
 * You should have received a copy of the GNU General Public License version
 * 2 along with this work; if not, write to the Free Software Foundation,
 * Inc., 51 Franklin St, Fifth Floor, Boston, MA 02110-1301 USA.
 *
 * Please contact Oracle, 500 Oracle Parkway, Redwood Shores, CA 94065 USA
 * or visit www.oracle.com if you need additional information or have any
 * questions.
 */

Copyright (C) 1996-2014, International Business Machines Corporation and others. All Rights Reserved.
/** ******************************************************************************* * Copyright (C) 1996-2014, International Business Machines Corporation and * others. All Rights Reserved. ******************************************************************************* */
package jdk.internal.icu.lang; import jdk.internal.icu.impl.UBiDiProps; import jdk.internal.icu.impl.UCharacterProperty; import jdk.internal.icu.text.Normalizer2; import jdk.internal.icu.text.UTF16; import jdk.internal.icu.util.VersionInfo;

The UCharacter class provides extensions to the java.lang.Character class. These extensions provide support for more Unicode properties and together with the UTF16 class, provide support for supplementary characters (those with code points above U+FFFF). Each ICU release supports the latest version of Unicode available at that time.

Code points are represented in these API using ints. While it would be more convenient in Java to have a separate primitive datatype for them, ints suffice in the meantime.

To use this class please add the jar file name icu4j.jar to the class path, since it contains data files which supply the information used by this file.
E.g. In Windows
set CLASSPATH=%CLASSPATH%;$JAR_FILE_PATH/ucharacter.jar.
Otherwise, another method would be to copy the files uprops.dat and unames.icu from the icu4j source subdirectory $ICU4J_SRC/src/com.ibm.icu.impl.data to your class directory $ICU4J_CLASS/com.ibm.icu.impl.data.

Aside from the additions for UTF-16 support, and the updated Unicode properties, the main differences between UCharacter and Character are:

  • UCharacter is not designed to be a char wrapper and does not have APIs to which involves management of that single char.
    These include:
    • char charValue(),
    • int compareTo(java.lang.Character, java.lang.Character), etc.
  • UCharacter does not include Character APIs that are deprecated, nor does it include the Java-specific character information, such as boolean isJavaIdentifierPart(char ch).
  • Character maps characters 'A' - 'Z' and 'a' - 'z' to the numeric values '10' - '35'. UCharacter also does this in digit and getNumericValue, to adhere to the java semantics of these methods. New methods unicodeDigit, and getUnicodeNumericValue do not treat the above code points as having numeric values. This is a semantic change from ICU4J 1.3.1.

Further detail on differences can be determined using the program com.ibm.icu.dev.test.lang.UCharacterCompare

In addition to Java compatibility functions, which calculate derived properties, this API provides low-level access to the Unicode Character Database.

Unicode assigns each code point (not just assigned character) values for many properties. Most of them are simple boolean flags, or constants from a small enumerated list. For some properties, values are strings or other relatively more complex types.

For more information see "About the Unicode Character Database" (http://www.unicode.org/ucd/) and the ICU User Guide chapter on Properties (http://www.icu-project.org/userguide/properties.html).

There are also functions that provide easy migration from C/POSIX functions like isblank(). Their use is generally discouraged because the C/POSIX standards do not define their semantics beyond the ASCII range, which means that different implementations exhibit very different behavior. Instead, Unicode properties should be used directly.

There are also only a few, broad C/POSIX character classes, and they tend to be used for conflicting purposes. For example, the "isalpha()" class is sometimes used to determine word boundaries, while a more sophisticated approach would at least distinguish initial letters from continuation characters (the latter including combining marks). (In ICU, BreakIterator is the most sophisticated API for word boundaries.) Another example: There is no "istitle()" class for titlecase characters.

ICU 3.4 and later provides API access for all twelve C/POSIX character classes. ICU implements them according to the Standard Recommendations in Annex C: Compatibility Properties of UTS #18 Unicode Regular Expressions (http://www.unicode.org/reports/tr18/#Compatibility_Properties).

API access for C/POSIX character classes is as follows:


- alpha:     isUAlphabetic(c) or hasBinaryProperty(c, UProperty.ALPHABETIC)
- lower:     isULowercase(c) or hasBinaryProperty(c, UProperty.LOWERCASE)
- upper:     isUUppercase(c) or hasBinaryProperty(c, UProperty.UPPERCASE)
- punct:     ((1<<getType(c)) & ((1<<DASH_PUNCTUATION)|(1<<START_PUNCTUATION)|
              (1<<END_PUNCTUATION)|(1<<CONNECTOR_PUNCTUATION)|(1<<OTHER_PUNCTUATION)|
              (1<<INITIAL_PUNCTUATION)|(1<<FINAL_PUNCTUATION)))!=0
- digit:     isDigit(c) or getType(c)==DECIMAL_DIGIT_NUMBER
- xdigit:    hasBinaryProperty(c, UProperty.POSIX_XDIGIT)
- alnum:     hasBinaryProperty(c, UProperty.POSIX_ALNUM)
- space:     isUWhiteSpace(c) or hasBinaryProperty(c, UProperty.WHITE_SPACE)
- blank:     hasBinaryProperty(c, UProperty.POSIX_BLANK)
- cntrl:     getType(c)==CONTROL
- graph:     hasBinaryProperty(c, UProperty.POSIX_GRAPH)
- print:     hasBinaryProperty(c, UProperty.POSIX_PRINT)

The C/POSIX character classes are also available in UnicodeSet patterns, using patterns like [:graph:] or \p{graph}.

There are several ICU (and Java) whitespace functions. Comparison:
  • isUWhiteSpace=UCHAR_WHITE_SPACE: Unicode White_Space property; most of general categories "Z" (separators) + most whitespace ISO controls (including no-break spaces, but excluding IS1..IS4 and ZWSP)
  • isWhitespace: Java isWhitespace; Z + whitespace ISO controls but excluding no-break spaces
  • isSpaceChar: just Z (including no-break spaces)

This class is not subclassable.

Author:Syn Wee Quek
See Also:
  • UCharacterEnums
@stableICU 2.1
/** * <p>The UCharacter class provides extensions to the * <a href="http://java.sun.com/j2se/1.5/docs/api/java/lang/Character.html"> * java.lang.Character</a> class. These extensions provide support for * more Unicode properties and together with the <a href=../text/UTF16.html>UTF16</a> * class, provide support for supplementary characters (those with code * points above U+FFFF). * Each ICU release supports the latest version of Unicode available at that time. * * <p>Code points are represented in these API using ints. While it would be * more convenient in Java to have a separate primitive datatype for them, * ints suffice in the meantime. * * <p>To use this class please add the jar file name icu4j.jar to the * class path, since it contains data files which supply the information used * by this file.<br> * E.g. In Windows <br> * <code>set CLASSPATH=%CLASSPATH%;$JAR_FILE_PATH/ucharacter.jar</code>.<br> * Otherwise, another method would be to copy the files uprops.dat and * unames.icu from the icu4j source subdirectory * <i>$ICU4J_SRC/src/com.ibm.icu.impl.data</i> to your class directory * <i>$ICU4J_CLASS/com.ibm.icu.impl.data</i>. * * <p>Aside from the additions for UTF-16 support, and the updated Unicode * properties, the main differences between UCharacter and Character are: * <ul> * <li> UCharacter is not designed to be a char wrapper and does not have * APIs to which involves management of that single char.<br> * These include: * <ul> * <li> char charValue(), * <li> int compareTo(java.lang.Character, java.lang.Character), etc. * </ul> * <li> UCharacter does not include Character APIs that are deprecated, nor * does it include the Java-specific character information, such as * boolean isJavaIdentifierPart(char ch). * <li> Character maps characters 'A' - 'Z' and 'a' - 'z' to the numeric * values '10' - '35'. UCharacter also does this in digit and * getNumericValue, to adhere to the java semantics of these * methods. New methods unicodeDigit, and * getUnicodeNumericValue do not treat the above code points * as having numeric values. This is a semantic change from ICU4J 1.3.1. * </ul> * <p> * Further detail on differences can be determined using the program * <a href= * "http://source.icu-project.org/repos/icu/icu4j/trunk/src/com/ibm/icu/dev/test/lang/UCharacterCompare.java"> * com.ibm.icu.dev.test.lang.UCharacterCompare</a> * </p> * <p> * In addition to Java compatibility functions, which calculate derived properties, * this API provides low-level access to the Unicode Character Database. * </p> * <p> * Unicode assigns each code point (not just assigned character) values for * many properties. * Most of them are simple boolean flags, or constants from a small enumerated list. * For some properties, values are strings or other relatively more complex types. * </p> * <p> * For more information see * <a href="http://www.unicode/org/ucd/">"About the Unicode Character Database"</a> * (http://www.unicode.org/ucd/) * and the <a href="http://www.icu-project.org/userguide/properties.html">ICU * User Guide chapter on Properties</a> * (http://www.icu-project.org/userguide/properties.html). * </p> * <p> * There are also functions that provide easy migration from C/POSIX functions * like isblank(). Their use is generally discouraged because the C/POSIX * standards do not define their semantics beyond the ASCII range, which means * that different implementations exhibit very different behavior. * Instead, Unicode properties should be used directly. * </p> * <p> * There are also only a few, broad C/POSIX character classes, and they tend * to be used for conflicting purposes. For example, the "isalpha()" class * is sometimes used to determine word boundaries, while a more sophisticated * approach would at least distinguish initial letters from continuation * characters (the latter including combining marks). * (In ICU, BreakIterator is the most sophisticated API for word boundaries.) * Another example: There is no "istitle()" class for titlecase characters. * </p> * <p> * ICU 3.4 and later provides API access for all twelve C/POSIX character classes. * ICU implements them according to the Standard Recommendations in * Annex C: Compatibility Properties of UTS #18 Unicode Regular Expressions * (http://www.unicode.org/reports/tr18/#Compatibility_Properties). * </p> * <p> * API access for C/POSIX character classes is as follows: * <pre>{@code * - alpha: isUAlphabetic(c) or hasBinaryProperty(c, UProperty.ALPHABETIC) * - lower: isULowercase(c) or hasBinaryProperty(c, UProperty.LOWERCASE) * - upper: isUUppercase(c) or hasBinaryProperty(c, UProperty.UPPERCASE) * - punct: ((1<<getType(c)) & ((1<<DASH_PUNCTUATION)|(1<<START_PUNCTUATION)| * (1<<END_PUNCTUATION)|(1<<CONNECTOR_PUNCTUATION)|(1<<OTHER_PUNCTUATION)| * (1<<INITIAL_PUNCTUATION)|(1<<FINAL_PUNCTUATION)))!=0 * - digit: isDigit(c) or getType(c)==DECIMAL_DIGIT_NUMBER * - xdigit: hasBinaryProperty(c, UProperty.POSIX_XDIGIT) * - alnum: hasBinaryProperty(c, UProperty.POSIX_ALNUM) * - space: isUWhiteSpace(c) or hasBinaryProperty(c, UProperty.WHITE_SPACE) * - blank: hasBinaryProperty(c, UProperty.POSIX_BLANK) * - cntrl: getType(c)==CONTROL * - graph: hasBinaryProperty(c, UProperty.POSIX_GRAPH) * - print: hasBinaryProperty(c, UProperty.POSIX_PRINT) * }</pre> * </p> * <p> * The C/POSIX character classes are also available in UnicodeSet patterns, * using patterns like [:graph:] or \p{graph}. * </p> * * There are several ICU (and Java) whitespace functions. * Comparison:<ul> * <li> isUWhiteSpace=UCHAR_WHITE_SPACE: Unicode White_Space property; * most of general categories "Z" (separators) + most whitespace ISO controls * (including no-break spaces, but excluding IS1..IS4 and ZWSP) * <li> isWhitespace: Java isWhitespace; Z + whitespace ISO controls but excluding no-break spaces * <li> isSpaceChar: just Z (including no-break spaces)</ul> * </p> * <p> * This class is not subclassable. * </p> * @author Syn Wee Quek * @stable ICU 2.1 * @see com.ibm.icu.lang.UCharacterEnums */
public final class UCharacter {
Joining Group constants.
See Also:
  • JOINING_GROUP.JOINING_GROUP
@stableICU 2.4
/** * Joining Group constants. * @see UProperty#JOINING_GROUP * @stable ICU 2.4 */
public static interface JoiningGroup {
@stableICU 2.4
/** * @stable ICU 2.4 */
public static final int NO_JOINING_GROUP = 0; }
Numeric Type constants.
See Also:
  • NUMERIC_TYPE.NUMERIC_TYPE
@stableICU 2.4
/** * Numeric Type constants. * @see UProperty#NUMERIC_TYPE * @stable ICU 2.4 */
public static interface NumericType {
@stableICU 2.4
/** * @stable ICU 2.4 */
public static final int NONE = 0;
@stableICU 2.4
/** * @stable ICU 2.4 */
public static final int DECIMAL = 1;
@stableICU 2.4
/** * @stable ICU 2.4 */
public static final int DIGIT = 2;
@stableICU 2.4
/** * @stable ICU 2.4 */
public static final int NUMERIC = 3;
@stableICU 2.4
/** * @stable ICU 2.4 */
public static final int COUNT = 4; }
Hangul Syllable Type constants.
See Also:
  • HANGUL_SYLLABLE_TYPE.HANGUL_SYLLABLE_TYPE
@stableICU 2.6
/** * Hangul Syllable Type constants. * * @see UProperty#HANGUL_SYLLABLE_TYPE * @stable ICU 2.6 */
public static interface HangulSyllableType {
@stableICU 2.6
/** * @stable ICU 2.6 */
public static final int NOT_APPLICABLE = 0; /*[NA]*/ /*See note !!*/
@stableICU 2.6
/** * @stable ICU 2.6 */
public static final int LEADING_JAMO = 1; /*[L]*/
@stableICU 2.6
/** * @stable ICU 2.6 */
public static final int VOWEL_JAMO = 2; /*[V]*/
@stableICU 2.6
/** * @stable ICU 2.6 */
public static final int TRAILING_JAMO = 3; /*[T]*/
@stableICU 2.6
/** * @stable ICU 2.6 */
public static final int LV_SYLLABLE = 4; /*[LV]*/
@stableICU 2.6
/** * @stable ICU 2.6 */
public static final int LVT_SYLLABLE = 5; /*[LVT]*/
@stableICU 2.6
/** * @stable ICU 2.6 */
public static final int COUNT = 6; } // public data members -----------------------------------------------
The lowest Unicode code point value.
@stableICU 2.1
/** * The lowest Unicode code point value. * @stable ICU 2.1 */
public static final int MIN_VALUE = UTF16.CODEPOINT_MIN_VALUE;
The highest Unicode code point value (scalar value) according to the Unicode Standard. This is a 21-bit value (21 bits, rounded up).
Up-to-date Unicode implementation of java.lang.Character.MAX_VALUE
@stableICU 2.1
/** * The highest Unicode code point value (scalar value) according to the * Unicode Standard. * This is a 21-bit value (21 bits, rounded up).<br> * Up-to-date Unicode implementation of java.lang.Character.MAX_VALUE * @stable ICU 2.1 */
public static final int MAX_VALUE = UTF16.CODEPOINT_MAX_VALUE; // public methods ----------------------------------------------------
Returns the numeric value of a decimal digit code point.
This method observes the semantics of java.lang.Character.digit(). Note that this will return positive values for code points for which isDigit returns false, just like java.lang.Character.
Semantic Change: In release 1.3.1 and prior, this did not treat the European letters as having a digit value, and also treated numeric letters and other numbers as digits. This has been changed to conform to the java semantics.
A code point is a valid digit if and only if:
  • ch is a decimal digit or one of the european letters, and
  • the value of ch is less than the specified radix.
Params:
  • ch – the code point to query
  • radix – the radix
Returns:the numeric value represented by the code point in the specified radix, or -1 if the code point is not a decimal digit or if its value is too large for the radix
@stableICU 2.1
/** * Returns the numeric value of a decimal digit code point. * <br>This method observes the semantics of * <code>java.lang.Character.digit()</code>. Note that this * will return positive values for code points for which isDigit * returns false, just like java.lang.Character. * <br><em>Semantic Change:</em> In release 1.3.1 and * prior, this did not treat the European letters as having a * digit value, and also treated numeric letters and other numbers as * digits. * This has been changed to conform to the java semantics. * <br>A code point is a valid digit if and only if: * <ul> * <li>ch is a decimal digit or one of the european letters, and * <li>the value of ch is less than the specified radix. * </ul> * @param ch the code point to query * @param radix the radix * @return the numeric value represented by the code point in the * specified radix, or -1 if the code point is not a decimal digit * or if its value is too large for the radix * @stable ICU 2.1 */
public static int digit(int ch, int radix) { if (2 <= radix && radix <= 36) { int value = digit(ch); if (value < 0) { // ch is not a decimal digit, try latin letters value = UCharacterProperty.getEuropeanDigit(ch); } return (value < radix) ? value : -1; } else { return -1; // invalid radix } }
Returns the numeric value of a decimal digit code point.
This is a convenience overload of digit(int, int) that provides a decimal radix.
Semantic Change: In release 1.3.1 and prior, this treated numeric letters and other numbers as digits. This has been changed to conform to the java semantics.
Params:
  • ch – the code point to query
Returns:the numeric value represented by the code point, or -1 if the code point is not a decimal digit or if its value is too large for a decimal radix
@stableICU 2.1
/** * Returns the numeric value of a decimal digit code point. * <br>This is a convenience overload of <code>digit(int, int)</code> * that provides a decimal radix. * <br><em>Semantic Change:</em> In release 1.3.1 and prior, this * treated numeric letters and other numbers as digits. This has * been changed to conform to the java semantics. * @param ch the code point to query * @return the numeric value represented by the code point, * or -1 if the code point is not a decimal digit or if its * value is too large for a decimal radix * @stable ICU 2.1 */
public static int digit(int ch) { return UCharacterProperty.INSTANCE.digit(ch); }
Returns a value indicating a code point's Unicode category. Up-to-date Unicode implementation of java.lang.Character.getType() except for the above mentioned code points that had their category changed.
Return results are constants from the interface UCharacterCategory
NOTE: the UCharacterCategory values are not compatible with those returned by java.lang.Character.getType. UCharacterCategory values match the ones used in ICU4C, while java.lang.Character type values, though similar, skip the value 17.

Params:
  • ch – code point whose type is to be determined
Returns:category which is a value of UCharacterCategory
@stableICU 2.1
/** * Returns a value indicating a code point's Unicode category. * Up-to-date Unicode implementation of java.lang.Character.getType() * except for the above mentioned code points that had their category * changed.<br> * Return results are constants from the interface * <a href=UCharacterCategory.html>UCharacterCategory</a><br> * <em>NOTE:</em> the UCharacterCategory values are <em>not</em> compatible with * those returned by java.lang.Character.getType. UCharacterCategory values * match the ones used in ICU4C, while java.lang.Character type * values, though similar, skip the value 17.</p> * @param ch code point whose type is to be determined * @return category which is a value of UCharacterCategory * @stable ICU 2.1 */
public static int getType(int ch) { return UCharacterProperty.INSTANCE.getType(ch); }
Returns the Bidirection property of a code point. For example, 0x0041 (letter A) has the LEFT_TO_RIGHT directional property.
Result returned belongs to the interface UCharacterDirection
Params:
  • ch – the code point to be determined its direction
Returns:direction constant from UCharacterDirection.
@stableICU 2.1
/** * Returns the Bidirection property of a code point. * For example, 0x0041 (letter A) has the LEFT_TO_RIGHT directional * property.<br> * Result returned belongs to the interface * <a href=UCharacterDirection.html>UCharacterDirection</a> * @param ch the code point to be determined its direction * @return direction constant from UCharacterDirection. * @stable ICU 2.1 */
public static int getDirection(int ch) { return UBiDiProps.INSTANCE.getClass(ch); }
Maps the specified code point to a "mirror-image" code point. For code points with the "mirrored" property, implementations sometimes need a "poor man's" mapping to another code point such that the default glyph may serve as the mirror-image of the default glyph of the specified code point.
This is useful for text conversion to and from codepages with visual order, and for displays without glyph selection capabilities.
Params:
  • ch – code point whose mirror is to be retrieved
Returns:another code point that may serve as a mirror-image substitute, or ch itself if there is no such mapping or ch does not have the "mirrored" property
@stableICU 2.1
/** * Maps the specified code point to a "mirror-image" code point. * For code points with the "mirrored" property, implementations sometimes * need a "poor man's" mapping to another code point such that the default * glyph may serve as the mirror-image of the default glyph of the * specified code point.<br> * This is useful for text conversion to and from codepages with visual * order, and for displays without glyph selection capabilities. * @param ch code point whose mirror is to be retrieved * @return another code point that may serve as a mirror-image substitute, * or ch itself if there is no such mapping or ch does not have the * "mirrored" property * @stable ICU 2.1 */
public static int getMirror(int ch) { return UBiDiProps.INSTANCE.getMirror(ch); }
Maps the specified character to its paired bracket character. For Bidi_Paired_Bracket_Type!=None, this is the same as getMirror(int). Otherwise c itself is returned. See http://www.unicode.org/reports/tr9/
Params:
  • c – the code point to be mapped
See Also:
  • BIDI_PAIRED_BRACKET.BIDI_PAIRED_BRACKET
  • BIDI_PAIRED_BRACKET_TYPE.BIDI_PAIRED_BRACKET_TYPE
  • getMirror(int)
Returns:the paired bracket code point, or c itself if there is no such mapping (Bidi_Paired_Bracket_Type=None)
@stableICU 52
/** * Maps the specified character to its paired bracket character. * For Bidi_Paired_Bracket_Type!=None, this is the same as getMirror(int). * Otherwise c itself is returned. * See http://www.unicode.org/reports/tr9/ * * @param c the code point to be mapped * @return the paired bracket code point, * or c itself if there is no such mapping * (Bidi_Paired_Bracket_Type=None) * * @see UProperty#BIDI_PAIRED_BRACKET * @see UProperty#BIDI_PAIRED_BRACKET_TYPE * @see #getMirror(int) * @stable ICU 52 */
public static int getBidiPairedBracket(int c) { return UBiDiProps.INSTANCE.getPairedBracket(c); }
Returns the combining class of the argument codepoint
Params:
  • ch – code point whose combining is to be retrieved
Returns:the combining class of the codepoint
@stableICU 2.1
/** * Returns the combining class of the argument codepoint * @param ch code point whose combining is to be retrieved * @return the combining class of the codepoint * @stable ICU 2.1 */
public static int getCombiningClass(int ch) { return Normalizer2.getNFDInstance().getCombiningClass(ch); }
Returns the version of Unicode data used.
Returns:the unicode version number used
@stableICU 2.1
/** * Returns the version of Unicode data used. * @return the unicode version number used * @stable ICU 2.1 */
public static VersionInfo getUnicodeVersion() { return UCharacterProperty.INSTANCE.m_unicodeVersion_; }
Returns a code point corresponding to the two UTF16 characters.
Params:
  • lead – the lead char
  • trail – the trail char
Throws:
Returns:code point if surrogate characters are valid.
@stableICU 2.1
/** * Returns a code point corresponding to the two UTF16 characters. * @param lead the lead char * @param trail the trail char * @return code point if surrogate characters are valid. * @exception IllegalArgumentException thrown when argument characters do * not form a valid codepoint * @stable ICU 2.1 */
public static int getCodePoint(char lead, char trail) { if (UTF16.isLeadSurrogate(lead) && UTF16.isTrailSurrogate(trail)) { return UCharacterProperty.getRawSupplementary(lead, trail); } throw new IllegalArgumentException("Illegal surrogate characters"); }
Returns the "age" of the code point.

The "age" is the Unicode version when the code point was first designated (as a non-character or for Private Use) or assigned a character.

This can be useful to avoid emitting code points to receiving processes that do not accept newer characters.

The data is from the UCD file DerivedAge.txt.

Params:
  • ch – The code point.
Returns:the Unicode version number
@stableICU 2.6
/** * Returns the "age" of the code point.</p> * <p>The "age" is the Unicode version when the code point was first * designated (as a non-character or for Private Use) or assigned a * character. * <p>This can be useful to avoid emitting code points to receiving * processes that do not accept newer characters.</p> * <p>The data is from the UCD file DerivedAge.txt.</p> * @param ch The code point. * @return the Unicode version number * @stable ICU 2.6 */
public static VersionInfo getAge(int ch) { if (ch < MIN_VALUE || ch > MAX_VALUE) { throw new IllegalArgumentException("Codepoint out of bounds"); } return UCharacterProperty.INSTANCE.getAge(ch); }
Returns the property value for an Unicode property type of a code point. Also returns binary and mask property values.

Unicode, especially in version 3.2, defines many more properties than the original set in UnicodeData.txt.

The properties APIs are intended to reflect Unicode properties as defined in the Unicode Character Database (UCD) and Unicode Technical Reports (UTR). For details about the properties see http://www.unicode.org/.

For names of Unicode properties see the UCD file PropertyAliases.txt.

Sample usage:
int ea = UCharacter.getIntPropertyValue(c, UProperty.EAST_ASIAN_WIDTH);
int ideo = UCharacter.getIntPropertyValue(c, UProperty.IDEOGRAPHIC);
boolean b = (ideo == 1) ? true : false;
Params:
  • ch – code point to test.
  • type – UProperty selector constant, identifies which binary property to check. Must be UProperty.BINARY_START <= type < UProperty.BINARY_LIMIT or UProperty.INT_START <= type < UProperty.INT_LIMIT or UProperty.MASK_START <= type < UProperty.MASK_LIMIT.
See Also:
  • UProperty
  • hasBinaryProperty
  • getIntPropertyMinValue
  • getIntPropertyMaxValue
  • getUnicodeVersion
Returns:numeric value that is directly the property value or, for enumerated properties, corresponds to the numeric value of the enumerated constant of the respective property value enumeration type (cast to enum type if necessary). Returns 0 or 1 (for false / true) for binary Unicode properties. Returns a bit-mask for mask properties. Returns 0 if 'type' is out of bounds or if the Unicode version does not have data for the property at all, or not for this code point.
@stableICU 2.4
/** * Returns the property value for an Unicode property type of a code point. * Also returns binary and mask property values.</p> * <p>Unicode, especially in version 3.2, defines many more properties than * the original set in UnicodeData.txt.</p> * <p>The properties APIs are intended to reflect Unicode properties as * defined in the Unicode Character Database (UCD) and Unicode Technical * Reports (UTR). For details about the properties see * http://www.unicode.org/.</p> * <p>For names of Unicode properties see the UCD file PropertyAliases.txt. * </p> * <pre> * Sample usage: * int ea = UCharacter.getIntPropertyValue(c, UProperty.EAST_ASIAN_WIDTH); * int ideo = UCharacter.getIntPropertyValue(c, UProperty.IDEOGRAPHIC); * boolean b = (ideo == 1) ? true : false; * </pre> * @param ch code point to test. * @param type UProperty selector constant, identifies which binary * property to check. Must be * UProperty.BINARY_START &lt;= type &lt; UProperty.BINARY_LIMIT or * UProperty.INT_START &lt;= type &lt; UProperty.INT_LIMIT or * UProperty.MASK_START &lt;= type &lt; UProperty.MASK_LIMIT. * @return numeric value that is directly the property value or, * for enumerated properties, corresponds to the numeric value of * the enumerated constant of the respective property value * enumeration type (cast to enum type if necessary). * Returns 0 or 1 (for false / true) for binary Unicode properties. * Returns a bit-mask for mask properties. * Returns 0 if 'type' is out of bounds or if the Unicode version * does not have data for the property at all, or not for this code * point. * @see UProperty * @see #hasBinaryProperty * @see #getIntPropertyMinValue * @see #getIntPropertyMaxValue * @see #getUnicodeVersion * @stable ICU 2.4 */
// for BiDiBase.java public static int getIntPropertyValue(int ch, int type) { return UCharacterProperty.INSTANCE.getIntPropertyValue(ch, type); } // private constructor -----------------------------------------------
Private constructor to prevent instantiation
/** * Private constructor to prevent instantiation */
private UCharacter() { } /* * Copied from UCharacterEnums.java */
Character type Mn
@stableICU 2.1
/** * Character type Mn * @stable ICU 2.1 */
public static final byte NON_SPACING_MARK = 6;
Character type Me
@stableICU 2.1
/** * Character type Me * @stable ICU 2.1 */
public static final byte ENCLOSING_MARK = 7;
Character type Mc
@stableICU 2.1
/** * Character type Mc * @stable ICU 2.1 */
public static final byte COMBINING_SPACING_MARK = 8;
Character type count
@stableICU 2.1
/** * Character type count * @stable ICU 2.1 */
public static final byte CHAR_CATEGORY_COUNT = 30;
Directional type R
@stableICU 2.1
/** * Directional type R * @stable ICU 2.1 */
public static final int RIGHT_TO_LEFT = 1;
Directional type AL
@stableICU 2.1
/** * Directional type AL * @stable ICU 2.1 */
public static final int RIGHT_TO_LEFT_ARABIC = 13; }