UTF-16

Part of the TechTarget Network of Enterprise IT Web Sites
Home Look It Up ITKnowledge Exchange Fast References Products White Papers Blogs

Search our IT-specific encyclopedia for:
 
OR Jump to a topic:
 
Advanced Search
Browse alphabetically:
A B C D E F G H I J K L M N O P Q R S T U V W X Y Z #
All Categories Internet Technologies

UTF-16

UTF-16 (16- bit Unicode Transformation Format) is a standard method of encoding Unicode character data. Part of the Unicode Standard version 3.0 (and higher-numbered versions), UTF-16 has the capacity to encode all currently defined Unicode characters. UTF-16 is specified in Annex Q of the ISO/IEC 10646 standard and in the IETF RFC 2781.

Unicode is designed to accommodate all of the world's known writing systems. The system currently employs three different encodings to represent Unicode character sets: UTF-8, UTF-16 and UTF-32. Each encoding defines a system whereby characters in some character set may be represented in binary form in a file . Each such binary representation of a character is called a code point. Unicode can define over one million distinct encodings (10FFFF code points in hexadecimal ; 1,114,112 in decimal). Unicode code points are divided into 17 planes, of which Planes 0 through 2 are most common:

  • Plane 0, known as the Basic Multilingual Pane (BMP) contains characters for almost all modern languages as well as most common special characters.
  • Plane 1, known as the Supplementary Multilingual Plane (SMP) is used primarily for historic scripts such as Linear B and for musical and mathematical symbols .
  • Plane 2, known as the Supplementary Ideographic Plane (SIP), is used for about 40,000 Unified Han Ideographs seldom used in daily written communications.
The remaining planes are, as yet, largely unused.

UTF-16 encodes characters into specific binary sequences using either one or two 16-bit sequences. Because there are three different encoding schemes to map code points to 8-bit or octet sequences, there are three different encoding schemes around the basic 16-bit sequence model.

UTF-16 is sometimes used interchangeably with UCS-2 although such use is not strictly correct.



Read more about it at:
> UTF-16 is specified in IETF RFC 2781.
> Jbrowse.com offers a Unicode Tutorial / FAQ / Bluffer's Guide.
> For general information on Unicode, including code charts, see the official Unicode site.
> Wikipedia provides an illustrated explanation of how languages map into the BMP, among lots of other useful information.
Last updated on: Sep 13, 2007

>  Enterprise Software related Research & News
>  White Papers for the Retail Industry

Are you a Know-IT-All?
What was the first mobile phone virus?  
Answer

WORD OF THE DAY...
full-disk encryption (FDE)
LEARN MORE ABOUT...
business intelligence tools
Buzz Alert: Facebook Connect
Our Latest Discovery
Learning Guides and Tutorials
Our 60 tech-specific sites
WhatIs.com RSS Feeds
Home Look It Up ITKnowledge Exchange Fast References Products White Papers Blogs
About Us   |   Contact Us   |   For Advertisers   |   For Business Partners   |   Reprints   |   RSS   |   Awards
TechTarget provides enterprise IT professionals with the information they need to perform their jobs - from developing strategy, to making cost-effective IT purchase decisions and managing their organizations' IT projects - with its network of technology-specific Web sites, events and magazines.

TechTarget Corporate Web Site  |  Media Kits  |  Reprints  |  Site Map




All Rights Reserved, Copyright 2008, TechTarget | Read our Privacy Policy
  TechTarget - The IT Media ROI Experts