Application of IANA Charset Registration for GB18030 ---------------------------------------------------- (last updated 2002-05-23) Charset name: GB18030 Charset aliases: Currently none. Suitability for use in MIME text: Yes Published specification(s): The official GB 18030-2000 standard was created by the Chinese IT Standardization Technical Committee and was published in print by the China Standard Press, Beijing, March 17, 2000: Chinese National Standard GB 18030-2000: Information Technology -- Chinese ideograms coded character set for information interchange -- Extension for the basic set Xinxi Jishu -- Xinxi Jiaohuan Yong Hanzi Bianma Zifuji -- Jibenji de Kuochong) The mapping data was re-released on November 30, 2000, mainly to correct the mapping of the "Euro" character and to exclude the surrogate area. Dirk Meyer (Adobe Systems) has kindly provided an English summary, explanations, and remarks of the GB 18030-2000 standard on-line (February 16, 2001): http://examples.oreilly.com/cjkvinfo/pdf/GB18030_Summary.pdf Markus Scherer (IBM) also published "GB 18030: A mega-codepage: Exploring the history and structure of the new Chinese Unicode standard" on-line (February 2001): http://oss.software.ibm.com/icu/docs/papers/gb18030.html Meyer's presentation at the 18th International Unicode Conference, "Two New Chinese Character Standards: HK SCS / GB 18030-2000" (Hong Kong, April 2001), provides insightful historical background: http://examples.oreilly.com/cjkvinfo/pdf/IUC18B17.pdf ISO 10646 equivalency table: Markus Scherer (IBM) et al. have prepared an authorative GB18030 and ISO 10646 mapping table with the latest revisions in CharMapML (XML) format (ref. Unicode Technical Report #22). It is available on-line at: http://oss.software.ibm.com/cvs/icu/~checkout~/charset/data/xml/gb-18030-2000.xml Additional information: The People's Republic of China has already expressed her fundamental consent to support the combined efforts of the ISO/IEC and the Unicode Consortium through publishing a Chinese National Standard that was code- and character- compatible with ISO 10646 / Unicode. This standard was named GB 13000. Since the legacy GB 2312-1980 standard and GBK (1995) specification is still widely used, it is important to provide a smooth migration path towards GB 13000. GBK was the first step in this direction. The new GB 18030-2000 standard "replaces" GBK: it retains legacy encoding compatibility, and also provides for a complete and final mechanism to include future extensions of Unicode. In a nutshell, it is the Chinese version of UTF-8: whereas UTF-8 maintains compatibility with ASCII, GB18030 maintains compatibility with GB2312/GBK and provides full ISO 10646 compatibility. Part of the mapping data is from a lookup table (similar to GBK). The rest is calculated algorithmically. A brief summary of the GB18030 codepoints is listed below: 1-byte: {00-7F} Same as GB 11383-89 / US-ASCII / ISO 646 IRV (1991) 2-byte: {81-FE}{40-7E,80-FE} A full superset of GBK, but with fallback mappings removed so that only 1-to-1 roundtrip mappings remain 4-byte: {81-FE}{30-39}{81-FE}{30-39} Maps linearly to ISO 10646 starting from GB+81308130 = U+0080 up to U+FFFF, and from GB+90308130 = U+10000 up to U+10FFFF, skipping the mappings already defined in the 1-byte and 2-byte areas. The surrogate area is excluded. The current GB18030 standard specifies the addition of CJK Extension A, and ethnic minority languages Mongolian, Tibetan, Uyghur (Arabic) and Yi. Since GB18030 is fully ISO 10646 compatible, it readily supports CJK Extension B and other languages. GB18030 is a "mandatory" standard: starting September 1, 2001, all operating systems sold in Mainland China must support this standard. (Embedded systems and PDAs are currently exempt.) Eventually, end-user applications must also fully support the GB18030 standard--mere UTF-8 support is not enough. Harsh it may seem, this regulation is a smart move: it ensures that rare Chinese characters found in personal names, geographic names and ancient literature, as well as minority languages, may finally be computerized and exchanged all across the country. Person & email address to contact for further information: CHEN Zhuang chenzh&cesi.ac.cn Chinese IT Standardization Technical Committee Chinese Electronics Standardization Institute Additionally, please Cc: ietf-charsets&iana.org to keep the community informed, as the implementation of the GB18030 standard on operating systems and applications is a community effort. Intended usage: COMMON Acknowledgement: Appreciations and kudos to the Internet community for documenting and explaining the GB 18030-2000 standard to the whole world; for implementing this new standard in software so quickly; and for their comments to this registration. Special thanks to Dirk Meyer for his translation of the GB18030 standard, and to Markus Scherer for his GB18030/Unicode mapping table. This registration also contains excerpts of their writings. Compiled by Anthony Fok , March 15, 2002. (created 2002-05-23)