TLD: xn--mgberp4a5d4ar Language Tag: AR Language Description: Arabic Version: 1.0 Effective Date: 01 Jan 2010 |
Registry: Saudi Network Information Center Contact: Abdulaziz Al-Zoman azoman[at]citc.gov.sa Address: SaudiNIC, General directorate of Internet services, CITC, P.O. Box 75606, Riyadh 11588, Saudi Arabia Telephone: +966-1-263-9233 Fax: +966-1-263-9393 Website: http://www.nic.net.sa/ |
Relevant Policy Document URL: http://www.nic.net.sa/docs/ADN/Arabic_Domain_Name_Registration_Regulation_For_Saudi_IDN_ccTLD_V1.pdf
This document provides a description of the IDN (Internationalized Domain Names) Language Table to be used by SaudiNIC for the registration of Arabic Domain names under xn--mgberp4a5d4ar(السعودية) TLD. These are based on the recommendation of the Arabic Domain Name Pilot Project (www.arabic-domains.org).
############################################################################################### # # This file describes IDN table of Variants for Arabic language that belong to this label # xn--mgberp4a5d4ar. Each row in this file represents the relation of Arabic character # (character which under the domain of Arabic language) with variant characters across # Arabic script (based on UNICODE standard). Each relation has type of either Exact or Typo # (Exact for mirror matching of both characters while Typo for Similarity look). # # -Structure of this file: # <CHAR>; <VCHAR1>(<POS>:<REL>), <VCHAR2>(<POS>:<REL>), <VCHAR3>(<POS>:<REL>), ... # # <CHAR> a unicode for character which is part of Arabic language domain # <VCHAR#> a unicode for variant character which is part of Arabic script domain and has relation with <CHAR> # <POS> a combination of B,M,F and I (B: Beginning, M: Medial, F: Final, I: Isolated) # <REL> relation type of both <CHAR> and <VCHAR#> either E for Exact of T for Typo # # Authors: # Abdulaziz Al-Zoman (azoman[at]citc.gov.sa) # Raed Al-Fayez (rfayez[at]citc.gov.sa) # Abdulrahman I. AL-Ghadir (aghadir[at]citc.gov.sa) # # SaudiNic # ############################################################################################### 0621; 0622; 0671(FI:T) 0623; 0672(FI:T), 0675(FI:T) 0624; 0676(FI:T) 0625; 0673(FI:T) 0626; 06D3(FI:T), 0678(FI:T) , 0678(BM:E) 0627; 0628; 0629; 06C3(F:T), 06C3(I:E) 062A; 067A(BMFI:T) 062B; 067D(BMFI:T) , 06BD(FI:T), 06BD(BM:E) 062C; 062D; 062E; 062F; 0630; 0631; 0632; 0633; 0634; 0635; 0636; 0637; 0638; 0639; 063A; 0641; 06A7(FI:T), 06A7(BM:E) 0642; 0643; 06A9(FI:T), 06A9(BM:E), 06AA(BMFI:T) 0644; 0645; 0646; 06BA(BM:E) 0647; 06BE(BMI:E), 06C1(I:E), 06C1(MF:T), 06D5(FI:E) 0648; 0649; 06CD(FI:T), 06D2(FI:T), 06CC(FI:E) 064A; 067B(BM:T), 06D0(BMFI:T), 06CC(BM:E) 0660; 0030(BMFI:T), 06F0(BMFI:E) 0661; 0031(BMFI:T), 06F1(BMFI:E) 0662; 0032(BMFI:T), 06F2(BMFI:E) 0663; 0033(BMFI:T), 06F3(BMFI:E) 0664; 0034(BMFI:T), 06F4(BMFI:T) 0665; 0035(BMFI:T), 06F5(BMFI:T) 0666; 0036(BMFI:T), 06F6(BMFI:T) 0667; 0037(BMFI:T), 06F7(BMFI:E) 0668; 0038(BMFI:T), 06F8(BMFI:E) 0669; 0039(BMFI:T), 06F9(BMFI:E) 0030; 0660(BMFI:T), 06F0(BMFI:T) 0031; 0661(BMFI:T), 06F1(BMFI:T) 0032; 0662(BMFI:T), 06F2(BMFI:T) 0033; 0663(BMFI:T), 06F3(BMFI:T) 0034; 0664(BMFI:T), 06F4(BMFI:T) 0035; 0665(BMFI:T), 06F5(BMFI:T) 0036; 0666(BMFI:T), 06F6(BMFI:T) 0037; 0667(BMFI:T), 06F7(BMFI:T) 0038; 0668(BMFI:T), 06F8(BMFI:T) 0039; 0669(BMFI:T), 06F9(BMFI:T) 002D; 002E; #EOF
Tashkeel (Diacritics) and Shadda
They are small singes that are usually put on top or under an Arabic letter for the purpose of correct pronunciation which may leads to a different meaning. Al-tashkeel is not a letter by itself but it is a mean to correctly pronounce a letter. It is not widely used except incase of the possibility of mispronouncing words that have the same letters but with different pronunciations, and hence having different meanings.
Therfore, Tashkeel and Shadda should not be supported in IDN, yet they can be supported only in the user interface, and stripped off at the preparation of internationalized strings (stringprep) phase.
Kasheeda or Tatweel (Horizontal Character Size Extension)
Kasheeda is not a letter. It is a horizontal line (like dash) used to lengthen the connection line between letters. It is used sometimes to enhance the display of Arabic words on screens or printouts.
Hence, Kasheeda (Tatweel) should not be used in IDN.
Character folding
Character folding is the process where multiple letters (that may have some similarity with respect to their shapes) are folded into one shape. This includes:
With respect to the Arabic language, character folding is not acceptable because it changes the meaning of the words and it is against the simplest spelling rules.
Therfore, character folding should not be allowed.
Numbers
In the Arab world, there are two sets of numerical digits used:
From U+0030 (Digit Zero) to U+0039 (Digit Nine)
Mostly used in the western part of the Arab world (al-maghrib al-arabi).
From U+0660 (Arabic-Indic Digit Zero) to U+0669 (Arabic-Indic Digit Nine),
Mostly used in the eastern part of the Arab world (al-mashriq al-arabi).
Hence, both sets should be supported in the user interface and both are folded to one set (Set I) at the preparation of internationalized strings (e.g., "stringprep") phase.
Connecting Multiple Words
In the Arab language words are separated by spaces. Connecting words without spaces is usually not acceptable. Therefore, a single space is the best word separator in an Arabic domain name with multiple words.
Since it is technially not visable to use space as word separator, then multiple words are separated by the character "-" dash.