TLD: xn--mgberp4a5d4ar
Language Tag: AR
Language Description: Arabic
Version: 1.0
Effective Date: 01 Jan 2010
    Registry: Saudi Network Information Center
Contact: Abdulaziz Al-Zoman azoman[at]citc.gov.sa
Address: SaudiNIC, General directorate of Internet services, CITC, P.O. Box 75606, Riyadh 11588, Saudi Arabia
Telephone: +966-1-263-9233 Fax: +966-1-263-9393
Website: http://www.nic.net.sa/

Relevant Policy Document URL: http://www.nic.net.sa/docs/ADN/Arabic_Domain_Name_Registration_Regulation_For_Saudi_IDN_ccTLD_V1.pdf

This document provides a description of the IDN (Internationalized Domain Names) Language Table to be used by SaudiNIC for the registration of Arabic Domain names under xn--mgberp4a5d4ar(السعودية) TLD. These are based on the recommendation of the Arabic Domain Name Pilot Project (www.arabic-domains.org).

###############################################################################################
#
#		This file describes IDN table of Variants for Arabic language that belong to this label
#	xn--mgberp4a5d4ar. Each row in this file represents the relation of Arabic character 
#	(character which under the domain of Arabic language) with variant characters across 
#	Arabic script (based on UNICODE standard). Each relation has type of either Exact or Typo
#	(Exact for mirror matching of both characters while Typo for Similarity look).
#
#   -Structure of this file:        
#   <CHAR>; <VCHAR1>(<POS>:<REL>), <VCHAR2>(<POS>:<REL>), <VCHAR3>(<POS>:<REL>), ...
#   
#	<CHAR> a unicode for character which is part of Arabic language domain   
#   <VCHAR#> a unicode for variant character which is part of Arabic script domain and has relation with <CHAR>   
#	<POS> a combination of B,M,F and I (B: Beginning, M: Medial, F: Final, I: Isolated)
#	<REL> relation type of both <CHAR> and <VCHAR#> either E for Exact of T for Typo
#
#	Authors: 
#		Abdulaziz Al-Zoman 			(azoman[at]citc.gov.sa)
#		Raed Al-Fayez      			(rfayez[at]citc.gov.sa)
#		Abdulrahman I. AL-Ghadir		(aghadir[at]citc.gov.sa)
#
#		SaudiNic
#
###############################################################################################


0621; 
0622; 0671(FI:T)  
0623; 0672(FI:T), 0675(FI:T)
0624; 0676(FI:T) 
0625; 0673(FI:T)  
0626; 06D3(FI:T), 0678(FI:T) , 0678(BM:E) 
0627; 
0628; 
0629; 06C3(F:T), 06C3(I:E)  
062A; 067A(BMFI:T)  
062B; 067D(BMFI:T) , 06BD(FI:T), 06BD(BM:E) 
062C; 
062D; 
062E; 
062F; 
0630; 
0631; 
0632; 
0633; 
0634; 
0635; 
0636; 
0637; 
0638; 
0639; 
063A; 
0641; 06A7(FI:T), 06A7(BM:E)  
0642; 
0643; 06A9(FI:T), 06A9(BM:E), 06AA(BMFI:T)  
0644; 
0645; 
0646; 06BA(BM:E)  
0647; 06BE(BMI:E),  06C1(I:E), 06C1(MF:T), 06D5(FI:E)  
0648; 
0649; 06CD(FI:T), 06D2(FI:T), 06CC(FI:E)  
064A; 067B(BM:T), 06D0(BMFI:T), 06CC(BM:E)  

0660; 0030(BMFI:T), 06F0(BMFI:E)  
0661; 0031(BMFI:T), 06F1(BMFI:E)    
0662; 0032(BMFI:T), 06F2(BMFI:E)    
0663; 0033(BMFI:T), 06F3(BMFI:E)    
0664; 0034(BMFI:T), 06F4(BMFI:T)
0665; 0035(BMFI:T), 06F5(BMFI:T)  
0666; 0036(BMFI:T), 06F6(BMFI:T)
0667; 0037(BMFI:T), 06F7(BMFI:E)  
0668; 0038(BMFI:T), 06F8(BMFI:E)  
0669; 0039(BMFI:T), 06F9(BMFI:E)  

0030; 0660(BMFI:T), 06F0(BMFI:T) 
0031; 0661(BMFI:T), 06F1(BMFI:T) 
0032; 0662(BMFI:T), 06F2(BMFI:T) 
0033; 0663(BMFI:T), 06F3(BMFI:T) 
0034; 0664(BMFI:T), 06F4(BMFI:T) 
0035; 0665(BMFI:T), 06F5(BMFI:T) 
0036; 0666(BMFI:T), 06F6(BMFI:T) 
0037; 0667(BMFI:T), 06F7(BMFI:T) 
0038; 0668(BMFI:T), 06F8(BMFI:T) 
0039; 0669(BMFI:T), 06F9(BMFI:T) 

002D; 
002E; 

#EOF

Some Linguistic Issues

  1. Tashkeel (Diacritics) and Shadda

    They are small singes that are usually put on top or under an Arabic letter for the purpose of correct 
pronunciation which may leads to a different meaning. Al-tashkeel is not a letter by itself but it is a 
mean to correctly pronounce a letter. It is not widely used except incase of the possibility of 
mispronouncing words that have the same letters but with different pronunciations, 
and hence having different meanings.

    Therfore, Tashkeel and Shadda should not be supported in IDN, yet they can be supported only in 
the user interface, and stripped off at the preparation of internationalized strings (stringprep) phase.

  2. Kasheeda or Tatweel (Horizontal Character Size Extension)

    Kasheeda is not a letter. It is a horizontal line (like dash) used to lengthen the connection line between
letters. It is used sometimes to enhance the display of Arabic words on screens or printouts.

    Hence, Kasheeda (Tatweel) should not be used in IDN.

  3. Character folding

    Character folding is the process where multiple letters (that may have some similarity with respect 
to their shapes) are folded into one shape. This includes:

    With respect to the Arabic language, character folding is not acceptable because it changes the meaning
of the words and it is against the simplest spelling rules.

    Therfore, character folding should not be allowed.

  4. Numbers

    In the Arab world, there are two sets of numerical digits used:

    1. From U+0030 (Digit Zero) to U+0039 (Digit Nine)

      Mostly used in the western part of the Arab world (al-maghrib al-arabi).

    2. From U+0660 (Arabic-Indic Digit Zero) to U+0669 (Arabic-Indic Digit Nine),

      Mostly used in the eastern part of the Arab world (al-mashriq al-arabi).

    Hence, both sets should be supported in the user interface and both are folded to one set (Set I) 
at the preparation of internationalized strings (e.g., "stringprep") phase.

  5. Connecting Multiple Words

    In the Arab language words are separated by spaces. Connecting words without spaces is 
usually not acceptable. Therefore, a single space is the best word separator in an Arabic 
domain name with multiple words.

    Since it is technially not visable to use space as word separator, then multiple words are 
separated by the character "-" dash.