Skip to main content

Rendering Recommendations draft

So finally, I am giving out this long awaited draft:
http://tinyurl.com/34yckl

It addresses some of the OpenType, Unicode and fonts related issues. Many of the issues discussed here, have been the source of conflicts, especially for ml_IN. Thus it was an utter need to provide a detailed analysis like this. I hope the illustrations made there provide some common guidelines. There is certainly a scope for improvement. I would like to hear from various communities if they want some of the other left out issues to be also addressed.
The draft is open for discussion and feedback.

Comments

  1. Hi राहुल,
    Posting my comments here, primarily for devanagari.
    First, with reference to the issues that were raised.
    1.
    अॅ Issue 1 - This form is also used in Hindi for transliterating foreign language words, for though more common in Marathi.
    Similar is the case for
    आॅ - which should be rendered the same way as ऑ, although we would be promoting two forms of the same text. Perhaps the renderer can auto convert आॅ to ऑ
    २. ऱ् - I'm curious to know if there are any other instances in Marathi that necessitate the need for ZWJ - ie not having ZWJ would change the word grammatically? Asking you because ZWJ use for ऱ् has been oft quoted on the unicode mailing lists even though it can be encoded even without ZWJ.
    3. Nothing can be done about the "redundant" codepoints. But we could have the rendering engine always produce the most compact encoding, ie the rendering engine can auto convert क़ to the single codepoint.
    4. Lack of encoding for ङ्क, ङ्ख, ङ्ग, ङ्घ conjucts - most common fonts lack these conjuncts, as a result, users resort to using anuswar instead, which is fine, but the font glyphs are forcing a particular kind of behavior on the users. Encoding of these conjuncts should be mandatory.
    5. Text editor - can you incorporate a section for the text editor as well, aside from font, rendering engine and unicode? - text editors should be designed such that they compact to the least codepoint form.
    eg क़ gets replaced with the one single conjunct, र + halant + ZWJ gets replaced with ऱ + halant during storage or data transfer.
    This way we save space(which not really the goal) and also have a consistent codepoint sequence for common words (which is the goal).
    आलोक

    ReplyDelete
  2. Hi Alok,
    Thanks for your comments.
    1. I have received few feedbacks that have informed that the independent vowel for candra E i.e. अॅ already proposed in unicode and is currently in the beta version of unicode 5.1.0. I don't think need to have anything special for आॅ issue since ऑ is already encoded. But we may manipulate the sequence in keyboard maps.
    2. Users can always use ZWJ/ZWNJ as per guided in unicode standard. Generally they are used for alternative half forms and are used in marathi sometimes.ZWNJ is more common on few websites, may not be per unicode always, but in few cases they are required.
    3. removing redundancy is useful for ensuring proper combinations as well. But the actual data size cannot be reduced.
    4. Formation of the conjuncts is entirely upto font and we can at the most make sure to include these conjuncts in the font. But convincing unicode to add them will be very difficult.
    5. Variety and wide spectrum of text editors may not provide compact encoding, but we can always modify or create few of such.

    ReplyDelete

Post a Comment

Popular posts from this blog

Unicode 5.1 release and Indic changes

Unicode 5.1 release was announced earlier this month on 4th April. Here I have put a diff taken of Unicode 5.1 character database against that of Unicode 5.0. My buddy, Parag also did a nice job of summarizing the Indic specific changes, that I am trying to restate now. So, here go the updates on Indian scripts UCD: A. New Indic Scripts Added to Unicode: 1. LEPCHA: Lepcha is a language spoken by the Lepcha people in Sikkim in India,and parts of Nepal and Bhutan. The Lepcha script (also known as "róng") is a syllabic script which has a lot of special marks and requires ligatures. Its genealogy is unclear. Early Lepcha manuscripts were written vertically, a sign of Chinese influence. Lepcha is considered to be one of the aboriginal languages of the area in which it is spoken. Total number of speakers numbers near 50,000. Unicode Range =>U1C00 to U1C4F Chart URL => http://www.unicode.org/charts/PDF/U1C00.pdf 2. OL-CHIKI: The Ol Chiki script, also known

PVR is so wierd!

Yesterday we went second time to a mall bit far from office to complete the earlier failed mission of watching this 3D movie, Clash of the Titans. On ticket counter, we were first told that evening show was house full. Then we asked for a night show, and were told there isn't any show then and the gentleman handed us the pamphlet of all movie schedules. We checked on the nearby digital kiosk and also on the printed schedule to be sure of the show timings. Then went to second counter, and asked the lady for the night show tickets, and without any problem got the tickets for back seats. In fact this show was hardly 20% full, wonder how the evening show became houseful. But the biggest wonder/blunder is yet to come. On the entrance we were stopped for having a laptop bag along with (we had went straight after the office). In spite of having checked the bag, we were not allowed, because laptops were not allowed inside! Then we asked for keeping it at the baggage counter. But then, the

What is so wrong with Bhagwad Geeta?

Here's a discussion I had with someone over Bhagwad Geeta on TOI forum (Stop reading now if you don't want to go to the end, it may mislead): mukunda (Bengaluru) replies to Siddharth 21 Jul, 2011 02:50 PM Ok,lets read ch 4 verse 13. catur-varnyam maya srstam guna-karma-vibhagasah tasya kartaram api mam viddhy akartaram avyayam "According to the three modes of material nature and the work associated with them, the four divisions of human society are created by Me. And although I am the creator of this system, you should know that I am yet the nondoer, being unchangeable." 1st line"catur-varnyam maya srstam" 4 varnas are created by Me(Paramatma),2nd line "guna-karma-vibhagasah" where the vabhajan\categorization is based on one's guna composition and karma composition. 3rd and 4th line states how He is the non doer and unchangable. Sri Krishna says that each living entity is categorized into one of the 4 varnas based ONLY on their pre