what is unicode
play

What is Unicode? Universal Character Set All of the major scripts - PDF document

What is Unicode? Universal Character Set All of the major scripts Sinhala Unicode Simple and consistent manner Developer Workshop Alphabetic, syllabic and ideographic scripts Version 4.0 Muthu Nedumaran 50,000


  1. What is Unicode? • Universal Character Set – All of the major scripts Sinhala Unicode – Simple and consistent manner Developer Workshop – Alphabetic, syllabic and ideographic scripts • Version 4.0 Muthu Nedumaran – 50,000 characters (muthu@murasu.com) – Over 90 scripts Unicode Implementation Inside a Unicode Sinhala Font • All major operating systems OpenType accepts glyphs in TrueType Glyphs or Type1 format (glyf) – Windows, MacOS, Linux, PalmOS, WinCE, Symbian Maps character codes to glyphs. Straight one Character to Glyph to one mapping. For Indic ( & Hebrew, Arabic etc) • WWW Mapping Table scripts, number of glyphs required are more than (cmap) number of characters defined – HTML 4.0, XML, Java, JavaScript • Applications GSUB table provides substitution information. OpenType Tables – MS Office, OpenOffice, InDesign, Acrobat, IE GPOS table provides positioning information. Can GSUB be used to minimise the number of glyphs required and many more GPOS and thus the size of a font Input Method Editors Inside a Unicode Text Document • Unicode Marker (Text) • Legacy Keyboard Drivers – Byte ordering dependant – Mapped to ASCII – Mapped to 8bit • Characters “Only” • Sinhala Unicode IME’s • No Ligatures or “Unencoded” shapes – Vowels, Consonants, Ligatures • No font information – Key Layouts – Text is not bound to a font – FontTester • Sinhala and Tamil recognised respectively 1

  2. Unicode Friendly Applications DEMO • Currently Supported: • Sinhala Font and Text – Text Editors/Word Processors • Legacy Text (7bit Font) – ������������ – Browsers – Databases • Unicode Text (Unicode Font) • Possible Expansion: – Spell Check/Dictionary – Client (Desktop) Applications – Other utilities and tools Unicode Filenames BREAK • Windows • Mac OS X Unicode Text Format Unicode Strings and APIs • ANSI, ASCII, UTF-8, UTF-16 • Windows • Windows Notepad • MacOS • Email Messages • Java • HTML Documents • JavaScript (>1,3) • RTF Format • PHP 2

  3. Parsing Strings Demos • Determining if text is Unicode • External Rendering vs Internal Representation • Determining Consonants, Vowels, Marks – FontTesterTool etc • Handling Unicode Strings • How do I know if the text is Unicode? • Converting Legacy Strings • Byte-Stripping • Searching Unicode APIs A Simple Unicode Application • WideStrings • English, Sinhala and Tamil on the same document – Functions • Display messages in Sinhala/Tamil • Messages • ANSI vs Unicode • Text input in Sinhala/Tamil DEMO: Unicode Web Applications • A Simple Unicode Application • HTML and JavaScript • Header • Embedding Fonts • Text strings 3

  4. Unicode Web Applications Server Side • Forms and Fields • Database Support • User Input • Manipulating Strings • IME Handling • Co-existence – Traditional/Legacy Text 4

Recommend


More recommend