What is Unicode? • Universal Character Set – All of the major scripts Sinhala Unicode – Simple and consistent manner Developer Workshop – Alphabetic, syllabic and ideographic scripts • Version 4.0 Muthu Nedumaran – 50,000 characters (muthu@murasu.com) – Over 90 scripts Unicode Implementation Inside a Unicode Sinhala Font • All major operating systems OpenType accepts glyphs in TrueType Glyphs or Type1 format (glyf) – Windows, MacOS, Linux, PalmOS, WinCE, Symbian Maps character codes to glyphs. Straight one Character to Glyph to one mapping. For Indic ( & Hebrew, Arabic etc) • WWW Mapping Table scripts, number of glyphs required are more than (cmap) number of characters defined – HTML 4.0, XML, Java, JavaScript • Applications GSUB table provides substitution information. OpenType Tables – MS Office, OpenOffice, InDesign, Acrobat, IE GPOS table provides positioning information. Can GSUB be used to minimise the number of glyphs required and many more GPOS and thus the size of a font Input Method Editors Inside a Unicode Text Document • Unicode Marker (Text) • Legacy Keyboard Drivers – Byte ordering dependant – Mapped to ASCII – Mapped to 8bit • Characters “Only” • Sinhala Unicode IME’s • No Ligatures or “Unencoded” shapes – Vowels, Consonants, Ligatures • No font information – Key Layouts – Text is not bound to a font – FontTester • Sinhala and Tamil recognised respectively 1
Unicode Friendly Applications DEMO • Currently Supported: • Sinhala Font and Text – Text Editors/Word Processors • Legacy Text (7bit Font) – ������������ – Browsers – Databases • Unicode Text (Unicode Font) • Possible Expansion: – Spell Check/Dictionary – Client (Desktop) Applications – Other utilities and tools Unicode Filenames BREAK • Windows • Mac OS X Unicode Text Format Unicode Strings and APIs • ANSI, ASCII, UTF-8, UTF-16 • Windows • Windows Notepad • MacOS • Email Messages • Java • HTML Documents • JavaScript (>1,3) • RTF Format • PHP 2
Parsing Strings Demos • Determining if text is Unicode • External Rendering vs Internal Representation • Determining Consonants, Vowels, Marks – FontTesterTool etc • Handling Unicode Strings • How do I know if the text is Unicode? • Converting Legacy Strings • Byte-Stripping • Searching Unicode APIs A Simple Unicode Application • WideStrings • English, Sinhala and Tamil on the same document – Functions • Display messages in Sinhala/Tamil • Messages • ANSI vs Unicode • Text input in Sinhala/Tamil DEMO: Unicode Web Applications • A Simple Unicode Application • HTML and JavaScript • Header • Embedding Fonts • Text strings 3
Unicode Web Applications Server Side • Forms and Fields • Database Support • User Input • Manipulating Strings • IME Handling • Co-existence – Traditional/Legacy Text 4
Recommend
More recommend