uptex unicode version of ptex with cjk extensions
play

upTEX Unicode version of pTEX with CJK extensions Takuji Tanaka - PowerPoint PPT Presentation

upTEX Unicode version of pTEX with CJK extensions Takuji Tanaka upTEX project Oct 26, 2013 Takuji Tanaka (upTEX project) upTEX Unicode version of pTEX with CJK extensions Oct 26, 2013 1 / 42 Outline /


  1. upTEX – Unicode version of pTEX with CJK extensions Takuji Tanaka 田中 琢 爾 upTEX project Oct 26, 2013 Takuji Tanaka 田中 琢 爾 (upTEX project) upTEX – Unicode version of pTEX with CJK extensions Oct 26, 2013 1 / 42

  2. Outline / 概要 Outline / 概要 (1) Introduction (2) Unicodization / Unicode 化 ◮ Japanese / 日本語 ◮ CJK / 中韓 / 中 ・日・ 한 ◮ with European languages / 欧文との親和性 ◮ world languages / 世界の言語 (3) Imprementation / 実装 ◮ Unicodization / Unicode 化 ◮ \ kcatcode ◮ set3 (4) upTEX vs. Ω , X TEX, . . . E (5) Present & future / 現在と今後 Takuji Tanaka 田中 琢 爾 (upTEX project) upTEX – Unicode version of pTEX with CJK extensions Oct 26, 2013 2 / 42

  3. Part I Introduction Takuji Tanaka 田中 琢 爾 (upTEX project) upTEX – Unicode version of pTEX with CJK extensions Oct 26, 2013 3 / 42

  4. Introduction pTEX/pL A TEX ASCII pTEX/pL A TEX It’s great : High quality Japanese typesetting incl. vertical writing, Japanese hyphenation, . . . Japanese standard TEX/L A TEX Strong support by environment —DVIware, packages, macros, softwares, books, . . . but has weakness : Japanese local — 8bit Latin/Chinese/Korean are not available Limited character set by legacy encodings (Shift_JIS, EUC-JP) Takuji Tanaka 田中 琢 爾 (upTEX project) upTEX – Unicode version of pTEX with CJK extensions Oct 26, 2013 4 / 42

  5. Introduction Motivation Motivation Support wider character set of Japanese by Unicode Support babel by switching Latin–CJK tokens Support Chinese/Korean Keep quality & environment of pTEX Takuji Tanaka 田中 琢 爾 (upTEX project) upTEX – Unicode version of pTEX with CJK extensions Oct 26, 2013 5 / 42

  6. Introduction Feature Feature of upTEX/upL A TEX (1) High quality CJK typesetting based on pTEX/pL A TEX (2) Compatible with pTEX/pL A TEX (3) Unicode / UTF-8 (4) Switching Latin (12bit) / CJK (29bit) tokens (5) CJK with Babel (Latin/Cyrillic/Greek. . . ) (6) Over BMP — incl. SIP (U+2xxxx) Takuji Tanaka 田中 琢 爾 (upTEX project) upTEX – Unicode version of pTEX with CJK extensions Oct 26, 2013 6 / 42

  7. Part II Unicodization / Unicode 化 Takuji Tanaka 田中 琢 爾 (upTEX project) upTEX – Unicode version of pTEX with CJK extensions Oct 26, 2013 7 / 42

  8. Unicodization / Unicode 化 Unicodization / Unicode 化 Unicodization / Unicode 化 Strategies of Unicodization (1) Unicodize only IO Ex: \ usepackage[utf8]{inputenc} (2) Imprement Unicode functions Ex: X TEX E (3) Comromise upTEX: Intenal: Unicodize only CJK, IO: Fully Unicodize Takuji Tanaka 田中 琢 爾 (upTEX project) upTEX – Unicode version of pTEX with CJK extensions Oct 26, 2013 8 / 42

  9. ① Ⅳ Unicodization / Unicode 化 Partial Unicodization / 折衷的 Unicode 化 Partial Unicodization / 折衷的 Unicode 化 TEX pTEX upTEX 7bit Latin azAZ azAZ azAZ Latin 8bit Latin æœÆŒ æœÆŒ inputenc гдГД гдГД あア亜 あア亜 Japanese JIS X 0208 髙 Unicode 汉字 漢字 CK Unicode 한글 pTEX, upTEXconsists of two parts (1) As same as original TEX (2) pTeX–JIS X 0208, upTeX–Unicode Takuji Tanaka 田中 琢 爾 (upTEX project) upTEX – Unicode version of pTEX with CJK extensions Oct 26, 2013 9 / 42

  10. ␣ ② ヹ ヺ ⅓ ⅔ ⅕ ✓ ⌘ ⓒ ⏎ ⓑ ① ③ ヷ ❶ ❷ ❸ ⓵ ⓶ ⓷ ⅰ ⅱ ⅲ Ⅰ Ⅱ ヸ ゖ ⓐ ♦ ㋒ ㋑ 〼 〽 ♮ ♫ ♬ ♩ ♤ ♠ ♢ ♡ ㋐ ♥ ♧ ♣ ☖ ☗ 〠 ☎ ☀ ☁ ☂ ☃ ♨ Ⅲ Japanese / 日本語 New JIS / 新 JIS New JIS : JIS X 0213 upTEX treats new JIS X 0213 (over JIS X 0208) ゔゕ ㈱㈲ 鄧小平 李承燁 里見弴 草彅剛 朴璐美 森鷗外 森雞二 王銘琬 宮﨑 あおい 蔣介石 你好 深圳 東日本旅 客鉃道株式会社 尾骶骨 生酛仕込 凮月堂 㐂寿 仐寿 圓壔函數 啞然 火焰 嚙む 任俠 長身瘦 軀 石鹼 屢〻 刺繡 醬油 蟬時雨 隔靴搔痒 奥飛驒 簞笥 摑む 充塡 顚末 祈禱 瀆職 土囊 潑溂 醱酵 頰紅 素麵 麴町 蓬萊 蠟燭 攢竹 Takuji Tanaka 田中 琢 爾 (upTEX project) upTEX – Unicode version of pTEX with CJK extensions Oct 26, 2013 11 / 42

  11. ⑱ ⑬ ⑤ ⑥ ⑦ ⑧ ⑨ ⑩ ⑪ ⑫ ⑭ ③ ⑮ ⑯ ⑰ Ⅴ ⑲ ⑳ Ⅰ Ⅱ ④ ② Ⅳ ∮ ∪ ∩ ∵ ⊿ ∟ ∠ ⊥ √ ∫ ① ≡ ≒ № Ⅹ Ⅸ 𠮷 Ⅷ Ⅶ Ⅵ Ⅲ Japanese / 日本語 Characters out of JIS / JIS 外字 Characters out of JIS / JIS 外字 over JIS X 0213 (new JIS) ✎ ☞ 髙島屋、 内田百閒、 杮落 髙島屋、内田百閒、 とし、 安全㐧一、 野家 杮落とし、安全㐧一、𠮷 野家 ✍ ✌ source output Platform dependent characters are now in Unicode ㍉㌔㌢㍍㌘㌧㌃㌶㍑㍗㌍㌦㌣㌫㍊㌻ ㎜㎝㎞㎎㎏㏄㎡㍻ 〝〟 ㏍℡ ㊤㊥㊦㊧㊨㈱㈲㈹㍾㍽㍼ 髙閒塚 德豐﨑 彅弴燁珉鄧 Takuji Tanaka 田中 琢 爾 (upTEX project) upTEX – Unicode version of pTEX with CJK extensions Oct 26, 2013 13 / 42

  12. CJK / 中 ・日・ 한 basis Chinese/Japanese/Korean 中 ・日・ 한 ✎ ☞ \schrm 简体中文 : 你好 简体中文 : 你好 \tchrm 繁體中文 : 早晨 繁體中文 : 早晨 日本語 : こんにちは \jpnrm 日本語 : こんにちは 한국어 : 안녕하세요 \korrm 한국어 : 안녕하세요 ✍ ✌ output source Takuji Tanaka 田中 琢 爾 (upTEX project) upTEX – Unicode version of pTEX with CJK extensions Oct 26, 2013 15 / 42

  13. CJK / 中 ・日・ 한 glyphs Difference of glyphs among CJK / CJK のグリフの違い 骨練,平直。神祀,才次. Simplified Chinese 骨練,平直。神祀,才次. Traditional Chinese 骨練,平直。神祀,才次. Japanese 骨練,平直。神祀,才次. Korean Takuji Tanaka 田中 琢 爾 (upTEX project) upTEX – Unicode version of pTEX with CJK extensions Oct 26, 2013 16 / 42

  14. CJK / 中 ・日・ 한 end-of-line end-of-line ✎ ☞ Please give me beer. Please give ↓ (treated as space) me beer. 请给我啤酒。 请给我 ↓ 啤酒。 (ignored) ビールを私に下さい。 ビールを私に ↓ 下さい。 (ignored) 맥주를 나에게 ↓ 맥주를 나에게 주세요 . 주세요 . ✍ ✌ (treated as space) Takuji Tanaka 田中 琢 爾 (upTEX project) upTEX – Unicode version of pTEX with CJK extensions Oct 26, 2013 18 / 42

  15. CJK / 中 ・日・ 한 control words Control word by CJK characters ✎ ☞ \def\ 오늘 {% \number\year 연 % Today: 《 2013 연 10 월 26 \number\month 월 % \number\day 일 % 일》 } Today: 《 \ 오늘》 ✍ ✌ Takuji Tanaka 田中 琢 爾 (upTEX project) upTEX – Unicode version of pTEX with CJK extensions Oct 26, 2013 20 / 42

  16. ⑸ 3 答 ♼ Ⅸ 学校法人 ⒍ 野球 ① ❷ 問 4 CJK / 中 ・日・ 한 Japanese-OTF package Japanese-OTF package ✎ ☞ \usepackage[uplatex,...]{otf} ... Adobe-Korea1-1: Adobe-Korea1-1:\\ 1⃞�☯����약⃝ \CIDK{8322}\CIDK{8588} ... Adobe-Japan1-5: Adobe-Japan1-5:\\ 問 \ ◇ 答 \ajRecycle{10}% \ ● \ajLig{ 学校法人 }% ㈦㊇ \ajPICT{ 野球 }\\ \ajMaru{1}... ✍ ✌ Japanese-OTF package also supports CK. Takuji Tanaka 田中 琢 爾 (upTEX project) upTEX – Unicode version of pTEX with CJK extensions Oct 26, 2013 22 / 42

  17. CJK / 中 ・日・ 한 Unification / 統合 Unification / 統合 standard full-width Cyrillic Ж U+0416 Ж U+0416 W U+0057 W U+FF37 Latin No “full-width” code in Greek, Cyrillic in Unicode. It is a barrier to Unicodize Japanese softs. upTEX can treat full-width Greek, Cyrillic by markup. Takuji Tanaka 田中 琢 爾 (upTEX project) upTEX – Unicode version of pTEX with CJK extensions Oct 26, 2013 23 / 42

  18. with European languages / 欧文との親和性 inputenc inputenc & UTF-8 ✎ ☞ \usepackage[utf8]{inputenc} \usepackage[T1]{fontenc} \kcatcode‘ç=15 “¿But aren’t Kafka’s Schloß ... and Æsop’s Œuvres often “¿But aren’t Kafka’s naïve vis-à-vis the dæmonic Schloß and Æsop’s phœnix’s official rôle in Œuvres often naïve fluffy soufflés?” vis-à-vis the dæmonic phœnix’s official rôle in fluffy soufflés?” ✍ ✌ Takuji Tanaka 田中 琢 爾 (upTEX project) upTEX – Unicode version of pTEX with CJK extensions Oct 26, 2013 25 / 42

  19. with European languages / 欧文との親和性 Babel Babel ✎ ☞ English \usepackage[french,...]% October 26, 2013 {babel} Français ... 26 octobre 2013 \selectlanguage{english} Deutsch English ... \today 26. Oktober 2013 ... Czech \selectlanguage{russian} 26. října 2013 Русский ... \today Русский 26 октября 2013 г. \selectlanguage{japanese} 日本語 日本語 ... \today 2013 年 10 月 26 日 ✍ ✌ Takuji Tanaka 田中 琢 爾 (upTEX project) upTEX – Unicode version of pTEX with CJK extensions Oct 26, 2013 27 / 42

  20. with European languages / 欧文との親和性 It’s a small world It’s a small world upTEX can treat CJK, Latin, Cyrillic and Greek. upTEX cannot directly treat Arabic, Brahmic, . . . Takuji Tanaka 田中 琢 爾 (upTEX project) upTEX – Unicode version of pTEX with CJK extensions Oct 26, 2013 28 / 42

  21. Part III Imprementation / 実装 Takuji Tanaka 田中 琢 爾 (upTEX project) upTEX – Unicode version of pTEX with CJK extensions Oct 26, 2013 29 / 42

Recommend


More recommend