upTEX – Unicode version of pTEX with CJK extensions Takuji Tanaka 田中 琢 爾 upTEX project Oct 26, 2013 Takuji Tanaka 田中 琢 爾 (upTEX project) upTEX – Unicode version of pTEX with CJK extensions Oct 26, 2013 1 / 42
Outline / 概要 Outline / 概要 (1) Introduction (2) Unicodization / Unicode 化 ◮ Japanese / 日本語 ◮ CJK / 中韓 / 中 ・日・ 한 ◮ with European languages / 欧文との親和性 ◮ world languages / 世界の言語 (3) Imprementation / 実装 ◮ Unicodization / Unicode 化 ◮ \ kcatcode ◮ set3 (4) upTEX vs. Ω , X TEX, . . . E (5) Present & future / 現在と今後 Takuji Tanaka 田中 琢 爾 (upTEX project) upTEX – Unicode version of pTEX with CJK extensions Oct 26, 2013 2 / 42
Part I Introduction Takuji Tanaka 田中 琢 爾 (upTEX project) upTEX – Unicode version of pTEX with CJK extensions Oct 26, 2013 3 / 42
Introduction pTEX/pL A TEX ASCII pTEX/pL A TEX It’s great : High quality Japanese typesetting incl. vertical writing, Japanese hyphenation, . . . Japanese standard TEX/L A TEX Strong support by environment —DVIware, packages, macros, softwares, books, . . . but has weakness : Japanese local — 8bit Latin/Chinese/Korean are not available Limited character set by legacy encodings (Shift_JIS, EUC-JP) Takuji Tanaka 田中 琢 爾 (upTEX project) upTEX – Unicode version of pTEX with CJK extensions Oct 26, 2013 4 / 42
Introduction Motivation Motivation Support wider character set of Japanese by Unicode Support babel by switching Latin–CJK tokens Support Chinese/Korean Keep quality & environment of pTEX Takuji Tanaka 田中 琢 爾 (upTEX project) upTEX – Unicode version of pTEX with CJK extensions Oct 26, 2013 5 / 42
Introduction Feature Feature of upTEX/upL A TEX (1) High quality CJK typesetting based on pTEX/pL A TEX (2) Compatible with pTEX/pL A TEX (3) Unicode / UTF-8 (4) Switching Latin (12bit) / CJK (29bit) tokens (5) CJK with Babel (Latin/Cyrillic/Greek. . . ) (6) Over BMP — incl. SIP (U+2xxxx) Takuji Tanaka 田中 琢 爾 (upTEX project) upTEX – Unicode version of pTEX with CJK extensions Oct 26, 2013 6 / 42
Part II Unicodization / Unicode 化 Takuji Tanaka 田中 琢 爾 (upTEX project) upTEX – Unicode version of pTEX with CJK extensions Oct 26, 2013 7 / 42
Unicodization / Unicode 化 Unicodization / Unicode 化 Unicodization / Unicode 化 Strategies of Unicodization (1) Unicodize only IO Ex: \ usepackage[utf8]{inputenc} (2) Imprement Unicode functions Ex: X TEX E (3) Comromise upTEX: Intenal: Unicodize only CJK, IO: Fully Unicodize Takuji Tanaka 田中 琢 爾 (upTEX project) upTEX – Unicode version of pTEX with CJK extensions Oct 26, 2013 8 / 42
① Ⅳ Unicodization / Unicode 化 Partial Unicodization / 折衷的 Unicode 化 Partial Unicodization / 折衷的 Unicode 化 TEX pTEX upTEX 7bit Latin azAZ azAZ azAZ Latin 8bit Latin æœÆŒ æœÆŒ inputenc гдГД гдГД あア亜 あア亜 Japanese JIS X 0208 髙 Unicode 汉字 漢字 CK Unicode 한글 pTEX, upTEXconsists of two parts (1) As same as original TEX (2) pTeX–JIS X 0208, upTeX–Unicode Takuji Tanaka 田中 琢 爾 (upTEX project) upTEX – Unicode version of pTEX with CJK extensions Oct 26, 2013 9 / 42
␣ ② ヹ ヺ ⅓ ⅔ ⅕ ✓ ⌘ ⓒ ⏎ ⓑ ① ③ ヷ ❶ ❷ ❸ ⓵ ⓶ ⓷ ⅰ ⅱ ⅲ Ⅰ Ⅱ ヸ ゖ ⓐ ♦ ㋒ ㋑ 〼 〽 ♮ ♫ ♬ ♩ ♤ ♠ ♢ ♡ ㋐ ♥ ♧ ♣ ☖ ☗ 〠 ☎ ☀ ☁ ☂ ☃ ♨ Ⅲ Japanese / 日本語 New JIS / 新 JIS New JIS : JIS X 0213 upTEX treats new JIS X 0213 (over JIS X 0208) ゔゕ ㈱㈲ 鄧小平 李承燁 里見弴 草彅剛 朴璐美 森鷗外 森雞二 王銘琬 宮﨑 あおい 蔣介石 你好 深圳 東日本旅 客鉃道株式会社 尾骶骨 生酛仕込 凮月堂 㐂寿 仐寿 圓壔函數 啞然 火焰 嚙む 任俠 長身瘦 軀 石鹼 屢〻 刺繡 醬油 蟬時雨 隔靴搔痒 奥飛驒 簞笥 摑む 充塡 顚末 祈禱 瀆職 土囊 潑溂 醱酵 頰紅 素麵 麴町 蓬萊 蠟燭 攢竹 Takuji Tanaka 田中 琢 爾 (upTEX project) upTEX – Unicode version of pTEX with CJK extensions Oct 26, 2013 11 / 42
⑱ ⑬ ⑤ ⑥ ⑦ ⑧ ⑨ ⑩ ⑪ ⑫ ⑭ ③ ⑮ ⑯ ⑰ Ⅴ ⑲ ⑳ Ⅰ Ⅱ ④ ② Ⅳ ∮ ∪ ∩ ∵ ⊿ ∟ ∠ ⊥ √ ∫ ① ≡ ≒ № Ⅹ Ⅸ 𠮷 Ⅷ Ⅶ Ⅵ Ⅲ Japanese / 日本語 Characters out of JIS / JIS 外字 Characters out of JIS / JIS 外字 over JIS X 0213 (new JIS) ✎ ☞ 髙島屋、 内田百閒、 杮落 髙島屋、内田百閒、 とし、 安全㐧一、 野家 杮落とし、安全㐧一、𠮷 野家 ✍ ✌ source output Platform dependent characters are now in Unicode ㍉㌔㌢㍍㌘㌧㌃㌶㍑㍗㌍㌦㌣㌫㍊㌻ ㎜㎝㎞㎎㎏㏄㎡㍻ 〝〟 ㏍℡ ㊤㊥㊦㊧㊨㈱㈲㈹㍾㍽㍼ 髙閒塚 德豐﨑 彅弴燁珉鄧 Takuji Tanaka 田中 琢 爾 (upTEX project) upTEX – Unicode version of pTEX with CJK extensions Oct 26, 2013 13 / 42
CJK / 中 ・日・ 한 basis Chinese/Japanese/Korean 中 ・日・ 한 ✎ ☞ \schrm 简体中文 : 你好 简体中文 : 你好 \tchrm 繁體中文 : 早晨 繁體中文 : 早晨 日本語 : こんにちは \jpnrm 日本語 : こんにちは 한국어 : 안녕하세요 \korrm 한국어 : 안녕하세요 ✍ ✌ output source Takuji Tanaka 田中 琢 爾 (upTEX project) upTEX – Unicode version of pTEX with CJK extensions Oct 26, 2013 15 / 42
CJK / 中 ・日・ 한 glyphs Difference of glyphs among CJK / CJK のグリフの違い 骨練,平直。神祀,才次. Simplified Chinese 骨練,平直。神祀,才次. Traditional Chinese 骨練,平直。神祀,才次. Japanese 骨練,平直。神祀,才次. Korean Takuji Tanaka 田中 琢 爾 (upTEX project) upTEX – Unicode version of pTEX with CJK extensions Oct 26, 2013 16 / 42
CJK / 中 ・日・ 한 end-of-line end-of-line ✎ ☞ Please give me beer. Please give ↓ (treated as space) me beer. 请给我啤酒。 请给我 ↓ 啤酒。 (ignored) ビールを私に下さい。 ビールを私に ↓ 下さい。 (ignored) 맥주를 나에게 ↓ 맥주를 나에게 주세요 . 주세요 . ✍ ✌ (treated as space) Takuji Tanaka 田中 琢 爾 (upTEX project) upTEX – Unicode version of pTEX with CJK extensions Oct 26, 2013 18 / 42
CJK / 中 ・日・ 한 control words Control word by CJK characters ✎ ☞ \def\ 오늘 {% \number\year 연 % Today: 《 2013 연 10 월 26 \number\month 월 % \number\day 일 % 일》 } Today: 《 \ 오늘》 ✍ ✌ Takuji Tanaka 田中 琢 爾 (upTEX project) upTEX – Unicode version of pTEX with CJK extensions Oct 26, 2013 20 / 42
⑸ 3 答 ♼ Ⅸ 学校法人 ⒍ 野球 ① ❷ 問 4 CJK / 中 ・日・ 한 Japanese-OTF package Japanese-OTF package ✎ ☞ \usepackage[uplatex,...]{otf} ... Adobe-Korea1-1: Adobe-Korea1-1:\\ 1⃞�☯����약⃝ \CIDK{8322}\CIDK{8588} ... Adobe-Japan1-5: Adobe-Japan1-5:\\ 問 \ ◇ 答 \ajRecycle{10}% \ ● \ajLig{ 学校法人 }% ㈦㊇ \ajPICT{ 野球 }\\ \ajMaru{1}... ✍ ✌ Japanese-OTF package also supports CK. Takuji Tanaka 田中 琢 爾 (upTEX project) upTEX – Unicode version of pTEX with CJK extensions Oct 26, 2013 22 / 42
CJK / 中 ・日・ 한 Unification / 統合 Unification / 統合 standard full-width Cyrillic Ж U+0416 Ж U+0416 W U+0057 W U+FF37 Latin No “full-width” code in Greek, Cyrillic in Unicode. It is a barrier to Unicodize Japanese softs. upTEX can treat full-width Greek, Cyrillic by markup. Takuji Tanaka 田中 琢 爾 (upTEX project) upTEX – Unicode version of pTEX with CJK extensions Oct 26, 2013 23 / 42
with European languages / 欧文との親和性 inputenc inputenc & UTF-8 ✎ ☞ \usepackage[utf8]{inputenc} \usepackage[T1]{fontenc} \kcatcode‘ç=15 “¿But aren’t Kafka’s Schloß ... and Æsop’s Œuvres often “¿But aren’t Kafka’s naïve vis-à-vis the dæmonic Schloß and Æsop’s phœnix’s official rôle in Œuvres often naïve fluffy soufflés?” vis-à-vis the dæmonic phœnix’s official rôle in fluffy soufflés?” ✍ ✌ Takuji Tanaka 田中 琢 爾 (upTEX project) upTEX – Unicode version of pTEX with CJK extensions Oct 26, 2013 25 / 42
with European languages / 欧文との親和性 Babel Babel ✎ ☞ English \usepackage[french,...]% October 26, 2013 {babel} Français ... 26 octobre 2013 \selectlanguage{english} Deutsch English ... \today 26. Oktober 2013 ... Czech \selectlanguage{russian} 26. října 2013 Русский ... \today Русский 26 октября 2013 г. \selectlanguage{japanese} 日本語 日本語 ... \today 2013 年 10 月 26 日 ✍ ✌ Takuji Tanaka 田中 琢 爾 (upTEX project) upTEX – Unicode version of pTEX with CJK extensions Oct 26, 2013 27 / 42
with European languages / 欧文との親和性 It’s a small world It’s a small world upTEX can treat CJK, Latin, Cyrillic and Greek. upTEX cannot directly treat Arabic, Brahmic, . . . Takuji Tanaka 田中 琢 爾 (upTEX project) upTEX – Unicode version of pTEX with CJK extensions Oct 26, 2013 28 / 42
Part III Imprementation / 実装 Takuji Tanaka 田中 琢 爾 (upTEX project) upTEX – Unicode version of pTEX with CJK extensions Oct 26, 2013 29 / 42
Recommend
More recommend