An end-to end solution for using unicode with Blaise to support any - PowerPoint PPT Presentation

An end-to end solution for using unicode with Blaise to support any language • Alerk Amin (CentERdata) • Richard Boreham (NatCen) • Maurice Martens (CentERdata) • Colin Miceli (NatCen)

Overview • Understanding Society � Language requirements • Solution � Translation of questionnaire � Questionnaire setup � Unicode inside Blaise � UNITIP • Demonstration • Interviewer feedback

Survey Design • Design � 40,000 households per wave � 32.5 min CAPI for ALL adults (16+) � Longitudinal • 1000 adult interviews with each of the 5 groups � Indian, Pakistani, Bangladeshi, Black-African, Black- Caribbean • Also eligible for interview � Sri Lankan, Chinese, Far Eastern, Middle Eastern & Iranian, Turkish, North-African, Asian-African

Language requirements • 9 main languages are translated � Welsh � Bengali, Gujarati, Punjabi (Urdu), Punjabi (Gurmukhi), Urdu � Arabic, Cantonese, Somali • Multiple languages in a household • Right-to-left languages • Showcards

Solution overview Paper based English Qure translations LMU MBG UNITIP

Translation process • Translate • Proof-read • Check • Query & retranslate • Adjudicate • Sign-off

Questionnaire setup • Tags � MvEver (DE_MvEver) • Types � Standard types at datamodel level � Non-translated types in different file • Textfills � Separate procedure for each textfill He/she varies according to context

Textfill Procedure PROCEDURE txtLACal PARAMETERS IMPORT imLACSx: TBoyGirl EXPORT exSFLCal_TFGender2: STRING RULES IF (imLACSx=Boy) THEN exSFLACal_TFGender2:=‘he’ {SFLACal_TFGender[1]} ELSEIF (imLACSx=Girl) THEN exSFLACal_TFGender2:=‘she’ {SFLACal_TFGender[2]} ENDIF ENDPROCEDURE

Translated Textfill Procedure RULES IF (imLACSx=Boy) THEN IF UT.CurrentLanguage=L1 THEN exSFLACal_TFGender2:=‘he’ {SFLACal_TFGender[1]} ELSEIF UT.CurrentLanguage=L2 THEN exSFLACal_TFGender2:=‘ayay’ {SFLACal_TFGender[1]} …

Unicode • UTF-8 � 1-4 bytes for every character � Diacritics are added to base character ় ী ক = ক + ◌ ◌় + ◌ ◌ী = 9 bytes • Backwards compatible with ASCII � 1 byte Latin characters are the same � No special characters codes (space, newline) in extra bytes

Using UTF-8 in Blaise • Variable names are Latin characters • Strings in Blaise are Extended ASCII • Replace the ASCII strings with UTF-8 � Text processing works perfectly BLA -> BMI, fills � Rendering does not work Blaise Editor, DEP • Questionnaire modification � String Length Latin strings are 1 byte per character Other languages can be 1-4 bytes String variables must be 2-4x larger

“No” in Bengali • না = ন + ◌ ◌া = “na” + “aa” • UTF-8 � ন = E0 A6 A9 � ◌ ◌া = E0 A6 BE • Extended ASCII � E0 = à � A6 = ¦ � A9 = ¨ � BE = ¾ • TYesNo = … No (2) "No" "à¦¨à¦¾" � This is how Blaise renders the string

UNITIP • Similar functionality as Blaise DEP • All visual controls support UTF-8 • Blaise API � Blaise processes all texts as though they are ASCII Variable names (for fills) are all Latin characters � UNITIP gets the strings from Blaise, and renders it correctly as UTF-8 Question Text Answer Categories

Demonstration

Interviewer feedback • Translation pilot � Bi-lingual interviewers � Interviewer/translator pairs � Bengali and Punjabi Urdu (RTL) • Timing of translated interview was not much longer than English interview • Data entry was not as fast as Blaise DEP • Increase quality of interview and responses • Toggling between translation and English was helpful and used extensively • Improvement over old paper-based system

Questions

An end-to end solution for using unicode with Blaise to support any - PowerPoint PPT Presentation

An end-to end solution for using unicode with Blaise to support any language Alerk Amin (CentERdata) Richard Boreham (NatCen) Maurice Martens (CentERdata) Colin Miceli (NatCen) Overview Understanding Society Language

Blaise NG Key issues in current system Language enhancements Layout Unicode

Application Development and Tools: Making Your Solution Accessible Blaise Hanagami Overview

What is Unicode? Universal Character Set All of the major scripts Sinhala Unicode

Implementing Computer-Audio Recorded Interviewing (CARI) Using Blaise 4 8 2 Using Blaise 4.8.2

Converting to Blaise 4.8 for CATI and CAWI Surveys Presentation at the 12 th International Blaise

Web Application Stress Testing with Blaise Internet with Blaise Internet Jim OReilly Jim O

Unicode Security Considerations (TR#36) Michel Suignard Senior Program Manager, Microsoft

Blaise Editing Rules + Relational Data Storage = Best of Both Worlds? Presented at the Blaise

Blaise 5 in a Production Environment Authors: Paul Segal, Mangal Subramanian, Ray Snowden,

Specification Guidelines for Blaise Survey Instruments Jim Hagerman The University of Michigan

A New Tool for Visualizing Blaise Logic Jason Ostergren, Rhonda Ash 15th International Blaise

Converting to Blaise 5 Experiences and Lessons Learned Introduction Initial focus for

Application Development Strategies (Building the Solution) Blaise Hanagami Mililani High School

Unicode BCP47 Extensions Mark Davis http://goo.gl/owbBk Unicode Locale/Lang ID BCP47

Using Blaise 4.8 for Census Coverage Measurement 12 th International Blaise Users Conference

Unicode Agenda for Bangla Unicode Agenda for Bangla Unicode Agenda for Bangla Unicode Agenda for

Software for the world: latest developments in Unicode and CLDR Mark Davis President &

Blaise Source Code Blaise Source Code Editing System Presenter: Danilo Gutierrez C Co-author:

Unicode Introduction Ken Zook November, 2006 1 Unicode properties 0041;LATIN CAPITAL LETTER

Unicode Character Code A character is the smallest possible component of a tex t (e.g., A,

Unicode 4.0 In Common Unicode 4.0 In Common Lisp Lisp Adoption of Unicode In CLforJava

Blaise Testing Blaise Testing M Margaret Tang t T Statistics Canada IBUC 2007 IBUC 2007

Blaise Internet 4.8.4 Load and Performance Testing Lane Masterton Assistant Statistician

Dynamic List Building Using Blaise Barbara S. Bibb & Gilbert Rodriguez September 26 28,

An end-to end solution for using unicode with Blaise to support any - PowerPoint PPT Presentation

An end-to end solution for using unicode with Blaise to support any language Alerk Amin (CentERdata) Richard Boreham (NatCen) Maurice Martens (CentERdata) Colin Miceli (NatCen) Overview Understanding Society Language

Blaise NG Key issues in current system Language enhancements Layout Unicode

Application Development and Tools: Making Your Solution Accessible Blaise Hanagami Overview

What is Unicode? Universal Character Set All of the major scripts Sinhala Unicode

Implementing Computer-Audio Recorded Interviewing (CARI) Using Blaise 4 8 2 Using Blaise 4.8.2

Converting to Blaise 4.8 for CATI and CAWI Surveys Presentation at the 12 th International Blaise

Web Application Stress Testing with Blaise Internet with Blaise Internet Jim OReilly Jim O

Unicode Security Considerations (TR#36) Michel Suignard Senior Program Manager, Microsoft

Blaise Editing Rules + Relational Data Storage = Best of Both Worlds? Presented at the Blaise

Blaise 5 in a Production Environment Authors: Paul Segal, Mangal Subramanian, Ray Snowden,

Specification Guidelines for Blaise Survey Instruments Jim Hagerman The University of Michigan

A New Tool for Visualizing Blaise Logic Jason Ostergren, Rhonda Ash 15th International Blaise

Converting to Blaise 5 Experiences and Lessons Learned Introduction Initial focus for

Application Development Strategies (Building the Solution) Blaise Hanagami Mililani High School

Unicode BCP47 Extensions Mark Davis http://goo.gl/owbBk Unicode Locale/Lang ID BCP47

Using Blaise 4.8 for Census Coverage Measurement 12 th International Blaise Users Conference

Unicode Agenda for Bangla Unicode Agenda for Bangla Unicode Agenda for Bangla Unicode Agenda for

Software for the world: latest developments in Unicode and CLDR Mark Davis President &amp;

Blaise Source Code Blaise Source Code Editing System Presenter: Danilo Gutierrez C Co-author:

Unicode Introduction Ken Zook November, 2006 1 Unicode properties 0041;LATIN CAPITAL LETTER

Unicode Character Code A character is the smallest possible component of a tex t (e.g., A,

Unicode 4.0 In Common Unicode 4.0 In Common Lisp Lisp Adoption of Unicode In CLforJava

Blaise Testing Blaise Testing M Margaret Tang t T Statistics Canada IBUC 2007 IBUC 2007

Blaise Internet 4.8.4 Load and Performance Testing Lane Masterton Assistant Statistician

Dynamic List Building Using Blaise Barbara S. Bibb &amp; Gilbert Rodriguez September 26 28,

Software for the world: latest developments in Unicode and CLDR Mark Davis President &

Dynamic List Building Using Blaise Barbara S. Bibb & Gilbert Rodriguez September 26 28,