strong
play

Strong whose properties are known at compilation Type conversions - PowerPoint PPT Presentation

A language is strongly typed, if: Every data element has a unique type, Strong whose properties are known at compilation Type conversions take place in a controlled typing manner, by interpreting value of one type as another Not


  1. ● A language is strongly typed, if: – Every data element has a unique type, Strong whose properties are known at compilation – Type conversions take place in a controlled typing manner, by interpreting value of one type as another • Not by (mis)interpreting bits in memory! Other definitions for • Type conversions are checked at "strong typing": compilation • All type errors are ● Static typing reported (compile- or – Every variable has a definite type at runtime) compile time: • Operations that are Strong typing →! compile time typing incompatible are prevented 146

  2. ● Fortran (weak typing) – No type checks in parameter passing Typing in – Equivalence -expression ● Pascal (nearly strong typing) languages – Records that are not type-safe ● Ada (strong typing) – Recors are type-safe ● C and C++ (weak typing) – union is not type safe – (Implicit type casting) – Type conversion between incompatible types ● Java and C# (strong typing) 147

  3. ● Stronger typing (Ada) Strong typing Type – Explicit conversions allowed only in a limited sense conversions – exception: Unchecked_Conversion ● Weaker typing (C) – Compiler can make automatic type conversion • coercive (implicit) type conversions – Lack of consistency • When and what kind of automatic conversions cast: take place • Explicit conversion coersion: • When an explicit conversion is possible • Implicit conversion 152

  4. ● Scalar types ● Enumerated types Types in ● Subtypes languages ● Structured types (struct, classes, ...) – Formed using type constructions ● Pointer/reference types ● Set types ● Subroutine and function types – We will return to this in the section about functions ● Task types (Ada) – Concurrency issue 153

  5. ● Representing numeric types – Efficiency mandates internal representation Numeric – Variations in the representation makes portability harder types ● Twos complement for integers – 2 n-1 ..2 n-1 -1 e.g., 16 bits: -32768..32767 ● Floating points: (s * significand * 2^exponent) – Standard for representation • bigger significand/mantissa yields more precision • bigger exponent yields a larger range 1 8 23 ● Decimals (Cobol, C#) s exponent significand – Fixed numbers, fixed decimal point – Large accuracy, small value range 154

  6. ● Representing one character in a language Character ● Common operations: types – Equality, lexicografical ordering – I/O – Conversions to string type and back – Upper to lower case and back, etc. ● Representation in memory – ASCII (ISO-646) (7 bits), ISO-latin etc. 8-bit – Unicode codepoints (32 bit) (previously also 16-bit version) 155

  7. ● Size and interpretation – traditionally 8 bits (orig. 6-7), max. 256 chars Char-type – Not enough for non-English languages! issues – Solution 1: several encodings (iso-latin-X) – Solution 2: larger type (Unicode, 16- and 32- bit) ● What determines the encoding? – Sourcefile encoding (char- and string literals, identifiers) – Memory representation? ● Conversions with I/O 156

  8. Comparison ● – What determines alphabetical order, is < the Issues with same? – When are two characters equal? char types One or many chars? ● – Ligatures : ß, ij, Ľ, ŋ, œ, ä – upper/lower case problems: ß → SS – Many representations: ä (U+00E4) vs. ä = a¨ (U+0061, U+0308) Does the programming language support many ● character encodings? – Several types for each encoding: Trouble – One type: needs to be large enough (Unicode UCS4) – Downwards compatibility issues... 157

  9. ● Problem: The encoding of a text file is not usually given Source code ● Char/string literals vs identifiers encoding ● Attempted solutions: – Coding is set by the language(Java) – Coding determined by source code (Python) – Given in the command line (C++- compilers) – Not taken into account (C++, others) 158

  10. ● Traditional: list/array of characters ● Typical operations: String types – Indexing – Iteration – Splicing – Printing etc. ● How to encode? – Char type problems remain ● 32 bits quadruples memory consumption 159

  11. ● Standard characters: 137 439 (ver 11.0) (encoding allows 1 114 112) Unicode and ● Codes U+000000 – U+10FFFF for chars strings ● Part of the codes reserved for other purpose ● String encoding UTF-32 (UCS4) – String is a sequence of 32-bit chars (4 294 967 295 possible values) – Easy, compare with 8-bit strings – Lots of memory consumed, wasted bits 160

  12. UTF-16 ● – String consists of 16-bit pairs of bytes Unicode and – Unicode-symbols ≤ U+FFFF are coded strings directly – Rest are presented by surrogates, two 16-bit bytepairs – Surrogates are encoded so that first and second code cannot be confused – Consequence 1: A pair can immediately be checked if it is a character, a surrogate beginning or a surrogate ending. – Consequence 2: One symbol may be one or two pairs of bytes! 161

  13. UTF-8 ● – 8-bit bytes form a symbol Unicode and – Unicode-symbols ≤ U+7F directly (7-bit strings ASCII) – Rest up to ≤ U+7FF use 2 bytes (European chars) – Rest up to ≤ U+FFFF use 3 bytes (Asian chars) – Rest use 4 bytes (old languages, math symbols ...) – Consequence: One symbol is 1–4 bytes! – Multibyte symbols can be separated from 1- byte symbols and so on. – Consequence: bytes can be scanned until beginning of a symbol is found. 162

  14. ● Indexing and other interpretation (UTF-8 and UTF-16) Unicode- – Symbols differ in size, s[i] is not the i+1 th challenges symbol! and problems – String length and table size are not the same! – Indexing at the middle of a symbol – Replacing a symbol is complicated if of different size – → Encoding using strings is complicated – Unicode iterator: returns unicode symbols and moves correctly in a string 163

  15. ● More memory and cache (esp. UTF- 32) Unicode ● Alphabets challenges – Number-coded alphabetizing impossible ● Equality – Strings must be normalised before comparing ● Conversions are needed if multiple codings are supported 164

  16. ● Java – String coding UTF-16 Unicode in – Source code coding UTF-8 some ● Python languages – Strings "raw" (Python 2) or Unicode (2 & 3) – Source code can present what encoding is used ● C++ – C++03: not defined (wchar_t = UTF-32?) – C++11: support for UTF-8/16/32, but not full – Source encoding compiler-dependent 165

Recommend


More recommend