Language Ted Dunning Kristinn Reykjavk University Languages - PowerPoint PPT Presentation

Oct 14, 2023 •26 likes •166 views

Statistical Identification of Language Ted Dunning Kristinn Reykjavk University Languages Hall Hello Hallo Hola Bonjour 2 Languages Hall

“Statistical Identification of Language” – Ted Dunning Kristinn Reykjavík University
Languages • 안녕하세요 • Halló • Hello • こんにちは • Hallo • 你好 • Hola • 你好 • Bonjour 2
Languages • Halló • 안녕하세요 – Íslenska • Korean • Hello • こんにちは – English • Hallo – Japanese – German • 你好 • Hola – Chinese (traditional) – Spanish • Bonjour • 你好 – French – Chinese (simplified) 3
Introduction • Statistical based program has been written which learns to distinguish between languages, e.g. Spanish, English, French – 100 words of code – Only needs a few thousand words of sample text in order to learn the language – Works very well with 92%+ accuracy and more accurate with a larger “learning text”. – Learning text implies a sample of text which the computer program can “tokenize” 4
Bayesian Method with Markov Probablity • Bayesian logic probablity, i.e. deciding which event is causing the observation by observing • Markov probability is analyzing past events to predict future events, i.e. weather systems. 5
Previous Work: Unique Letter Combinations • Enumerating a number of short sequences from text which are unique to a particular language • Drawback: Languages sometimes adobt words from other cultures, e.g. Geography, Movies, Names, etc.. 6
Previous Work: Common Words • Devise a list of commonly used words in a language. – English: the, of, to, and, a, in, is, it, you, “etc..” – German: der/die/das, und, sein, in, ein, zu, “etc..” – Spanish: el/la, de, que, y, a, en, un, ser, se, “etc..” • Drawback: not all langauge phrases contain these words. Difficult to tokenize a language such as Chinese and therefore impossible to implement this method. 7
Previous Work: N-gram counting with rank order • Ad hoc rank ordering of tokenized text. Or, comparing tokenized text to a large library of text from a source such as network news groups. • Drawback: Input had to be tokenized and the statistical rank order of text was dependant on longer text sizes, i.e. 4K or 700 words 8
Markov Method • The Markov model defines a random variables whose values are strings from an alphabet X, and where the probability of a particular string S is: • We are loooking at the sequence of characters in a learning text, but not considering language structure. 9
Bayesian Method • If we are choosing between A and B given an observation X, where we feel that we know how A or B might affect the distribution of X, we can use Bayes’ theorem. • looking for what happened before this current character. What is most porbable since this event already occured. 10
Summarised • This method reads from a learning text of a relatively small size. – Test results • Language: English and Spanish • Learning text: 10 training texts of size: 1000, 2000, 5000, 10,000, and 50,000 bytes length • Tests Texts: 100 different tests: 10, 20, 50, 100, and 500 bytes in length 11
Test Results 12
Why and Where? • Genetic sequence analyzers – Determining the species which a particular animal or plant, etc.. • Determining the origin of a language. – http://whatlanguageisthis.com/ 13
Questions 14

Recommend

Kerry Dunning, MHA, MSH, CPAR, RAC-CT Kerry Dunning LLC May 2017 Kerry Dunning has no

Kerry Dunning, MHA, MSH, CPAR, RAC-CT Kerry Dunning LLC May 2017 Kerry Dunning has no proprietary interest in any product, instrument, device, service, or material discussed during this learning event. This presentation was current at the time

318 views • 28 slides

Working Memory Training: Is it Strategic? Darren Dunning Rationale Our previous research

Working Memory Training: Is it Strategic? Darren Dunning Rationale Our previous research has showed that Cogmed working memory training improves performance on non-trained working memory tasks (Low working memory children Dunning et

304 views • 27 slides

Accurate Methods for the Statistics of Surprise and Coincidence Ted Dunning Computing Research

Accurate Methods for the Statistics of Surprise and Coincidence Ted Dunning Computing Research Laboratory New Mexico State University Las Cruces, NM 88003-0001 ABSTRACT Much work has been done on the statistical analysis of text. In some

472 views • 18 slides

Feature Extraction Tales from the missing manual Who Am I? Ted Dunning Apache Board Member (but

Feature Extraction Tales from the missing manual Who Am I? Ted Dunning Apache Board Member (but not for this talk) Early Drill Contributor, mentor to many projects tdunning@apache.org CTO @ MapR, HPE (how I got to talk to these users)

640 views • 62 slides

Ted Swem Ted Swem Ted Swem Common Loon reproductive effects now shown in New England and

Ted Swem Ted Swem Ted Swem Common Loon reproductive effects now shown in New England and eastern Canada A. Recent findings from a 10-year study indicate sig. relationship 0.60 between increasing Hg levels and: Chicks fledged / Territoril

1.68k views • 114 slides

Radiation Testing of Advanced Non-Volatile Memories Ted Wilcox ted.wilcox@nasa.gov NASA Goddard

Radiation Testing of Advanced Non-Volatile Memories Ted Wilcox ted.wilcox@nasa.gov NASA Goddard Space Flight Center 1 To be presented by Ted Wilcox at the 2019 NEPP Electronics Technology Workshop (ETW), NASA GSFC, Greenbelt, MD, June 17-20,

404 views • 25 slides

How to Design Ted-Worthy Presentation Slides (Black White Edition): How to Design Ted-Worthy

[PDF] How to Design Ted-Worthy Presentation Slides (Black White Edition): Presentation Design Principles from the Best... How to Design Ted-Worthy Presentation Slides (Black White Edition): How to Design Ted-Worthy Presentation Slides (Black

583 views • 3 slides

How to Design Ted-Worthy Presentation Slides (Black How to Design Ted-Worthy Presentation Slides

9RY1XPD3XJOK \ PDF ^ How to Design Ted-Worthy Presentation Slides (Black White Edition): Presentation Design Principles... How to Design Ted-Worthy Presentation Slides (Black How to Design Ted-Worthy Presentation Slides (Black White Edition):

443 views • 4 slides

How to Deliver a Great TED Talk Presentation Secrets of How to Deliver a Great TED Talk

RDFLOTL8DFV1 < Book # How to Deliver a Great TED Talk Presentation Secrets of the Worlds... How to Deliver a Great TED Talk Presentation Secrets of How to Deliver a Great TED Talk Presentation Secrets of the Worlds Best Speakers the Worlds

492 views • 3 slides

How to Design Ted-Worthy Presentation Slides (Black White Edition): How to Design Ted-Worthy

DCUIQTNBFIQ7 > Book // How to Design Ted-Worthy Presentation Slides (Black White Edition): Presentation Design Principles... How to Design Ted-Worthy Presentation Slides (Black White Edition): How to Design Ted-Worthy Presentation Slides (Black

433 views • 4 slides

Variability of an artificial tandem repeat Ted Pak HURS 2007 Variability of an artificial tandem

Variability of an artificial tandem repeat Ted Pak HURS 2007 Variability of an artificial tandem repeat Ted Pak HURS 2007 Variability of an artificial tandem repeat Ted Pak HURS 2007 Variability of an artificial tandem repeat Ted Pak HURS

460 views • 22 slides

PsyPhilProg Ted Neward Neward & Associates http://www.tedneward.com | ted@tedneward.com Who

PsyPhilProg Ted Neward Neward & Associates http://www.tedneward.com | ted@tedneward.com Who am I? Ted Neward, "Computational Philosopher" this means I like to wrestle with really interesting/hard questions that often have no

977 views • 69 slides

Storage Information Services Ted Hesselroth Fermilab Abhishek Singh Rana and Frank Wuerthwein UC

Ted Hesselroth USCMS T3 Meeting 2008-03-03 Storage Information Services Ted Hesselroth Fermilab Abhishek Singh Rana and Frank Wuerthwein UC San Diego Ted Hesselroth USCMS T3 Meeting 2008-03-03 Ted Hesselroth OSG Storage Forum 2009-07-01

325 views • 19 slides

Student Services Supporting students to progress and succeed in Higher Education Pip Dunning -

Student Services Student Services Supporting students to progress and succeed in Higher Education Pip Dunning - Head of Advice, Guidance and Customer Service Jayne Faraday - Head of Inclusion Student Services Extract from Edge Hill

427 views • 17 slides

DISCHARGES Presented By: Kerry Dunning, MHA, MSH, CPAR, RAC-CT A portion of these materials

POPULATION HEALTH & DISCHARGES Presented By: Kerry Dunning, MHA, MSH, CPAR, RAC-CT A portion of these materials were produced in partnership with the Iowa Department of Public Health for the Iowa Small Hospital Improvement Program (SHIP)

633 views • 30 slides

Group R14300 Digital Microfluidics Peter Dunning Paulina Klimkiewicz Matthew Partacz

Group R14300 Digital Microfluidics Peter Dunning Paulina Klimkiewicz Matthew Partacz Andrew Greeley Thomas Wossner Wunna Kyaw Problem Statement Need for point of care medical testing devices where access to conventional tests is

458 views • 34 slides

High-Dimensional Classification Methods for Sparse Signals and Their Applications in Gene

High-Dimensional Classification Methods for Sparse Signals and Their Applications in Gene Expression Data Dawit Tadesse, Ph.D. Department of Mathematical Sciences University of Cincinnati Biostatistics Epidemiology & Research Design

809 views • 69 slides

Sentiment Analysis and Movie Reviews By: Donovan Ambler Overview Problem Description

Sentiment Analysis and Movie Reviews By: Donovan Ambler Overview Problem Description Sentiment Analysis and Naive Bayes Experimental Design and Procedures Results Problem Description For this problem, I will be taking a

159 views • 11 slides

Variational inference, spin glasses, and TAP free energy Song Mei Stanford University September

Variational inference, spin glasses, and TAP free energy Song Mei Stanford University September 19, 2018 Joint work with Zhou Fan and Andrea Montanari Song Mei (Stanford University) TAP free energy September 19, 2018 1 / 29 General

995 views • 71 slides

TERM PROJECT Classifying Tweets Using Nave Bayes Classifier CSC 177 Spring 2020 Andrew

TERM PROJECT Classifying Tweets Using Nave Bayes Classifier CSC 177 Spring 2020 Andrew Flores, Hera Flores Agenda Demo Motivation Background Knowledge Lessons Learned Scope of the Project Future work Approach

609 views • 34 slides

Bayesian inference from all-sky SETI surveys Claudio Grimaldi C. Grimaldi, Sci. Rep. 7, 46273

Bayesian inference from all-sky SETI surveys Claudio Grimaldi C. Grimaldi, Sci. Rep. 7, 46273 (2017) C. Grimaldi, G.W. Marcy, N.K. Tellis, F. Drake, PASP 130, 054101 (2018) C. Grimaldi, G.W. Marcy, PNAS 115 , E9755 (2018) Bayes rule Bayes

510 views • 18 slides

Rank of tensors of l-out-of-k functions: an application in probabilistic inference Ji r

Rank of tensors of l-out-of-k functions: an application in probabilistic inference Ji r Vomlel Institute of Information Theory and Automation ( UTIA) Academy of Sciences of the Czech Republic Contents The computer game of

764 views • 59 slides

An Experimental Study of The Jury Voting Model with Ambiguous Information Simona Fabrizi Steffen

An Experimental Study of The Jury Voting Model with Ambiguous Information Simona Fabrizi Steffen Lippert Addison Pan University of Auckland DECIDE Workshop Auckland 6 July 2018 1 / 35 Small Group Decision Making Much real world

373 views • 35 slides

Multivariate verification: Motivation, Complexity, Examples A.Hense, A. R opnack, J. Keune, R.

Ein Root ifs Diff Ex Con Multivariate verification: Motivation, Complexity, Examples A.Hense, A. R opnack, J. Keune, R. Glowienka-Hense, S. Stolzenberger, H. Weinert Berlin, May, 5th 2017 + Ein Root ifs Diff Ex Con Motivations

584 views • 29 slides

Language Ted Dunning Kristinn Reykjavk University Languages - PowerPoint PPT Presentation

Statistical Identification of Language Ted Dunning Kristinn Reykjavk University Languages Hall Hello Hallo Hola Bonjour 2 Languages Hall

Kerry Dunning, MHA, MSH, CPAR, RAC-CT Kerry Dunning LLC May 2017 Kerry Dunning has no

Working Memory Training: Is it Strategic? Darren Dunning Rationale Our previous research

Accurate Methods for the Statistics of Surprise and Coincidence Ted Dunning Computing Research

Feature Extraction Tales from the missing manual Who Am I? Ted Dunning Apache Board Member (but

Ted Swem Ted Swem Ted Swem Common Loon reproductive effects now shown in New England and

Radiation Testing of Advanced Non-Volatile Memories Ted Wilcox ted.wilcox@nasa.gov NASA Goddard

How to Design Ted-Worthy Presentation Slides (Black White Edition): How to Design Ted-Worthy

How to Design Ted-Worthy Presentation Slides (Black How to Design Ted-Worthy Presentation Slides

How to Deliver a Great TED Talk Presentation Secrets of How to Deliver a Great TED Talk

How to Design Ted-Worthy Presentation Slides (Black White Edition): How to Design Ted-Worthy

Variability of an artificial tandem repeat Ted Pak HURS 2007 Variability of an artificial tandem

PsyPhilProg Ted Neward Neward &amp; Associates http://www.tedneward.com | ted@tedneward.com Who

Storage Information Services Ted Hesselroth Fermilab Abhishek Singh Rana and Frank Wuerthwein UC

Student Services Supporting students to progress and succeed in Higher Education Pip Dunning -

DISCHARGES Presented By: Kerry Dunning, MHA, MSH, CPAR, RAC-CT A portion of these materials

Group R14300 Digital Microfluidics Peter Dunning Paulina Klimkiewicz Matthew Partacz

High-Dimensional Classification Methods for Sparse Signals and Their Applications in Gene

Sentiment Analysis and Movie Reviews By: Donovan Ambler Overview Problem Description

Variational inference, spin glasses, and TAP free energy Song Mei Stanford University September

TERM PROJECT Classifying Tweets Using Nave Bayes Classifier CSC 177 Spring 2020 Andrew

Bayesian inference from all-sky SETI surveys Claudio Grimaldi C. Grimaldi, Sci. Rep. 7, 46273

Rank of tensors of l-out-of-k functions: an application in probabilistic inference Ji r

An Experimental Study of The Jury Voting Model with Ambiguous Information Simona Fabrizi Steffen

Multivariate verification: Motivation, Complexity, Examples A.Hense, A. R opnack, J. Keune, R.

PsyPhilProg Ted Neward Neward & Associates http://www.tedneward.com | ted@tedneward.com Who