Grey Relational Analysis and Natural Language Processing Arjab Singh Khuman 1 Yingjie Yang 1 Sifeng Liu 2 1 Centre for Computational Intelligence De Montfort University Leicester, United Kingdom 2 College of Economics and Management Nanjing University of Aeronautics and Astronautics Nanjing, China September 2015
Outline for the Presentation 1 Introduction 2 Natural Language Processing 3 Grey Relational Analysis 4 Proposal 5 Observations 6 Conclusion A. S. Khuman (C.C.I.) The Leverhulme Trust September 2015 2 / 30
Introduction • We will investigate the validity of using Grey Relational Analysis for Natural Language Processing • Providing a theoretical overview from which further research can be undertaken • Describing what Grey Relational Analysis and Natural Language Processing entails • We look towards the use of Grey Incidence Analysis for inspection and quantification • Understanding the traditional use of Grey Incidence , allows one to better understand our intended use for Natural Language Processing • We describe the the varying components to our framework, highlighting problem areas and possible solutions • We conclude and suggestions of possible enhancements are put forward A. S. Khuman (C.C.I.) The Leverhulme Trust September 2015 3 / 30
Outline for the Presentation 1 Introduction 2 Natural Language Processing 3 Grey Relational Analysis 4 Proposal 5 Observations 6 Conclusion A. S. Khuman (C.C.I.) The Leverhulme Trust September 2015 4 / 30
Natural Language Preliminaries • Natural Language Processing is primarily concerned with the interaction between machines and human based linguistics • It has been a hot topic within Computer Science and Artificial Intelligence since the 1950s • It is an umbrella term, which encompasses many sub-domains, including Natural Language Understanding which is associated with deriving meaning and sentiment • There are many examples of experiments and programs that are associated with Natural Language Processing • The Georgetown experiment in 1954, where the automatic transformation of over 60 Russian sentences were converted into interpretable English equivalent sentences • The creation of ELIZA, a system which simulated a person-centred counseling client A. S. Khuman (C.C.I.) The Leverhulme Trust September 2015 5 / 30
Natural Language Preliminaries • The 1970s saw the introduction of conceptual ontologies , which associated itself with structuring real-world information into data that was machine understandable • The likes of MARGIE, SAM, PAM, POLITICS, all which are examples of conceptual ontology programs • The introduction chatterbots , programs that could interact with users and engage in menial conversation, at least to some extent • The likes of PARRY, a program written to simulate a paranoid schizophrenic • Racter, which was supposedly able to generate English language prose , short pieces of grammatically structured works, with rudimental natural flow • Jabberwacky, a chatterbot created to synthesize natural human chatter in an interesting, entertaining manner A. S. Khuman (C.C.I.) The Leverhulme Trust September 2015 6 / 30
Natural Language Preliminaries • Modern Natural Language Processing algorithms are based on machine learning, in particular statistical machine learning • Prior implementations of language-processing tasks typically involved the hard-coding of a large number of deterministic rules • Modern day machine learning algorithms are still firmly rooted in statistical inferencing • There are several different classes of machine learning which execute in similar ways; taking large sets of features that are obtained from the input data • The current trend is still very much to make use of statistical models , which allow for soft, probabilistic decisions based on attaching a weight to each identified input feature • There are certain characteristics that make it very applicable for Grey Theory A. S. Khuman (C.C.I.) The Leverhulme Trust September 2015 7 / 30
Outline for the Presentation 1 Introduction 2 Natural Language Processing 3 Grey Relational Analysis 4 Proposal 5 Observations 6 Conclusion A. S. Khuman (C.C.I.) The Leverhulme Trust September 2015 8 / 30
Grey Relational Analysis Preliminaries • Grey Relational Analysis falls under the remit of Grey Incidence Analysis , whereby the main ethos is to understand which factors of a system are more important than others • Establishing which factors can be identified as being favourable and equally, which factors are detrimental • By using a characteristic sequence, a sequence that represents an ideal of the system, then comparing it against behavioural factors to ascertain how much the sequences are alike, or how much the behaviour factors impact upon the characteristic sequence itself • This information can then be used in terms of identifying if more emphasis should be applied to a particular behaviour or not • Given that incidence analysis is mainly used for the inspection of a system, there is little to no literature regarding the use of incidence analysis for Natural Language Processing A. S. Khuman (C.C.I.) The Leverhulme Trust September 2015 9 / 30
Grey Relational Analysis Preliminaries • The characteristic sequences of a system Y 1 , Y 2 , . . . , Y n , against its behavioural factor sequences X 1 , X 2 , . . . , X m , all of which must be of the same magnitude • Γ = [ γ ij ], where each entry in the i th row of the matrix is the degree of grey incidence for the corresponding characteristic sequence Y i , and relevant behavioural factors X 1 , X 2 , . . . , X m • Each entry for the j th column is reference to the degrees of grey incidence for the characteristic sequences Y 1 , Y 2 , . . . , Y n and behavioural factors X m • For the inspection and analysis of the sequences, there are several variations of the degree of incidence one could employ... • However, we a merely concerned with the Absolute degree of grey incidence A. S. Khuman (C.C.I.) The Leverhulme Trust September 2015 10 / 30
Degrees of Grey Incidence Absolute degree of grey incidence Assume that X i and X j ∈ U are two sequences of data with the same magnitude, that are defined as the sum of the distances between two consecutive time points, whose zero starting points have already been computed: � n � n s i = ( X i − x i (1)) dt s j = ( X j − x j (1)) dt (1) 1 1 � n ( X 0 i − X 0 s i − s j = j ) dt (2) 1 • Which is associated with the absolute relationships that exist between characteristic sequences and their behaviours A. S. Khuman (C.C.I.) The Leverhulme Trust September 2015 11 / 30
Outline for the Presentation 1 Introduction 2 Natural Language Processing 3 Grey Relational Analysis 4 Proposal 5 Observations 6 Conclusion A. S. Khuman (C.C.I.) The Leverhulme Trust September 2015 12 / 30
The Concept • We are merely interested in the analysis of the sequences • Assume that you have a hard-wired linguistic sequence in the system, this may execute an associated command; this can be representative of a characteristic sequence • Also assume that a user input stream is presented to the system; a behavioural sequence , incidence analysis can be carried out to establish how similar or dissimilar the sequences are • If the returned coefficient surpasses a threshold value, the associated output command is executed • This harks back to the fact that the more recent Natural Language Processing algorithms make use of statistical based models • Allowing for soft, probabilistic decisions to be undertaken, with the advantage of expressing relative certainty to any number of possible answers rather than just one A. S. Khuman (C.C.I.) The Leverhulme Trust September 2015 13 / 30
The Concept • Multiple input streams could be compared to multiple target streams and compared accordingly in a pairwise manner to establish which input is better suited to which output • This is achieved is by the measurement of the metric spaces contained between the geometric curves of the sequences being compared • As the sequence themselves are made up of discretised data points, point wise comparisons can be made to garner the relative similarity between sequences • The use of the absolute degree of grey incidence gives the means of providing computation, returning a coefficient value of absoluteness • The value itself falls within the range of [0 , 1], the more similar the sequences are the closer to 1 the coefficient will be and vice-versa A. S. Khuman (C.C.I.) The Leverhulme Trust September 2015 14 / 30
Outline for the Presentation 1 Introduction 2 Natural Language Processing 3 Grey Relational Analysis 4 Proposal 5 Observations 6 Conclusion A. S. Khuman (C.C.I.) The Leverhulme Trust September 2015 15 / 30
Observations • We will present some of the core individual aspects that contribute to the framework • Small examples are demonstrated to further enhance the understanding of using such an approach • Also identified are the weak points and the assumptions that are placed upon the concept • Possible solutions to circumvent these weak areas an unrealistic assumptions are discussed • Some key application areas are described where real world applicability is feasible • The overall evaluation of the framework is also discussed, remarking on the individual aspects of the framework A. S. Khuman (C.C.I.) The Leverhulme Trust September 2015 16 / 30
Recommend
More recommend