0. MACHINE LEARNING Liviu Ciortuz Department of CS, University of Ia¸ si, Romˆ ania
1. What is Machine Learning? • ML studies algorithms that improve with experience. � �� � learn from Tom Mitchell’s Definition of the [ general ] learning problem : “A computer program is said to learn from experience E with respect to some class of tasks T and performance measure P , if its performance on tasks in T , as measured by P , improves with experience E .” • Examples of [specific] learning problems (see next slide) • [ Liviu Ciortuz: ] ML is data-driven programming • [ Liviu Ciortuz: ] ML gathers a number of well-defined sub- domains/disciplines, each one of them aiming to solve in its own way the above-formulated [ general ] learning problem .
2. What is Machine Learning good for? • natural language (text & speech) processing • genetic sequence analysis • robotics • customer (financial risc) evaluation • terrorist threat detection • compiler optimisation • semantic web • computer security • software engineering • computer vision (image processing) • etc.
3. Related courses at FII • Genetic Algorithms • Artificial Neural Networks • Probabilistic programming • Special Chapters of Machine Learning • Special Chapters of Artificial Neural Networks • Data Mining • Nature-inspired computing methods • Big Data Analytics • Image Processing • Computer Vision ◦ Bioinformatics
4. A multi-domain view Algorithms Mathematics Artificial Database Intelligence Systems (concept learning) (Knowledge Discovery Machine Data in Databases) Learning Mining Pattern Statistical Learning Recognition Statistics Engineering (model fitting)
5. The Machine Learning Undergraduate Course: Plan 0. Introduction to Machine Learning (T. Mitchell, ch. 1) 1. Probabilities Revision (Ch. Manning & H. Sch¨ utze, ch. 2) 2. Decision Trees (T. Mitchell, ch. 3) 3. Bayesian Learning (T. Mitchell, ch. 6) [ and the relationship with Logistic Regression ] 4. Instance-based Learning (T. Mitchell, ch. 8) 5. Clustering Algorithms (Ch. Manning & H. Sch¨ utze, ch. 14)
6. The Machine Learning Master Course: Tentative Plan 1. Probabilities Revision (Ch. Manning & H. Sch¨ utze, ch. 2) 2. Parameter estimation for probablistic distributions (see Estimating Probabilities , additional chapter to T. Mitchell’s book, 2016) 3. Decision Trees: Boosting 4. Gaussian Bayesian Learning 5. The EM algorithmic schemata (T. Mitchell, ch. 6.12) 6. Support Vector Machines (N. Cristianini & J. Shawe-Taylor, 2000) 7. Hidden Markov Models (Ch. Manning & H. Sch¨ utze, ch. 9) 8. Computational Learning Theory (T. Mitchell, ch. 7)
7. Bibliography 0. “Exercit ¸ii de ˆ ınv˘ at ¸are automat˘ a” L. Ciortuz, A. Munteanu E. B˘ ad˘ ar˘ au. Ia¸ si, Romania, 2020 www.info.uaic.ro/ ∼ ciortuz/ML.ex-book/book.pdf 1. “Machine Learning” Tom Mitchell. McGraw-Hill, 1997 2. “The Elements of Statistical Learning” Trevor Hastie, Robert Tibshirani, Jerome Friedman. Springer, 2nd ed. 2009 3. “Machine Learning – A Probabilistic Perspective” Kevin Murphy, MIT Press, 2012 4. “Pattern Recognition and Machine Learning” Christopher Bishop. Springer, 2006 5. “Foundations of Statistical Natural Language Processing” Christopher Manning, Hinrich Sch¨ utze. MIT Press, 2002
8. A general schema for machine learning methods test/generalization data training machine learning data data algorithm model predicted classification “We are drawning in information but starved for knowledge.” John Naisbitt, “Megatrends” book, 1982
9. Basic ML Terminology 1. instance x , instance set X concept c ⊆ X , or c : X → { 0 , 1 } example (labeled instance): � x, c ( x ) � ; positive examples, neg. examples 2. hypotheses h : X → { 0 , 1 } hypotheses representation language hypotheses set H hypotheses consistent with the concept c : h ( x ) = c ( x ) , ∀ example � x, c ( x ) � version space 3. learning = train + test supervised learning (classification), unsupervised learning (clustering) 4. error h = | { x ∈ X, h ( x ) � = c ( x ) } | training error, test error accuracy, precision, recall 5. validation set, development set n -fold cross-validation, leave-one-out cross-validation overfitting
10. The Inductive Learning Assumption Any hypothesis found to conveniently approximate the target function over a sufficiently large set of training examples will also conveniently approximate the target function over other unobserved examples.
11. Inductive Bias Consider • a concept learning algorithm L • the instances X , and the target concept c • the training examples D c = {� x, c ( x ) �} . • Let L ( x i , D c ) denote the classification assigned to the instance x i by L after training on data D c . Definition : The inductive bias of L is any minimal set of assertions B such that ( ∀ x i ∈ X )[( B ∨ D c ∨ x i ) ⊢ L ( x i , D c )] for any target concept c and corresponding training examples D c . ( A ⊢ B means A logically entails B )
12. Inductive systems can be modelled by equivalent deductive systems
13. Evaluation measures in Machine Learning tp + tn accuracy: Acc = tp + tn + fp + fn h c tp precision: P = tp + fp tp fn tp fp recall (or: sensitivity): R = tp + fn F = 2 P × R F-measure: tn P+R tn specificity: Sp = tn + fp − true positives tp fp follout: = tn + fp fp − false positives − true negatives tn Mathew’s Correlation Coefficient: fn − false negatives tp × tn − fp × fn MCC = � (tp + fp) × (tn + fn) × (tp + fn) × (tn + fp)
14. Lazy learning vs. eager learning algorithms Eager: generalize before seeing query ◦ ID3, Backpropagation, Naive Bayes, Radial basis function net- works, . . . • Must create global approximation Lazy: wait for query before generalizing ◦ k -Nearest Neighbor, Locally weighted regression, Case based rea- soning • Can create many local approximations Does it matter? If they use the same hypothesis space H , lazy learners can represent more complex functions. E.g., a lazy Backpropagation algorithm can learn a NN which is dif- ferent for each query point, compared to the eager version of Back- propagation.
15. Who is Liviu Ciortuz? • Diploma (maths and CS) from UAIC, Ia¸ si, Romania, 1985 PhD in CS from Universit´ e de Lille, France, 1996 • programmer: Bac˘ au, Romania (1985-1987) • full-time researcher: Germany (DFKI, Saarbr¨ ucken, 1997-2001), UK (Univ. of York and Univ. of Aberystwyth, 2001-2003), France (INRIA, Rennes, 2012-2013) • assistant, lecturer and then associate professor: Univ. of Iasi, Romania (1990-1997, 2003-2012, 2013-today)
16. ADDENDA “...colleagues at the Computer Science department at Saarland University have a strong conviction, that nothing is as practical as a good theory.” Reinhard Wilhelm, quoted by Cristian Calude, in The Human Face of Computing , Imperial College Press, 2016
17. “Mathematics translates concepts into formalisms and applies those formalisms to derive insights that are usually NOT amenable to a LESS formal analysis.” J¨ urgen Jost, Mathematical Concepts , Springer, 2015
18. “Mathematics is a journey that must be shared, and by sharing our own journey with others, we, together, can change the world.” “Through the power of mathematics, we can explore the uncertain, the counterintuitive, the invisible; we can reveal order and beauty, and at times transform theories into practi- cal objects, things or solutions that you can feel, touch or use.” Cedric Villani, winner of the Fields prize, 2010 cf. http://www.bbc.com/future/sponsored/story/20170216-inside-the-mind-of-a-mathematician , 15.03.2017 xxx
19. ADMINISTRATIVIA
20. Teaching assistants for the ML undergraduate course 2020 (fall semester) • Conf. dr. Anca Ignat ( . . . Image processing) https://profs.info.uaic.ro/ ∼ ancai/ML/ • Conf. dr. Adrian Z˘ alinescu ( . . . Probabilities and Statistics) https://profs.info.uaic.ro/ ∼ adrian.zalinescu/ML.html • Sebastian Ciobanu (PhD student) www.seminarul.ml • S ¸tefan Pant ¸iru (MSc) • S ¸tefan Matcovici (MSc) • Cosmina Asofiei (MSc)
21. Grading standards for the ML undergraduate course 2020 Obiectiv: ˆ Inv˘ at ¸are pe tot parcursul semestrului! Punctaj T1 S1 S2 T2 Seminar: 10p Seminar: 10p Test: 6p Test: 6p Minim: 2p Minim: 2p Minim: 1.25p Minim: 1.25p Prezenta la curs: recomandata! Prezenta la seminar: obligatorie! Penalizare: 0.2p pentru fiecare absenta de la a doua incolo! Nota = (8 + S1 + S2 + T1 + T2) / 4 Pentru promovare: Nota >= 4.5 <=> S1 + S2 + T1 + T2 >= 10
Recommend
More recommend