Data Mining and Matrices 01 – Introduction Rainer Gemulla, Pauli Miettinen April 18, 2013
Outline What is data mining? 1 What is a matrix? 2 Why data mining and matrices? 3 Summary 4 2 / 27
What is data mining? “Data mining is the process of discovering knowledge or patterns from massive amounts of data.” (Encyclopedia of Database Systems) Rock Gold Tools Miners Data Knowledge Software Analysts Estimated $100 billion industry around managing and analyzing data. 3 / 27 Data, Data everywhere. The Economist, 2010.
What is data mining? “Data mining is the process of discovering knowledge or patterns from massive amounts of data.” (Encyclopedia of Database Systems) Science ◮ The Sloan Digital Sky Survey gathered 140TB of information ◮ NASA Center for Climate Simulation stores 32PB of data ◮ 3B base pairs exist in the human genome ◮ LHC registers 600M particle collisions per second, 25PB/year Social data ◮ 1M customer transactions are performed at Walmart per hour ◮ 25M Netflix customers view and rate hundreds of thousands of movies ◮ 40B photos have been uploaded to Facebook ◮ 200M active Twitter users write 400M tweets per day ◮ 4.6B mobile-phone subscriptions worldwide Government, health care, news, stocks, books, web search, ... 4 / 27 Data, Data everywhere. The Economist, 2010.
What is data mining? “Data mining is the process of discovering knowledge or patterns from massive amounts of data.” (Encyclopedia of Database Systems) Prediction Outlier detection “Regnet es am Siebenschl¨ afertag, der Regen sieben Wochen nicht weichen mag.” Clustering (German folklore) Pattern mining 5 / 27
What is data mining? “Data mining is the process of discovering knowledge or patterns from massive amounts of data.” (Encyclopedia of Database Systems) Focus of this lecture Knowledge discovery pipeline 6 / 27
Womb mater (Latin) = mother matrix (Latin) = pregnant animal matrix (Late Latin) = womb also source , origin Since 1550s: place or medium where something is developed Since 1640s: embedding or enclosing mass 8 / 27 Online Etymology Dictionary
Rectangular arrays of numbers “Rectangular arrays” known in ancient China ( rod calculus , estimated as early as 300BC) 1 0 0 0 0 0 0 1 0 0 0 0 0 0 1 0 0 0 0 0 0 1 0 0 0 0 0 0 1 0 0 0 0 0 0 1 Term “matrix” coined by J.J. Sylvester in 1850 9 / 27
System of linear equations Systems of linear equations can be written as matrices 3 x + 2 y + z = 39 3 2 1 39 2 x + 3 y + z = 34 2 3 1 34 → 1 2 3 26 x + 2 y + 3 z = 26 and then be solved using linear algebra methods 3 2 1 39 x 9 . 25 = = 5 1 24 y 4 . 25 ⇒ 12 33 2 . 75 z 10 / 27
Set of data points x y − 3 . 84 − 2 . 21 4 − 3 . 33 − 2 . 19 − 2 . 55 − 1 . 47 − 2 . 46 − 1 . 25 − 1 . 49 − 0 . 76 ● 2 − 1 . 67 − 0 . 39 ● ● ● ● ● ● ● ● ● ●● − 1 . 3 − 0 . 59 ● ● ● ● ● ● ● ● . . ● ● ● . . ● ● ● . . y 0 ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● 1 . 59 0 . 78 ● ● 1 . 53 1 . 02 ● ● 1 . 45 1 . 26 − 2 ● ● 1 . 86 1 . 18 2 . 04 0 . 96 2 . 42 1 . 24 − 4 2 . 32 2 . 03 − 4 − 2 0 2 4 2 . 9 1 . 35 x 11 / 27
Linear maps Original data Linear maps from R 3 to R 4 f 1 ( x , y , z ) = 3 x + 2 y + z 2 ● ● ● ● ● ● ● ● ● ● ●● f 2 ( x , y , z ) = 2 x + 3 y + z ● ● ● ● ● ● ● ● ● ● ● ● ● ● y 0 ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● f 3 ( x , y , z ) = x + 2 y + 3 z ● ● − 2 ● ● f 4 ( x , y , z ) = x − 4 Linear map f 1 written as a matrix − 4 − 2 0 2 4 x x Rotated and stretched = f 1 ( x , y , z ) � � 3 2 1 y 4 z ● Linear map from R 3 to R 4 ● 2 ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● 3 2 1 f 1 ( x , y , z ) ● ● ● ● ● y 0 ● ● ● ● ● ● x ● ● ● ● ● ● ● ● ● ● 2 3 1 f 2 ( x , y , z ) ● ● ● ● ● ● = y ● − 2 ● 1 2 3 f 3 ( x , y , z ) z 1 0 0 f 4 ( x , y , z ) − 4 − 4 − 2 0 2 4 x 12 / 27
Graphs Adjacency matrix 13 / 27
Objects and attributes Anna, Bob, and Charlie went shopping Anna bought butter and bread Bob bought butter, bread, and beer Charlie bought bread and beer Bread Butter Beer Data Matrix Mining Anna 1 1 0 Book 1 5 0 3 Bob 1 1 1 Book 2 0 0 7 Charlie 0 1 1 Book 3 4 6 5 Customer transactions Document-term matrix Avatar The Matrix Up Jan Jun Sep Alice 4 2 Saarbr¨ ucken 1 11 10 Bob 3 2 Helsinki 6 . 5 10 . 9 8 . 7 Charlie 5 3 Cape Town 15 . 7 7 . 8 8 . 7 Incomplete rating matrix Cities and monthly temperatures Many different kinds of data fit this object-attribute viewpoint. 14 / 27
What is a matrix? A means to describe computation ◮ Rotation ◮ Rescaling ◮ Permutation Linear operators ◮ Projection ◮ · · · Attribute j A means to describe data Rows Columns Entries A 11 A 12 A 1 j Objects Attributes Values · · · · · · A 21 A 22 A 2 j Equations Variables Coefficients · · · · · · . . . ... ... Data points Axes Coordinates . . . . . . Vertices Vertices Edges Object i A i 1 A i 2 A ij · · · · · · . . . . . . . . . ... ... . . . . . . . . . In data mining, we make use of both viewpoints simultaneously. 15 / 27
Outline What is data mining? 1 What is a matrix? 2 Why data mining and matrices? 3 Summary 4 16 / 27
Key tool: Matrix decompositions A matrix decomposition of a data matrix D is given by three matrices L , M , R such that D = LMR , D ij = � k , k ′ L ik M kk ′ R k ′ j where k ′ k ′ D is an m × n data matrix, R ∗ j M R L is an m × r matrix, k M is an r × r matrix, R is an r × n matrix, and D ij L i ∗ r is an integer ≥ 1. k There are many different kinds of matrix L D decompositions, each putting certain con- straints on matrices L , M , R (which may not be easy to find). 17 / 27
Example: Singular value decomposition D 50 × 2 4 2 ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● y ● 0 ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● − 2 ● ● − 4 − 4 − 2 0 2 4 x L 50 × 2 M 2 × 2 R 2 × 2 1.0 R 2* 0.4 ● ● ● 0.5 R 1* ● 0.2 ● ● ● ● ● ● ● ● � 11 . 73 � ● ● 0 ● ● ● ● ● ● ● ● ● ● ● 0.0 ● 0.0 y ● ● ● y ● ● ● ● ● ● ● ● ● ● ● ● ● 0 1 . 71 ● − 0.2 ● ● ● ● ● ● − 0.5 ● − 0.4 − 1.0 − 0.4 − 0.2 0.0 0.2 0.4 − 1.0 − 0.5 0.0 0.5 1.0 x x 18 / 27
Example: Non-negative matrix factorization D ∗ j L R ∗ j LR ∗ j 19 / 27 Lee and Seung. Learning the parts of objects by non-negative matrix factorization. Nature, 1999.
Example: Latent Dirichlet allocation R ( L ) 20 / 27 Blei et al. Latent dirichlet allocation. JMLR, 2003.
Other matrix decompositions Singular value decomposition (SVD) k -means Non-negative matrix factorization (NMF) Semi-discrete decomposition (SDD) Boolean matrix decomposition (BMF) Independent component analysis (ICA) Matrix completion Probabilistic matrix factorization . . . 21 / 27
What can we do with matrix decompositions? Separate data from multiple processes Remove noise from the data Remove redundancy from the data Reveal latent structure and similarities in the data Fill in missing entries Find local patterns Reduce space consumption Reduce computational cost Aid visualization Matrix decompositions can make data mining algorithms more effective. They may also provide insight into the data by themselves. 22 / 27
Recommend
More recommend