modeling
play

Modeling What exactly is the problem, the expected benefit? project - PowerPoint PPT Presentation

Modeling What exactly is the problem, the expected benefit? project understanding How would a solution look like? What is known about the domain? revise objective What data do we have available? Is the data relevant to the problem? data


  1. Modeling What exactly is the problem, the expected benefit? project understanding How would a solution look like? What is known about the domain? revise objective What data do we have available? Is the data relevant to the problem? data understanding Is it valid? Does it reflect our expectations? Is the data quality, quantity, recency sufficient? partially does data no cancel project suit problem? yes Which data should we concentrate on? data preparation How is the data best transformed for modeling? How may we increase the data quality? What kind of model architecture suits the problem best? What is the best technique/method to get the model? modeling How good does the model perform technically? technical quality revise objective improvable? likely unlikely How good is the model in terms of project requirements? evaluation What have we learned from the project? business objective partially close project achieved? no success How is the model best deployed? deployment How do we know that the model is still valid? Compendium slides for “Guide to Intelligent Data Analysis”, Springer 2011. 1 / 50 � Michael R. Berthold, Christian Borgelt, Frank H¨ c oppner, Frank Klawonn and Iris Ad¨ a

  2. Modeling What exactly is the problem, the expected benefit? project understanding How would a solution look like? What is known about the domain? revise objective What data do we have available? Is the data relevant to the problem? data understanding Is it valid? Does it reflect our expectations? Is the data quality, quantity, recency sufficient? partially does data no cancel project suit problem? yes Which data should we concentrate on? data preparation How is the data best transformed for modeling? How may we increase the data quality? What kind of model architecture suits the problem best? What is the best technique/method to get the model? modeling How good does the model perform technically? technical quality revise objective improvable? likely unlikely How good is the model in terms of project requirements? evaluation What have we learned from the project? business objective partially close project achieved? no success How is the model best deployed? deployment How do we know that the model is still valid? Compendium slides for “Guide to Intelligent Data Analysis”, Springer 2011. 1 / 50 � Michael R. Berthold, Christian Borgelt, Frank H¨ c oppner, Frank Klawonn and Iris Ad¨ a

  3. The Four Steps of Modeling Select the model class General structure of the analysis result ”Architecture” or ”model class” Example: Linear or quadratic functions for regression problem Compendium slides for “Guide to Intelligent Data Analysis”, Springer 2011. 2 / 50 � Michael R. Berthold, Christian Borgelt, Frank H¨ c oppner, Frank Klawonn and Iris Ad¨ a

  4. The Four Steps of Modeling Select the model class General structure of the analysis result ”Architecture” or ”model class” Example: Linear or quadratic functions for regression problem Select the score function Evaluate possible ”models” using a score function Compendium slides for “Guide to Intelligent Data Analysis”, Springer 2011. 2 / 50 � Michael R. Berthold, Christian Borgelt, Frank H¨ c oppner, Frank Klawonn and Iris Ad¨ a

  5. The Four Steps of Modeling Select the model class General structure of the analysis result ”Architecture” or ”model class” Example: Linear or quadratic functions for regression problem Select the score function Evaluate possible ”models” using a score function Apply the algorithm Compare models through the score function But: How do we find the models? Compendium slides for “Guide to Intelligent Data Analysis”, Springer 2011. 2 / 50 � Michael R. Berthold, Christian Borgelt, Frank H¨ c oppner, Frank Klawonn and Iris Ad¨ a

  6. The Four Steps of Modeling Select the model class General structure of the analysis result ”Architecture” or ”model class” Example: Linear or quadratic functions for regression problem Select the score function Evaluate possible ”models” using a score function Apply the algorithm Compare models through the score function But: How do we find the models? Validate the results We know: Best model among the chose ones But: Is this the best among very good or very bad choices? Compendium slides for “Guide to Intelligent Data Analysis”, Springer 2011. 2 / 50 � Michael R. Berthold, Christian Borgelt, Frank H¨ c oppner, Frank Klawonn and Iris Ad¨ a

  7. Model class? Model = The form or structure of the analysis result Here the parameters are not defined only the type is selected Examples: Linear models ( y = ax + b ) Constant values (e.g. mean) Rule based models (if A buys product one , then weather is sunny ) Compendium slides for “Guide to Intelligent Data Analysis”, Springer 2011. 3 / 50 � Michael R. Berthold, Christian Borgelt, Frank H¨ c oppner, Frank Klawonn and Iris Ad¨ a

  8. Model class - Requirements Simplicity Occam’s razor : Choose the simplest model that still ”explains” the data. Or : Numquam ponenda est pluralitas sine necessitate = [Plurality must never be posited without necessity] easier to understand lower complexity avoid overfitting(see Slide 21 ff.) Compendium slides for “Guide to Intelligent Data Analysis”, Springer 2011. 4 / 50 � Michael R. Berthold, Christian Borgelt, Frank H¨ c oppner, Frank Klawonn and Iris Ad¨ a

  9. Model class - Requirements Simplicity Occam’s razor : Choose the simplest model that still ”explains” the data. Or : Numquam ponenda est pluralitas sine necessitate = [Plurality must never be posited without necessity] easier to understand lower complexity avoid overfitting(see Slide 21 ff.) Interpretability Black-Boxes are mostly not a proper choice But: They can result in a very good accuracy(e.g. neural networks) Compendium slides for “Guide to Intelligent Data Analysis”, Springer 2011. 4 / 50 � Michael R. Berthold, Christian Borgelt, Frank H¨ c oppner, Frank Klawonn and Iris Ad¨ a

  10. Global vs. local models Global models provide a (not necessarily good) description for the whole data set. Example: Regression line Local models or patterns provide a description for only a part or subset of the data set. Example: Association rules Compendium slides for “Guide to Intelligent Data Analysis”, Springer 2011. 5 / 50 � Michael R. Berthold, Christian Borgelt, Frank H¨ c oppner, Frank Klawonn and Iris Ad¨ a

  11. Fitting Criteria and Score Function find an objective function f : M → I R Which, evaluates the quality of your model In order to detect the ”best” model Compendium slides for “Guide to Intelligent Data Analysis”, Springer 2011. 6 / 50 � Michael R. Berthold, Christian Borgelt, Frank H¨ c oppner, Frank Klawonn and Iris Ad¨ a

  12. Fitting Criteria and Score Function find an objective function f : M → I R Which, evaluates the quality of your model In order to detect the ”best” model Example R m and ”model” M : I R m → I R m Given: Dataset D = { d 1 , d 2 , ...d n } ∈ I ( M predicts a value for a given data point). Compendium slides for “Guide to Intelligent Data Analysis”, Springer 2011. 6 / 50 � Michael R. Berthold, Christian Borgelt, Frank H¨ c oppner, Frank Klawonn and Iris Ad¨ a

  13. Fitting Criteria and Score Function find an objective function f : M → I R Which, evaluates the quality of your model In order to detect the ”best” model Example R m and ”model” M : I R m → I R m Given: Dataset D = { d 1 , d 2 , ...d n } ∈ I ( M predicts a value for a given data point). � n i =1 ( x − M ( x )) 2 Mean squared error : f ( x ) = 1 n Compendium slides for “Guide to Intelligent Data Analysis”, Springer 2011. 6 / 50 � Michael R. Berthold, Christian Borgelt, Frank H¨ c oppner, Frank Klawonn and Iris Ad¨ a

  14. Fitting Criteria and Score Function find an objective function f : M → I R Which, evaluates the quality of your model In order to detect the ”best” model Example R m and ”model” M : I R m → I R m Given: Dataset D = { d 1 , d 2 , ...d n } ∈ I ( M predicts a value for a given data point). � n i =1 ( x − M ( x )) 2 Mean squared error : f ( x ) = 1 n � n Mean absolute error : f ( x ) = 1 i =1 | x − M ( x ) | n Compendium slides for “Guide to Intelligent Data Analysis”, Springer 2011. 6 / 50 � Michael R. Berthold, Christian Borgelt, Frank H¨ c oppner, Frank Klawonn and Iris Ad¨ a

  15. Short comment : What is classification? Example Imagine a cup factory, which wants to classify their cups as good or broken. Compendium slides for “Guide to Intelligent Data Analysis”, Springer 2011. 7 / 50 � Michael R. Berthold, Christian Borgelt, Frank H¨ c oppner, Frank Klawonn and Iris Ad¨ a

  16. Error functions for classification problems How to set up an error function for those classification problems? Very common misclassification rate = # wrong classified # total classified A low misclassification rate does not necessarily tell anything about the quality of a classifier. when classes are unbalanced (e.g. When 99% of the production are ok, a classifier always predicting ok will have a misclassification rate of 1%.) Compendium slides for “Guide to Intelligent Data Analysis”, Springer 2011. 8 / 50 � Michael R. Berthold, Christian Borgelt, Frank H¨ c oppner, Frank Klawonn and Iris Ad¨ a

Recommend


More recommend