Transmogrification: The Magic of Feature Engineering Leah McGuire and Mayukh Bhaowal
ML algorithms take center stage in AI Modeling Raw Data Feature Engineering Bottleneck
Mythical Numeric Matrix X 1 X 2 X 3 X 4 X 5 Y 0 1 0 0 0 A 1 1 1 0 0 B 0 0 1 1 0 B 1 1 1 1 1 A 1 0 1 0 0 A
Use the data types
Automatic Feature Engineering Numeric Categorical Text Temporal Spatial Imputation Imputation Augment with Tokenization Track null value external data e.g avg Time difference Track null value income Hash Encoding One Hot Encoding Circular Statistics Log transformation for Spatial fraudulent Tf-Idf Dynamic Top K pivot large range behavior e.g: Time extraction (day, impossible travel Word2Vec week, month, year) Smart Binning Scaling - zNormalize speed Sentiment Analysis Closeness to major LabelCount Encoding Smart Binning Geo-encoding events Language Detection Category Embedding
Transmogrification val featureVector = Seq ( age , phone , email , subject , zipCode ).transmogrify()
Impact on Feature Engineering Email Phone Age Subject Zipcode Top Email Is Average Top 10 Email Country Phone Age Age Age TF-IDF Spammy Domain Income Code Is Valid [0-15] [15-35] [>35] Terms Vector
The Black Swan of Perfectly Interpretable Models Leah McGuire, Mayukh Bhaowal
Roadmap for this talk Local Global (full (record How to model) level) Complications explain solutions solutions of feature What does it your engineering Why mean to model? Interpretability explain vs accuracy explain your your tradeoff model? model?
Roadmap for this talk Local Global (full (record How to model) level) Complications explain solutions solutions of feature What does it your engineering Why mean to model? Interpretability explain vs accuracy explain your your tradeoff model? model?
The Question Why did the machine learning model make the decision that it did?
Translation #1 How do I fix this model? — Data Scientist
Translation #2 Do we have our bases covered, in case of a regulatory audit? — Legal Counsel
Translation #3 Does Einstein know what I know? How do I use this prediction? — Non Technical End User
P 1 (c | f) Input Output P k (c | f) Σ P n (c | f)
Model Insights Report
Roadmap for this talk Local Global (full (record How to model) level) Complications explain solutions solutions of feature What does it your engineering Why mean to model? Interpretability explain vs accuracy explain your your tradeoff model? model?
Debuggability Top contributing features for surviving the Titanic: 1. Gender 2. pClass 3. Body F1
Trust How can you trust a man that wears both a belt and suspenders? Man can't even trust his own pants.
Right Human Machine Wrong
Bias
Legal
Black defendant has higher risk scores https://www.propublica.org/article/how-we-analyzed-the-compas-recidivism-algorithm
Actionable
Roadmap for this talk Local Global (full (record How to model) level) Complications explain solutions solutions of feature What does it your engineering Why mean to model? Interpretability explain vs accuracy explain your your tradeoff model? model?
It’s complicated
Does the consumer care about how Are the raw features affect the Does the consumer Can you use a features fed into model or just feature care about individual simple model? the model insights? predictions? interpretable? Feature Impact Secondary Model Feature Weights/ Model Agnostic Global Importance Global Global Feature Impact Feature Weights/ Secondary Model Model Agnostic Importance Local Local Local
Roadmap for this talk Local Global (full (record How to model) level) Complications explain solutions solutions of feature What does it your engineering Why mean to model? Interpretability explain vs accuracy explain your your tradeoff model? model?
The best model or the model you can explain?
Roadmap for this talk Local Global (full (record How to model) level) Complications explain solutions solutions of feature What does it your engineering Why mean to model? Interpretability explain vs accuracy explain your your tradeoff model? model?
Where did you get the feature matrix? X 1 X 2 X 3 X 4 X 5 Y 0 1 0 0 0 A 1 1 1 0 0 B 0 0 1 1 0 B 1 1 1 1 1 A 1 0 1 0 0 A
Feature Engineering Email Phone Age Subject Zipcode Top Email Is Average Top 10 Email Country Phone Age Age Age TF-IDF Spammy Domain Income Code Is Valid [0-15] [15-35] [>35] Terms Vector
Metadata!!! ● The name of the feature the column was made from ● The name of the RAW feature(s) the column was made from ● Everything you did to get the column ● Any grouping information across columns https://ontotext.com/knowledgehub/fundamentals/metadata-fundamental/ ● Description of the value in the column
Roadmap for this talk Local Global (full (record How to model) level) Complications explain solutions solutions of feature What does it your engineering Why mean to model? Interpretability explain vs accuracy explain your your tradeoff model? model?
Interpretability: Global vs Local
Does the consumer care about how Are the raw features affect the Does the consumer Can you use a features fed into model or just feature care about individual simple model? the model insights? predictions? interpretable? Feature Impact Secondary Model Feature Weights/ Model Agnostic Global Importance Global Global
Feature Weight / Importance (Global)
Predict House Price
Predict Titanic Passenger Survival
P 1 (c | f) Input Output P k (c | f) Σ P n (c | f)
Feature Impact (Global - the hard way) X X 2 X 3 X 4 X 5 Y 0 1 0 0 0 A 1 1 1 0 0 B 0 0 1 1 0 B 1 1 1 1 1 A 1 0 1 0 0 A
Feature Impact (Global - the hard way)
Issues with Feature Importance / Weight / Impact (Global) http://resources.esri.com/help/9.3/arcgisengine/java/gp_toolref/spatial_statistics_toolbox/multicollinearity.htm
Secondary Model Prediction Input Explanation
Secondary Model (Global)
Secondary Model (Global) https://www.statmethods.net/advgraphs/images/corrgram1.png
What we do: ● All the metadata about how you got the feature ● Correlation ● Mutual information ● Feature weight / importance ● Feature distribution
{ "featureName" : "sex", What we do: "derivedFeatures" : [ { "stagesApplied" : [ "pivotText_OpSetVectorizer" ], "derivedFeatureValue" : "Male", "corr" : -0.5185045877245239, "mutualInformation" : 0.19652543270839468, "contribution" : 0.1763534388489181, …. }, { "stagesApplied" : [ "pivotText_OpSetVectorizer" ], "derivedFeatureValue" : "Female", "corr" : 0.518504587724524, "mutualInformation" : 0.19652543270839468, "contribution" : 0.18080355705344647, …. } }
Roadmap for this talk Local Global (full (record How to model) level) Complications explain solutions solutions of feature What does it your engineering Why mean to model? Interpretability explain vs accuracy explain your your tradeoff model? model?
Does the consumer care about how Are the raw features affect the Does the consumer Can you use a features fed into model or just feature care about individual simple model? the model insights? predictions? interpretable? Feature Impact Feature Weights/ Secondary Model Model Agnostic Importance Local Local Local
Feature Weight (Local)
Predict House Price 852 2 1 36
Feature Weight (Local)
Feature Impact (LOCO) {"age":17.0, "embarked":"C", "name":"Attalah, Miss. Malake", "pClass":"3", "parch":"0", "sex":"female", "sibSp":"0", "survived":0.0, "ticket":"2627"} Score = 0.62 Why? sex = "female" (+0.13), pClass = 3 (-0.05), ... https://www.oreilly.com/ideas/ideas-on-interpreting-machine-learning
Secondary Model (LIME) https://www.oreilly.com/ideas/ideas-on-interpreting-machine-learning
Secondary Model (Correlation) Norm (feature) * Corr https://www.oreilly.com/ideas/ideas-on-interpreting-machine-learning
What we do: ● Use case determines LOCO or correlation ● Use case determines what level of features we show
Recommend
More recommend