Decision-making Bias in Instance Matching Model Selection Mayank Kejriwal, Daniel P . Miranker Acknowledgements: US National Science Foundation, Microsoft Research
Instance Matching 50+ year old Artificial Intelligence problem When do two entities refer to the same underlying entity? “Record linkage: making maximum use of the discriminating power of identifying information.” Newcombe and Kennedy (1962) 2 Numerous surveys by Winkler (2006), Rahm et al. (2010) etc.
Machine learning Classifier example: feedforward multilayer perceptron (MLP) 3 “Machine Learning: an artificial intelligence approach.” Michalski, Carbonell and Mitchell (2013)
Supervised machine learning Requires a (manually) labeled set for both training and validation Typically acquired through sampling a ground-truth Training: Classifier parameters (e.g. edge weights of MLP) Validation: Classifier hyperparameters (e.g. number of layers, nodes, learning rate...) Also requires model selection decisions: Which training algorithm? What sampling technique? How to split the data for training/validation? Not obvious 4 “Machine Learning: an artificial intelligence approach.” Michalski, Carbonell and Mitchell (2013)
Model Selection Exercise What percentage of labeled data should I use for training and what percentage for validation? 5 “Machine Learning: an artificial intelligence approach.” Michalski, Carbonell and Mitchell (2013)
What do other people do? Most common approach in the literature is a ten-fold split (and less often, two-fold) What if I care more about one performance metric (say recall, versus precision) within reasonable constraints? What if I have sampled and labeled a lot of data (say 90% of the estimated ground-truth?) Should answers to these questions (and others) bias my decision? 6 “Semi - supervised instance matching using boosted classifiers.” Kejriwal and Miranker (2015)
Let’s do an experiment Results for the Amazon-GoogleProducts benchmark, using MLP Labeled Data (as Precision Recall percentage of ground-truth) 10% 54.13% 25.77% Ten-fold split 50% 61.51% 28.77% 90% 73.27% 27.69% 10% 45.47% 35.64% 50% 55.50% 34.92% Two-fold split 90% 66.67% 36.92% Consistent results across two other benchmarks, and 7 several experimental controls...
Concluding the exercise What if I care more about recall than precision? I should choose a two-fold split (unlike what the literature would suggest) What if I have sampled and labeled a lot of data(say 90% of the estimated ground-truth?) An irrelevant concern, once the metric is specified Takeaway: Some model selection decisions can bias other model selection decisions, not always in an obvious way 8
How do we make informed model selection decisions? 9
Decision-making and Model Selection Cognitive psychology has shown (empirically) that human beings are neither logical nor rational Wason Selection Task “Reasoning about a rule.” Wason (1968) “ The logic of social exchange: Has natural selection shaped how humans reason? Studies with the Wason selection task .” Cosmides (1989) Prospect Theory (awarded the 2002 Nobel Prize for Economics) “Propsect theory: an analysis of decision under risk.” Kahneman and Tversky (1979) 10
One systematic method is to start by... Visualizing decision-making biases through capturing influences between decisions Decision Performance Training/ Metric Validation split Labeling budget Computational resources 11
Concise approach: bipartite graphs The interpretation of the nodes and edges is abstract (we don’t impose strict requirements) Performance Training/ Metric Validation split Labeling Node of influence budget Computational resources 12 “Bipartite graphs and their applications.” Asratian et al. (1998)
Hypothesizing about biases The art in model selection: are there edges we should consider removing/adding? In the paper, we form at least four hypotheses that directly translate to recommendations Performance Training/ Metric Validation split Labeling budget Computational resources 13
14
Experimental platform Collected over 25 GB of data on the Microsoft Azure ML platform Used three publicly available benchmarks 15
Efficiency Recommendation 1 Validation is usually much faster than training, especially for expressive classifiers Run-time reductions of almost 70% with proportionally less loss in effectiveness Recommendation: consider favoring more validation over training if speed is an important concern 16
Efficiency Recommendation 2 Validation is usually much faster than training, especially for expressive classifiers Grid search is no more effective than random search for default hyperparameter values Mean difference less than 0.99% and not statistically significant Recommendation: Favor random search in your hyperparameter optimization as it is much faster (over 90% run-time decrease) 17
Concluding notes Hard problems (e.g. instance matching) require an ingenious combination of heuristics, biases and models Understanding decision-making biases can help us do better model selection Can also help to identify experimental confounds! There are many proposals to visualize decision-making, but not decision-making bias We proposed a bipartite graph as a good candidate The visualization is not just a pedantic exercise About 25 GB of data shows that it can also be useful Many future directions! https://sites.google.com/a/utexas.edu/mayank- kejriwal/projects/semantics-and-model-selection 18 kejriwalresearch.azurewebsites.net
What biases go into your model selection process? 19
Recommend
More recommend