behavioral neural networks
play

Behavioral Neural Networks Shaowei Ke (UMich Econ) Chen Zhao (HKU - PowerPoint PPT Presentation

Behavioral Neural Networks Shaowei Ke (UMich Econ) Chen Zhao (HKU Econ) Zhaoran Wang (Northwestern IEMS) Sung-Lin Hsieh (UMich Econ) November 2020 Machine Learning Over the last 15 years, machine-learning models have performed well in many


  1. Behavioral Neural Networks Shaowei Ke (UMich Econ) Chen Zhao (HKU Econ) Zhaoran Wang (Northwestern IEMS) Sung-Lin Hsieh (UMich Econ) November 2020

  2. Machine Learning Over the last 15 years, machine-learning models have performed well in many decision problems I Product recommendation I Complex games: AlphaGo 2018 Turing Award (Bengio, Hinton, and LeCun): “conceptual and engineering breakthroughs that have made deep neural networks a critical component of computing”

  3. Questions I A statistical model that can predict well is not necessarily a good model of how people make decisions I E.g. insights about decision making may be lost in the approximation I But maybe some of these useful machine-learning models are indeed good models of how people make decisions? I If so, we may better understand and incorporate them into economics I If we have more choice data, it is highly likely that such machine learning models would outperform our traditional models in terms of prediction, and may even help us identify behavioral phenomena

  4. A Good Model of Decision-Making? E.g. the expected utility model 1. The model is characterized by reasonable axioms imposed directly on choice behavior 2. The model provides a plausible interpretation/story about how people make choices

  5. This Paper 1. We provide an axiomatic foundation for a class of neural-network models applied to decision making under risk, called the neural-network expected utility (NEU) models I The independence axiom is relaxed in a novel way consistent with experimental findings I The model provides a plausible interpretation of people’s choice behavior 2. We show that simple neural-network structures, referred to as behavioral neurons, can capture behavioral biases intuitively 3. By using these behavioral neurons, we find that some simple NEU models that are easy to interpret perform better than EU and CPT

  6. Neural-Network Expected Utility

  7. Choice Domain and Primitive Prizes: Z = { z 1 , . . . , z n } I Generic prizes: x , y , z � p 2 R n + : ∑ n The set of lotteries: L = i = 1 p i = 1 I Generic lotteries: p , q , r , s I Degenerate lotteries: d x Mixture: For any l 2 [ 0, 1 ] , l p + ( 1 � l ) q is a lottery such that ( l p + ( 1 � l ) q ) i = l p i + ( 1 � l ) q i I l pq : = l p + ( 1 � l ) q A decision maker has a binary relation/preference % on L

  8. Vector-Valued A ffi ne Function I t : R w ! R e w is a ffi ne if there exists w ⇥ e w matrix b and e w ⇥ 1 vector g such that t ( a ) = b a + g for any a 2 R w I t = ( t ( 1 ) , . . . , t ( e w ) ) is a ffi ne ) t ( j ) ’s are a ffi ne I A real-valued function on L is a ffi ne if and only if it is an expected utility function

  9. NEU Representation A function U : L ! R is a NEU function if there exist I h , w 0 , w 1 , . . . , w h + 1 2 N with w 0 = n and w h + 1 = 1 I q i : R w i ! R w i , i = 1, . . . , h , such that for any b 2 R w i , q i ( b ) = ( max { b 1 , 0 } , . . . , max { b w i , 0 } ) I a ffi ne t i : R w i � 1 ! R w i , i = 1, . . . , h + 1 , such that U ( p ) = t h + 1 � q h � t h � · · · � q 2 � t 2 � q 1 � t 1 ( p ) We say that % has a NEU representation if it can be represented by a NEU function

  10. NEU Representation First Hidden Layer Second Hidden Layer p 1 max { τ (1) max { τ (1) ( · ) , 0 } ( · ) , 0 } 1 2 p 2 U ( p ) max { τ (2) max { τ (2) ( · ) , 0 } ( · ) , 0 } 1 2 p 3 I A NEU function U ( p ) = t 3 � q 2 � t 2 � q 1 � t 1 ( p ) I i th hidden layer: q i � t i I Activation function: max {· , 0 } I Neuron: max { t ( j ) , 0 } i

  11. Interpretation First Hidden Layer Second Hidden Layer p 1 max { τ (1) max { τ (1) ( · ) , 0 } ( · ) , 0 } 1 2 p 2 U ( p ) max { τ (2) max { τ (2) ( · ) , 0 } ( · ) , 0 } 1 2 p 3 I The decision maker has multiple considerations toward uncertainty (expected utility functions in the first layer) I E.g., one for the mean of prizes and one for downside risk I She considers multiple ways to aggregate those attitudes plausible (a ffi ne functions in the second layer) I Recursively, she may continue to have multiple ways in mind to aggregate the aggregations from the previous layer

  12. Axiomatic Characterization

  13. Expected Utility Theory Axiom (Weak Order) % is complete and transitive. Axiom (Continuity) For any p , { q : p % q } and { q : q % p } are closed. Axiom (Independence) For any l 2 ( 0, 1 ) , p % q ) l pr % l qr and p � q ) l pr � l qr . I There are alternative ways to define independence Axiom (Bi-Independence) For any l 2 ( 0, 1 ) , if p % q , then r % s ) l pr % l qs and r � s ) l pr � l qs . I Let p = q : Bi-Independence ) Independence I Apply Independence twice, we get Bi-Independence

  14. Violations of (Bi-)Independence: The Allais Paradox First Pair Second Pair 87% $ 1M 87% $ 0 90% $ 0 100% $ 1M 3% $ 0 13% $ 1M 10% $ 1.5M 10% $ 1.5M 0.13 pr 0.13 qr 0.13 ps 0.13 qs I p = d 1M , q = 3 13 d 0 + 10 r = d 1M , s = d 0 13 d 1.5M , 1. Bias toward certainty 2. 0.13 qr must look su ffi ciently di ff erent from a risk-free lottery

  15. The Allais Paradox in a Nutshell (Literally) First Pair Second Pair $ 1M 98.7% $ 0.5M $ 0.5M 98.7% 99% $ 1M $ 0.5M 100% 0.3% $ 1M $ 1.5M 1.3% 1% $ 1.5M 1% 0.013 q 0 r 0.013 ps 0 0.013 q 0 s 0 0.013 pr q 0 = 3 s 0 = d 0.5M 13 d 0.5M + 10 I p = d 1M , 13 d 1.5M , r = d 1M , I It seems much less likely to observe significant violations of (Bi-)Independence

  16. Violations of (Bi-)Independence The di ff erence between lotteries needs to be large enough so that psychological e ff ects apply to lotteries asymmetrically I We want to stick to (Bi-)Independence as much as possible because of its normative appeal I But if (Bi-)Independence holds locally everywhere, it holds globally I Is there a (slightly) relaxed version of (Bi-)Independence that can hold locally everywhere but not globally?

  17. Relaxing Independence A subset L preserves independence with respect to p ( L ? p ) if for any q , r 2 L and l 2 ( 0, 1 ) , q % r ) l pq % l pr and q � r ) l pq � l pr I L may not be convex, and p , l pq , l pr may be outside L

  18. Relaxing Independence I A subset L preserves independence with respect to p if for any q , r 2 L and l 2 ( 0, 1 ) , q % r ) l pq % l pr and q � r ) l pq � l pr I A subset L ✓ L preserves independence if for any p , q , r 2 L and l 2 ( 0, 1 ) such that l pr , l qr 2 L , q % r ) l pq % l pr and q � r ) l pq � l pr

  19. Relaxing Independence I A neighborhood of p : an open convex set that contains p Axiom (Weak Local Independence) Every p 2 L has a neighborhood L p such that L p ? p . I Weak Local Independence does not mean that “independence holds locally around every p ”

  20. Weak Local Independence Axiom (Weak Local Independence) Every p 2 L has a neighborhood L p such that L p ? p . I Allows the following type of indi ff erence curves

  21. Relaxing Bi-Independence I Weak Local Independence only tells us the decision maker’s local choice behavior I Local versions of Bi-Independence can regulate the decision maker’s non-local choice behavior

  22. Relaxing Bi-Independence Axiom (Weak Local Bi-Independence) If p % q , then p and q have neighborhoods L p and L q such that for any r 2 L p , s 2 L q and l 2 ( 0, 1 ) , r % s ) l pr % l qs and r � s ) l pr � l qs . I When p = q , we obtain Weak Local Independence I Only impose bi-independence when mixing with p and q respectively I L p does not have to be the same for di ff erent q ’s q λ qs p λ pr r s

  23. Main Theorem Theorem % has a NEU representation if and only if it satisfies Weak Order, Continuity, and Weak Local Bi-Independence. I EU characterizes linear functions on L I NEU characterizes continuous finite piecewise linear functions on L

  24. Behavioral Neurons and Empirical Analysis

  25. NEU and the Certainty E ff ect The Allais paradox: the decision maker has a bias toward certainty ���������� V � ( p ) p 1 max { p 1 � − � 0 . 9 � , � 0 } p 2 U ( p ) max { p 2 � − � 0 . 9 � , � 0 } p 3 max { p 3 � − � 0 . 9 � , � 0 } I V is an expected utility function I If p i > 0.98 , a neuron that captures certainty e ff ect will be activated

  26. NEU and the Certainty E ff ect

  27. NEU and Reference Dependence Kahneman and Tversky (1979): prizes are evaluated relative to a reference point; people treat gains and losses di ff erently Ert and Erev (2013): the di ff erence becomes insignificant when prizes don’t deviate from the reference point by much I A neuron about expected utility: V ( p ) I A neuron about loss aversion relative to $ x with a threshold e : l min { ∑ p i min { z i � x , 0 } , e } (loss aversion coe ffi cient: l > 1 ) i | {z } a ffi ne in p I U ( p ) is the di ff erence between two neurons’ values I Violations of expected utility theory in the form of loss aversion only occur when losses (relative to the reference point) are significant

  28. Empirical Analysis I Can the NEU model explain and predict decision makers’ choice behavior well? I Can we do so with a NEU model that is not too complicated to interpret? I How does the NEU model compare to other economic models?

Recommend


More recommend