Correspondence Analysis of Surveys with Conditioned and Multiple Response Questions Amaya Z´ arraga and Beatriz Goitisolo Department of Econometrics and Statistics. University of Basque Country. Spain • First • Prev • Next • Last • Go Back • Full Screen • Close • Quit
Contents 1 Introduction: Surveys with closed questions with a finite number of response categories 3 2 How to analyze surveys 7 3 Possible Solution: Creation of the CDT 9 3.1 Effects of forcing the creation of a CDT . . . . . . . . . . . . . . . . . . . . . . . . . . 10 4 Another possible solution: CA of the PDT 12 4.1 Problems Resulting from the Application of CA to the PDT: Effect on Distances . . . 13 5 Suggested Approach: CA of PDT with a modified marginal 16 5.1 Computation of Axes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17 6 Illustrative Example 19 7 Conclusions 24 • First • Prev • Next • Last • Go Back • Full Screen • Close • Quit
1. Introduction: Surveys with closed questions with a finite number of response categories 1. Multiple Choice Questions: individuals choose one and only one response category • Gender – Male – Female • Have you ever taken a course on computers? – Yes, in the last year – Yes, more than a year ago – No, never • Use of computers every day? – Yes – No • First • Prev • Next • Last • Go Back • Full Screen • Close • Quit
2. Multiple Response Questions: individuals can choose more than one category • Have you ever, even once, used the following? – Tobacco – Alcohol – Marijuana – Cocaine – Crack – Heroin – Hallucinogens – Inhalants – Pain Relievers – Tranquilizers – Stimulants – Sedatives • First • Prev • Next • Last • Go Back • Full Screen • Close • Quit
3. Conditioned Response Questions: individuals must answer a question or not depending on their answer to a previous one. • Use of computers every day? – Yes – No (go to question 16) • Purpose of computer use: Leisure – Yes – No • Purpose of computer use: Music – Yes – No • Purpose of computer use: Games – Yes – No • First • Prev • Next • Last • Go Back • Full Screen • Close • Quit
4. Conditioned Multiple Response Questions: • Is the number of children you have the desired one? – Yes (go to question 26) – No • Which of the following are the reasons of this discrepancy? – Desire to continue studying – Problems of health – Supposes loss of freedom . . . • First • Prev • Next • Last • Go Back • Full Screen • Close • Quit
2. How to analyze surveys ⇒ The study and visualization of the relationships among response categories 1. Multiple Choice: Classical analysis: MCA ⇒ Create Complete Disjunctive Table (CDT) coding as 0 (category of no chosen responses) and 1 (category of chosen response) ⇒ Create Burt’s Table Gender Course ... < 1 > 1 No i M F 1 1 0 1 0 0 1 0 0 Q 2 0 1 0 1 0 0 1 0 Q 3 1 0 0 0 1 1 0 0 Q . . . n n n n nQ � 1 1 value z ij = ∀ q ∈ Q 0 J q − 1 values z q = 1 ∀ q ∈ Q ∀ i ∈ I i z q = n ∀ q ∈ Q z i. = Q ∀ i ∈ I z = nQ • First • Prev • Next • Last • Go Back • Full Screen • Close • Quit
2. Multiple Choice, Multiple Response, Conditioned: Gender Course Drugs Computer Purpose Sedatives Tobacco Games Music < 1 > 1 No i M F ... Y N 1 1 0 1 0 0 1 ... 1 1 0 1 ... 0 ?= z i. 2 0 1 0 1 0 1 ... 0 0 1 0 ... 0 ?= z i ′ . 3 1 0 0 0 1 0 ... 1 0 1 0 ... 0 ? . . . n n ? n ? ? = = = z q z q ′ z � 0 J q values for some i and some q conditioned questions z ij = 1 J q values for some i and some q multiple response questions z q � = 1 for some i and some q i z q � = n for some q z i. � = Q for some i z � = nQ ⇒ Partial Disjunctive Table (PDT) • First • Prev • Next • Last • Go Back • Full Screen • Close • Quit
3. Possible Solution: Creation of the CDT ⇒ Advantage: MCA ⇒ For each response category (in MRQ) a new category that denies the previous one (fictitious or dummy category (D)) have to be created. ⇒ m original categories ⇒ m questions ⇒ 2m final categories Drugs Tobacco ... Sedatives i Y D Y D Y D 1 1 0 1 0 2 1 0 0 1 . . . ⇒ For conditioned questions by a previous one, a new category indicating Not required to answer (NRA) is created for each question. ⇒ For conditioned MRQ, both types of artificial category (D) and (NRA) have to be created for each original category. ⇒ m original categories ⇒ m questions ⇒ 3m final categories Gender ... Children C. Studying C. Health ... NRA NRA Yes Yes Yes No M D D i F 1 1 0 0 1 1 0 0 1 0 0 2 0 1 0 1 1 0 0 0 1 0 . . . 1 0 1 0 0 0 1 0 0 1 1 0 1 0 0 0 1 0 0 1 • First • Prev • Next • Last • Go Back • Full Screen • Close • Quit
3.1. Effects of forcing the creation of a CDT • Increase in the number of response categories ⇒ – Increase in the variability (inertia in terms of CA) – All the categories (originals + fictitious) contribute to the creation of factorial axes – Planes covered by points (complicating the interpretation) • Dummy categories may really fit to the negative of the original category but can also hide a desire of not to answer and/or ignorance of the response. Aim: study of the relationships among original categories • In the case ”pink k / m ” ( k < m ), ( m − k ) dummy categories which only represent the restriction of choosing k among the original m are created. • Dummy categories may have similar response patterns and even they can create the first fac- torial axes (case in conditioned questions). • First • Prev • Next • Last • Go Back • Full Screen • Close • Quit
Completed Disjunctive Table with Not Required to Answer (NRA) categories 1. Advantage: MCA 2. Disadvantage: could create the first axes Analysis of the CDT Factor 2 ( 10.47 %) Internet-No ILeisure-NRA ISchool-NRA IOther-NRA 1 IPHome-NRA IPSchool-NRA IPFriends-NRA CLeisure-No IPPublic-NRA CSchool-No IPCibercafe-NRA CPHome-No CPSchool-Yes Computer-Yes CPPublic-No CPcibercafe-No CPFriends-No CPSchool-No Mobile-No CPcibercafe-Yes COther-No CSchool-Yes CLeisure-Yes CPHome-Yes 0 COther-Yes CPFriends-Yes CPPublic-Yes Mobile-Yes 74.37% Factor 1 Internet-Yes ILeisure-Yes ILeisure-No 73.18% Factor 2 ISchool-Yes ISchool-No IOther-Yes IOther-No IPHome-Yes IPHome-No IPFriends-Yes IPFriends-No IPSchool-Yes IPSchool-No IPPublic-Yes IPPublic-No -1 IPCibercafe-Yes IPCibercafe-No Computer-No CLeisure-NRA CSchool-NRA COther-NRA CPHome-NRA CPSchool-NRA CPPublic-NRA CPFriends-NRA CPcibercafe-NRA -2 0 1 2 3 Factor 1 ( 86.13 %) • Survey on Equipment and Use of Information and Communication Technologies in the Home (Spanish Institute of Statistics, 2007) • Block: Use of computers and the Internet by children (aged 10 to 15) (18 questions) • First • Prev • Next • Last • Go Back • Full Screen • Close • Quit
4. Another possible solution: CA of the PDT Frequencies and Profiles Relative and marginal frequencies: p ij = z ij p ij = z i. p ij = z .j � � p i. = p .j = z z z j ∈J i ∈I Row profiles i, i ∈ I : p ij = z ij N ( I ) ⊂ R J ∀ j ∈ J ⇒ p i. z i. Column profiles j, j ∈ J : p ij = z ij N ( J ) ⊂ R n ∀ i ∈ I ⇒ p .j z .j • First • Prev • Next • Last • Go Back • Full Screen • Close • Quit
4.1. Problems Resulting from the Application of CA to the PDT: Effect on Distances In CA, similarity between any pair of row profiles and between any pair of column profiles is calculated by means of the χ 2 distance. The χ 2 distance between two row profiles i and i ′ : � p ij � 2 � z ij � 2 1 − p i ′ j z − z i ′ j � � d 2 ( i, i ′ ) = = p .j p i. p i ′ . z .j z i. z i ′ . j ∈J j ∈J In CDT: � 1 � 0 � 1 � 2 � 2 � 2 d 2 ( i = 1 , i ′ = 2) = nQ Q − 0 + nQ Q − 1 + · · · + nQ Q − 1 + . . . z .M Q z .F Q z .CY Q � �� � � �� � � �� � � =0 � =0 =0 In PDT , z 1 . � = z 2 . : � 1 � 0 � 1 � 2 � 2 � 2 z − 0 + z − 1 z − 1 d 2 ( i = 1 , i ′ = 2) = + · · · + + . . . z .M z 1 . z 2 . z .F z 1 . z 2 . z .CY z 1 . z 2 . � �� � � = 0 • First • Prev • Next • Last • Go Back • Full Screen • Close • Quit
The χ 2 distance between two column profiles j and j ′ : � p ij � 2 � z ij � 2 1 − p ij ′ z − z ij ′ � � d 2 ( j, j ′ ) = = p i. p .j p .j ′ z i. z .j z .j ′ i ∈I i ∈I In CDT z i. = Q ∀ q ∈ Q : � z ij � 2 nQ − z ij ′ � d 2 ( j, j ′ ) = Q z .j z .j ′ i ∈I In PDT , z i. � = z i ′ . : � z ij � 2 z − z ij ′ � d 2 ( j, j ′ ) = z i. z .j z .j ′ i ∈I • First • Prev • Next • Last • Go Back • Full Screen • Close • Quit
The χ 2 distance between column profile j and average profile : 1 ( p ij � 2 d 2 ( j, G J ) = − p i. ) p i. p .j i p ij = 1 ∀ i ∈ I p .j n In CDT p i. = 1 ∀ i : n � 1 � 2 n − 1 � d 2 ( j, G J ) = n = 0 n i In PDT , p i. = z i. z : � 1 � 2 z n − z i. � d 2 ( j, G J ) = � = 0 z i. z i • First • Prev • Next • Last • Go Back • Full Screen • Close • Quit
Recommend
More recommend