Motivation Stata Basics Datasets Summary Basics Regression Final Questions Stata Bootcamp - STAMP Denise Laroze 1 University of Essex Department of Government October 10, 2014 Denise Laroze University of Essex STAMP 1 of 30
Motivation Stata Basics Datasets Summary Basics Regression Final Questions Motivation i) What is STATA? ii) Why do we learn it? iii) What are students expected to learn from STAMP? Disclaimer As with any programming environment there are multiple ways of producing the same results. The options shown here are just one alternative, you may find that other options suit you more. Please, go out and find the best one for you. Denise Laroze University of Essex STAMP 2 of 30
Motivation Stata Basics Datasets Summary Basics Regression Final Questions Stata Basics Open Stata, understand the layout HELP!! i) ‘Help’ icon in Stata ii) UCLA Institute for Digital Research and Education iii) Online tutorials (e.g. Princeton) iv) Google? ‘log’ and ‘do’ files Denise Laroze University of Essex STAMP 3 of 30
Motivation Stata Basics Datasets Summary Basics Regression Final Questions Creating and Manipulating Data i) Generate data with certain characteristics using the gen command and a series of options (including, rnormal(), rbinomial(), runiform(), if , =, != ). Here the structure of the command works like this: gen variablename = something ii) Use existing data with the use and insheet commands iii) Manipulate data with the recode, replace, rename, label define, and label values commands Denise Laroze University of Essex STAMP 4 of 30
Motivation Stata Basics Datasets Summary Basics Regression Final Questions Tables and Graphs As in any statistical package, Stata allows you to summarise your data by producing graphs and tables. These can be as sophisticated as you are willing code or very simple. Open the ‘STAMP.do’ file Denise Laroze University of Essex STAMP 5 of 30
Motivation Stata Basics Datasets Summary Basics Regression Final Questions Exercise 1 - Generating and Using Data I 1 Generate a .do and a .log file 2 Generate a dataset with 1000 observations 3 Generate variable z1 from a normal distribution with mean=0 and sd=1. Create two other variables (z2 and z3) with different means and standard deviations. Then create a variable z4 that is the equal to z2*z3. Create a histogram of each of the variables and summarize the data and describe the differences and similarities between the variables you have just created. Denise Laroze University of Essex STAMP 6 of 30
Motivation Stata Basics Datasets Summary Basics Regression Final Questions Exercise 1 - Generating and Using Data II 4 Create a new variable ”Fun” with 4 categories (0-3). Recode category 3 and replace it with the value 99. After you have finished add labels to each of the categories. The first three should be activities that are fun for you and the last is a category for ”Don’t know”. 5 Why is recoding the ”Don’t know” category to 99 a problem? What should you use instead? 6 Create dummy variables out of the categorical variable and rename the each of the new variables according to what they represent. 7 Save your data as a .dta file and as a .csv. 8 close the ‘log’ file Denise Laroze University of Essex STAMP 7 of 30
Motivation Stata Basics Datasets Summary Basics Regression Final Questions Solutions Exercise 1 For the solutions to Excercise 1 look at the ‘solutionsEx1.do’ file Denise Laroze University of Essex STAMP 8 of 30
Motivation Stata Basics Datasets Summary Basics Regression Final Questions Datasets Quantitative research can take many shapes and forms, the only requirement is that it uses numbers. However , to conduct empirical research you have to understand what kind of data you are working with, there are several different types, for example: 1 Cross-Section 2 Time-Series 3 Time-Series Cross-Sections 4 Panel 5 Panel(wide) or Survival Denise Laroze University of Essex STAMP 9 of 30
Motivation Stata Basics Datasets Summary Basics Regression Final Questions Cross-Sectional Data Denise Laroze University of Essex STAMP 10 of 30
Motivation Stata Basics Datasets Summary Basics Regression Final Questions Time-Series Denise Laroze University of Essex STAMP 11 of 30
Motivation Stata Basics Datasets Summary Basics Regression Final Questions Panel Denise Laroze University of Essex STAMP 12 of 30
Motivation Stata Basics Datasets Summary Basics Regression Final Questions Survival Survival Data (or Panel (wide) for STATA) Denise Laroze University of Essex STAMP 13 of 30
Motivation Stata Basics Datasets Summary Basics Regression Final Questions Merging Data For most research projects you will need to combine (merge) datasets. (Sometimes you may even code data yourselves). You can do this by hand, for example copying and pasting in an excel file, but that becomes inefficient very quickly. Stata can do the merging for you, but only if you: i) Have a ‘merging’ or ‘identifying’ variable ii) The files you are merging have the same shape iii) Have the data in the same formate (e.g. a .dta file) Denise Laroze University of Essex STAMP 14 of 30
Motivation Stata Basics Datasets Summary Basics Regression Final Questions Stata Example - Datasets and Merging Let’s see the difference between the datasets in practice and test how the merging command works. Go on to the ‘STAMP.do’ file Denise Laroze University of Essex STAMP 15 of 30
Motivation Stata Basics Datasets Summary Basics Regression Final Questions Exercise 2 - Merging Real Datasets I For this exercise you will have to merge two datasets available on my webpage http://deniselaroze.wordpress.com/ and a third dataset of your choice, either from the world bank or the IMF. 1 To start look at the ‘STAMP merge.dta’ dataset. What type of dataset it is? What observations are included? How many observations does it have? 2 Now insheet the ‘Inflation IMF.csv’ dataset. What shape does it have? Can you merge it with the ‘STAMP merge.dta’? Why or why not? (Hint, to merge data it has to be in the same ‘shape’ and both files have to merge on the same criteria/variables, for example country id and year). 3 Once you have reshaped the data, merge both files. Denise Laroze University of Essex STAMP 16 of 30
Motivation Stata Basics Datasets Summary Basics Regression Final Questions Exercise 2 - Merging Real Datasets II 4 Go to the IMF or the World Bank databank and get a variable that you like. Download it, reshape it and merge it with the other datasets. 5 Choose two variables and create a scatter plot with an lfit line. 6 Obtain correlation coefficients of the variables in your new dataset ( pwcorr, var1 var2 ..., sig ). Denise Laroze University of Essex STAMP 17 of 30
Motivation Stata Basics Datasets Summary Basics Regression Final Questions Solutions Exercise 2 For the solutions to Exercise 2 look at the ‘solutionsEx2.do’ file Denise Laroze University of Essex STAMP 18 of 30
Motivation Stata Basics Datasets Summary Basics Regression Final Questions Trouble Shooting Once you’ve managed to get all the data into one dataset with the correct shape for your objectives, remember to check if the process has worked correctly. You may encounter more problems than you originally thought Potential problems You may have used 1:M or M:1 instead of 1:1, Merged a file that uses ‘,’ instead of ‘.’ and now its a string, Accidentally included data with letters or symbols (e.g. e − 8 ), and, again, the variable is a string, etc. There are many problems and even more solutions (as long as you check!!!) Denise Laroze University of Essex STAMP 19 of 30
Motivation Stata Basics Datasets Summary Basics Regression Final Questions Summary of Basics Up to this point you have learned how to: i) Introduce data (from different formats) into Stata ii) Reshape and create data iii) Created some basic graphs and tables Denise Laroze University of Essex STAMP 20 of 30
Motivation Stata Basics Datasets Summary Basics Regression Final Questions Important Commands Who can tell me what these commands do? i) gen vi) set obs ii) merge vii) replace iii) use viii) if iv) tab viv) histogram v) sum vv) rnormal(a,b) Denise Laroze University of Essex STAMP 21 of 30
Motivation Stata Basics Datasets Summary Basics Regression Final Questions Linear Regression - (O)LS A linear regression is a function of the type: y = β 0 + β 1 X + e The function has two components: 1 The systematic component : β 0 + β 1 X 2 The random component : e Denise Laroze University of Essex STAMP 22 of 30
Motivation Stata Basics Datasets Summary Basics Regression Final Questions (O)LS - to STATA I To implement a function like this in Stata, let’s look at the following example income = 1000 + 20 ∗ capabilities + 500 ∗ EssexMasters + luck It is composed of N elements 1 a β 0 constant = 1000 2 a β 1 constant = 20 3 a variable capabilities (our X ), let’s assume it has mean=10 and standard deviation =5 4 an EssexMasters dummy variable (another X ) 5 a luck variable (our e ), and 6 a predicted income variable (our y variable) Denise Laroze University of Essex STAMP 23 of 30
Recommend
More recommend