Outline 1. (Review) Install Python and some libraries 2. Download Template File Tutorial: Market Simulator 3. Create a ‘market simulator’ that builds a porHolio, analyze it, computes expected return. 1. Create an analyzer: • Edit the analysis.py file 2. Create a market simulator on your own • Your Simulator will use funcQons from analysis.py which is [Project 1] a warm-up project. Installa;on: Mac InstallaQon: Fundamentals 1) InstrucQon that the instructor used: Step 1: Install your python plaHorm a) installed anaconda (got required packages) a): Install Anaconda h_ps://www.conQnuum.io/downloads (2.7) Step 2 (later) : Install Market Simulator Templates. includes, sci.py, num.py, and matplotlib • Read Data : Read Stock Data from a CSV File and input . It needs SciPy — so: it into a pandas DataFrame Note: The Anaconda python distribu;on includes – Pandas.DataFrame * NumPy, Pandas, SciPy, Matplotlib, and Python, and over 250 more packages available via a – Pands.read_csv simple “conda install <packagename>” It also has an IDE. • Select Subsets of Data: Select desired rows and Instructor got 2.7, and the anaconda distribuQon of columns python – Indexing and slicing data To get the appropriate so^ware you’ll need: python (scripQng ‘programming’ language) – Gotchas: Label-based slicing convenQon sci.py (numerical rouQnes), num.py (matrices, linear algebra), and • Generate Useful Plots : Visual data by generaQng plots matplotlib (enables generaQng plots of data) – Plogng Installing Python (2.7) via Anaconda: – Pandas.DataFrame.Plot Anaconda instruc;on site including lots of libraries with python. – Matplot.pyplot.plot h_ps://docs.conQnuum.io/anaconda/install
Goal • Scrape S&P 500 Qcker list and industry sectors from list of • Go from RAW data (adjusted close prices in S&P 500 companies on Wikipedia (code provided). a .csv file) all the way to visualizaQon – h_ps://en.wikipedia.org/wiki/List_of_S%26P_500_companies • Download daily close data for each industry sector from Yahoo finance – using pandas DataReader. • Build a sample PorHolio (in lecture by hand): • Look at measure s of the performance of a porHolio (project 1). We will use the first measure for project 1. – Sharp ra;o (in class) – Treynor raQo – Jensen’s alpha First Something Familiar: Weather Comma Separated Values (.CSV) Data • CSV File • .csv Comma Separated Values of weather • Header Files condiQons from Oct 2009 to Aug 2017 • Lines/Rows of • Town of Cary, North Carolina Dates – Temperature, pressure, humidity, … lets see • Each Element is separated by – Import as “text data” columns • Next … stock data. • Shi^-ctrl-down h_ps://catalog.data.gov/dataset?res_format=CSV&tags=weather
What is in a Historical Stock Data File ? What is in a Historical Stock Data File? a) # of employees a) # of employees b) Date/Time b) Date/Time c) Company Name (does not change over Qme) c) Company Name d) Price of the Stock d) Price of the Stock e) Company’s Hometown (does not change over Qme) e) Company’s Hometown h_ps://finance.yahoo.com/quote/GOOG/history?ltr=1 Comma Separated Values (.CSV) Stock Data Files • Date • Stock Data from • Open – price stock opens at in the morning, it is Yahoo Finance first price in the day. • CSV file pulled by • High – highest price in the day panda’s (later) • Low – lowest price in the day DataReader() • Close – closing price at 4 PM. • Volume – how many shares traded all together on that day. • Adjusted Close – accounts for splits/and dividends – encapsulates the increase in value if you hold stock for a long Qme (later). h_p://www.investopedia.com/terms/a/adjusted_closing_price.asp
h_ps://finance.yahoo.com/quote/GOOG/history?ltr=1 h_ps://finance.yahoo.com/quote/IBM/history GOOG.csv (from Yahoo). • Adjusted Close – adjusts / accounts for stocks • Newer dates on top, older descending. splits and dividend payments. • On the Current Day – Adjusted Close and Close are always the same. • Previous Days: – But as we go back in Qme start they to differ they are not always the same. – Actual Return is not captured by the closing price, need to use adjusted close on historical data. Pandas: Included in Anaconda Store PorHolio in a Panda Data Frame • Want: <Symbols> vs Time • h_ps://en.wikipedia.org/wiki/Pandas_(so^ware) • Includes a set of equiQes (ownership) • Developed by Wes McKinney while at AQR – Exchange Traded Fund (ETF) symbols Capital Management to analyze financial data – SPY 500 • Tracks the index S&P 500 Index. – Open Source. – Russell 1000 – AAPL – apple – Numerical Tables and Time Series – GOOG – Google – A Key Element : Data Frames – Other: securiQes (government) • NaN • Slicing • hXps://en.wikipedia.org/wiki/ Google – Panel Data – Ini;al public offering (IPO) - Qme August 19, 2004.
h_ps://finance.yahoo.com/quote/GOOG/history?ltr=1 Exercises Warm-up : Reading into a Data frame Exercise 1. • InteracQvely • Read in the enQre CSV file in a funcQon – Import pandas – Print it out. – Rename it to pd • Read it in. Exercise 2. • First column is index • Read in the enQre file in a funcQon helping you to access rows. – Print out a selecQon of file • Top 5 lines : .head() • SPY, AAPL, GOOG, • Bo_om 5 lines: .tail() GLD def -- Make it a funcQon • Only print top 5 line of data frame – print df.head() • Only print bo_om 5 lines of data frame – print df.tail() Print out a subset of columns, and/or rows: • Slicing : Only print rows between index 10, 20 (not inclusive) • simple-frame.py – print df[10:21] – EnQre frame – print df[:21] – Try: prin;ng - df.head(), df.tail() – print df[['Date','High']].values[5] • Ques;on : Print last 5 lines?
Compute Max Closing Price ComputaQon on CVS File get_max_close( symbol ) • From the file, find out maximum closing price. 1. Read the file into a data frame Now - SPY.csv • Later – any symbol. • 2. Process the Column ‘Close’ 3. Use pandas funcQon .max() to return max. h_ps://pyformat.info/ 1a-maxclosingprice.py Exercises Plo_ng maplotlib • Calculate the mean volume. • Calculate the max adjusted close. • Challenge : Return date(s) when : – closing price is different from the adjusted price? – IBM 2a-1column-plots.py h_p://matplotlib.org/users/pyplot_tutorial.html#working-with-text 1b-meanvolume-quiz.py
Plot 2 Columns in a single Plot Coming UP. • Restrict Data Ranges (e.g., specific date range)? (join) • Drop Missing Data Rows • Join Data Incrementally, column by column 2b-2column-plots.py Want to get a frame with Closing date How many days were US Stocks of Different Stocks. Traded in 2014 (over an enQre year) a) 365 b) 260 c) 252 Only on trading days …
How many days were US Stocks Steps: Building a DataFrame Traded in 2014 (over an enQre year) a) 365 1. DF1 = First build a data frame by specifying the date range . Includes weekend dates (markets are not open). – b) 260 (52x5) But there are also holidays … 2. DF2 = SPY = Load in SPY data (adjusted close) into a separate data frame (all data and prices). c) 252 – Only trading days (market open) in DF2. 3. Join DF2 and DF 1 – join so that only dates that are present in ‘both’ frames (it eliminates the weekends in Data frame 1). 4. AddiQonal Joins with other ‘symbol’ that we want to add, IBM, GOOG. Steps 0-2 : Specifying the Data Range • Step 0: Step 3: Combine the data frames with Joining Frames • Step 1: Create a list of data ;me index objects a) df2: Create SPY date frame w/ SPY data – dates = dates = pd.date_range pd.date_range(start_date start_date, , b) Combine date frames via join. end_date) end_date – Check it out (print). • List of data Qme index objects – df1: Empty date frame with a date range – Dates[0] (dates with Qme stamp) – df2_SPY Populated date frame (only trading days) – Dates[1] • Step 2. Index it by dates instead of integer by – Join: led join specifying index and segng it to ‘dates’ • df1.join(df2_SPY) – index = dates. • Only SPY row are retained. – ? No values from SPY?? – NOTE seen the default of integers already … 3a-simple-join.py
• MulQple Stocks from a list • dfSPY is indexed by integers by default, – symbols = [‘GOOG’, ‘IBM’, ‘GLD’] change index to dates by index_col – For loop iteraQng through symbols – index_col=“Date” pd_read_csv pd_read_csv(“data/{}. (“data/{}.csv csv”.format(symbol), ”.format(symbol), index_col index_col=‘Date’, =‘Date’, parse_dates parse_dates=True, =True, Usecols Usecols=[‘Date’, =[‘Date’,Adj Adj Close’], Close’], na_values na_values=[‘nan’]) =[‘nan’]) – … overlap of Adj Close column Rename the column to stock symbol instead. • Exercise: Re-Cap: Last Week • Worked on board … on code. • UQlity FuncQons to read in data no NaNs. • Compute / Code financial staQsQcs in pandas and numPY: – Global StaQsQcs • Mean • Median • Standard DeviaQons – Rolling StaQsQcs • Rolling mean – RepresentaQon of underlying value of a stock • Rolling standard deviaQon – deviate from the mean (buy and sell signal)
Recommend
More recommend