Outline 1. Install Python and some libraries 2. Use and extend templates Machine Learning for Trading 3. Create a workflow of modules needed for analyzing Software Platform market behavior. From getting data, building a Introduction & Installation portfolio, analyze it, computes expected return. Read a CSV file into python – Create an analyzer that enables you to assess a portfolo – • Edit the analysis.py file – Create an optimizer Installation: Projects (tentative) Mac Installation: 1) Instruction that the instructor used: Step 1: Install your python platform a) installed anaconda (got required packages) a): Install Anaconda https://www.continuum.io/downloads (2.7) Step 2 (later) : Install Market Simulator Templates. includes, sci.py, num.py, and matplotlib • Assess Portfolio (5%) It needs SciPy — so: Alternate Setup (pip)): Note: The Anaconda python distribution includes • Assess Learners & Defeat Learners (25%) http://quantsoftware.gatech.edu/Undergrad_ML4T_Software_Setup * NumPy, Pandas, SciPy, Matplotlib, and Python, http://quantsoftware.gatech.edu/ML4T_Software_Installation and over 250 more packages available via a simple “conda install <packagename>” • Market Simulator (10%) . It also has an IDE. Instructor got 2.7, and the anaconda distribution of • Q-Learning Robot (10%) python To get the appropriate software you’ll need: • Strategy Learner (15%) python (scripting ‘programming’ language) sci.py (numerical routines), num.py (matrices, linear algebra), and matplotlib (enables generating plots of data) Installing Python (2.7) via Anaconda: Anaconda instruction site including lots of libraries with python. https://docs.continuum.io/anaconda/install
Overview Workflow • Read Data : Read Stock Data from a CSV File and input • Scrape S&P 500 ticker list and industry sectors from list of S&P 500 companies on Wikipedia (code provided). it into a pandas DataFrame – https://en.wikipedia.org/wiki/List_of_S%26P_500_companies – Pandas.DataFrame – NOTE : for this class these files will be downloaded for use already – available in a zip file. – Pands.read_csv • Use the daily close data (adjusted close) for each industry sector from • Select Subsets of Data: Select desired rows and Yahoo finance columns – using pandas DataReader (again – you will use cvs files that have already been downloaded). – Indexing and slicing data Build a sample Portfolio (in lecture by hand): • – Gotchas: Label-based slicing convention Look at measure s of the performance of a portfolio (project 1). We will • use the first measure for project 1. • Generate Useful Plots : Visual data by generating plots – Sharp ratio (in class) – Plotting – Treynor ratio – Pandas.DataFrame.Plot – Jensen’s alpha • Visualization via matplotlib – Matplot.pyplot.plot First Something Familiar: Goal Weather Data • Go from RAW data (adjusted close prices in • .csv Comma Separated Values of weather a .csv file) all the way to visualization conditions from Oct 2009 to Aug 2017 • Town of Cary, North Carolina – Temperature, pressure, humidity, … lets see – Import as “text data” • Next … stock data. https://catalog.data.gov/dataset?res_format=CSV&tags=weather rdu-weather-history.csv
Getting Real Data Comma Separated Values (.CSV) • CSV File • Header • Rows – Rows of Dates • Each Element is separated by columns • Shift-ctrl-down In Excel (use Text Import Wizard) Data-> Get External Data->Import Text File … rdu-weather-history.csv What is in a Historical Stock Data File ? What is in a Historical Stock Data File? a) # of employees a) # of employees b) Date/Time b) Date/Time c) Company Name (does not change over time) c) Company Name d) Price of the Stock d) Price of the Stock e) Company’s Hometown (does not change over time) e) Company’s Hometown Does not change over time Is it relevant or expected?
https://finance.yahoo.com/quote/GOOG/history?ltr=1 Comma Separated Values (.CSV) Stock Data Files • Date • Stock Data from • Open – price stock opens at in the morning, it is Yahoo Finance first price in the day. • CSV file pulled by • High – highest price in the day panda’s (later) • Low – lowest price in the day DataReader() • Close – closing price at 4 PM. • Volume – how many shares traded all together on that day. • Adjusted Close – accounts for splits/and dividends – encapsulates the increase in value if you hold stock for a long time (later). http://www.investopedia.com/terms/a/adjusted_closing_price.asp GetWebS.py https://finance.yahoo.com/quote/GOOG/history?ltr=1 https://finance.yahoo.com/quote/IBM/history GOOG.csv (from Yahoo). • Adjusted Close – adjusts / accounts for stocks • Newer dates on top, older descending. splits and dividend payments. • On the Current Day – Adjusted Close and Close are always the same. • Previous Days: – But as we go back in time start they to differ they are not always the same. – Actual Return is not captured by the closing price, need to use adjusted close on historical data.
Goal: Pandas : Included in Anaconda Store Portfolio in a Panda Data Frame • Want: <Symbols> vs Time • https://en.wikipedia.org/wiki/Pandas_(software) Adjusted close Volume • Includes a set of equities Close • Developed by Wes McKinney while at AQR (ownership) – Exchange Traded Fund (ETF) Capital Management to analyze financial data symbols – SPY 500 – Python Spreadsheets • Tracks the index S&P 500 Index. – Russell 1000 – Open Source. – AAPL – apple – GOOG – Google – Numerical Tables and Time Series – Other: securities (government) time – A Key Element : Data Frames • NaN • Slicing • https://en.wikipedia.org/wiki/ Google – Panel Data – Initial public offering (IPO) - August 19, 2004. https://finance.yahoo.com/quote/GOOG/history?ltr=1 Exercises Warm-up : Reading into a Data frame Exercise 1. • Interactively • Read in the entire CSV file in a function – Import pandas – Rename it to pd – print it out. • Read it in. • First column is index helping you to access rows. Exercise 2. • Stocks (get data from zip file) – SPY, • Read in the entire file in a function – AAPL, – Print out a selection of file – GOOG, – GLD • Top 5 lines : .head() • Link data directory. • Bottom 5 lines: .tail()
def -- Make it a function • Only print top 5 line of data frame – print df.head() • Only print bottom 5 lines of data frame – print df.tail() Print out a subset of columns, and/or rows: • Slicing : Only print rows between index 10, 20 (not • If this file is run as the main program the inclusive) – if statement is true and calls test_run • simple-frame.py – print df[10:21] – Entire frame – Try: printing - df.head(), df.tail() – print df[:21] Question : Print last 5 lines? • – print df[['Date','High']].values[5] simple-frame.py Compute Max Closing Price Computation on CVS File get_max_close( symbol ) • From the file, find out maximum closing price. 1. Read the file into a data frame Now - SPY.csv • Later – any symbol. • 2. Process the Column ‘Close’ 3. Use pandas function .max() to return max. https://pyformat.info/ 1a-maxclosingprice.py
Quiz Plotting with matplotlib 1) Calculate the mean volume. 2) Calculate the max adjusted close. 3) Challenge (bonus) : Return date(s) when : – closing price is different from the adjusted price? – IBM (try this stock) 2a-1column-plots.py df = pd.read_csv("data/AAPL.csv", index_col=0 ) # read in data http://matplotlib.org/users/pyplot_tutorial.html#working-with-text 1b-meanvolume-quiz.py Building a Portfolio with Pandas: Plot 2 Columns in a single Plot Create a DataFrame with Closing date of Different Stocks. BUT - à Only on trading days … 2b-2column-plots.py
How many days were US Stocks How many days were US Stocks Traded in 2014 (over an entire year) Traded in 2014 (over an entire year) really any year … really any year … a) 365 a) 365 b) 260 b) 260 (what is this number?) c) 252 c) 252 How many days were US Stocks 3 steps Traded in 2014 (over an entire year) a) 365 • Restrict Data Ranges (e.g., specific date range)? – idea : b) 260 (52 weeks x 5) • Get the intersection by joining with other frames. c) 252 • Skips dates • Drop Missing Data Rows (NaN). • Join Data Incrementally, column by column
Follow Along. Add in Multiple Stocks Iteratively import pandas as pd start_date = '2010-01-22’ • dfSPY = dfSPY.rename(columns={'Adj Close': 'SPY'}) end_date = '2010-01-26’ dates= pd.date_range(start_date, end_date) df1 = pd.DataFrame(index=dates) df2_SPY = pd.read_csv("data/SPY.csv", index_col="Date", parse_dates=True, usecols=['Date', 'Adj Close'], na_values=['NaN'] # NaN should be not a number • Read SPY data into frame with restricted date ) dfR=df1.join(df2_SPY) # Step 1. join with restricted date frame range print dfR # Step 2. NaN are still in there. • Iteratively add additional stocks: GOOG, IBM, GLD dfR = dfR.dropna() and join into frame. print dfR – Trick is to use rename to avoid column name clashes. # dfR = df1.join(df2_SPY,how='inner’) # alternate 2 step in 1 ValueError: columns overlap but no suffix specified: if __name__ == "__main__": test_run() Index([u'Adj Close'], dtype='object') 3a-simple-join.py 3a-simple-joinKEY.py 3b-iterate-multiple.py Utility Function Slicing. 4b-get_data.py • execfile(“4b-get-data.py”) • Copy paste test_run() • df[‘GOOG’], df[[‘GOOG’,’IBM’]] • df1.ix is deprecated.. In favor of iloc and loc indexer
Recommend
More recommend