Welcome
Overview of Predictive Analytics Claudia Perlich Chief Scientist, Dstillery
Predictive Modeling: Algorithms that Learn from Data
Example: Micro Loans Ag e Inc ome De fa ult 35 75K no 68 83K ye s 43 61K no 71 56K ye s … … …
Learning to Classify Classification tree Balance Split over balance > = 50K < 50K Age Default Age Prob.= 12/13 45 < 45 > = 45 Split over age Default Default Prob.= 4/7 Prob.= 1 50K Balance Bad risk (Default) – 16 cases Probability of default= 4/ 7 Good risk (Not default) – 14 cases
Learning to Classify Logistic Regression p(+|x)= Age 45 β 0 = 123 β 1 = -1.3 50K Balance Bad risk (Default) – 16 cases p(+|x) = 0.48 Good risk (Not default) – 14 cases
Lending Club Data • Text • Loan Category • Demographic information • Credit Score
Targeted Online Display Advertising
100 Million Who should Brow sers w e target for a product? cookies 100 Million Shopping at one of Does the ad URL’s our campaign sites have an effect? conversion Where should What data should 0.0001% to 1% w e advertise and Billions of w e pay for? baserate at w hat price? Auctions Attribution? per day Ad Exchange Which request are fraud?
Agnostic Data A c onsume r’s online a c tivity g e ts re c orde d like this: Purc ha se s Purc ha se s E E nc ode d nc ode d T he Bra nde d We b T he Non- Bra nde d We b da te 1 3012L da te 1 3012L 20 20 da te 2 4199L da te 2 4199L 30 30 … … da te n 3075L da te n 3075L 50 50 Browsing History Browsing History Ha she d URL Ha she d URL ’s: ’s: da te 1 a b kc c da te 1 a b kc c da te 2 kkllo da te 2 kkllo da te 3 88io k da te 3 88io k da te 4 7uio l da te 4 7uio l … … I do not want/need to ‘understand’ who you are …
Model in 10 Million Dimensions Using Na ïve Ba ye s a nd Sto c ha stic Gra die nt De c e nt L o g istic Re g re ssio n, we e stima te sta tistic a l c o rre la tio ns b e twe e n 10s o f millio ns o f we b URL s a nd 1000s o f b ra nde d a c tio ns. Pa ssion ike lihood to Conve rt g ive n Visit non- bra nde d we bsite s L Ave rsion p(buy|urls) =
Real ‐ time Scoring of a Browser ENG AG EMENT O BSERVATIO N Pur c ha se Ad Ad Ad Ad Ad Ad Ad Ad Prospe c tRa nk T hre shold Some pr ospe c ts fall out of favor onc e the ir in-mar ke t indic ator s de c line . site visit with po sitive c o rre la tio n site visit with ne g a tive c o rre la tio n p(buy|urls) =
Models in Our World • Spam Detection • Fraud/Fault Detection • Financial Trading • Medial Diagnosis/Quality control • Sentiment Analysis • Prioritization in General • CRM • Recommender systems • Advertising/Targeting
Important Takeaways • The algorithm is secondary • The data is KEY • Quality control is HARD • Model is only as good as the modeler • Very difficult to really understand the data
Panel Discussion • Pamela Dixon , Founder, World Privacy Forum • Edmund Mierzwinski , Consumer Program Director and Senior Fellow, U.S. Public Interest Research Group • Claudia Perlich , Chief Scientist, Dstillery • Stuart Pratt , President and CEO, Consumer Data Industry Association • Ashkan Soltani , Independent Researcher and Consultant • Rachel Nyswander Thomas , Executive Director of Data ‐ Driven Marketing Institute, and Vice President of Government Affairs, Direct Marketing Association • Joseph Turow , Professor, University of Pennsylvania
Presentation Ashkan Soltani Independent Researcher and Consultant
whoami twitter: @ashk4n ashkan.soltani@gmail.com independent researcher & consultant
today: alternative scoring • methodology • findings • data sources
methodology
user ‐ agent
older findings: orbitz
findings: orbitz Some sites, for example, gave discounts based on whether or not a person was using a mobile device. A person searching for hotels from the Web browser of an iPhone or Android phone on travel sites Orbitz and CheapTickets would see discounts of as much as 50% off the list price , Orbitz said. Both sites are run by Orbitz Worldwide Inc., which in fact markets the differences as "mobile steals." Orbitz says the deals are also available on the iPad if a person installs the Orbitz app.
findings: gogo inflight User ‐ Agent: Desktop User ‐ Agent: iPhone $12.95 $7.95
location
findings: staples
findings: staples
findings: more geography Home Depot's website offered Location also seemed to be important for price variations that appeared to some international companies. The Journal be based on the nearest brick ‐ and ‐ saw Rosetta Stone, which sells software for mortar store as well. A 250 ‐ foot learning languages, offering discounts of as spool of electrical wiring fell into much as 20% for people who bought multiple six pricing groups, including levels of its German lessons from certain $70.80 in Ashtabula, Ohio; $72.45 locations in the U.S. or Canada , but not others in Erie, Pa.; $75.98 in Olean, N.Y from the U.K. or Argentina. and $77.87 in Monticello, N.Y.
findings: discover In the tests, Discover, for instance, showed a prominent offer for the company's new "it" card to computers connecting from cities including Denver, Kansas City, Mo., and Dallas, Texas. Computers connecting from Scranton, Penn., Kingsport, Tenn., and Los Angeles didn't see the same offer. A Discover spokeswoman said that the company was testing the card, but that for competitive reasons, it wouldn't comment further on its "acquisition strategy" for new customers.
findings: staples higher income = lower price In the Journal's examination of Staples' online pricing , the weighted average income among ZIP Codes that mostly received discount prices was roughly $59,900, based on Internal Revenue Service data. ZIP Codes that saw generally high prices had a lower weighted average income, $48,700.
profiles*
findings: nextag / shoplet
findings: nextag / shoplet
findings: capital one Capital One was showing different users different cards first— either those for "excellent credit" or "average credit."
findings: capital one
data sources
data sources
data sources
data sources
data sources
conclusion
conclusion: staples As a final test, the Journal ordered two separate Swingline staplers from Staples.com, from two nearby ZIP Codes—one costing $14.29 and the other one $15.79. The staplers arrived the same day. They appear to be indistinguishable from one another and do an equally thorough job of stapling.
Panel Discussion • Pamela Dixon , Founder, World Privacy Forum • Edmund Mierzwinski , Consumer Program Director and Senior Fellow, U.S. Public Interest Research Group • Claudia Perlich , Chief Scientist, Dstillery • Stuart Pratt , President and CEO, Consumer Data Industry Association • Ashkan Soltani , Independent Researcher and Consultant • Rachel Nyswander Thomas , Executive Director of Data ‐ Driven Marketing Institute, and Vice President of Government Affairs, Direct Marketing Association • Joseph Turow , Professor, University of Pennsylvania
Recommend
More recommend