2/20/2012 Social Media Analytics: Data Mining Applied to Insurance Twitter Posts Posts CAS Ratemaking and Product Management Seminar Roosevelt C. Mosley, Jr., FCAS, MAAA Pinnacle Actuarial Resources, Inc. March 21, 2012 Experience the Pinnacle Difference! Antitrust Notice The Casualty Actuarial Society is committed to adhering strictly to the letter and spirit of the antitrust laws. Seminars conducted under the auspices of the CAS are designed solely to provide a forum for the expression of various points of view on topics described in the programs or agendas for such meetings. Under no circumstances shall CAS seminars be used as a means for competing companies or firms to reach any understanding – expressed or implied – that restricts competition or in any way impairs the ability of members to exercise independent business judgment regarding matters affecting competition. It is the responsibility of all seminar participants to be aware of antitrust regulations, to prevent any written or verbal discussions that appear to violate these laws, and to adhere in every respect to the CAS antitrust compliance policy. Social Media Analytics The growth in social media Background on Twitter Data General descriptive statistics General descriptive statistics Processing the data Analysis – identifying the themes Analysis challenges Application of social media analytics 1
2/20/2012 The Growth in Social Media Social Media Defined Social media : a group of Internet ‐ based applications that build on the ideological and technological foundations of Web 2.0, and that allow the creation and exchange of user ‐ generated content Building blocks Identity Conversations Sharing Presence Relationships Reputation Groups Kaplan, Andreas M.; Michael Haenlein (2010). "Users of the world, unite! The challenges and opportunities of Social Media". Business Horizons Social Media Platforms I need to eat I have five I ate connections that recommend me This is where I ate because I eat so well Why am I eating? This is a review of where I ate Let’s all eat together Watch me eat 2
2/20/2012 Social Media – Explosive Growth Facebook has 750 million users 30 billion pieces of content is shared on Facebook every month As of May 2011, there were on average 190 As of May 2011, there were on average 190 million tweets per day Google+ reached 10 million users in 16 days People upload 3,000 images to Flickr every minute Source: http://www.jeffbullas.com/2011/09/02/20‐stunning‐social‐media‐statistics/ Business Has Taken Notice Two ‐ thirds of comScore’s U.S. Top 100 websites and half of comScore’s Global Top 100 websites have integrated with Facebook Many businesses now have established Twitter accounts in an attempt to connect with current accounts in an attempt to connect with current and potential customers 80% of companies use LinkedIn as a recruitment tool Companies spent over $3 billion to advertise on social media sites in 2011, an increase of 55% over 2010 Insurance Facebook Fans Fan Count Percentage Page Category (December, 2011) Growth Flo, The Progressive Girl Mascot 3,336,486 1.5 Farmers Insurance Corporate 2,360,972 ‐0.5 State Farm Nation Demographic 1,353,524 1.1 Mayhem Mascot 1,129,941 1.7 AFLAC Duck Mascot 293,496 0.6 USAA Corporate 208,732 1.6 The Gecko Mascot 204,593 1.8 GEICO Corporate 202,825 2.2 State Farm Insurance Corporate 193,864 8.1 New York Life Corporate 154,390 19.6 Source: Customer Respect Group. “Social Eyes: The Insurer’s View of Social Media.” 3
2/20/2012 Insurance Twitter Followers Percent Company Category Followers Change (December, 2011) State Farm Nation Corporate 28,218 0 Allstate Insurance Corporate 25,884 1 USAA USAA Corporate Corporate 23 742 23,742 4 4 New York Life Corporate 23,344 31 State Farm Insurance Corporate 18,856 5 VPI Pet 17,551 7 Hartford Achieve Advocacy 17,524 ‐2 AFLAC Duck Mascot 13,709 2 The Hartford Corporate 10,913 2 Progressive Insurance Corporate 10,531 3 Are You Taking Advantage of Social Media? Insurance companies are investing significant resources in a social media presence Current and potential customers are voluntarily sharing intimate details of their life with the world Current and potential customers are interacting with companies on a very personal level This information can be applied in different ways (service, marketing, competitive monitoring) Twitter Background 4
2/20/2012 Data Data Used for Paper – Dataset #1 Tweets including #allstate – 68,370 Dates: July 29, 2010 – August 12, 2011 Downloaded from twapperkeeper.com – no longer exists Data user : the username that sent the tweet tweet : the content of the tweet timestamp : the date and time the tweet was sent (GMT) tweet ID : Twitter identification number of the tweet geo : latitude and longitude of the user Data Used for Updated Analysis – Dataset #2 Keyword searches for State Farm, Allstate, Geico, esurance, and #Progressive – 176,694 tweets January 25 – February 12, 2012 Tracked through hootsuite.com Data text : content of the tweet to user id : specific tweet recepient from user : sender of the tweet iso_language_code : language of tweet source : where did the tweet originate? profile image : picture of user geo : latitude and longitude of the user Date and time 5
2/20/2012 Sources of Social Media Data Third party data aggregators (hootsuite, GNIP) API Company developers Screen scraping Screen scraping General Descriptive Statistics Tweets Per Month ‐ #allstate Tweets by Month 16,000 14,984 14,000 12,748 N u 12,000 m 10,988 b e 10,000 r r o 8,000 f 6,399 T 6,000 w 4,947 4,848 e 4,663 e 4,000 t 3,027 s 2,072 1,843 1,580 2,000 271 ‐ Aug 2010 Sep 2010 Oct 2010 Nov 2010 Dec 2010 Jan 2011 Mar 2011 Apr 2011 May 2011 Jun 2011 Jul 2011 Aug 2011 Month / Year 6
2/20/2012 Tweets per Day – Dataset #2 Tweets Per Day 14,000 Average = 5,521 12,000 tweets per day N u m 10,000 b e r 8,000 o f 6,000 T w e 4,000 e t s 2,000 ‐ 1/11/2012 1/12/2012 1/13/2012 1/14/2012 1/15/2012 1/16/2012 1/17/2012 1/18/2012 1/19/2012 1/20/2012 1/21/2012 1/22/2012 1/23/2012 1/24/2012 1/25/2012 1/26/2012 1/27/2012 1/28/2012 1/29/2012 1/30/2012 1/31/2012 2/1/2012 2/2/2012 2/3/2012 2/4/2012 2/5/2012 2/6/2012 2/7/2012 2/8/2012 2/9/2012 2/10/2012 2/11/2012 Date Distribution of Tweets by Company Distribution of Tweets by Company Allstate 23% State Farm 39% GEICO 38% Allstate GEICO State Farm Tweets per Hour – Dataset #1 Tweets by Hour of the Day 4,500 4,051 4,016 3,897 3,898 4,000 3,820 3,776 3,706 3,600 3,542 3,573 3,498 N 3,500 u 3,192 3,210 m b 2,925 3,000 2,784 e r r 2,500 2,363 2,264 o 2,162 f 2,000 1,663 T 1,584 w 1,408 1,500 e 1,226 1,122 1,090 e t 1,000 s 500 ‐ 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 Hour of the Day (Eastern Time) 7
2/20/2012 Tweets per Hour – Dataset #2 Tweets Per Hour 45.0% P 39.6% 40.0% e r c 35.0% e “If I had a nickel for every n 30.0% 28.4% t GEICO commercial I've ever 28.0% 26.2% a seen, I could buy us all car seen I could buy us all car g 25.0% insurance” 22.2% e 21.0% 20.0% 19.7% 20.0% o 17.3% 15.9% f 15.0% T 11.7% 10.3% W 8.9% 10.0% 8.8% 8.7% e 6.2% e 4.2% t 5.0% 3.1% s 0.0% 1‐4 5‐8 9‐12 13‐16 17‐20 21‐0 Hour Range allstate geico statefarm Data Processing Steps Data Processing Steps Remove punctuation and symbols (retain @ and #) Tweet User Tweet Word1 Word2 … Word35 Parse the tweet (35 words ID 1 @mosley Text of W1 W2 … W35 worked for Twitter – will tweet need many more for other sources) other sources) Change table structures from tweets in rows to Tweet ID Word Order Word 1 1 Word1 tweets in columns – keep 1 2 Word2 … … … indicator of order 1 35 Word35 Correct spelling errors Add word indicators 8
More recommend