the duck test leveraging machine learning to remediate
play

The Duck Test: Leveraging Machine Learning to Remediate Fraud in - PowerPoint PPT Presentation

1 The Duck Test: Leveraging Machine Learning to Remediate Fraud in Huge Datasets Matthew Harper Director Cyber Crime Prevention 2 Who is Aflac? Supplemental insurance Significant presence in Japan (2/3 of corporate revenue) My


  1. 1 The Duck Test: Leveraging Machine Learning to Remediate Fraud in Huge Datasets Matthew Harper – Director Cyber Crime Prevention

  2. 2 Who is Aflac? • Supplemental insurance • Significant presence in Japan (2/3 of corporate revenue) • My Special Aflac Duck – Robotic duck for child cancer victims

  3. 3 We do what the duck says… “Aflac” When you get hurt or sick, Aflac pays you cash in usually a day

  4. 4 Spoiler Alert -> Machine Learning is not black/white but boy are we trying

  5. Steps of Machine Learning 5 Business Data Model training Implement Problem to Data Collection Preparation and and evaluation Results solve Exploration

  6. Business Problem -> Why did Cyber Crime Prevention Start 6 In late 2016 Aflac was the victim of a Account Take Over (ATO) against policy holders • The financial industry has been seeing this for over a Channels Policy Holders decade, emergence against Aflac due to our direct Policy Holders Aflac Core policy holder payment model • Aflac US was not built to detect this type of fraud or Channels Technology attack • Limited identity validation at claim or time of Easier Target enrollment for new policy holders (workplace based enrollment model) Channels Associate Access Mature Cyber Security program in place Traditional internal Security & Blocked • Security controls designed with the presumption of Network controls (access management, firewall, IDS, hackers going after core Aflac data (customer master Hackers etc…) files, etc…) • New control infrastructure needed

  7. Data Collection 7 Service Claim Enroll Sell Bill 1.5% of daily IDV Service – All Identity Validation Calls Splunk volume All items in green custom integrated into identity validation service Agent/Call Center Enrollment (~10%) and in partnership with third party D2C Enrollment Online Risk Engine -> Device Tagging OTP Client Information File – Information on clients: names, address, etc… MyAflac – Client online activity Client Central – Contact center activity, notes, phone number NoCheck – Claim payment data, bank account IVR – Voice response system Claims – Claims Processing System 98.5% of daily Splunk volume Supporting Legacy Security/IT Feeds in Splunk (US & Japan)

  8. One cool thing -> Built and logging our own middleware 8 Field Force Services Ability to provide real-time MFA risk insight to all policy SOLUTION (Authentication & Multi-factor) login.aflac.com holder and associate Office 365 authentication Agent & Remote Employee Sales System Key Insight Separate policy holder Identity Validation Vendor - Configurable risk engine and authentication solution scoring MyAflac Identity (Policy Holder Validation • Online Device Analytics – Device Data Services) Service • Identity Validation • KBA/OTP Policy Holder Enrollment System Splunk – Logs all IDV and MFA Solution calls via JSON

  9. Data Exploration 9 So much spaghetti throwing… • Starting from zero…nothing to lose • Alerting -> Known Use Cases • Investigation -> New Dashboards • Operational Monitoring -> who works alerts? • Drive real-time risk based controls Key insight -> Don’t jump straight into machine learning, take time to develop faster returns with the data and understand it; Stacked use cases, hotlist alerts, research dashboards, etc…

  10. Maturity Levels 10 Level 1: Real-time Control Monitoring; Call Center OTP …Level 2: Alerts & Hot List Tracking …Level 3: Investigation & Link Analysis …Level 4: Associate MFA, tune controls

  11. DEFCON 5 11

  12. Data Preparation - Risk Index 12 Client Information Online Account Activity Updates Risk IVR/Contac Claims Summary t Center Index

  13. Data Indexes : Risk Index : Risk Index : Alert Manager : - Phs _time _time CIF_Customer_ID Customer_PH_GUID - Client central risk_object_type risk_object_type: alert Risk Rules Suspect Use Cases CIF_Name 13 - Nocheck risk_object risk_object: cif_number Policyholder risk_source_rules - Idv risk_score risk_score: 10*(number of use cases) alert_type - Ivr risk_source_rule risk_source_rule: Suspect_Policyholder IP - claim guid alert_type: Use case list Application_ID cif_number guid Customer_Email_Address CIF_State policy_number cif_number DeviceID_Print policy_number Account_Number Alert History : Hashed_Password - Alert Username - Title alert_history - Date Fraudnet Rules Fired - incident_id CIF_Phone Assign Alert Results - status Auto-decision Phone risk_source_rule=“Alert_Results” Alert - Owner history - comments - CIF_Customer_ID First Pass - Customer_Email_Address - Customer_PH_GUID Suspect ATO Cleared Referred ATO Referred Suspect ATO : - IP - Policy_Number Through DB: - Account_Number - Remarks Added Assign Suspect Claim - Routing_Number Generate FN - Reset Password Feedback File FraudNet Second Pass create manual Suspect Monitoring Hotlist : Referred Suspect Cleared Suspect Hotlist CIF_Customer_ID SIU Policy_Number Cleared F/P Privacy Customer_PH_Guid Reviewed by Trust – ATO Abuse Incident_id Comment Shared Indicator _time Metric DBs/Reporting Alert

  14. Data Preparation – Feature Creation 14 Features : Policyholder Online Login Claim Address Calls ∙ Developed over 30 key cross-channel activities to analyze 26 Identifier Filed Update ∙ Leveraged online risk scoring platform othe 1 4 8 0 0 r Modeling : 2 7 9 1 1 ∙ Challenges: Labeled dataset not available, supervised learning not feat 3 5 5 0 0 an option ures ∙ First Attempt: Assign each feature a weight manually based on 4 0 0 0 0 … how “risky” the event is • Problem: High activity = high score, even if the activity is 5 2 1 0 1 “normal” ∙ Second Attempt: Use K-Means clustering to find outliers based on 6 14 0 1 0 features

  15. Home Grown Cross-Channel User Behavior Analytics 15

  16. Numeric Clustering with MLTK 16 Policyholder Count by Cluster 3D Scatterplot of Fields vs. Cluster

  17. Lessons Learned 17

  18. Thank You

Recommend


More recommend