Data Visualizations of HYIP Dataset Quantifying the World April 23, 2012 Jie Han
Financial Cryptography 2012 This could be you!!! http://fc12.ifca.ai/pre-proceedings/paper_27.pdf
Overview 1. What's an HYIP? 2. Dataset 3. Processes 4. R graph examples 5. Google Chart examples 6. Some helpful hints
High Yield Investment Programs (HYIPs) ● Also known as a Ponzi or pyramid scheme ● Promise high returns on investment ● Pay existing investors with revenue from new investors ● Unsustainable in the long run
Why are HYIPs a problem? ● Advertised as legitimate investments ● Sophisticated online ecosystem in support of the schemes
HYIP Website
HYIP Aggregator Websites
HYIP Variables
HYIP Lifetime Typical life cycle of an HYIP:
About the Data ● Since 11/17/2010, still running ● Collected data from nine "aggregator" websites ● Total observations: 141k+ ● Total HYIPs observed: 1,576+
Process Preliminary Continue data Data collection analysis collection, work on (Python, crontab, (Python, R) parsing all mongoDB) aggregators (Python) Use new tools to Difficulties in analyzing Look at what we look for patterns data -> create have, decide on (browser & eyes) interactive data what we want (R) visualizations (Python, Google Charts, JS, HTML)
How an R Chart Gets Generated Data Collection (Python) Background scripts Parse data & insert into db (Python, mongoDB) New user input Fetch & manipulate data (HTML forms) Back End (Python, mongoDB, R) Front End Output a .pdf image to server User interact with data in browser
How Can We Trust Aggregator Data? CDF of Standard Deviations of HYIP Lifetimes ● Aggregators agree 80% of the time
How Long Do HYIPs Last Before Collapsing? Survival function of HYIP Lifetimes ● Most HYIPs collapse within a few weeks
What Factors Lead to Collapse? Factors that lead to shorter HYIP lifespans: ● Higher advertised rates of return ● Shorter mandatory investment terms
R vs. Google Charts R Google Charts ● Useful if familiar with the ● Anyone can view & interact dataset with the data ● Good at presenting ● See a complete data aggregate summaries distribution ● Large learning curve, ● Learning curve isn't bad especially when you want to ● Not as customizable do something specific ● Have to wait for updates for ● More customizable more functionality, or write ● Most analysis techniques your own are available
How a Google Chart Gets Generated Data Collection Background scripts (Python) Parse data & insert into db (Python, mongoDB) Back End Fetch & manipulate data New user input (Python, mongoDB, R) (HTML forms) Write JS & HTML page (Python, JS, HTML, CSS) Front End User interact with data in browser
Distribution of HYIPs Around the World Link
Motion Charts Link
Variable Changes Over Time cherryshares.com, aggregator rating Link
Relationships Between Two Variables Link
Multi-Dimensional Scatterplot Link
Multi-Dimensional Scatterplot Link
General Programming Tips ● Spend time on data quality ● Organize your code, variable names, and files ● Keep records of working examples ● Plan out your code to maximize pattern capture ● Error-catching, browser consoles, and regexes are friends ● Test out chunks of code before putting them together ● Google Tables take a while to load for large datasets ● Google Charts Playground allows you to test code in their environment
Future Work ● Create an interactive web based visualization for our dataset - some examples I made ● Link scams together ● Explore larger dataset
Thanks!
Recommend
More recommend