Breaking your proprietary software habit Best practices for data import into CiviCRM Young-Jin Kim, Eileen McNaughton, Micah Lee
7 deadly sins of data migration 1. Wrath - Feeling you'll get if you don't plan! 2. Gluttony - Failure to restrict import scope 3. Greed - Failure to get rid of data 4. Sloth - Failure to iterate quickly, work cleanly 5. Pride - Failure to validate the import 6. Lust - Failure to dedupe 7. Envy - Failure to leave behind old ways
liberate your data, set it free
Best practices for data migrations 1. Use a dedicated environment for data imports 2. Automate scripts for the full import early on! Use APIs! 3. Judiciously, with client input, limit data import scope 4. Data import is an iterative process: iterate, iterate! 5. Think about current workflow and future workflow as it impacts data mapping into CiviCRM 6. If you can, draw up a time horizon that will demarcate stale data from current data, e.g. 3 years in the past 7. Don't reinvent the wheel make use of free tools, i.e. migrate, civimigrate, ETL tools, Google Refine, APIs
Two possible migration workflows Export Import Google Pentaho Legacy CiviCRM DB Refine Kettle DB Cleanse Transform Export Import Civimigrate Legacy Export CiviCRM DB Module DB DB Transform
Google Refine ● Free Open Source Data Cleaning tool written in Java running on a local tomcat instance ● Uber-spreadsheet on " steroids " with GUI ● Reads in many file types and data formats and also Google Docs spreadsheets ● Many built in data transformations for merging, clustering, matching, faceting ● Ability to extend capabilities by writing custom transforms in GREL, Python or Clojure ● Cleaning procedure can be saved as JSON and replayed back easily
Pentaho Data Integration ● Free Open Source Extract-Translate-Load tool (ETL) written in Java Eclipse framework ● Visual programming interface (GUI) for pipelining data and inspecting data streams ● Comes with connectors to many existing data(base) formats for input and output ● Write custom Javascript and Java steps ● Data stream is routed using a transformation step, transformations can be chained in a job ● Transformations and jobs are stored as XML ● Replay XMLs from command line
What is Civimigrate? It's a bandaid between Migrate Module and the CiviCRM API More technically it exposes the API as a migrate destination
What does migrate do ● Maps source data to migrate destinations (csv, oracle , xml, mysql, JSON ....) ● Supplies a framework to do trial imports, rollbacks, updates- Drush or GUI ● Map tables maintain relationships between source data and the resulting CiviCRM entities ● Allows you to use hooks to manipulate data during the migration (prepareRow + callbacks, e.g to sanitize data)
You've migrated your data, but what about your donors? EFF had ~1,000 Ways to save your recurring donors: recurring donors in ● Call them on the phone, Convio, bringing in ask them to re-donate ~$20,000 per month. (recommended) ● Get credit card numbers, carefully baby-sit We spent a long, long selenium script time saving them, but ● Keep old payment in the end succeeded. processor around until all cards expire, write CiviCRM integration Probably worth it. code
Recommend
More recommend