Laying the Groundwork for Success in the Information Age Dr. Fran Berman Vice President for Research Professor of Computer Science Rensselaer Polytechnic Institute Fran Berman
Ken Kennedy – Pioneer, Colleague, Inspiration, Friend • Ken was a stellar example of leadership – Clear focus on, and prioritization of, what’s important – Effective, strategic, pragmatic, high-integrity, respectful of colleagues and collaborators at all levels • Ken was focused on moving the community forward – Through contributions to computer science – Through the use of cyberinfrastructure to address major challenges in science and engineering – Through the next generation of scholars and leaders – Through service at the whole-discipline level Fran Berman
Creating a Successful Future: Science and Engineering Drive Solutions to 21 st Century Challenges What is the “Science is more essential for our How will natural potential impact of disasters effect prosperity, our security, our health, Global Warming? urban centers? our environment, and our quality of life than it has ever been before.” President Barack Obama Can we accurately What therapies can predict market What plants work be used to cure or outcomes? best for biofuels? control cancer? Fran Berman
21 st Century Challenges Require 21 st Century Tools Cyberinfrastructure Sensors Visualization “If infrastructure is required for an “Science is more essential for our What is the How will natural industrial economy, then we could prosperity, our security, our health, potential impact of disasters effect our environment, and our quality Global Warming? say that cyberinfrastructure is urban centers? of life than it has ever been required for a knowledge economy.” before.” The “Atkins Report”: President Barack Obama Revolutionizing Science and Engineering Through Cyberinfrastructure, 2003 Can we accurately What therapies can predict market What plants work be used to cure or outcomes? best for biofuels? control cancer? Data Models Computation Fran Berman Images and movies courtesy of Al Wallace/RPI, Amit Chourasia/SDSC, and JCSG
Data Cyberinfrastructure-Enabled Research Which has the greatest impact – nature or nurture? Panel Study of Income Dynamics : longitudinal data on 8000 families How does disease How does the over 40 years spread? political and PDB : World wide cultural life of a reference collection society evolve? The U.S. “cyber- of protein structure election” of 2008 information Life at the time of the Russian Revolution Fran Berman Images and movies courtesy of Library of Congress, PDB, ICPSR
How Much Digital Information is There? By 2023, the amount of digital data will exceed Avogadro’s number . U.S. Library of Congress manages (6.02 X 10^23 = number of atoms in 295+ terabytes of digital data, 230+ 12 grams of carbon ) of which are “born digital” SDSC Tape Archives Kilo 10 3 = 36+ petabytes Meg capacity 10 6 a Giga 10 9 Google Earth = 71+ Tera 10 12 terabytes Peta 10 15 Exa 10 18 Zetta 10 21 1 novel = 1 50,000 Protein Data megabyte Bank Structures = 35 terabytes Stored data from ENZO cosmological simulations = 500 terabytes Fran Berman Graph Source: “The Diverse and Exploding Digital Universe” IDC Whitepaper, March 2008
Information from birth to death/immortality: The Digital Data Life Cycle Use / Preserve / Create Edit Publish Reuse Destroy Data creation / • Organize • Analyze • Disseminate • Store / capture / preserve • Annotate • Mine • Create portals gathering from / data • Store / • Clean • Model • laboratory collections / replicate / • Filter .... • Derive experiments databases preserve additional data • fieldwork • Associate with • Store / • Visualize literature …. ignore • surveys • Input to • Destroy …. • devices instruments / • media computers / devices …. • simulation output ... Fran Berman Information adapted from Chris Rusbridge and Liz Lyon
Out of Room • We may be generating unimaginable amounts of data, but we can’t save it all. • 2007 was the “crossover year” where the amount of digital information became greater than the amount of available storage Importance of digital data and the • need to make choices mandates a proactive approach to information stewardship Fran Berman Source: “The Diverse and Exploding Digital Universe” IDC Whitepaper, March 2008
Laying the groundwork for information stewardship: value (to whom and how), regulation, economics • Key Questions: Value 1) What should we save? 2) How should we Cost save it? Time 3) Who should pay for it? Access to information tomorrow requires preservation of information today Fran Berman
What Should We Save? Digital information we* want to keep over the long- term: increasing reliability required, increasing – We = “Society” • Official and historically valuable data (Census information, presidential emails, Societal Shoah Collection, etc.) Value – We = Research Community infrastructure expense • Protein Data Bank, National Virtual Community Observatory, etc. Value – We = Me • My medical record, my Quicken Personal data, digital photos of my kids’ graduations, etc. Value The Data Pyramid Fran Berman
Sarbanes-Oxley (Public Accounting Increasing Policy and Reform and Investor Protection Act of Regulation 2002) Affecting Digital Information Applies to all U.S. public company boards, management, and public accounting firms Crime and Punishment Includes electronic records (correspondence, work papers, memoranda, etc.) that are created, sent, or received in connection with an audit Regulations Retention Requirement Penalty or a review) Sarbanes-Oxley Auditors must retain Fines to $5M and 20 relevant data for at least years in prison 7 years 1. “Don’t forget that email and instant messaging are business records … HIPAA Retain patient data for 6 $250K fine and up to years 10 years in prison 4. Don't assume that the retention Gramm-Leach- Ensure confidentiality of Up to $500K and 10 requirement …is …7 years. … most Baily customer financial years in prison lawyers that understand information information retention agree that SEC 17a Broker data retention for Variable based on business records need to be kept 3-6 years. Some require violation indefinitely. longer retention Kevin Beaver, “Thirteen Data Retention Mistakes OMB Circular A- “a three year period is Penalty structure to Avoid” 110 / CFR Part the minimum amount of unclear, likely fines? http://searchdatamanagement.techtarget.com/ 215 (applies to time that research data news/article/0,289142,sid91_gci1186910,00.ht federally funded should be kept by the ml research data) grantee” Fran Berman Table information partly based on “Data Retention – More Value, Less Filling”,John Murphy, http://www.tdan.com/view-articles/5222
Increasing Policy and Regulation Affecting Digital Information Crime and Punishment Regulations Retention Requirement Penalty HIPAA (Health Insurance Portability and Sarbanes-Oxley Auditors must retain Fines to $5M and 20 Accountability Act) relevant data for at least years in prison 7 years • Applies to health information created or HIPAA Retain patient data for 6 $250K fine and up to maintained by health care providers “who years 10 years in prison engage in certain electronic transactions , health plans, and health care Gramm-Leach- Ensure confidentiality of Up to $500K and 10 Baily customer financial years in prison clearinghouses” [www.hipaa.org] information • Title II: Requires HHS to create rules and SEC 17a Broker data retention for Variable based on standards for the use and dissemination of 3-6 years. Some require violation health care information longer retention OMB Circular A- “a three year period is Penalty structure • Healthcare providers must retain healthcare 110 / CFR Part the minimum amount of unclear, likely fines? records for a period of not less than 6 215 (applies to time that research data years. federally funded should be kept by the research data) grantee” Fran Berman
Increasing Policy and … Regulation Affecting Digital Information Crime and Punishment • The U.S. Office of Management and Budget requires that federally Regulations Retention Requirement Penalty funded research data, supporting documentation, scientific Sarbanes-Oxley Auditors must retain Fines to $5M and 20 relevant data for at least years in prison notebooks, financial records, etc. 7 years be maintained by the grantee for HIPAA Retain patient data for 6 $250K fine and up to 3+ years years 10 years in prison Gramm-Leach- Ensure confidentiality of Up to $500K and 10 • University libraries, federal Baily customer financial years in prison information agencies, institutional repositories SEC 17a Broker data retention for Variable based on not currently prepared to address 3-6 years. Some require violation the economic, technological, legal longer retention and social issues associated with OMB Circular A- “a three year period is Penalty structure widespread compliance of data 110 / CFR Part the minimum amount of unclear, likely fines? retention policies 215 (applies to time that research data federally funded should be kept by the research data) grantee” Fran Berman
Recommend
More recommend