uw escience institute initiatives
play

UW eScience Institute Initiatives Cecilia Aragon University of - PowerPoint PPT Presentation

UW eScience Institute Initiatives Cecilia Aragon University of Washington Seattle, WA, USA aragon@uw.edu (slides courtesy Bill Howe, Anissa Tanweer, Carole Goble) Dagstuhl EAS plenary talk, Jun 23, 2016 2005-2008 All across our campus,


  1. UW eScience Institute Initiatives Cecilia Aragon University of Washington Seattle, WA, USA aragon@uw.edu (slides courtesy Bill Howe, Anissa Tanweer, Carole Goble) Dagstuhl EAS plenary talk, Jun 23, 2016

  2. 2005-2008 “All across our campus, the process of discovery will increasingly rely on researchers’ ability to extract knowledge from vast amounts of data… In order to remain at the forefront, UW must be a leader in advancing these techniques and technologies, and in making [them] accessible to researchers in the broadest imaginable range of fields.” In other words: • Data-intensive science will be ubiquitous • It’s about people and software, not only hardware

  3. Long Tail of Research Data High throughput experimental methods Industrial scale Commons based production [src: Carole Goble] Publicly data sets Cherry picked results Preserved GenBank PDB ChemSpider UniProt CATH, SCOP Pfam (Protein Structure Spreadsheets, Notebooks Classification) Local, Lost

  4. Wright 2013 How much data do you work with?

  5. Data Science Kickoff Session: 137 posters from 30+ departments and units 7/8/2016 5 Bill Howe, UW

  6. Broad collaborations PIs on Moore/Sloan effort + eScience Institute Steering Committee + UW participants in February 7 Data Science poster session 6

  7. Moore/Sloan Data Science Environment: Impact $37.8M Initiative at UCB, NYU, UW Graphic by Ray Hong and eScience Institute, UW 7

  8. Career paths and alternative metrics UW flagship activity: Establish two new roles on campus: “Data Science Fellows” and “Data Scientists” Recruited / recruiting data scientists • Typically Ph.D.-educated; fully supported by DSE; research position with emphasis on taking responsibility for core activities (e.g., incubator projects) Recruited / recruiting research scientists • Typically Ph.D.-educated; partially supported by DSE; research position with emphasis on specific science goals Designated 33 faculty and staff as Data Science Fellows • We cribbed Berkeley’s excellent idea Recruited 6 “Provost’s Initiative” faculty members • Provost provided 6 faculty “half-positions” • Individuals with strength and commitment both to advancing data science methodology and to applying it at the forefront of a specific field • Astronomy, Biology, Mechanical Engineering, Sociology, Applied Mathematics, Statistics + Computer Science & Engineering Recruited 2 cohorts of 6 Data Science Postdoctoral Fellows • Each is co-mentored by “methodology” and “applications” faculty

  9. Education and training UW flagship activity: Establish new graduate program tracks in data science • IGERT Ph.D. program in Big Data / Data Science • Seven departments have put in place Big Data Tracks • Data science classes count toward Ph.D. (no extra work) • Departments: Astronomy, Biology, Chemical Engineering, Computer Science & Engineering, Genome Sciences, Oceanography, and Statistics • Undergraduate “transcriptable option” • Workshops and Bootcamps – Multiple Software Carpentry Bootcamps (Python, R, etc.) – AstroData Hack Week – Many others • Seminar series

  10. New MS in data science at UW Interdisciplinary • Participation from six departments (HCDE, CSE, Stats, Biostats, iSchool, Applied Math) Innovative • Rigorous technical program in statistics and computer science • Human-centered data science curriculum – ethics, data science and society, ‘big data’ user experience, visualization Designed for working professionals • Evening courses, full-time or part time attendance

  11. Software tools, environments, and support UW flagship activity: Establish an “incubator” seed grant program “Incubator” program • Our experiment at achieving scalability • A lightweight 2-page proposal process several times each year • I have an interesting science problem • I’m stumped by the data science aspects • If you cracked it, others would benefit • I’m going to send you the following person half-time for 3 months to provide the labor; you provide the guidance • Preceded by an information session to clarify expectations and commitments • Activities take place in the Data Science Studio, staffed by our Data Scientists • We coach software hygiene as well as methodology • Running two cohorts annually

  12. Drop-in “Office Hours” • eScience Institute Data Scientists • UW-IT Academic & Collaborative Applications Team, Research Computing Team, Network Design & Architecture Team • AWS Scientific Computing Team • Center for Statistics and the Social Sciences Statistical Consulting Service • UW Libraries Research Data Management Team • Google Cloud Platform Team

  13. Reproducibility and open science UW flagship activity: Establish a campus-wide community around reproducible research UW campus-wide monthly meetings May 2014 national workshop • More than 80 participants • Attendees from NYU, Berkeley, Fred Hutchinson Cancer Research Center, Allen Institute for Brain Science, Sage Bionetworks, Google, … Draft guidelines for reproducible research Weekly tutorials on “research hygiene” topics • E.g. GitHub, KnitR, iPython Notebook

  14. Working spaces and culture UW flagship activity: Establish a “Data Science Studio” Washington Research Foundation Data Science Studio

  15. Ethnography and evaluation: data science studies UW flagship activity: Establish a research program in “the data science of data science” Ethnography and evaluation integrated into a wide range of Data Science Environment activities • Project overall (beginning with in-depth baseline interviews with participants from grad students through faculty) • IGERT • AstroData Hack Week • Incubator projects Developed ethnography research questions • E.g., who does data science, how are they networked, forms of social interaction and organization, intellectual groupings, career reward structures, collaborative tool use in scientific workflows, data science values and ethics, etc. Established baseline for evaluation, and determined evaluation questions

  16. Data Science Ethnography Qualitative field-based technique originally from anthropology • Enables the study of underlying patterns and themes in a sociotechnical system • Trends can be analyzed in context without compromising ecological validity Ethnographers immerse themselves in a community to discern • subtle patterns that may not be self-evident to members of the community • how members make sense of the world • what motivates them • how they work together

  17. Data Science Ethnography Ethnography involves • Hundreds of pages of field notes, interview transcripts, and artifacts from the field • Collected and recorded over a long period of time Ethnographic insights emerge as patterns and themes are detected Ethnographers work with members of community to interpret observations Analysis • Co-occurs with data collection • Interactively shapes research strategy “Applied ethnography” • Goal is to provide ongoing feedback on what works and what doesn’t

  18. go to them Duration of Engagement come to us per FTE: Joint 0-2 Research annually 2010-present 0-2 2010-present Embedded annually 1-2 2014-present Incubator; DSSG annually 25-30 2011-present Door-to-Door; Lab Visits annually 50+ Office Hours 2015-present annually # of engagements

  19. Data Science for Social Good @ UW ß

  20. Precursor: Data Science Incubator Spring 2014, Fall 2014, Winter 2016 Goal: Identify high-impact data- intensive science projects that will benefit from quarter-long sprints of expertise Protocol: ~ 1-2-page proposals, in- studio collaboration two days per week Best projects: “I have the questions, I have the data, I need help getting the answers” • 4-6 concurrent teams: Network effects among cohort beyond 1: 1 interactions • Each team is ~ 50% project lead + ~ 50% eScience FTE • Structured, time-bounded engagement http://data.uw.edu/incubator/ ensures progress (and an exit strategy) • Feels like a course: “I have incubator today, so I can’t go do XXXX”

  21. The project outlives the incubator…… “I talked with Alicia a bit yesterday, and she showed me that her earthquake-repeater- searching implementation is more general , and more powerful than I had thought, and closer to trial by others (and I have a particular use in mind in the ongoing iMUSH experiment on Mount St Helens)<snip> “So I'm encouraging her to continue to work on it a day per week or so for the foreseeable future, assuming you have the facilities to continue the incubation.” Publications in the works on both the software and the science – from three months of half-time work

  22. Inaugural 2015 program: 16 spots 140 applicants …from 20+ departments Assessing Community Well-Being Third-Place Technologies Optimization of King County Metro Paratransit Computer Science & Engineering Predictors of Permanent Housing for Homeless Families Bill and Melinda Gates Foundation Open Sidewalk Graph for Accessible Trip Planning Computer Science & Engineering 24

Recommend


More recommend