eScience in the Netherlands Rob van Nieuwpoort R.vanNieuwpoort@esciencecenter.nl
We work demand-driven
35
Career paths Scale … eScience eScience top eScience top 13 manager specialist researcher eScience eScience eScience 12 coordinator specialist researcher 11 eScience research engineer eScience research engineer 10 Managerial Technical Research
Lessons learnt • Demand-driven: start from the science • Collaboration, not competition (connected projects, calls) • Good is good enough • Generalization (10% ring-fenced) • Communication communication communication – Hiring people – Internal communication – Project kickoffs • IP, work place, generalization, co-authorships – coordinators – Web sites, demo’s • Keep challenging the RSEs – Courses, hackathons, sprints, … – Switch disciplines
eStep The eScience technology platform A coherent set of technologies to tackle the grand challenges in eScience
Cross-cutting basic skills • Code quality and best practices • Integration of software • Scaling of software • Analytics and statistics • Visualization
NLeSC eScience competences applied in research Handling sensor data Linked data Optimized data handling Information integration 1. Optimized data handling Databases Data integration, data base optimization, Data assimilation structured & unstructured data, real time data Natural Language processing Machine learning Information visualization 2. Big data analytics eStep Big Data analytics Scientific visualization Statistics, machine learning, Information retrieval visualization, text mining Computer vision 3. Efficient computing Distributed computing Distributed & accelerated Accelerated computing Efficient computing computing, efficient algorithms Low power computing Orchestrated computing High-performance computing
• Key expertises are used in many projects • Projects often use quite a number of different competences and technologies
eStep Goals • Prevent fragmentation and duplication • Promote the exchange and re-use of best practices • Represent NLeSC’s expertise and knowledge base • Improve the eScience state of the art with a fundamental eScience research line
Tailor Adopt Develop eStep NLeSC projects Generalize
enhanced science • Main criteria for integrating technology in eStep: project-specific software projects NLeSC – State-of-the-art / best-of-breed? discipline-specific software – Generic and overarching? – Match with our expertise areas? overarching software generic libraries, tools, and algorithms – Includes externally developed eStep software Open platform! e-infrastructure
Our sustainability approach • Prevent duplication, fragmentation • Build something that is worth sustaining! – Sufficiently generic – Modular – High quality – Must be taken into account from the start • Enforce software engineering guidelines and best practices • Educate partners with software carpentry and data carpentry • Open source / open access, open standards, unless… • Community coding • Standardization for software and data formats • eStep is an open platform
A Common Workflow @ NLeSC Gi GitHub Hub Travis is CI Test and Deploy with Confidence. We run a Jenkins CI instance locally. Easily sync your GitHub projects with Travis CI Used for private repositories and and you’ll be testing your code in minutes! repositories requiring HPC middleware. deploy Open platform for building, shipping and running distributed applications.
• technology.esciencecenter.nl • Non-technical, targets general audience software eScience software
• estep.esciencecenter.nl • All eScience software and knowledge you need, in one place • Technical, targets developers, PIs
Knowledge base • knowledge.esciencecenter.nl • training and education • best practices • tutorials • white papers • training resources • Software development Checklist available
More info on eStep technology.esciencecenter.nl estep.esciencecenter.nl R.vanNieuwpoort@esciencecenter.nl
Logo Bingo CommonSense xtas Osmium NLTK EDAL Semanticizer AHN2 viewer
Optimized data handling
Summer in the city example: human thermal comfort Three persons in elderly house died due to heat Elderly use heat info call desk massively Heat protection plan abandoned Cities poorly protected against heat End of 16 day heat wave Courtesy Bert Holtslag
Summer in the city example Novel hourly forecasting system for human thermal comfort in urban areas on street level • Kilometer scale: elevation (AHN2) and land-use data (Kadaster), imagery for assessing the green vegetation coverage and the soil moisture content • Street scale: sky view factor, the building height to street width ratio, the reflectivity and thermal characteristics of buildings and streets, the abundance of vegetation • Network of weather stations and crowd sourcing: wunderground.com • Special measuring campaigns • Social media? • Combine with fine-grained models Courtesy Bert Holtslag
Via Appia
Optimized Data Handling Technology • Distributed sensor networks, multi-model and multi-scale simulations, data assimilation, data integration, multi-scale pattern recognition, geographic information systems, databases, … • Xenon, NetCDF, HDF5, ROOT, XNAT, OpenDA, Hadoop, MapReduce, Oracle, MySQL, Postgres, MonetDB, ElasticSearch, DataVault , JSON, Spark, …
Big Data Analytics
eEcology example Courtesy Willem Bouten
Accelerometer and Behaviour Heave vertical Surge forward Sway sideward Static acceleration sitting floating standing Dynamic acceleration flapping flight gliding X-flapping Courtesy Willem Bouten
Machine learning / annotation interface
Routes and Geology Courtesy Willem Bouten
Detours and Climate Modis satellite image of dust Courtesy Willem Bouten
Embodied Emotion Project • Mapping bodily expression of emotions – To be downhearted – Clenching fists – My heart fills with joy – My blood is boiling • Test case: Dutch theatre texts 1600-1830 – Shift in experienced emotions – Shift in embodiment of emotions • Approach – Establishing corpus & standardizing text – Establishing emotional and bodily vocabularies – Emotion mining – Visualizing results Source: Nummenmaa et al., 2013
eMetabolomics example • Use reaction rules to identify compounds in Mass-spectrometry datasets • Online at http://www.emetabolomics.org/ • Public and private version, private allows bigger/longer calculations • Lars Ridder, Laboratory of Biochemistry, Wageningen University
Courtesy Lars Ridder
Courtesy Lars Ridder
forecast.ewatercycle.org
Big Data Analytics Technology • Natural language processing, machine learning, information and scientific visualization, information retrieval, computer vision • Matlab, R, NumPy, SciPy, scikit learn, Pandas, Weka, Xtas, Twiqs.nl, D3, ExtJS, Cesium, Leaflet, OpenLayers, GeoExt, X3Dom, X3DomExt, Mapnik, CommonSense , …
Efficient Computing
Efficient Computing • Smart algorithms can improve performance dramatically • Power consumption is becoming the bottleneck • Legacy codes are inefficient on modern architectures – Need completely different optimizations, algorithms O( 1 ) O( N ) O( log(N) ) A r A r i t t h m h m e e t t i c c I I n n t t e n s e n s i t y t y SpMV, BLAS1,2 FFTs Dense Linear Algebra Stencils (PDEs) (BLAS3) Lattice Methods Particle Methods
Efficient Computing Example • Radio Frequency Interference (RFI) is a huge problem for many radio astronomy observations • Caused by – Lightning, Vehicles, airplanes, satellites, electrical equipment, GSM, FM Radio, fences, reflection of wind turbines, … • Best removed offline – Complete dataset available – Good overview / statistics / model – Can spend compute cycles • Partner: Astron
Real-time RFI mitigation • Some pipelines need to run in real time today – Image-based transient detection (LOFAR/AARTFAAC) – Pulsar searching (WSRT/Apertif) • SKA will be entirely real-time – Data rates simply too high to store • Novel algorithms with linear computational complexity – Only very little loss in quality
RFI mitigation on accelerators Pulsar B1919+21 in the Vulpecula nebula. Pulse profile created with folding and the LOFAR software telescope. Background picture courtesy European Southern Observatory. • Accelerator-based computing – GPUs, Xeon Phi, … – Astronomy, ocean modeling, digital forensics, radar systems, high- energy physics • Auto-tuning & runtime compilation – Generate many codes at run-time, Performance compared to CPU Power usage compared to CPU select most efficient 80 10 70 8 60 50 6 40 4 30 20 2 10 0 0 Xeon Phi NVIDIA GTX AMD HD7990 Xeon Phi NVIDIA GTX AMD HD7990 Titan GPU GPU Titan GPU GPU
Recommend
More recommend