cs 644 introduction to big data chapter 1 introduction
play

CS 644: Introduction to Big Data Chapter 1. Introduction Chase Wu - PowerPoint PPT Presentation

CS 644: Introduction to Big Data Chapter 1. Introduction Chase Wu Professor of Computer Science Collaborative Research Staff Director of Center for Big Data Computer Science and Mathematics Division New Jersey Institute of Technology Oak


  1. CS 644: Introduction to Big Data Chapter 1. Introduction Chase Wu Professor of Computer Science Collaborative Research Staff Director of Center for Big Data Computer Science and Mathematics Division New Jersey Institute of Technology Oak Ridge National Laboratory chase.wu@njit.edu wuqn@ornl.gov 1

  2. The 1 st Class Attendance Check • Name • Program (MS, Ph.D., etc.) Order of Magnitude: 2 0 1 10 0 One • Year 2 10 K 10 3 Thousand • Why do you take this course? 2 20 M 10 6 Million • What is the largest data size 2 30 G 10 9 Billion you’ve ever personally handled 2 40 T 10 12 Trillion and in what context? 2 50 P 10 15 Quadrillion - application domain 2 60 E 10 18 Quintillion - data type 2 70 Z 10 21 Sextillion - storage format 2 80 Y 10 24 - processing/analysis purpose 2 90 - etc. …… 2

  3. About this course • Recent Developments and Future Trends on Big Data Computing • Cloud, Supercomputer, Cluster, etc. • Overview of Big Data Analytics • Systems, Platforms, Tools, and Techniques for Big Data Storage, Management, Computing, Processing, and Analytics • Advanced Topics: • Big-Data Visualization • Big-Data Movement • Big-Data Workflows • Big-Data Security 3

  4. Four V’s of Big Data 4

  5. Center for Big Data Director: Chase Wu, chase.wu@njit.edu Co-Director: Yi Chen, yi.chen@njit.edu URL: https://centers.njit.edu/bigdata Location: GITC 4111 5

  6. Industry Advisory Board • Binay Sugla (Trustee-Advisor, Vestac, LLC) • Ying Wu (China Capital Group) • Kathy Meier-Hellstern (AT&T Labs) • Terry Christiani (Microsoft) • Jianying Hu (IBM) 6

  7. Mission Statement • Synergize the strong expertise in various disciplines across the NJIT campus • Build a unified platform that embodies a rich set of big data enabling technologies and services with optimized performance to facilitate research collaboration and scientific discovery • Investigate, develop, and apply cutting-edge technologies to address unprecedented challenges in big data with high Volume, high Velocity, high Variety, and high Veracity, in order to create high Value 7

  8. A Three-layer Structure of the CBD � Transportation � Solar-Terrestrial � Goals: Advance sciences in various � Brain injury domains � Big Data Physics Layer 3 � Tasks: Adapt, customize, and refine � Applications Healthcare � application-specific solutions Business � Smart city � etc. bound User Interface North- � Goals: Provide generic and special big-data enabling solutions � Systems/Platforms � Tasks: Investigate, design, develop, Big Data � Tools/Libraries Layer 2 Technological implement, and test big data- � Services Infrastructure oriented analytics, visualization, � Algorithms computing, networking, workflow, storage, and retrieval solutions Data Access Retrieval and � Raw data (experimental, simulation, observational) � Goals: Share data and analysis � Metadata, markup data results for community building Big Data � Analysis results (intermediate, final) Layer 1 � Tasks: Standardize, categorize and Repository � Models, views, tables, forms, benchmark datasets animations, etc. � Workflow templates, provenance data 8

  9. - Layer 1: Big Data Repository • Store, manage, and provide a wide variety of data such as raw data (experimental, simulation, observational, and user-generated content), metadata, markup data, analysis results (intermediate and final) in various forms including models, views, tables, images, and videos, and workflow templates with provenance data. • Build a dedicated one-stop portal to share research data and analysis results for community building. - Layer 2: Big Data Technological Infrastructure • Provide generic and domain-specific big data enabling solutions for data management, movement, and analytics. • Host and maintain a set of practical technical resources in the form of systems/platforms, tools/libraries, services, and algorithms in various areas including database management, data mining, machine learning, and parallel and distributed computing, which are needed to compose big data solutions in different application domains. 9

  10. - Layer 3: Big Data Applications • Present a common portal to big data applications spanning across a wide spectrum of research fields, including - transportation - solar-terrestrial - brain injury - physics - healthcare - business - smart city • Provide researchers powerful and customized big data solutions to advance the frontier of sciences in various application domains. 10

  11. Core Faculty • Chase Wu: Associate Professor, Dept of Computer Science • Yi Chen: Associate Professor, Leir Chair, School of Management, Dept of Computer Science • Andrew Gerrard: Professor, Dept of Physics, Center for Solar-Terrestrial Research • Lazar Spasovic: Professor, Dept of Civil and Environmental Engineering • Steven Chien: Professor, Dept of Civil and Environmental Engineering • Joyoung Lee: Assistant Professor, Dept of Civil and Environmental Engineering • Namas Chandra: Professor, Dept of Biomedical Engineering, Center for Injury Bio- mechanics, Materials and Medicine • Jason Wang: Professor, Dept of Computer Science • Usman Roshan: Associate Professor, Dept of Computer Science • Zhi Wei: Associate Professor, Dept of Computer Science • Dimitri Theodoratos: Associate Professor, Dept of Computer Science • Vincent Oria: Professor, Dept of Computer Science • Senjuti Roy: Assistant Professor, Dept of Computer Science • Brook Wu: Associate Professor, Dept of Informatics • Dantong Yu: Associate Professor, School of Management • Yixin Fang: Associate Professor, Dept of Mathematics • Ji Meng Loh: Associate Professor, Dept of Mathematics 11

  12. Funded Projects • DOE: Technologies and Tools for Synthesis of Source-to-Sink High- Performance Flows, DOE Office of Science, Big Data-Aware Terabits Networking. • NSF: An Integrated Approach to Performance Modeling and Optimization of Big-data Scientific Workflows, Computer and Network Systems. • DOE: Towards a Scalable and Adaptive Application Support Platform for Large-Scale Distributed E-Sciences in High-Performance Network Environments, DOE Office of Science, High-Performance Networks for Distributed Petascale Science. • Google Research Award, Understanding and Processing Subjective Queries on Structured Data • NSF: CAREER CAREER: Analyzing and Exploiting Meta-information for Keyword Search on Semi-structured Data. • EarthCube IA: Magnetosphere-Ionosphere-Atmosphere Coupling, Abstract #1541009. • Intelligent Transportation Systems Resource Center - Task: Data Acquisition, Integration, Analysis, and Visualization. 12

  13. Transportation 13

  14. Solar Terrestrial Research 14

  15. Classification of Traumatic Brain Injury Blunt Injury-most Blast (military) Ballistic (bullet ) prevalent • Ballistics (Bullet, shrapnel) Blunt Impacts>> MVA, • Blunt (motor vehicle, sports, Fall, sports injury CONCUSSION fall from height) • Blast (explosions) 15

  16. Exascale Computing and Big Data By Daniel A. Reed and Jack Dongarra July 2015 Communications of the ACM https://vimeo.com/129742718 16

  17. ������� J ���������� 17

Recommend


More recommend