cs 644 introduction to big data chapter 1 introduction
play

CS 644: Introduction to Big Data Chapter 1. Introduction Chase Wu - PowerPoint PPT Presentation

CS 644: Introduction to Big Data Chapter 1. Introduction Chase Wu Professor, Associate Chair of Computer Science Collaborative Research Staff Director of Center for Big Data Computer Science & Mathematics Division New Jersey Institute of


  1. CS 644: Introduction to Big Data Chapter 1. Introduction Chase Wu Professor, Associate Chair of Computer Science Collaborative Research Staff Director of Center for Big Data Computer Science & Mathematics Division New Jersey Institute of Technology Oak Ridge National Laboratory chase.wu@njit.edu wuqn@ornl.gov 1

  2. Verification of presence The 1 st Class Attendance Check Teaming for HW1 Adjustment of teaching • Name • Program (BS, MS, Ph.D., etc.) Order of Magnitude: 2 0 1 10 0 One • Year 2 10 K 10 3 Thousand • Why do you take this course? 2 20 M 10 6 Million • What is the largest data size 2 30 G 10 9 Billion you’ve ever personally handled 2 40 T 10 12 Trillion and in what context? 2 50 P 10 15 Quadrillion - application domain 2 60 E 10 18 Quintillion - data type 2 70 Z 10 21 Sextillion - storage format 2 80 Y 10 24 - processing/analysis purposes 2 90 - etc. …… 2

  3. About this course • Recent Developments and Future Trends on Big Data Computing • Cloud computing, Supercomputing, cluster computing, etc. • Overview of Big Data Analytics • Systems, Platforms, Tools, and Techniques for Big Data Storage, Management, Computing, Processing, and Resource Management • Big Data Analytics • Advanced Big Data Topics: • Big-Data Visualization • Big-Data Movement • Big-Data Workflows • Big-Data Security Course Website: https://web.njit.edu/~chasewu/Courses/Fall2020/CS644BigData/CS644_BigData_Fall20.html 3

  4. Textbook and Reference Books MapReduce / Hadoop Machine Learning / Data Mining Overview Data Science Learning Theory Popular Frameworks

  5. Four V’s of Big Data 5

  6. Center for Big Data Director: Chase Wu (YWCC) Co-Director: Dantong Yu (SOM) URL: https://centers.njit.edu/bigdata Email: chase.wu@njit.edu Location: GITC 4416 6

  7. Industry Advisory Board • Binay Sugla (Trustee-Advisor, Vestac, LLC) • Ying Wu (China Capital Group) • Kathy Meier-Hellstern (AT&T Labs) • Terry Christiani (Microsoft) • Jianying Hu (IBM) 7

  8. Mission Statement • Synergize the strong expertise in various disciplines across the NJIT campus • Build a unified platform that embodies a rich set of big data enabling technologies and services with optimized performance to facilitate research collaboration and scientific discovery • Investigate, develop, and apply cutting-edge technologies to address unprecedented challenges in big data with high Volume, high Velocity, high Variety, and high Veracity, in order to create high Value 8

  9. A Three-layer Structure of the CBD � Transportation � Solar-Terrestrial � Goals: Advance sciences in various � Brain injury domains � Big Data Physics Layer 3 � Tasks: Adapt, customize, and refine � Applications Healthcare � application-specific solutions Business � Smart city � etc. bound User Interface North- � Goals: Provide generic and special big-data enabling solutions � Systems/Platforms � Tasks: Investigate, design, develop, Big Data � Tools/Libraries Layer 2 Technological implement, and test big data- � Services Infrastructure oriented analytics, visualization, � Algorithms computing, networking, workflow, storage, and retrieval solutions Data Access Retrieval and � Raw data (experimental, simulation, observational) � Goals: Share data and analysis � Metadata, markup data results for community building Big Data � Analysis results (intermediate, final) Layer 1 � Tasks: Standardize, categorize and Repository � Models, views, tables, forms, benchmark datasets animations, etc. � Workflow templates, provenance data 9

  10. - Layer 1: Big Data Repository • Store, manage, and provide a wide variety of data such as raw data (experimental, simulation, observational, and user-generated content), metadata, markup data, analysis results (intermediate and final) in various forms including models, views, tables, images, and videos, and workflow templates with provenance data. • Build a dedicated one-stop portal to share research data and analysis results for community building. - Layer 2: Big Data Technological Infrastructure • Provide generic and domain-specific big data enabling solutions for data management, movement, and analytics. • Host and maintain a set of practical technical resources in the form of systems/platforms, tools/libraries, services, and algorithms in various areas including database management, data mining, machine learning, and parallel and distributed computing, which are needed to compose big data solutions in different application domains. 10

  11. - Layer 3: Big Data Applications • Present a common portal to big data applications spanning across a wide spectrum of research fields, including - transportation - solar-terrestrial - brain injury - physics - healthcare - business - smart city • Provide researchers powerful and customized big data solutions to advance the frontier of sciences in various application domains. 11

  12. Core Faculty • Chase Wu: Associate Professor, Dept of Computer Science • Yi Chen: Associate Professor, Leir Chair, School of Management, Dept of Computer Science • Andrew Gerrard: Professor, Dept of Physics, Center for Solar-Terrestrial Research • Lazar Spasovic: Professor, Dept of Civil and Environmental Engineering • Steven Chien: Professor, Dept of Civil and Environmental Engineering • Joyoung Lee: Assistant Professor, Dept of Civil and Environmental Engineering • Namas Chandra: Professor, Dept of Biomedical Engineering, Center for Injury Bio- mechanics, Materials and Medicine • Jason Wang: Professor, Dept of Computer Science • Usman Roshan: Associate Professor, Dept of Computer Science • Zhi Wei: Associate Professor, Dept of Computer Science • Dimitri Theodoratos: Associate Professor, Dept of Computer Science • Vincent Oria: Professor, Dept of Computer Science • Senjuti Roy: Assistant Professor, Dept of Computer Science • Brook Wu: Associate Professor, Dept of Informatics • Dantong Yu: Associate Professor, School of Management • Ji Meng Loh: Associate Professor, Dept of Mathematics 12

  13. Funded Projects • DOE: Technologies and Tools for Synthesis of Source-to-Sink High- Performance Flows, DOE Office of Science, Big Data-Aware Terabits Networking. • NSF: An Integrated Approach to Performance Modeling and Optimization of Big-data Scientific Workflows, Computer and Network Systems. • DOE: Towards a Scalable and Adaptive Application Support Platform for Large-Scale Distributed E-Sciences in High-Performance Network Environments, DOE Office of Science, High-Performance Networks for Distributed Petascale Science. • Google Research Award, Understanding and Processing Subjective Queries on Structured Data • NSF: CAREER CAREER: Analyzing and Exploiting Meta-information for Keyword Search on Semi-structured Data. • EarthCube IA: Magnetosphere-Ionosphere-Atmosphere Coupling, Abstract #1541009. • Intelligent Transportation Systems Resource Center - Task: Data Acquisition, Integration, Analysis, and Visualization. 13

  14. Transportation 14

  15. Solar Terrestrial Research 15

  16. Classification of Traumatic Brain Injury Blunt Injury-most Blast (military) Ballistic (bullet ) prevalent • Ballistics (Bullet, shrapnel) Blunt Impacts>> MVA, • Blunt (motor vehicle, sports, Fall, sports injury CONCUSSION fall from height) • Blast (explosions) 16

  17. Exascale Computing and Big Data By Daniel A. Reed and Jack Dongarra July 2015 Communications of the ACM https://vimeo.com/129742718 17

  18. ������� J ����������� 18

Recommend


More recommend