CS 644: Introduction to Big Data Chapter 1. Introduction Chase Wu Professor, Associate Chair of Computer Science Collaborative Research Staff Director of Center for Big Data Computer Science & Mathematics Division New Jersey Institute of Technology Oak Ridge National Laboratory chase.wu@njit.edu wuqn@ornl.gov 1
Verification of presence The 1 st Class Attendance Check Teaming for HW1 Adjustment of teaching • Name • Program (BS, MS, Ph.D., etc.) Order of Magnitude: 2 0 1 10 0 One • Year 2 10 K 10 3 Thousand • Why do you take this course? 2 20 M 10 6 Million • What is the largest data size 2 30 G 10 9 Billion you’ve ever personally handled 2 40 T 10 12 Trillion and in what context? 2 50 P 10 15 Quadrillion - application domain 2 60 E 10 18 Quintillion - data type 2 70 Z 10 21 Sextillion - storage format 2 80 Y 10 24 - processing/analysis purposes 2 90 - etc. …… 2
About this course • Recent Developments and Future Trends on Big Data Computing • Cloud computing, Supercomputing, cluster computing, etc. • Overview of Big Data Analytics • Systems, Platforms, Tools, and Techniques for Big Data Storage, Management, Computing, Processing, and Resource Management • Big Data Analytics • Advanced Big Data Topics: • Big-Data Visualization • Big-Data Movement • Big-Data Workflows • Big-Data Security Course Website: https://web.njit.edu/~chasewu/Courses/Fall2020/CS644BigData/CS644_BigData_Fall20.html 3
Textbook and Reference Books MapReduce / Hadoop Machine Learning / Data Mining Overview Data Science Learning Theory Popular Frameworks
Four V’s of Big Data 5
Center for Big Data Director: Chase Wu (YWCC) Co-Director: Dantong Yu (SOM) URL: https://centers.njit.edu/bigdata Email: chase.wu@njit.edu Location: GITC 4416 6
Industry Advisory Board • Binay Sugla (Trustee-Advisor, Vestac, LLC) • Ying Wu (China Capital Group) • Kathy Meier-Hellstern (AT&T Labs) • Terry Christiani (Microsoft) • Jianying Hu (IBM) 7
Mission Statement • Synergize the strong expertise in various disciplines across the NJIT campus • Build a unified platform that embodies a rich set of big data enabling technologies and services with optimized performance to facilitate research collaboration and scientific discovery • Investigate, develop, and apply cutting-edge technologies to address unprecedented challenges in big data with high Volume, high Velocity, high Variety, and high Veracity, in order to create high Value 8
A Three-layer Structure of the CBD � Transportation � Solar-Terrestrial � Goals: Advance sciences in various � Brain injury domains � Big Data Physics Layer 3 � Tasks: Adapt, customize, and refine � Applications Healthcare � application-specific solutions Business � Smart city � etc. bound User Interface North- � Goals: Provide generic and special big-data enabling solutions � Systems/Platforms � Tasks: Investigate, design, develop, Big Data � Tools/Libraries Layer 2 Technological implement, and test big data- � Services Infrastructure oriented analytics, visualization, � Algorithms computing, networking, workflow, storage, and retrieval solutions Data Access Retrieval and � Raw data (experimental, simulation, observational) � Goals: Share data and analysis � Metadata, markup data results for community building Big Data � Analysis results (intermediate, final) Layer 1 � Tasks: Standardize, categorize and Repository � Models, views, tables, forms, benchmark datasets animations, etc. � Workflow templates, provenance data 9
- Layer 1: Big Data Repository • Store, manage, and provide a wide variety of data such as raw data (experimental, simulation, observational, and user-generated content), metadata, markup data, analysis results (intermediate and final) in various forms including models, views, tables, images, and videos, and workflow templates with provenance data. • Build a dedicated one-stop portal to share research data and analysis results for community building. - Layer 2: Big Data Technological Infrastructure • Provide generic and domain-specific big data enabling solutions for data management, movement, and analytics. • Host and maintain a set of practical technical resources in the form of systems/platforms, tools/libraries, services, and algorithms in various areas including database management, data mining, machine learning, and parallel and distributed computing, which are needed to compose big data solutions in different application domains. 10
- Layer 3: Big Data Applications • Present a common portal to big data applications spanning across a wide spectrum of research fields, including - transportation - solar-terrestrial - brain injury - physics - healthcare - business - smart city • Provide researchers powerful and customized big data solutions to advance the frontier of sciences in various application domains. 11
Core Faculty • Chase Wu: Associate Professor, Dept of Computer Science • Yi Chen: Associate Professor, Leir Chair, School of Management, Dept of Computer Science • Andrew Gerrard: Professor, Dept of Physics, Center for Solar-Terrestrial Research • Lazar Spasovic: Professor, Dept of Civil and Environmental Engineering • Steven Chien: Professor, Dept of Civil and Environmental Engineering • Joyoung Lee: Assistant Professor, Dept of Civil and Environmental Engineering • Namas Chandra: Professor, Dept of Biomedical Engineering, Center for Injury Bio- mechanics, Materials and Medicine • Jason Wang: Professor, Dept of Computer Science • Usman Roshan: Associate Professor, Dept of Computer Science • Zhi Wei: Associate Professor, Dept of Computer Science • Dimitri Theodoratos: Associate Professor, Dept of Computer Science • Vincent Oria: Professor, Dept of Computer Science • Senjuti Roy: Assistant Professor, Dept of Computer Science • Brook Wu: Associate Professor, Dept of Informatics • Dantong Yu: Associate Professor, School of Management • Ji Meng Loh: Associate Professor, Dept of Mathematics 12
Funded Projects • DOE: Technologies and Tools for Synthesis of Source-to-Sink High- Performance Flows, DOE Office of Science, Big Data-Aware Terabits Networking. • NSF: An Integrated Approach to Performance Modeling and Optimization of Big-data Scientific Workflows, Computer and Network Systems. • DOE: Towards a Scalable and Adaptive Application Support Platform for Large-Scale Distributed E-Sciences in High-Performance Network Environments, DOE Office of Science, High-Performance Networks for Distributed Petascale Science. • Google Research Award, Understanding and Processing Subjective Queries on Structured Data • NSF: CAREER CAREER: Analyzing and Exploiting Meta-information for Keyword Search on Semi-structured Data. • EarthCube IA: Magnetosphere-Ionosphere-Atmosphere Coupling, Abstract #1541009. • Intelligent Transportation Systems Resource Center - Task: Data Acquisition, Integration, Analysis, and Visualization. 13
Transportation 14
Solar Terrestrial Research 15
Classification of Traumatic Brain Injury Blunt Injury-most Blast (military) Ballistic (bullet ) prevalent • Ballistics (Bullet, shrapnel) Blunt Impacts>> MVA, • Blunt (motor vehicle, sports, Fall, sports injury CONCUSSION fall from height) • Blast (explosions) 16
Exascale Computing and Big Data By Daniel A. Reed and Jack Dongarra July 2015 Communications of the ACM https://vimeo.com/129742718 17
������� J ����������� 18
Recommend
More recommend