ai and embodiment at the edge leveraging deep learning
play

AI and Embodiment at the Edge: Leveraging Deep Learning, with OSS - PowerPoint PPT Presentation

AI and Embodiment at the Edge: Leveraging Deep Learning, with OSS Project Watson-Intu (Self) IBM WATSON CLOUD MARCH, 2018 AI What are we talking about when we say AI Multiple uses of the term AI: Artificial Intelligence


  1. AI and Embodiment at the Edge: Leveraging Deep Learning, with OSS Project Watson-Intu (“Self”) IBM WATSON CLOUD MARCH, 2018

  2. AI What are we talking about when we say “AI” Multiple uses of the term AI: “ Artificial Intelligence ” – Replicating the human mind in artificial form AI: “ Augmented Intelligence ” – Augmenting human cognition with intelligent assistance Distinction between AI and Deep Learning / NN’s AI, in both cases, may be brought about by multiple contributing methods/technologies: Rule-based or “Expert” systems 1 Knowledge Graphs 2 / Semantic representations Machine Learning, Neural Network based Deep Learning (differentiable programming 3 ) Trea Knowledge Graph EKG Rule-Based System for IBM Inventors Deep Neural Network 1 “Rule-Based Expert Systems”, Crina Grosan, Ajith Abraham, “https://link.springer.com/chapter/10.1007/978-3-642-21004-4_7 2 “Towards a Definition of Knowledge Graphs”, Lisa Ehrlinger and Wolfram Wöß Institute for Application Oriented Knowledge Processing Johannes Kepler University Linz, Austria; http://ceur-ws.org/Vol-1695/paper4.pdf

  3. Purpose / Motivation Cognitive Systems are able to learn their behavior through education That support forms of expression which are more natural for human interaction Whose primary value is their expertise; and That continue to evolve as they experience new information, new scenarios, and new responses And do so at enormous scale Human-System Interaction 1 Data produced per day, projections 1 1 NVIDIA GTC 2016 Keynote, Rob High CTO IBM Watson Cloud: https://fudzilla.com/news/40407-ibm-cto-shows-off-gpu-accelerated-cognitive-computing

  4. Advancement Rapidly advancing capabilities: ◦ Speech Processing ◦ Natural Language Understanding ◦ Computer Vision ◦ Image and Video Processing ◦ Physical Simulation IBM Watson Natural Language ◦ Object Detection and Recognition Historical Perspective of Speech Processing 1 ◦ Event Prediction Human Detection in Crowded Scenes 2 Jensen Huang, 2017 NVIDIA Keynote “A Historical Perspective of Speech Recognition” , By Xuedong Huang, James Baker, Raj Reddy; Communications of the ACM, Vol. 57 No. 1, Pages 94-103, 10.1145/2500887, “Fast human detection in crowded scenes by contour integration and local shape estimation”, Csaba Beleznai, Horst Bischof, 2009 in 2009 IEEE Conference on Computer Vision and Pattern Recognition

  5. Project Intu 1 An Open source project for embodied cognition Embodied: An entity is in and of the world and that it can sense, react, and act in that world. Cognitive: An entity can reason and learn. An embodied cognitive entity has a identity that distinguishes itself from all other entities (and is to a degree aware of its own identity). SoftBank “Pepper” Nao Robot and Hilton Concierge 1 “Project Intu v1.0”, Grady Booch, IBM: https://ibm.box.com/v/IBMWatson-Intu-Self-Embodiment

  6. Project Intu 1 Based on a cognitive architecture named “Self”. Self is an agent-based architecture that combines connectionist and symbolic models of computation, using blackboards for opportunistic collaboration. Project Intu provides a framework for orchestrating cognitive services in a manner that brings higher level cognition to an embodied system. Credit: Global Digital Citizen 2 1 “Project Intu v1.0”, Grady Booch, IBM: https://ibm.box.com/v/IBMWatson-Intu-Self-Embodiment 2 Image Credit: Global Digital Citizen: https://globaldigitalcitizen.org/category/global-digital-citizen

  7. Self Ontology Models ◦ Self ◦ Others ◦ World Within ◦ People ◦ Places ◦ Things ◦ Domain Knowledge Knowledge graph maintained of historical events, interactions, representations 1 “Project Intu v1.0”, Grady Booch, IBM: https://ibm.box.com/v/IBMWatson-Intu-Self-Embodiment

  8. Self [High-Level] Architecture Agent-based blackboard system Sensors contribute data to topics published to blackboard Agents ◦ Agents subscribe to topics (system health, question, identity, position, weather…) ◦ Agents carry out actions according to Goals, fulfilling Plans Skills are defined to interact with the user, and Gestures are the manifestation of the interaction 1 “Project Intu v1.0”, Grady Booch, IBM: https://ibm.box.com/v/IBMWatson-Intu-Self-Embodiment

  9. Self Architecture Agent-based blackboard Self Startup Sequence system Sensors contribute data to topics published to blackboard Agents ◦ Agents subscribe to topics (system health, question, identity, position, weather…) ◦ Agents carry out actions according to Goals, fulfilling Plans Skills are defined to interact with the user, and Gestures are the manifestation of the interaction

  10. Self Flow Agent-based blackboard system Sensors contribute data to topics published to blackboard Agents ◦ Agents subscribe to topics (system health, question, identity, position, weather…) ◦ Agents carry out actions according to Goals, fulfilling Plans Skills are defined to interact with the user, and Gestures are the manifestation of the interaction

  11. The Confluence of Cloud and Edge The result of decades of foundational technologies, combined with emerging technologies Society of Mind Docker Agent-based Blackboard registry Knowledge Graphs Neural Networks Deep Learning Edge Compute 1010 1010 0101 0101 1010 1010 1010 0101 1010 Distributed Systems Containerized Microservices GPU-accelerated Devices Ref: Docker Stack Watson Services Image Credit: https://www.ibm.com/cloud-computing/bluemix/sites/default/files/assets/page/feature-cognitive_0_0.png

  12. Intu: Embodiment Setup*: Jetson TX2 Hardware: KB, Monitor, Webcam, Mic Set up IBM Edge (see previous WIoTP steps) Register Edge Device In IBM Watson Cloud: Setup Watson Services: Conversation Conversation Setup: Intu Dialog Starter Copy Watson creds to local config folder on Jetson “Self” Embodiment on Jetson TX2 Out-of-the box integration w/ Watson services Custom “Edge” plugin w/ basic Agent (DL Agent) Workload group: Intu “Self” 1010 1010 0101 0101 1010 1010 AI Face Aural2: command words Aural2 Watson-Intu Classification Face-classification: emotion /dev/video5 /dev/video6 /Mic /dev/video1 Intu Self on Jetson TX2 (viewed from Mac) * External repo for Self setup on NVIDIA Jetson TX2: https://github.com/chrod/self-jetsontx2/wiki/Getting-Started

  13. Aural v2 NVIDIA GTC 2018 Session: 81037: Doing what the User Wants, Before They Finish Speaking Sound state classifier for human speech Long short-term memory (LSTM) model Training: Upload ten-second audio clips for labeling & subsequent training Written using TensorFlow compute graph, golang ~30 times/second: Model outputs the probabilities for each state of the world Multiple vocabularies: (one model trained per vocabulary) Words the user says: Recognized the word spoken “ play ” Intent (action the user wants performed): “ play music ” (as opposed to “I play ball ”) *Emotional state of the user *Person who is currently talking (user or other) Performance Model Size: 5.4MB 10% CPU usage (800MHz x86 laptop). Can run on a Pi3, but slowly… 2.5 mins training time - 3000 mini batches, 1 hr of audio (GTX 1060) i.e. Train at home Negative Latency: Aural predicts the word/intent prior to the end of the word Privacy All data stays at the edge (inference and training on prem w/ consumer-grade equip)

  14. Face-Emotion Classification (repo) Face Classification Workload: Keras 2.x Convolutional Neural Network Tensorflow 1.3; Computer Vision: OpenCV 3.1 Programming languages: Python 3, OpenCV native bindings Face-Classification from video file, Demos: Face emotion classification using a video, and webcam publishing to Intu Topic Tie-in with Intu: Service: Publishes messages to blackboard topic Emotion (“Angry”, 1) - W.I.P: Agent inspects topic msgs, persists user emotional state Deep Learning Models: Adaptive Boosting frontal face detector by Rainer Lienhart (OpenCV) Keras Xception Model with 7 classes (angry, disgust, fear, happy, sad, surprise, neutral) Training: Model trained with the Kaggle Facial Expression Recognition Challenge dataset (FER2013). Training time: 2 x NVidia GRID K2 GPU: 6-20 hours depending on training parameters settings GRID K2: 2x2.2 GFlops (FP32)…. GTX1080: 1x9.0 TFLOPS (1080Ti: 1x11.3 TFLOPS) Github repo for open source face classification build: https://github.com/open-horizon/cogwerx-face-classification-tx2/ Face classification codebase with Intu publish: https://github.com/ig0r/face_classification/tree/intu

  15. Intu: Embodiment Setup*: Jetson TX2 Hardware: KB, Monitor, Webcam, Mic Set up IBM Edge (see previous WIoTP steps) Register Edge Device In IBM Watson Cloud: Setup Watson Services: Conversation Conversation Setup: Intu Dialog Starter Copy Watson creds to local config folder on Jetson “Self” Embodiment on Jetson TX2 Out-of-the box integration w/ Watson services Custom “Edge” plugin w/ basic Agent (DL Agent) Video: See 2018 NVIDIA/IBM Edge Webinar Workload group: Intu “Self” 1010 1010 0101 0101 1010 1010 Aural2: command words AI Face Aural2 Watson-Intu Classification Face-classification: emotion /dev/video5 /dev/video6 /Mic /dev/video1 Intu Self on Jetson TX2 (viewed from Mac) * External repo for Self setup on NVIDIA Jetson TX2: https://github.com/chrod/self-jetsontx2/wiki/Getting-Started

Recommend


More recommend