Age At Home Deep Learning & IoT in Elder Care http://age-at-home.mybluemix.net/ http://ageathome.slack.com/ David C Martin (dcmartin@us.ibm.com) Dima Rekesh (dima.rekesh@optum.com)
Internet-of-Things growing fast Estimated 30 billion to 200+ billion connected devices by 2020
Unspoken challenge 2016: 10B+ 2020: 100B+ 10M+ 10M+ 1: 1,000 Additional staffing required at current ratio +100M 1: 10,000
Criteria for success 1. Autonomous transacting agents 2. Decentralized, governed economy 3. Distributed, untrusted environment … autonomous software agents outside of human control will participate in 5% of all economic transactions . The result of the prevalence of metacoin platforms will be the emergence of a fully programmable economy operating beyond the control of any single centralized institution or government. It is the metacoin platform that enables automatic enforcement of conditions in a fully distributed and untrusted environment . – Gartner
BlueHorizon Private, secure, opt-in, auditable value-exchange 1. Tax-free economy 2. Open, standards-based community 3. Technology independence 4. Supply-chain transparency http://bluehorizon.network
Market opportunity Japan • Aging population (65+) Italy Germany • 44MM US Percentage of Population 65 years and older • 65MM India Ireland China Brazil • 119MM China US • Need for assistance World • 70+ > 10% India Egypt • 75+ > 20% • 80+ > 30% • IBM addressable market 2015 • $10BB by 2025 • http://www-03.ibm.com/able/ https://www.census.gov/population/socdemo/statbriefs/agebrief.html http://www-03.ibm.com/able/news/bolzano_video.html
Minimum viable product (MVP) Utilize image analysis to recognize entities (e.g. person) and scenes (e.g. Kitchen) to trigger action (e.g. Gma and Gpa didn’t get up). Determine Trigger Action Analyze Scene normal vs abnormal (e.g. no person in kitchen by 10 AM)
Components • Jetson TX2 (1) • Raspberry Pi3 (3) • Playstation3 Eye Camera • 75° or 56° FOV • Multi-array microphone • Speaker • WiFi • LINUX, Docker, motion package • DIGITS v5, Caffe v1, JetPack v3 AND IBM WATSON CLOUD
System Overview Motion DIGITS Watson Analytics Watson VR PlayStation 3 Jetson TX2 RaspberryPi 3 Bluemix USB Camera DIGITS Cloudant K80 (2)
System Details • Raspberry Pi3 & Playstation3 Eye • LINUX, Docker, motion package • Image resolution: 640x480 (8 bit color) • Field-of-view: 75° (interior) or 56° (exterior) • IBM Watson Cognitive services • Alchemy image recognition (deprecated) • VisualInsights (retired) • Watson Visual Recognition (current) • Device installations • Locations: kitchen, bathroom, road • DB: rough-fog, damp-cloud, quiet-water Event identifier ( YYMMDDHHMMSS-##-## ) • Date & time components (Y,M,D,H,M,S) • ALCHEMY: top result {entity,model,score} • VISUAL: image ID and results array [{entity,model,score},…] •
Process overview Peer 1 Block- Peer Governor Escrow chain 2 Peer 3 Recorded on Transacting Peers (full & Governed by distributed, shared lite) Smart Contracts ledger
Public Process Owner CLIENT / CLOUD IOTF / MQTT details 3 2 1. Developer create app: B I. Client (RPi) container II. Cloud (X86) container III. BlueMix Container service IV. BlueMix CF app (e.g. Node.js) A V. Register in AppStore 2. Public become Owner 5 C I. Visit public application II. Acquire Raspberry Pi (DIY) III. Select “app” from Store 3. OWN offer Client run-time BlueHorizon 4. DEV accept (v$ /hr) 5. OWN deploy container A. DEV offer Cloud run-time B. OWN accept (v$ /hr) 4 1 C. DEV deploy container Developer BUILD
Jupyter Notebook Exploration By Peter Parente (pparent@us.ibm.com) https://gist.github.com/parente/7db992fae487d6e665e7b7dca841ffa2
Watson Analytics Exploration 1. Replication from Cloudant into dashDB 2. SQL select, project, join in dashDB 3. Import into Watson Analytics 4. Refine & visualize BUILD
Signal Quality • Images classified generically • Rooms, colors, .. • Many false-negatives (no entity detected) • Uncontrolled vocabulary and taxnonomy • Need training exemplars • Collect in-situ images • Build taxonomy, ontology • Train recognition algorithm • Attenuate iteratively • Curate training data • Improve signal:noise • Increase TOP1 accuracy
All vendors fail out-of-the-box Microsoft Amazon Google
Lesson 1 : you must learn The default (OOTB) model is always insufficient • Too noisy and sparse for traditional data cleansing • Prior examples not specific to application context • Information content critically dependent on signal quality Avoid a priori limitations or filters on sensor input • Empirical data must be collected, curated and learned • Define positive and negative conditions or outcomes • Manual “human-in-the-loop” curation into positive and negative • Training, testing, and refinement of learnings (e.g. classifier) • Deployment and QA/QC metric monitoring
Generic learning cycle
Ingestion and preparation • Asynchronous ingestion Periodic “changes” from Cloudant • Collect (FTP) from local RPi devices • • Collate into hierarchy Store inventory (Cloudant) • • Dynamic presentation Display images from inventory • • Time limit (YYYYMMDDHHMMSS) Count (1-20) • Class (e.g. all, person) • • Classification actions • Negative class in red (e.g. kitchen) Positive class in green (e.g. person) • Created class in blue (e.g. dog, cat) • SKIP – more than one class • • Update inventory & repeat Remove bad examples • Squash vs. Crop •
Curate – Model – Analyze : Repeat Curate Model Analyze Review and label samples to train Watson, e.g. dog, cat, household members
Summary Activity Analysis
Building DL models with DIGITS General built in cloud (2xK80) Specific built on-premise (TX2) • GoogLeNet • Base model from cloud • ImageNet • End-user selected images • 1MM images • 1033 images (776/ 257) • 1000 classes • 11 classes • 224x224 images • 224x224 samples • 44 hours (4 GPU) • 18 minutes, 4 s (1 GPU) • 73% Top1; 91% Top5 • 90.1% Top1; 99.6% Top5
Challenges - Jetsons • Versioning hell (what version of OpenCV? What version of Caffe?) • Performance. Full HD and 4K frame processing, anyone? • Decoding of incoming streams. Can’t on the CPU • Latency of video streams • Generation of training sets / effort required from human trainers
Dockerizing Jetson TX1 / TX2 • Docker is the way applications get distributed today • Developer productivity increase is immense • Nvidia-docker is not yet supported on the Jetson, so we had to do a little bit of reverse engineering… • Install docker • Get the cuda / cudnn/ drivers right • A USB3 HD or direct attached SSD is a must to do dev
Focus on moving objects • Usually , moving objects are the most interesting • When you camera is stationary, you can use traditional CV to compute changed pixels with ease • Must consider both long term and short term motion (front of cat vs behind cat) • Now you can draw bounding boxes around these moving objects and send them to a classic image classification network • Total number of pixels processed is greatly reduced • Image is downscaled for motion detection • Image Classification networks are fast, lean, very accurate and support thousands of classes
Benefits of working with moving objects • You get the bounding boxes – and the pixels free • Now you can train a variety of networks: yolo, detectnet.. FCNs • The network will guess the label, human can correct only if needed • Once the network becomes good at recognizing objects, you don’t need to ask the human to classify them anymore, just focus on more granular classes • The trained networks can now be placed on moving cameras and learn to recognize immobile objects in frames
Future work • LSTMs for more accurate motion detection • FCNs for more accurate labeling • Multi-labeled , crowdsourcing tags • Integration with time series data • Performance optimizations…
DIGITS Caffe training GENERAL 73% Top1; 91% Top5 SPECIFIC 90.1% Top1; 99.6% Top5
Scenarios David, did you take your medications? Breakfast! How David, did you just about a yogurt? wake-up? Take your low- dose aspirin; small, round and yellow in the yellow and green Bayer bottle. Time to check your blood sugar! You should eat now that you’ve taken your medications!
Lesson 2 : context defines success • Curation effects outcome • Variability in human detection, e.g. missed “cat” • Defined, discovered and previously unknown (e.g.“dog” or David) • Optimize objective functions “top-down” • Focus curation on utility and expectation (e.g. notification satisfaction) • Cascade positives and negatives to subordinate domain (e.g. image capture interval ) • Enable dynamic subsystem controls (e.g. sensor trigger) • Any definition should be fact, not opinion, estimate, or historical statistic • Consider all non-defined parameters as learning inputs or outputs
Many layers of learning detect locate identify describe car horse understand dog � ü car ü horse bike � cat � bottle � person ü And more …
Recommend
More recommend