In-Home Daily-Life Captioning Using Radio Signals Lijie Fan* Tianhong Li* Yuan Yuan Dina Katabi MIT CSAIL * denotes equal contribution
How can I make sure grandma is fine?
How can I make sure grandma is fine? Daily Life Captioning 08:30am: Grandma wakes up and leaves bedroom 10:30am: Grandma takes medicine and eats breakfast 02:00pm: Grandma is watching TV
Camera is not acceptable Camera
How to do Daily Life Captioning?
What about Radio-Frequency(RF) Signals? RF Device
RF signals are privacy- preserving β¦ RGB Video RF Signals
but are capable of capturing peopleβs movements and activities RGB Video RF Signals
Challenge I. Object Information
Challenge I. Object Information
Challenge I. Object Information
Solution I. Skeleton + Floormap Skeleton Generation Network RF Signal Skeleton
Solution I. Skeleton + Floormap Shelf Y RF Device X Table Wardrobe Sink Stove Sofa Dish Washer TV Bed Fridge Window Floormap Illustration
Challenge II. No Existing RF Captioning Dataset! Can We Leverage Existing RGB Captioning Dataset?
Solution II. Multi-modal Feature Alignment RF+Floormap Feature Extraction Feature + Extraction π― π Network Floormap RF Signal
Solution II. Multi-modal Feature Alignment RF+Floormap Feature Extraction Feature + Extraction π― π Network Floormap RF Signal Video Feature Extraction Video Spa Spati tial Encoder πΈππππππ π π° π π π° π Paired Video π π
Solution II. Multi-modal Feature Alignment RF+Floormap Feature Extraction Feature + Extraction π― π Network Paired Data Floormap RF Signal Alignment π 2 Loss Video Feature Extraction β ππππ Video Spa Spati tial Encoder πΈππππππ π π° π π π° π Paired Video π π
Solution II. Multi-modal Feature Alignment RF+Floormap Feature Extraction Feature + Extraction π― π Network Paired Data Floormap RF Signal Alignment π 2 Loss Video Feature Extraction β ππππ Video Spati Spa tial Encoder πΈππππππ π π° π π π° π Paired Video π π Video Spa Spati tial Encoder πΈππππππ π π° π π π° π Unpaired Video π π
Solution II. Multi-modal Feature Alignment RF+Floormap Feature Extraction Feature + Extraction π― π Network Paired Data Floormap RF Signal Alignment π 2 Loss Video Feature Extraction β ππππ Video Spa Spati tial Encoder πΈππππππ π π° π π π° π Unpaired Data Paired Video π π Alignment Loss πΈ π πΈ π β π£πππππ Video Spa Spati tial Encoder πΈππππππ π π° π π π° π Unpaired Video π π
RF-Diary System Structure
RF- Diary can caption peopleβs daily life in home β¦ RF Signals RGB Video A person enters the kitchen. He takes off his clothes, sits at table and starts playing laptop. Floormap RF-Caption
Even when the light is off β¦ Not Applicable RF Signals RGB Video A person walks to the kitchen. He then pours water into a cup and drinks from it. Floormap RF-Caption
Quantitative Results
Summary β’ RF-Diary enables captioning peopleβs daily life in their home. β’ RF-Diary uses radio signals as input to address the privacy issues of camera. β’ RF-Diary achieves comparable results of camera-based captioning and keeps working under poor lighting or occluded scenarios.
For more information, please visit our webpage: http://rf-diary.csail.mit.edu
Recommend
More recommend