predictive view generation to enable mobile 360 degree
play

Predictive View Generation to Enable Mobile 360-degree and VR - PowerPoint PPT Presentation

Predictive View Generation to Enable Mobile 360-degree and VR Experiences Xueshi Hou, Sujit Dey Jianzhong Zhang, Madhukar Budagavi Mobile Systems Design Lab, Samsung Research America Center for Wireless Communications, UC San Diego


  1. Predictive View Generation to Enable Mobile 360-degree and VR Experiences Xueshi Hou, Sujit Dey Jianzhong Zhang, Madhukar Budagavi Mobile Systems Design Lab, Samsung Research America Center for Wireless Communications, UC San Diego

  2. Motivation: Towards a Truly Mobile VR Experience Goal: Enable wireless and light VR experience § Observation: Existing head-mounted displays (HMDs) have limitations § How to make it mobile and portable: wireless and lighter? Rendering on mobile device Rendering with tethered PC attached to HMD Not mobile Clunky to wear Solution: Shifting computing tasks (e.g. rendering) to the edge/cloud, and § streaming videos to the HMD Example for Cloud-based Solution : 1. Transmit Head Motion and Control to Cloud 2. Field-of-View Rendering on Cloud 3. Transmit Rendered Video to VR Glass Streaming only Field of View (FOV) 2

  3. Challenges of Cloud/Edge-based Wireless VR § VRhead-mounteddevicesmaketherequirementsmuchsteeperthancloud/edge-basedvideostreaming - Userexperiencemuchmoresensitivetovideoartifacts à Significantlyhighervideoqualityneeded - Head motion significantly increases latency sensitivity à Significantly higher frame rate and bitrate needed Experimentsetup: - VR space created using Unity; VR HMD: Oculus Rift DK2; Video: H264, 1080p/4K,GOP=30 Bitrate Bitrate (Mbps) Display Head Framerate Acceptable (Mbps) (Racing Game) Device Motion & QP Latency (Virtual Classroom) Virtual Classroom 1080p 4K 1080p 4K PC - 45fps, 100-200ms(for VC) 5.8 14.5 16.6 41.5 Monitor <100ms(for Game ) QP=20 Oculus - 45fps, 10.9 27.3 33.9 84.8 28ms QP=15 Oculus 75fps, 28.2 70.5 39.7 99.3 22ms QP=15 Note: For Virtual Classroom with 50 students, bitrate needed for 4k > 3.5 Gbps; Racing Game For head motion, cloud/edge-based wireless VR will require very high frame rate and bit rate, and also needs to satisfy ultra-low latency! 3

  4. Solution for Ultra-low Latency: Machine Learning Based Predictive Pre-Rendering Possible Method 1 : Render 360-degree video on cloud, transmit to RAN § edge, and FOV extraction at edge depending on head motion Advantage: low computation overhead on edge device § Problem: Very high (backhaul) data rate § Possible Method 2: Render 360-degree video on edge device and FOV § extraction depending on head motion Advantage: theoretically low (backhaul) data rate § Problem: Restricted to edge device with very high computation; § (FOV Extraction) 4

  5. Solution for Ultra-low Latency: Machine Learning Based Predictive Pre-Rendering § Solution: Based on head motion prediction, pre-render and stream predicted FOV in advance from edge device § Advantages: § Latency: No rendering/encoding delay, minimal communication delay with Cellular Control Data Video Connection significantly reduced bandwidth Glasses § Edge can be RAN or local; can be mobile device Cloud Server MEC System overview for proposed approach: Controller § (Predictive FOV Generation) (a) Cellular WiFi/Millimeter Data Control Video Data Control Video Connection Wave Glasses Glasses LEC Cloud Server MEC Cloud Server (Predictive FOV (Predictive (Predictive Controller (Predictive FOV Controller Generation) FOV Generation) FOV (b) Generation) (a) Generation) WiFi/Millimeter (b) Using Local Edge Computing node (LEC) (a) Using Mobile Edge Computing node (MEC) Data Control Video Wave Glasses Question: Is it possible to predict Head Motion? LEC 5 Cloud Server (Predictive FOV Controller Generation) (b)

  6. Predictive View Generation to Enable Mobile 360-degree and VR Experiences: Early experiments with Samsung Dataset // // y 90 ° x ~90 ° 180 ° -180 ° ~90 ° FOV -90 ° Projection Euler Coordinates FOV in a 360-degree view Motivation: address both bandwidth & latency challenges § Cellular Control Data Video Connection Common approach to reduce bandwidth: streaming only FOV à still cannot § Glasses address latency problem Cloud Server MEC System overview for proposed approach: § Controller (Predictive FOV Generation) (a) Cellular WiFi/Millimeter Data Control Video Data Control Video Connection Wave Glasses Glasses LEC Cloud Server MEC Cloud Server (Predictive FOV (Predictive (Predictive Controller (Predictive FOV Controller Generation) FOV Generation) FOV (b) Generation) (a) Generation) WiFi/Millimeter (a) Using Mobile Edge Computing node (MEC) (b) Using Local Edge Computing node (LEC) Data Control Video Wave 6 Glasses LEC Cloud Server (Predictive FOV Controller Generation) (b)

  7. § Idea: predictive view generation approach – § only predicted view is extracted (for 360-degree video) or rendered (in case of VR) and transmitted in advance (viewpoint refers to the center of FOV) y 90° x -180° 180° ~90° Tile (30°x30°) Viewpoint FOV ~90° -90° 7

  8. Predictive View Generation to Enable Mobile 360-degree and VR Experiences: Early experiments with Samsung Dataset Setup: Samsung Gear VR, sampling frequency f=5Hz § Dataset: head motion traces from over 36,000 viewers for 19 360-degree/VR § videos during 7 days Tiles options: 12x6 tiles (30°x30°), 18x6 tiles (20°x30°), etc. § VR dataset statistics 1 1 § Over 80% of videos have >100s for 0.8 0.8 0.6 0.6 CDF duration CDF 0.4 0.4 § Around 85% of videos have >1000 0.2 0.2 viewers 0 0 100 200 300 400 1000 2000 3000 Video Duration (s) # Viewers Head Motion Speed (°/s) Max 150 This boxplot shows head motion speed 75 th Percentile distribution for over 1500 viewers during 100 60s; it presents the challenging situation of Median predicting head motion since viewers may 25 th Percentile 50 change viewing direction fast as well as Min 0 frequently 0 10 20 30 40 50 60 Time (s) Head motion speed versus time in Kong VR 8

  9. Predictive View Generation to Enable Mobile 360-degree and VR Experiences § Attention heatmap is defined as a series of probability that a viewpoint is within a tile for n viewers during time-period from cts 1 to cts 2 Example of attention heatmap Brighter tiles attract more attention and viewers are more likely to look at § these areas § Feasibility of performing viewpoint prediction (some areas attracting more attention than remaining areas within a 360-degree view) Multiple tiles (as high as 11 tiles) have relatively high probabilities (>5%), § indicating the difficulties of predicting viewpoint accurately 9

  10. Viewpoint Prediction using Deep Learning Goal: predict viewpoint position (tile) for 200ms in advance § Model: multi-layer long short-term memory (LSTM) networks § Input Features: tile-based one-hot encoding representation for viewpoint traces § as 72x10 matrix (72 tiles, 10 timestamps in 2s) Label for training: whether viewpoint belonging to each tile as 72x1 matrix § Output: probability of viewpoint belonging to each of the 72 tiles § Viewpoint trace during t ∈ (3,5], in seconds Where is the viewpoint when t=5s t= 5.2s (200ms t=3s afterwards)? 10

  11. Viewpoint Prediction using Deep Learning Goal: predict viewpoint position (tile) for 200ms in advance § Model: multi-layer long short-term memory (LSTM) networks § Input Features: tile-based one-hot encoding representation for viewpoint traces § as 72x10 matrix (72 tiles, 10 timestamps in 2s) Label for training: whether viewpoint belonging to each tile as 72x1 matrix § Output: probability of viewpoint belonging to each of the 72 tiles § Viewpoint traces during t ∈ (3,5) seconds <0.01 <0.01 <0.01 <0.01 0.21 0.37 0.11 <0.01 0.05 <0.01 0.03 0.04 0.10 <0.01 <0.01 <0.01 <0.01 11

  12. Viewpoint Prediction using Deep Learning Dataset: Head motion traces of 36,000 viewers § Predicted Viewpoint during 7 days for 19 360-degree/VR videos; Softmax Layer Each trace point 200ms Fully Connected Layer Training Data: 45,000 head motion sampling § … LSTM LSTM LSTM traces (each for 2s long) Unit Unit Unit Test Data: 5,000 head motion sampling traces § … LSTM LSTM LSTM (where viewers are different from training data) Unit Unit Unit … Parameters: § Viewpoint Features § first layer: 128 LSTM units; second layer: 128 LSTM units; fully connected layer: 72 nodes; § We explore four deep learning or classical machine learning models for viewpoint prediction: LSTM, Stacked sparse autoencoders (SAE), Bootstrap- aggregated decision trees (BT), and Weighted k-nearest neighbors (kNN) § SAE: two fully-connected layers with 100 and 80 nodes respectively; BT: ensembles with 30 bagged decision trees; kNN: 100 nearest neighbors 12

  13. Predictive View Generation: Accuracy and Bitrate FOV Selection: Accuracy and Bitrate § FOV generation method: § Select m tiles with highest probabilities predicted by the LSTM model § Compose the predicted FOV as the combination of FOVs for each selected tile § Transmit the predicted FOV with high quality FOV generation while leaving the rest of tiles blank FOV prediction accuracy: the probability that actual user view will be within the predicted FOV § depends on the LSTM model accuracy and FOV generation method, § and thus reflects both the performance of our LSTM model and FOV generation method 13

Recommend


More recommend