A A 3D 3D-Ad Adver ert t Cr Crea eati tion System em for Product ct Place cements ADAPT SFI Research Centre, Trinity College Dublin, Ireland
Introduction • The common contextual advertising platforms utilize the information provided by users to integrate 2D visual ads into videos. • The existing platforms face many technical challenges such as ad integration with respect to occluding objects and 3D ad placement. • The growing video demand and the increase of user generated videos creates additional challenges for advertisement and marketing agencies. 1
Contribution • A 3D-advertisement creation system that can automatically analyze different depth layers in a video sequence and seamlessly integrate new 3D objects with proper occlusion handling. 2
Advert’s Workflow Input Video Interactive Rough Segmentation Mask Mono Depth Estimation Rough Segmentation Background Plate Mask Propagation Reconstruction Plane Localization Foreground Matting Plane & Camera Tracking Foreground Colours 3D Model Render object/advert layer occluding foreground layer background layer Composite Output 3
Monocular Depth Estimation * Aim: The monocular depth estimation is used to understand the 3D geometry of the scene and anchor the 3D plane on which the object will be placed. * Hu, J., Ozay, M., Zhang, Y. and Okatani, T., 2019, January. Revisiting single image depth estimation: Toward higher resolution maps with accurate object 4 boundaries. In 2019 IEEE Winter Conference on Applications of Computer Vision (WACV) (pp. 1043-1051).
Camera Tracking 5
Interactive Segmentation * Aim: Allow users to decide which part of the scene is causing the occlusion, Broader control over tracking the occluding object across the entire video. * Oh, S.W., Lee, J.Y., Xu, N. and Kim, S.J., 2019. Fast user-guided video object segmentation by interaction-and-propagation networks. 6 In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (pp. 5247-5256)
Background Reconstruction * Aim: Background reconstruction is used to recover the foreground layer and to produce the final composite image. * Kim, D., Woo, S., Lee, J.Y. and Kweon, I.S., 2019. Deep Video Inpainting. 7 In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (pp. 5792-5801)
Foreground Matting Aim: Reconstruct the transparency mask of the foreground occlusion layer. This is used to seamlessly recomposite fine details such as hairs and effects such as motion blur. Frame Rough Mask Detailed Mask Background 8
Foreground Matting • Introducing a Background-Aware Generative Adversarial Network to estimate alpha channels. • Unlike the conventional methods, this architecture is designed to accept a 7-channel volume, where the first 3 channels contain the RGB image, the second 3 channels contain the RGB background information and the last channel contains the trimap. • The preliminary experiments using the trained model indicates a significant improvement in the accuracy of the alpha mattes compared to the state of the art. * For more info please refer to: Javidnia, H. and Pitié, F., 2020. Background Matting. arXiv preprint arXiv:2002.04433. 9
Foreground Matting Evaluation Test Set: 40 images from our synthetic and Adobe Matting datasets MSE SAD GRADIENT CONNECTIVITY Our Model 0.027169554 7.30600487 10.572783 6.52185248 Deep Matting [1] 0.045339207 16.3519901 22.7727221 16.4433491 DCNN [4] 0.078826554 19.6619429 28.9060183 20.469791 IndexNet [3] 0.050014134 14.0422988 20.0859051 13.2744296 InformationFlow [2] 0.060489726 16.9163196 23.1831811 17.0637133 ClosedForm [5] 0.068729347 17.7465063 28.2863385 18.1190624 References N. Xu, B. Price, S. Cohen, and T. Huang, “Deep image matting,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2017, pp. 1. 2970–2979. Y. Aksoy, T. Ozan Aydin, and M. Pollefeys, “Designing effective inter-pixel information flow for natural image matting,” in Proceedings of the IEEE Conference on 2. Computer Vision and Pattern Recognition, 2017, pp. 29–37. H. Lu, Y. Dai, C. Shen, and S. Xu, “Indices Matter: Learning to Index for Deep Image Matting,” arXiv Prepr. arXiv1908.00672, 2019. 3. D. Cho, Y.-W. Tai, and I. Kweon, “Natural image matting using deep convolutional neural networks,” in European Conference on Computer Vision, 2016, pp. 626– 4. 643. 5. A. Levin, D. Lischinski, and Y. Weiss, “A closed-form solution to natural image matting,” IEEE Trans. Pattern Anal. Mach. Intell., vol. 30, no. 2, pp. 228–242, 2007. 10
Foreground Matting Evaluation [2] [3] [4] [1] [ours] 11
Demo System User Interface Occlusion Service Aurelia.js WebGL Three.js Back end Front-end Tracking Service Python, JS, HTML, CSS Flask Misc D3.js, Bootstrap, Font awesome, Mousetrap, Popper.js, Shepard.js, User Whammy.js, ... Service 12
UI A B C 13
UI – Camera Tracking 14
Processing Speed Time per frame Processed Frames Component 155ms one frame Depth Estimation 500-4000ms all frames Camera Tracking 10ms selected frames Interactive Segmentation Edit 23ms occluded frames Interactive Segmentation Propagation ~1000ms occluded frames Foreground/Background Layers (total time including IO, networking,...) > Background Reconstruction 70ms occluded frames > Foreground Matting (new) 50ms occluded frames > Foreground Colours 50ms occluded frames Final Compositing IO bounded all frames 15
Result 16
Result 17
Thank You! www.adaptcentre.ie 18
Recommend
More recommend