IT 2 EC 2020 Digital Twins to Computer Vision Presentation Digital twins to computer vision: A rapid path to augmented reality object detection on the battlefield C. Wythe 1 , N. Fisher 2 , S. Bobrov 3 , B. Russell 4 , J. Alessi 5 , C. Gallagher 6 and J. Throckmorton 7 1 Chief Revenue Officer / Principle Investigator, Cape Henry Associates, Virginia Beach, USA 2 Systems Engineer, Cape Henry Associates, Virginia Beach, USA 3 Principle Architect, Independent, San Francisco, USA 4 Software Engineer, Cape Henry Associates, Virginia Beach, USA 5 Technologist, Independent, Virginia Beach, USA 6 Sr. Multimedia Developer, Cape Henry Associates, Virginia Beach, USA 7 Technolgy Solutions Architect, KOVA Global, Virginia Beach, USA Abstract — Acquiring real life training data for the purposes of object identification and training on the battlefield is both costly and a time-consuming task requiring human intervention. We lay the hypothetical foundation of rapidly developing Artificial Intelligence (AI) object recognition models based solely on available 3-Dimensional (3D) models to provide rapid and accurate battlefield object detection. 1 Introduction degree of specificity and discretion between similar object classes? Several studies have shown the efficacy of leveraging 3D models for the purposes of object identification in AI 3 Approach model training when utilized in conjunction with real life imagery. We expand upon these studies and venture into We would select high-fidelity 3D models of multiple the possibilities of solely leveraging 3D models and battlefield vehicles found in online gaming object synthetic images to train AI models for object recognition repositories and generate several thousand synthetic on the battlefield. Successful object detection and images to serve as our training data set for our AI models. classification using AI algorithms is highly dependent on We would automate image generation and labelling availability of training data. (e.g. labelled images) through the utilization of custom scripts within the 3D Although large repositories of labelled images exist and environment to reduce human interaction, performance continue to be generated for research purposes, most errors and cost. Upon completion of AI model training, we labelling is generalized to object types and not to the level would test and document algorithm performance against of specificity that would produce useful object detection video footage of battlefield operations. algorithms for battlefield applications. Real-world training imagery is scarce, and labelling is a time-intensive human- 3.1 Training Image Generation in-the-loop event. In practice, thousands of images of two similar, but distinct items of interest (e.g. M1A1 Abrams We first developed a 3D scene in Unity in which to place Tank vs. Panther Tank) are required to efficiently train an our target 3D objects. We then scripted an animation to AI model to high level of confidence. We explore the modify environmental elements of the scene such as practicality of the utilization of existing high-fidelity weather conditions and lighting. Computer code was gaming objects and future digital twins for rapid, developed to automate the export of synthetic imagery, automated generation of high-volume AI model training correlating labels and XML notations of object data for expedited deployment of AI powered applications coordinates. The AI algorithms were trained with 10,000 to improve battlefield situational awareness. synthesized images from each vehicle 3D model. The synthesized images of the target objects inherited object 1.2 Problem Statement boundary boxes, XML annotations and class labels automatically via programmed code during the image Sufficient quantities of military specific labelled images generation process. The image capture process produced a for AI algorithm training do not currently exist, are variety of backgrounds, camera angles, weather difficult to obtain, and are time consuming to label. Some conditions, light conditions (obscuration), and questions arise. Can high fidelity 3D rendered objects obstructions (occlusion). which exist in large quantity for virtual training environments be utilized for the automated development of AI training data? Can they be successful to train AI 3.2 AI Model Training Process models to identify different types of objects with a high
IT 2 EC 2020 Presentation Digital Twins to Computer Vision [2] Peng, Xingchao, et al. "Learning deep object detectors from 3d models." Proceedings of the IEEE International Conference on Computer Vision. 2015. [3] Xiang, Yu, et al. "Objectnet3d: A large scale database for 3d object recognition." European Conference on Computer Vision. Springer, Cham, 2016. Author/Speaker Biographies Chuck Wythe Fig. 1. AI Pipeline Chuck is the Chief Revenue Officer for Cape Henry Associates and actively leads teams as a principle investigator on R&D efforts. Chuck has extensive For our AI pipeline, we utilized the NVIDIA NGC technical experience in the fields of manpower analysis, Transfer Learning Toolkit (TLT) AI models and processed training content development, artificial intelligence, them on NVIDIA DGX on-premise hardware. machine learning, augmented reality, training devices, and simulation. 3.3 Experimentation Hardware Utilized Nosika Fisher Model Training: NVIDIA DGX-1 Deep Learning Server, Nosika’s primary background is in SaaS systems with eight Tesla V100 GPU(s). integrations and data science. Currently she is a Systems Synthetic Image Production: Custom workstation with Engineer at Cape Henry Associates, where she leads NVIDIA GTX GPU. artificial intelligence and machine learning projects, including the productization of FogLifter TM , a stand-alone 4 Future Work mobile artificial intelligence framework for high-volume machine learning computations. Further experimentation is needed to add additional object classes and continue to study efficiency and confidence Sergey Bobrov levels obtained utilizing additional neural network types. Sergey’s main focus areas of work are machine learning, Planned experimentation includes testing the limits of data ingestion platforms and building infrastructure and discrete variance identification in objects like those from models for rapid data analysis. Sergey previously worked additive crew served weapons. We will also work toward on IoT Platform Xively (acquired by Google), architecting successful deployment of developed object detection and building REST API back-ends for identity algorithms on devices for practical application. (E.g. Live management, authorization of publish/subscribe video feeds, aerial drone footage, Microsoft HoloLens, and messages, and IoT domain description and management. mobile phones.) Brandon Russell 5 Conclusions Brandon is a software engineer at Cape Henry Associates, who is heavily involved in the field of artificial At the time of abstract submission, experiments are intelligence. His main areas of work involve building end- ongoing. Preliminary results are showing promise on the to-end pipelines leveraging the power of machine- viability of this approach. learning/deep-learning to provide intelligent insights on data. Acknowledgements Jeremy Alessi The authors would like to thank the NVIDIA DGX team Jeremy is a technologist with 25 years of experience for their support during this effort. architecting full-stack software solutions in many fields including gaming, simulation, AR, VR, AI, mobile, References streaming, fin-tech, med-tech, blockchain, and transportation. Jeremy has written software that has been [1] Su, Hao, et al. "Render for cnn: Viewpoint estimation used by 10’s of millions of end-users and has written and in images using cnns trained with rendered 3d model spoken extensively on various subjects in the field of views." Proceedings of the IEEE International software technology. Conference on Computer Vision. 2015. Chris Gallagher
IT 2 EC 2020 Presentation Digital Twins to Computer Vision The last 12 years has had Chris working in the field of high-end computer-generated imagery. Recently, Chris has been developing augmented, virtual, and mixed reality using Unity’s core platform and their High Dynamic Rendering Pipeline. Joel Throckmorton Joel is a Technology Solutions Architect for KOVA Global and has worked in the defense industry for over 16 years. His most recent work has been in the development of the Lighthouse ™ and FogLifter ™ platforms, technology stack integrations and AI platform implementations for the US Navy and Defense Intelligence Agency.
Recommend
More recommend