The BaBL project Real-Time Closed-Captioning for WebRTC Luis Villaseñor Muñoz lvillase@hawk.iit.edu 30 th April 2014 1
BaBL, version 1.0: Project Goal To develop a proof of concept WebRTC conference application that is able to use the WebRTC's data channel for transmitting real-time captioning. 2
BaBL, version 1.0: Final result 3
BaBL, version 2.0: Project Goal To develop a WebRTC multiconference application with some extra features based on speech recognition as real- time closed-captioning, instant translation or transcription storage. 4
BaBL, version 2.0: Milestones • Multiconference WebRTC application • Real-Time Closed-Captioning • Instant translation • Transcription storage 5
Multiconference WebRTC application • WebRTC, what is it? WebRTC is a free, open project that enables web browsers with Real- Time Communications capabilities. • Its goal: To enable rich, high quality, RTC applications to be developed in the browser via simple Javascript APIs and HTML5. [1] As stated in WebRTC.org. 6
WebRTC APIs • MediaStream: For acquiring audio and video. • RTCPeerConnection: For transmitting audio and video. • RTCDataChannel: For transmitting data. 7
MediaStream • navigator.getUserMedia(constraints, successCallback, errorCallback); [2] Figure by Justin Uberti and Sam Dutton. 8
RTCPeerConnection • Signaling: Session description, ICE, STUN, TURN… • Media engines: Codecs, echo cancelation, noise reduction, jitter buffering… • Security: HTTPS, SRTP, DTLS… 9
WebRTC architecture [1] Figure from WebRTC.org. 10
Signaling server • NodeJS: Web server and signaling server. Fully implemented using Javascript. • Socket.io: NodeJS module that enables websockets between clients and server. 11
Calling: The establishment User A User B Server Download webpage (HTTP/HTTPS) Create room (websocket) Download webpage (HTTP/HTTPS) getUserMedia Join room (websocket) New user joined (websocket) getUserMedia PeerConnection PeerConnection createOffer Offer (websocket) Offer (websocket) createAnswer Answer (websocket) Answer (websocket) ICE candidates (websocket) ICE candidates (websocket) Media streams (SRTP) 12
Calling: The mesh Room A Room B 13
Calling: ICE/STUN/TURN • Interactive Connectivity Establishment (ICE): RFC 5245. Candidates for IP address. • Session Traversal Utilities for NAT (STUN): Request and response. • Traversal Using Relays around NAT (TURN): STUN extension. Relay. Useful but resource-intensive. 14
Multiconference accomplished! 15
Real-time closed-captioning • Web speech API: SpeechRecognition interface: For converting the voice into text. • WebRTC data channel: For sending the text to the other peers. 16
Web Speech API • Another HTML5 API: Specification by W3C. • Only implemented on Chrome: The voice is sent to Google ’ s speech recognition web service. A JSON object with a list of possible matches is returned. They use it for voice searches: https://www.google.com/ 17
RTCDataChannel • Bidirectional peer to peer: Really low latency. • Secure: Datagram Transport Layer Security. • Unreliable or reliable: Latency or accuracy. 18
Challenges • Subtitles should be switched on/off by the remote user We send the remote user ’ s requests using the signaling server. • Continuous recognition We keep a list of user requesting subtitles. • Microphone permission We use HTTPS. 19
Architecture Signaling server 2. Subtitles request 1. Subtitles request 5. Subtitles User A User B 3. Voice 4. Subtitles Google server 20
Real-time closed-captioning accomplished! 21
Transcription storage • Keeping record of our conversations: Text is much lighter than audio or video. And easier to find! • Indexed DB: One more HTML5 API. Local storage in the client side. 22
Transcription storage accomplished! 23
Instant translation • Translation services online: They are not free. • Microsoft Translator API: Free 2 millions characters/month. 24
Challenges • Should go through the server My private developer key can ’ t be in the client side. • When to request the translation? isFinal flag. Not so real-time. But much cheaper! 25
Architecture Microsoft Translator Server 7. Translated subtitles 6. Subtitles Signaling Server 8. Translated subtitles 5. Subtitles 2. Subtitles request 1. Subtitles request Google Server User A 4. Subtitles User B 3. Voice 26
“ Real-time ” translation accomplished! 27
Wait! Last minute add- on! 28
Spoken translated subtitles • Speech Synthesis API: The other interface included in the Web Speech API. Chrome has some built-in speech engines. 29
Spoken translated subtitles accomplished! 30
Conclusion • Not perfect: Programmed by just one person. Using free resources. These technologies are still under development. • A little more time, a little more resources: And Sci-Fi won ’ t be Sci-Fi anymore! 31
References • [1] Google Chrome team. WebRTC.org. http://www.webrtc.org/ [Online; accessed 30-April-2014] • [2] Justin Uberti and Sam Dutton. WebRTC. http://io13webrtc.appspot.com/ [Online; accessed 30-April-2014] • [3] J. Rosenberg. Interactive Connectivity Establishment (ICE). https://tools.ietf.org/html/rfc5245 [Online; accessed 30-April-2014] 32
Questions? 33
Acknowledgements • Don Monte and Nishant Agrawal • Elias Yousef • Javier Monte Condeoliva and Miguel Camacho Ruiz • Tania Arenas de la Rubia • Carol Davids 34
Thank you 35
Recommend
More recommend