The BaBL project Real-Time Closed-Captioning for WebRTC Luis - - PowerPoint PPT Presentation

the babl project
SMART_READER_LITE
LIVE PREVIEW

The BaBL project Real-Time Closed-Captioning for WebRTC Luis - - PowerPoint PPT Presentation

The BaBL project Real-Time Closed-Captioning for WebRTC Luis Villaseor Muoz lvillase@hawk.iit.edu 30 th April 2014 1 BaBL, version 1.0: Project Goal To develop a proof of concept WebRTC conference application that is able to use the


slide-1
SLIDE 1

1

The BaBL project

Real-Time Closed-Captioning for WebRTC

Luis Villaseñor Muñoz lvillase@hawk.iit.edu 30th April 2014

slide-2
SLIDE 2

2

BaBL, version 1.0: Project Goal

To develop a proof of concept WebRTC conference application that is able to use the WebRTC's data channel for transmitting real-time captioning.

slide-3
SLIDE 3

3

BaBL, version 1.0: Final result

slide-4
SLIDE 4

4

BaBL, version 2.0: Project Goal

To develop a WebRTC multiconference application with some extra features based on speech recognition as real- time closed-captioning, instant translation or transcription storage.

slide-5
SLIDE 5

5

BaBL, version 2.0: Milestones

  • Multiconference WebRTC application
  • Real-Time Closed-Captioning
  • Instant translation
  • Transcription storage
slide-6
SLIDE 6

6

Multiconference WebRTC application

  • WebRTC, what is it?

WebRTC is a free, open project that enables web browsers with Real- Time Communications capabilities.

  • Its goal:

To enable rich, high quality, RTC applications to be developed in the browser via simple Javascript APIs and HTML5.

[1] As stated in WebRTC.org.

slide-7
SLIDE 7

7

WebRTC APIs

  • MediaStream:

For acquiring audio and video.

  • RTCPeerConnection:

For transmitting audio and video.

  • RTCDataChannel:

For transmitting data.

slide-8
SLIDE 8

8

MediaStream

  • navigator.getUserMedia(constraints,

successCallback, errorCallback);

[2] Figure by Justin Uberti and Sam Dutton.

slide-9
SLIDE 9

9

RTCPeerConnection

  • Signaling:

Session description, ICE, STUN, TURN…

  • Media engines:

Codecs, echo cancelation, noise reduction, jitter buffering…

  • Security:

HTTPS, SRTP, DTLS…

slide-10
SLIDE 10

10

WebRTC architecture

[1] Figure from WebRTC.org.

slide-11
SLIDE 11

11

Signaling server

  • NodeJS:

Web server and signaling server. Fully implemented using Javascript.

  • Socket.io:

NodeJS module that enables websockets between clients and server.

slide-12
SLIDE 12

12

Calling: The establishment

Download webpage (HTTP/HTTPS) getUserMedia getUserMedia Download webpage (HTTP/HTTPS) New user joined (websocket) Create room (websocket) Join room (websocket) PeerConnection PeerConnection Offer (websocket) Offer (websocket) Answer (websocket) Answer (websocket) createOffer createAnswer ICE candidates (websocket) Media streams (SRTP) ICE candidates (websocket) User A User B Server

slide-13
SLIDE 13

13

Calling: The mesh

Room A Room B

slide-14
SLIDE 14

14

Calling: ICE/STUN/TURN

  • Interactive Connectivity Establishment (ICE):

RFC 5245. Candidates for IP address.

  • Session Traversal Utilities for NAT (STUN):

Request and response.

  • Traversal Using Relays around NAT (TURN):

STUN extension. Relay. Useful but resource-intensive.

slide-15
SLIDE 15

15

Multiconference accomplished!

slide-16
SLIDE 16

16

Real-time closed-captioning

  • Web speech API:

SpeechRecognition interface: For converting the voice into text.

  • WebRTC data channel:

For sending the text to the other peers.

slide-17
SLIDE 17

17

Web Speech API

  • Another HTML5 API:

Specification by W3C.

  • Only implemented on Chrome:

The voice is sent to Google’s speech recognition web service. A JSON object with a list of possible matches is returned. They use it for voice searches: https://www.google.com/

slide-18
SLIDE 18

18

RTCDataChannel

  • Bidirectional peer to peer:

Really low latency.

  • Secure:

Datagram Transport Layer Security.

  • Unreliable or reliable:

Latency or accuracy.

slide-19
SLIDE 19

19

Challenges

  • Subtitles should be switched on/off by the remote user

We send the remote user’s requests using the signaling server.

  • Continuous recognition

We keep a list of user requesting subtitles.

  • Microphone permission

We use HTTPS.

slide-20
SLIDE 20

20

Architecture

  • 1. Subtitles request
  • 2. Subtitles request
  • 3. Voice
  • 4. Subtitles
  • 5. Subtitles

User A User B Google server Signaling server

slide-21
SLIDE 21

21

Real-time closed-captioning accomplished!

slide-22
SLIDE 22

22

Transcription storage

  • Keeping record of our conversations:

Text is much lighter than audio or video. And easier to find!

  • Indexed DB:

One more HTML5 API. Local storage in the client side.

slide-23
SLIDE 23

23

Transcription storage accomplished!

slide-24
SLIDE 24

24

Instant translation

  • Translation services online:

They are not free.

  • Microsoft Translator API:

Free 2 millions characters/month.

slide-25
SLIDE 25

25

Challenges

  • Should go through the server

My private developer key can’t be in the client side.

  • When to request the translation?

isFinal flag. Not so real-time. But much cheaper!

slide-26
SLIDE 26

26

Architecture

  • 1. Subtitles request
  • 2. Subtitles request
  • 3. Voice
  • 4. Subtitles
  • 5. Subtitles

User A User B

  • 6. Subtitles
  • 7. Translated subtitles
  • 8. Translated subtitles

Google Server Signaling Server Microsoft Translator Server

slide-27
SLIDE 27

27

“Real-time” translation accomplished!

slide-28
SLIDE 28

28

Wait! Last minute add-

  • n!
slide-29
SLIDE 29

29

Spoken translated subtitles

  • Speech Synthesis API:

The other interface included in the Web Speech API. Chrome has some built-in speech engines.

slide-30
SLIDE 30

30

Spoken translated subtitles accomplished!

slide-31
SLIDE 31

31

Conclusion

  • Not perfect:

Programmed by just one person. Using free resources. These technologies are still under development.

  • A little more time, a little more resources:

And Sci-Fi won’t be Sci-Fi anymore!

slide-32
SLIDE 32

32

References

  • [1] Google Chrome team. WebRTC.org. http://www.webrtc.org/ [Online;

accessed 30-April-2014]

  • [2] Justin Uberti and Sam Dutton. WebRTC. http://io13webrtc.appspot.com/

[Online; accessed 30-April-2014]

  • [3]

J. Rosenberg. Interactive Connectivity Establishment (ICE). https://tools.ietf.org/html/rfc5245 [Online; accessed 30-April-2014]

slide-33
SLIDE 33

33

Questions?

slide-34
SLIDE 34

34

Acknowledgements

  • Don Monte and Nishant Agrawal
  • Elias Yousef
  • Javier Monte Condeoliva and Miguel Camacho Ruiz
  • Tania Arenas de la Rubia
  • Carol Davids
slide-35
SLIDE 35

35

Thank you