capturing natural interactions Nick Campbell Trinity College, Dublin Clarin/FLaReNet Workshop@KTH November 26th, 2009 Thursday 26 November 2009
introduction • Speech recognition and synthesis technologies can now be considered mature, but their simple incorporation into speech-based human-computer interfaces reveals shortcomings in their capabilities. • Perhaps the biggest reason for this is that each technology was designed explicitly to convert between text and spoken modalities, without taking into consideration the complexities of human spoken interaction as the joint creation of mutually understood meaning. • If the goal of this meeting is to facilitate data collection to improve these interfaces and make them more intelligent and human-like (through a better understanding of human interaction and communication), then we might make a start by designing improved techniques for efficiently capturing, storing, annotating, and distributing large corpora of natural spoken interactions Thursday 26 November 2009
Overview of the Talk • Speech & Multimodal Databases • Annotating, Viewing & Distributing Data • Two-way Dissemination (crowd-sourcing) • plus, because I am now at Trinity, working again in the Humanities, a selection of 18th Century poetic thought (but with an engineering bias) ! Thursday 26 November 2009
Speech & Multimodal Databases • my primary interest: collecting natural speech - modelling human conversational interactions • age 12: WW, ’98: “we murder to dissect” . . . “quit your books; let Nature be your teacher” • important design notes for corpus gatherers! Thursday 26 November 2009
Verse > William Wordsworth > Complete Poetical Works THE TABLES TURNED, 1798 UP! up! my Friend, and quit your books; One impulse from a vernal wood Or surely you'll grow double: May teach you more of man, Up! up! my Friend, and clear your looks; Of moral evil and of good, Why all this toil and trouble? Than all the sages can. The sun, above the mountain's head, Sweet is the lore which Nature brings; A freshening lustre mellow Our meddling intellect Through all the long green fields has spread, Mis-shapes the beauteous forms of things:-- His first sweet evening yellow. We murder to dissect. Books! 'tis a dull and endless strife: Enough of Science and of Art; Come, hear the woodland linnet, Close up those barren leaves; How sweet his music! on my life, Come forth, and bring with you a heart There's more of wisdom in it. That watches and receives. And hark! how blithe the throstle sings! He, too, is no mean preacher: Come forth into the light of things, Let Nature be your teacher. She has a world of ready wealth, Our minds and hearts to bless-- Spontaneous wisdom breathed by health, Truth breathed by cheerfulness. Thursday 26 November 2009
multifaceted behaviour • by constraining a corpus, we limit the types of interaction that it can illustrate • only by releasing these constraints on participant behaviour can we gather a corpus that will teach us something new about human conversational interaction Thursday 26 November 2009
dimensions of speech Thursday 26 November 2009
example: what is a “turn” Thursday 26 November 2009
Thursday 26 November 2009
contact management? Thursday 26 November 2009
Thursday 26 November 2009
bias in corpora • the proposed ISO standard also illustrates the inherent bias in existing corpora: • e.g., Tables 1 & 2 in Annex F show considerable differences in “Contact Management” between corpora • Our conclusion is that Contact Management could be considered as an ʻ optional ʼ dimension, since this aspect of communication is not reflected in most existing dialogue act annotation schemes (6 out of 18). It was noticed, however, that for some types of dialogues, e.g. phone conversations or tele- conferences (as in the OVIS corpus), this aspect may be important.” • only 0.1% in AMI, vs 12.3% in OVIS ..... Thursday 26 November 2009
• Results from survey of dimensions and communicative functions in existing annotation schemas Thursday 26 November 2009
Annotating, Viewing and Distributing New Data There are presently several tools for manual annotation of data that each store the results in a prescribed format, easy for dissemination, but my experience of working with these and of talking with people who use them regularly is that the task is tedious, and the framework often restrictive. Rather than prescribe a standard at this time, we might benefit more from creating a support group whereby people who annotate data regularly can communicate and share samples, tools, and formats for rapid assisted evolution. My LREC 2010 paper ( A Software Toolkit for Viewing Annotated Multimodal Data Interactively over the Web ) may be relevant here. Thursday 26 November 2009
Thursday 26 November 2009
A Software Toolkit for Viewing Annotated Multimodal Data Interactively over the Web section headings (lrec-2010) • introduction • the freetalk multimodal corpus • assembling complex data • viewing complex data interactively • details of the software • downloading & use • summary & conclusion Thursday 26 November 2009
flash-based data interface Thursday 26 November 2009
flash movies & dataplots • we archive ALL originals, and link various derived annotations, data streams, and compressed video versions .... flash movie format (xxx.flv) appears to offer the most efficient service and access software ........ • interactive pages at www.speech-data.jp Thursday 26 November 2009
Two-Way Dissemination By sharing a corpus, we stand to gain added annotation levels. We should also examine crowd-sourcing in this respect. As with our own FreeTalk corpus (www.speech-data.jp), by making the initial data public and co-operating worldwide with interested partners, the annotations can be grown as researchers with different interests contribute their own layers of knowledge. Since the world of multimodal corpora is still young, perhaps the most we might expect from this initial meeting is the opening up of channels whereby the exchange of sources and resources might take place. Thursday 26 November 2009
a growing community • we don’t yet have clearly defined “interface standards” but we try to keep a flexible, open- minded approach • different people are working on the corpus each from their own viewpoints, using different software and both ‘top-down’ (theory driven) vs ‘bottom-up’ (data driven) approaches .... • we are hoping for a happy marriage of both Thursday 26 November 2009
summary • we do not yet know how to properly create a ‘balanced’ and ‘representative’ speech corpus • we do not yet know how to integrate & manage complex multimodal data packages • we do not yet know the best ways to disseminate and share these types of data • so maybe it is a bit early to propose standards • but we can gain a lot by encouraging exchange and interchange of related annotations & data Thursday 26 November 2009
• thank you .... Thursday 26 November 2009
Recommend
More recommend