Multimedia Indexation Titus ZAHARIA, Pr. Titus.Zaharia@telecom-sudparis.eu
Multimedia indexation Interactive multimedia Still i mage Video Audio Graphics (2D/3D, static or animated) Tera-bytes of digital AV data
Multimedia indexation Interactive multimedia Disposing of huge multimedia databases is useless without the necessary information search and retrieval tools Tera-bytes of digital AV data [source : TREC trec.nist.gov]
Multimedia indexation Indexation : definition Associate to multimedia content pertinent descriptions (meta-data) which make it possible to retrieve the desired information in large databases Typical example: Textual indexing: keywords (the web and existing search engines)
Multimedia indexation Textual Indexation: limitations Difficulty to find appropriate words for describing an image/video: subjectivity Complex multimedia content: poly-semantic character Linguistic barriers
Multimedia indexation Content-based Indexation Define descriptions intrinsically related to the content and to its perceptual characteristics Descriptor: mathematical representation of and audio or video feature Syntax and semantics of its components defined in a data description language ( e.g ., XML, RDF, OWL…)
Multimedia indexation Content-based Indexation Audio attributs (primitives) Speech Music Melody Timbre…
Multimedia indexation Content-based Indexation Visual attributs (primitives) Color 0,035 0,03 0,025 0,02 0,015 0,01 0,005 0 10 13 16 19 22 25 28 31 34 37 40 43 46 49 52 55 58 61 64 67 70 73 76 79 82 85 88 91 94 97 100 1 4 7 Shape Texture Motion
Multimedia indexation Content-based Indexation Example: a color histogram of a given image p i i
Multimedia indexation Content-based Indexation Define descriptions intrinsically related to the content and to its perceptual characteristics Descriptor mathematical representation of and audio or visual feature Compare images: define a similarity measure in the descriptor space
Multimedia indexation Content-based indexation Example: L p distances between histograms p 1 i i p 2 i 1 2 1 2 d ( p , p ) p p i i i i
Multimedia indexation Content-based indexation Define descriptions intrinsically related to the content and to its perceptual characteristics Description scheme: more complex meta-data structure, integrating descriptors and other description schemes Signature combination Multimodal descriptions Multi-grannular/hierarchical descriptions
Multimedia indexation Content-based Indexation What describe ? The whole image Objects of interest (ex. : faces, characters with yellow clothes, world cup) Spatial segmentation into objects of interest Adapted d escriptions associated to each object
Multimedia indexation Video documents Huge and complex amount of information Set of scenes and shots with heterogeneous content Multiples objects Need of structuring Temporal segmentation into shots/scenes Spatio-temporal segmentation (objects)
Multimedia indexation Video structuring Shot : Temporal segment corresponding to a unique camera shooting Scene : set of shots that are homogeneous w.r.t. to a certain criterion, generally semantic Video object : spatial or spatio-temporal region of arbitrary shape corresponding to a semantically coherent entity
Multimedia indexation Video structuring: example Segment audio-visuel Décomposition Descripteurs : (temporelle) Annotation textuelle Deux segments Mosaïque audio-visuels Mouvement dominant Décomposition (média) Un segment Deux segments vidéo audios Décomposition AS1 AS2 (spatio-temporelle) Deux régions en mouvement Descripteurs Annotation textuelle Descripteurs visuels de Couleur/Texture/Forme
Multimedia indexation Video structuring Scene description: elements of spatio-temporal localisation Time stamps Support region descriptors Hierarchical descriptions schemes AS1 AS2
Multimedia indexation Video structuring MPEG-4 standard: first to take into account scene descriptions with compositing of video objects of arbitrary shape, natural and synthetic (graphics) 2D/3D MPEG-4 scene: tree-based representation, each node AS1 AS2 corresponding to an object Language BIFS ( BInary Format for Scenes ): binary version of VRML ( Virtual Reality Modeling Language )
Multimedia indexation Video structuring MPEG-4 standard: first to take into account scene descriptions with compositing of video objects of arbitrary shape, natural and synthetic (graphics) MPEG-4 scene description Adapted to objectives of video composition and transmission AS1 AS2 Too elementary description (temporal and spatio-temporal locators) MPEG-7 standard: indexation of multimedia content
Multimedia indexation Description of video documents Music Parole Texte Key-frames (still images) Shots Spatial object Spatio-temporal object
Multimedia indexation Description of video documents Multiple, parallel decompositions corresponding to different criteria and media Multiple descriptions associated to a same AV document Make interoperable and re-usable the various indexations Support a large range of media in different formats Offer dedicated tools for visualisation/navigation/annotation
Multimedia indexation How to exchange? Proprietary environments Objets AV Objets AV Interoperabiliy Normalisation Objets AV
MPEG-7 : Multimedia Content Description Interface Objectives (N2861) Offer a standard for Feature extraction multimedia content description Support a large range of Description Standard potential application Elaborate the standard Search engine ISO/IEC JTC1/SC29/WG11 - 15938
MPEG-7 : Multimedia Content Description Interface Standardized items A set of descriptors (D) A D is a representation of a feature Descriptors (color, shape, motion, texture, audio...) Description A D defines the syntax and the schemes semantics of this representation A set of description schemes (DS) A DS specifies the structure and the semantics of the relations between its components (Ds or DSs)
MPEG-7 : Multimedia Content Description Interface Standardized items A description language (DDL) Express existing DSs and Ds Descriptors Create new DSs and Ds Description Extend/modify existing DSs schemes Description language Encoding schemes A description is encoded in order to satisfy MPEG-7 requirements related to compression description efficiency, transmission, error resilience, scalability, universal access Coded description
Multimedia indexation Bibliography B.S. Manjunath, P. Salembier, T. Sikora, « The MPEG-7 Book », John Wiley & Sons, 2002 A. Mostefaoui, F. Prêteux, V. Lecuire, J-M. Moureaux, « Gestion des données multimédias », Hermes, Lavoisier, Paris 2004 P. Gros, « L’indexation multimédia: description et recherche automatiques », Hermes, Lavoisier, Paris 2007
Descripteurs de forme
MPEG-7 Visual descriptors
MPEG-7 : Visual descriptors Multimedia Content Description Interface Couleur Texture Espace de couleur Histogramme des orientations des contours Quantification de couleur Texture homogène (représentation énergétique par filtrage de Gabor) Histogramme de couleur scalable (représentation par transformée de Haar) Couleur-structure Parcours rapide de texture (histogramme d’éléments structuraux) (caractéristiques Tamura) Couleur d’un groupe de trames Mouvement (histogramme moyen, médian ou intersection) Mouvement paramétrique Trajectoire Couleur dominantes Mouvement de la caméra (modélisation complète d’une camera 3D) Distribution spatiale de couleur Activité de mouvement (fondée sur la transformée de DCT) Localisation Forme Localisation spatiale (région polygonale) Forme-région (2D) Localisation spatio-temporelle ( angular radial transform ) (ensemble de régions polygonales) Forme-contour (2D) Autres (espace échelle de contour) Reconnaissance de visage ( eigenfaces ) Forme 3D (spectre de forme 3D)
MPEG-7 : Visual descriptors Multimedia Content Description Interface MPEG-7 Shape Descriptors Region shape Contour shape Multiview DS 3D Shape
MPEG-7 : shape descriptors Multimedia Content Description Interface Shape: intuitive geometric properties Independance w.r.t. pose Scalability / geometrie & size A shape descriptor need to be invariant w.r.t. similarity transforms (Euclidian & isotropic scaling)
MPEG-7 : shape descriptors Multimedia Content Description Interface Region-Based Shape • Angular Radial Transform – ART Descriptor • Describes complex shapes, with multiple region / holes (arbitrary topologies) • Decomposition of the shape region within a family of harmonic functions { f mn } defined over the unit disc 1 jm f , ( , ) cos( n ) e m n polar coordinates ( , )
Recommend
More recommend