the future of data the future of data a smorgasbord a
play

The Future of Data: The Future of Data: A Smorgasbord A - PDF document

The Future of Data: The Future of Data: A Smorgasbord A Smorgasbord Guy M. Lohman Guy M. Lohman IBM Almaden Research Center IBM Almaden Research Center Myth #1: XML will "solve" the Myth #1: XML will "solve" the data


  1. The Future of Data: The Future of Data: A Smorgasbord A Smorgasbord Guy M. Lohman Guy M. Lohman IBM Almaden Research Center IBM Almaden Research Center Myth #1: XML will "solve" the Myth #1: XML will "solve" the data format problem data format problem Heterogeneity will always always reign! reign! Heterogeneity will Not everything will be XMLized! Legacy Not everything will be XMLized! Legacy systems, flat files, the next "great thing",... systems, flat files, the next "great thing",... Who's going to control the semantics of all Who's going to control the semantics of all those XML tags? Remember, the "X" stands those XML tags? Remember, the "X" stands for "extensible"! Everyone and his mother for "extensible"! Everyone and his mother will be coming up with new tags, and who will be coming up with new tags, and who knows what they mean when you're knows what they mean when you're searching the web. searching the web.

  2. XML is NOT a Panacea XML is NOT a Panacea EXAMPLE 1: What does tag <salary> mean? EXAMPLE 1: What does tag <salary> mean? What currency? What currency? What frequency? (annual, monthly,hourly,...?) What frequency? (annual, monthly,hourly,...?) EXAMPLE 2: What does value "order" mean EXAMPLE 2: What does value "order" mean when its tag is <type>? when its tag is <type>? Type of what? Type of what? "Order" of what? Purchase? Sequence? "Order" of what? Purchase? Sequence? Gives some some increased context... increased context... Gives But only a slight slight improvement over improvement over But only a Google search! Google search! Myth #2: Relational is Dead -- Myth #2: Relational is Dead -- Native XML repositories are the future Native XML repositories are the future Relational DBMSs are hugely successful, Relational DBMSs are hugely successful, with a complete array of utilities, features, with a complete array of utilities, features, and performance honing. and performance honing. Evolutionary rather than revolutionary Evolutionary rather than revolutionary changes are the only way that change will changes are the only way that change will happen happen Remember how object-oriented systems, Remember how object-oriented systems, which surely subsumed relational systems, which surely subsumed relational systems, were going to replace relational? were going to replace relational?

  3. Myth #3: Just shred everything Myth #3: Just shred everything into relational tables! into relational tables! Boy, that's a LOT of work for all documents, Boy, that's a LOT of work for all documents, few of which will ever be retrieved by few of which will ever be retrieved by queries queries Many documents won't even be searched! Many documents won't even be searched! This won't exploit the nesting structure that This won't exploit the nesting structure that XML provides -- a lost opportunity XML provides -- a lost opportunity Myth #4: Everything's off the Myth #4: Everything's off the Web as Data Streams Web as Data Streams SOMEONE has to store the stuff! SOMEONE has to store the stuff! Companies won't store their corporate Companies won't store their corporate jewels on the Web, except possibly in an jewels on the Web, except possibly in an Intranet inside the firewall Intranet inside the firewall Cacheing will become even more Cacheing will become even more commonplace, for performance commonplace, for performance

  4. Myth #5: There's just one copy of Myth #5: There's just one copy of the data I'm interested in the data I'm interested in Multiple levels of cacheing is now Multiple levels of cacheing is now commonplace commonplace Edge servers Edge servers Mobile clients that are periodically detached Mobile clients that are periodically detached Multiple tiers Multiple tiers Multiple components within a server Multiple components within a server Different degrees of synchronization Different degrees of synchronization Synchronizing is a major headache! Synchronizing is a major headache! Cache Write-Through Dilemma Cache Write-Through Dilemma Guido = 'nice' Guido = 'jerk' Replica 1 Replica 2 Guido = 'smart' Guido='smart' Master Guido='smart'

  5. Cache Write-Through Dilemma Cache Write-Through Dilemma Replica 1 Replica 2 Guido = 'jerk' Guido='nice' Master Guido='smart' Cache Write-Through Dilemma Cache Write-Through Dilemma Replica 1 Replica 2 Guido = 'jerk' Guido='nice' Master Guido= ???

  6. Myth #6: Don't need to integrate Myth #6: Don't need to integrate data -- use Web Services data -- use Web Services Back to the future! Back to the future! Return to the "Balkanization" of data silos! Return to the "Balkanization" of data silos! Encapsulating data within an app Encapsulating data within an app makes sense for security makes sense for security but not within an enterprise! but not within an enterprise! App Silos vs. Integration App Silos vs. Integration Web Customers Orders App App Service DB DBMS 2 DBMS 1 Integration Customers Orders Database Database

  7. Who's REALLY Doing These? Who's REALLY Doing These? Stock quotes Stock quotes Searching Shakespeare's plays Searching Shakespeare's plays Most XPath examples Most XPath examples More Realistic Examples More Realistic Examples Everything on IBM stock: price + Everything on IBM stock: price + Analysts' opinions Analysts' opinions News items News items A great statistic I saw a while ago (when?)... A great statistic I saw a while ago (when?)... In an article on the web? In an article on the web? In an e-mail from someone? Who? Folder? In an e-mail from someone? Who? Folder? In my Palm? Where? In my Palm? Where? In a presentation someone sent me? In a presentation someone sent me? In a paper I read? In a paper I read? In a file (which directory?) on my In a file (which directory?) on my development machine? development machine? laptop? laptop?

  8. My Position My Position Heterogeneity will always reign Heterogeneity will always reign Format (structured, semi-structured, unstructured) Format (structured, semi-structured, unstructured) Schema chaos, even for structured data! Schema chaos, even for structured data! Schema and data are interchangeable Schema and data are interchangeable A "Data Smorgasbord" A "Data Smorgasbord" Deal with it! Deal with it! Databases (not apps) are still the best hope for Databases (not apps) are still the best hope for integrating data (richer modeling) integrating data (richer modeling) Consequences Consequences Will see: Will see: Ad hoc "communities" for standardizing Ad hoc "communities" for standardizing semantics of tags (like e-marketplaces) semantics of tags (like e-marketplaces) Products promising integration Products promising integration Need: Need: Richer semantic models (yes, even for XML!) Richer semantic models (yes, even for XML!) More robust/adaptive query processing More robust/adaptive query processing Better tools for managing diversity Better tools for managing diversity

Recommend


More recommend