Apache Solr la piattaforma di ricerca enterprise LucaBonesini | Titulus User Group, Kion – Bologna 4/dic/2013
Chi s hi son ono Luca Bonesini Infor orma matico Lanciatore di giavellotti Prog ogramma mmator ore Suonatore di chitarra basso Sistemista Imprenditore IT M T Manager Marito Tecnico di prevendita Mountainbike-ista Webmaster Padre 2 http://www.lucabonesini.it Venditor ore @lbonesini http://it.linkedin.com/in/lucabonesini/ Cantore l.bonesini@sourcesense.com Markettaro +39 366 688 7125 LucaBonesini | Titulus User Group, Kion – Bologna 4/dic/2013
Sour ources cesen ense Making sense of Open Source Contributors Lucene/Solr Apache Chemistry Apache Jackrabbit OpenSSO-Alfresco Comm mmitters Lead developer Le Hibernate Search Lucene Project Infinispan Apache/UIMA project integration JBoss GateIn Portal LucaBonesini | Titulus User Group, Kion – Bologna 4/dic/2013
Lucene e Solr Cosa sono? LucaBonesini | Titulus User Group, Kion – Bologna 4/dic/2013
Apache Apache Lu Lucene ne (core) (core) Search by ASF “Apache Lucene is a high-performance, full- featured text search engine library written entirely in Java. It is a technology suitable for nearly any application that requires full-text search, especially cross-platform”. http://lucene.apache.org/core/ fast and efficient scoring and indexing algorithms lots of contributions to make common tasks easier: highlighting, spatial, query parsers, benchmarking tools, etc. most widely deployed search library on the planet LucaBonesini | Titulus User Group, Kion – Bologna 4/dic/2013
Apache Apache Solr olr Search by ASF “Solr is the popular, blazing fast open source enterprise search platfor orm from the Apache Lucene project. Its major features include powerful full-text search, hit highlighting, faceted search, near real- time indexing, dynamic clustering, database integration, rich document (e.g., Word, PDF) handling, and geospatial search”. Highly reliable, scalable, fault tolerant, distributed indexing, replication, load-balanced querying, automated failover and recovery, centralized configuration. LucaBonesini | Titulus User Group, Kion – Bologna 4/dic/2013
Apache Apache Solr olr Search by ASF Solr is written in Java and runs as a standalone e full-text searc earch h se server er within a servlet container such as Jetty. Solr uses the Lucene Java search library at its core for full-text indexing and search, and has REST-like e HTTP/X TTP/XML a and JSON APIs nd JSON APIs that make it easy to use from virtually any pro any programming gramming language language. http://lucene.apache.org/solr Access Lucene over HTTP: Java, XML, Ruby, Python, .NET, JSON, PHP, etc. Most programmi mming tasks in Lucene are configuration tasks in Solr Faceting (guided navigation, filters, etc.) Replication and distributed search support LucaBonesini | Titulus User Group, Kion – Bologna 4/dic/2013
Enterprise Search La ricerca con la cravatta LucaBonesini | Titulus User Group, Kion – Bologna 4/dic/2013
Ent nterp erpris rise e Sea earch, rch, cosa cosa e e come. ome. “Enterprise search is the practice of making con ontent tent from multiple enterprise- type sources, such as databases and intranets, sear arch chable able to a defined d au audi dien ence ce”. [wikipedia] Ingestion → P Processing and a analysis → Indexing → Qu Query parsing → M Matching Ingestion → P Processing and a analysis → Indexing → Qu Query parsing → M Matching Pull ull Documents types and formats Dictio ictionary nary of User ser query. query. Query-index Integration ( XML, HTML, Office, etc.) to all unique Faceting. co comparis mparison. API plain lain text text words in the Paging. References Push Pus Stemming, lemmatization, corpus. to source e Crawler synonym expansion, entity Ranking. Rankin do document cuments. connector extraction, part of speech Term tagging, tokenization. freq frequency uency. LucaBonesini | Titulus User Group, Kion – Bologna 4/dic/2013
Ent nterp erpris rise e Sea earch, rch, cosa cosa e e come. ome. LucaBonesini | Titulus User Group, Kion – Bologna 4/dic/2013
Ent nterp erpris rise e Sea earch, rch, cosa cosa e e come. ome. ● Crawler: an Internet bot that systematically browses the World Wide Web, typically for the purpose of Web indexing (also called Web spider, ant, automatic indexer, web scutter ● Pre reci cisi sion/R /Reca call: in pattern recognition and information retrieval, precision (also called positive predictive value) is the fraction of retrieved instances that are relevant, while recall (also known as sensitivity) is the fraction of relevant instances that are retrieved ● Stemmi mming: the process for reducing inflected (or sometimes derived) words to their stem, base or root form (ie: "fishing", "fished", and "fisher" to the root word, "fish") ● Lemma mmati tizati tion: in linguistics is the process of grouping together the different inflected forms of a word so they can be analysed as a single item (ie: word "better" has "good" as its lemma ) ● Name med-enti tity re reco cogniti tion (entity extraction) is a subtask of information extraction that seeks to locate and classify atomic elements in text into predefined categories such as the names of persons, organizations, locations, expressions of times, quantities, monetary values, percentages, etc. ● Part rt of f sp speech ch: a linguistic category of words (or more precisely lexical items), which is generally defined by the syntactic or morphological behaviour of the lexical item in question ( ie: noun and verb ) ● Tokenizati tion: the process of demarcating and possibly classifying sections of a string of input characters. The resulting tokens are then passed on to some other form of processing. The process can be considered a sub-task of parsing input. LucaBonesini | Titulus User Group, Kion – Bologna 4/dic/2013
Search e Open Source LucaBonesini | Titulus User Group, Kion – Bologna 4/dic/2013
Ent nterp erpris rise e Sear Search: pr : prodott odotti e i e vendor endor Vendors of proprietary y enterprise Free e and d open n source search s software ent nterpr pris ise sear earch softwar are AskMeNow, Attivio, Concept Searching Limited, Apac pache he Solr lr, DataparkSearch, Content Analyst Company LLC, Coveo, Dassault Systèmes (acquired Exalead alead), Denodo, ElasticSearch, ht://Dig, Dieselpoint, Inc., dtSearch Corp., EMC Corp., Exorbyte GmbH, Expert System S.p.A., Exterro, Inc., Jumper 2.0, mnoGoSearch, Fabasoft, Funnelback, Go Google gle Sear earch h Applianc Appliance, OpenSearchServer, HP (acquired Autonomy Corporation which in turn acquired Verity K2 and Ultraseek), IBM Searchdaimon, Sphinx (acquired Vi Vivis isimo), Inbenta, inter:gator Enterprise Search, ISYS Search Software, MarkLogic, Microsoft (includes Microsoft Search V e n d o r s o f o p e n s o u r c e Server, Fas ast Sear earch h & Trans ansfer er), Mindbreeze, Neofonie (includes WeFind), Omniture (acquired en terprise search so ftw are by Adobe Systems), Open Text Corporation, Oracle Corporation (includes Secure Enterprise Search and End ndec eca a Tec echno hnolo logie gies Inc.), 3 0 D i g i t s , Apache Softw are Perception Software, PolySpot, Q-go, Q-Sensei, Recommind, SAP (includes SAP NetWeaver Foundation , LucidW orks, Enterprise Search, Search Services in SAP Se m ate xt, Flax NetWeaver AS ABAP, and Search and Classification TREX), Sine inequa qua, SLI_Systems, Sophia Search Limited, TeraText, X1 Technologies, Inc., ZyLAB Technologies, ZL Technologies LucaBonesini | Titulus User Group, Kion – Bologna 4/dic/2013
Ope pen n Sour ource, ce, lo o fanno anno anche anche lor loro. LucaBonesini | Titulus User Group, Kion – Bologna 4/dic/2013
Perché Perché Innov Innovazione azione = = Bu$ u$ine$$ ine$$ Innovazione Open Source Open Standard OAGi OASIS W3C IETF IEEE ETSI Ecma OGF IEC ISO ITU CENELEC CEN BSI UNI CEI DKE DIN AFNOR GIETS Interoperabilità LDTI LucaBonesini | Titulus User Group, Kion – Bologna 4/dic/2013
Solr e Business LucaBonesini | Titulus User Group, Kion – Bologna 4/dic/2013
Recommend
More recommend