When is a Clone not a Clone? (and vice-versa) Contextualized Analysis of Web Services Douglas Martin James R. Cordy Scott Grant David B. Skillicorn School of Computing Kingston, Canada
Motivation The Personal Web Rapidly growing number of web services makes it increasingly difficult to find and choose the right ones Need a quick and convenient way to find alternatives Hand tagging impractical – automation is needed!
Motivation Automation Similarity detection techniques offer solutions! Code clone detection from software engineering research can find similar code fragments – why not similar services? Topic models from data mining research can find text documents with similar semantics – why not similar services?
Web Service Similarity Web services are stored in service registries, containing WSDL service description files Could apply clone detection to entire service descriptions But what we really want are similar service operations
Let’s try it! <operation name=" GetStock " > <input message="tns:GetStockRequest" /> <complexType name=“Stock”> <output message="tns:GetStockResponse" /> <sequence> </operation> <element name=“ Supplier ” type=“xsd:string”/> <element name=“ Warehouse ” type=“xsd:string”/> <element name=“ OnHand ” type=“xsd:string”/> <element name=“ OnOrder ” type=“xsd:string”/> <element name=“ Demand ” type=“xsd:string”/> </sequence> </complexType > <operation name=" GetStock " > <input message="tns:GetStockRequest" /> <complexType name=“Stock”> <output message="tns:GetStockResponse" /> <sequence> </operation> <element name=“ date ” type=“xsd:string”/> <element name=“ open ” type=“xsd:float”/> <element name=“ high ” type=“xsd:float”/> <element name=“ low ” type=“xsd:float”/> <element name=“ close ” type=“xsd:float”/> <element name=“ volume ” type=“xsd:float”/> </sequence> </complexType >
How about these? <operation name=“ DrawRateChartCustom ”> <input message=“DrawRateChartCustomIn”/> <output message=“DrawRateChartCustomOut”/> </operation> <operation name=" GetTopicBinaryChartCustom "> <input message="GetTopicBinaryChartCustomSoapIn"/> <output message="GetTopicBinaryChartCustomSoapOut"/> </operation>
So what went wrong? At this point we thought maybe our idea wasn’t going to work Maybe clone detection can’t help with web service discovery? But why? What’s so special about WSDL?
Web Service Description Language (WSDL) A WSDL service description has 3 main parts:
Web Service Description Language (WSDL) A WSDL service description has 3 main parts: a <portType> element where the operations are declared;
Web Service Description Language (WSDL) A WSDL service description has 3 main parts: a <portType> element where the operations are declared; <message> elements corresponding to inputs, outputs and faults of the operations;
Web Service Description Language (WSDL) A WSDL service description has 3 main parts: a <portType> element where the operations are declared; <message> elements corresponding to inputs, outputs and faults of the operations; and a <types> element containing an XML Schema that defines the data and structure types used in the messages
Web Service Description Language (WSDL) This simple example service has two operations:
Web Service Description Language (WSDL) This simple example service has two operations: ReserveRoom
Web Service Description Language (WSDL) This simple example service has two operations: ReserveRoom GetAvailableRooms
Web Service Description Language (WSDL) WSDL service description files contain descriptions of the operations that a web service has to offer But the pieces of each operation’s own description are scattered over different parts of the WSDL file Difficult to identify complete units to analyze and compare
The Problem This poses a problem for analysis techniques: Operations cannot easily be compared for similarity using clone detectors, because there are no contiguous fragments to compare And they cannot be analyzed using data mining topic models, because there are no separate complete documents to generate a model from
Our Solution Our solution is to contextualize the original <operation> elements, to create self-contained operation descriptions We use source transformation to inline remote information from the context into the elements that reference or depend on them We call these contextualized WSDL operations Web Service Cells, or WSCells The first example of a new kind of clone detection: contextual clones
Contextualizing WSDL Operations
Contextual Clone Detection
An Experiment We have run an experiment to investigate the difference between clone detection on WSCells and original raw operations Two sets of WSDL service description files: 1,100 operations and 7,500 operations Compared NICAD clone detector results for each set at various near-miss difference thresholds 0% = exact clone, 10% = 1 line in 10 different, and so on
An Experiment Number of clones decreases with WSCells Clone ¡Pairs ¡in ¡Set ¡1 ¡ Clone ¡Pairs ¡in ¡Set ¡2 ¡ Difference ¡ Threshold ¡ Originals ¡ WSCells ¡ Originals ¡ WSCells ¡ 0.0 ¡ 852 ¡ 705 ¡ 1434 ¡ 1066 ¡ 0.1 ¡ 852 ¡ 734 ¡ 1434 ¡ 1228 ¡ 0.2 ¡ 879 ¡ 775 ¡ 1438 ¡ 1637 ¡ 0.3 ¡ 884 ¡ 813 ¡ 1469 ¡ 1637 ¡ <operation name=" GetStock " > <input message="tns:GetStockRequest" /> <complexType name=“Stock”> <output message="tns:GetStockResponse" /> <sequence> </operation> <element name=“ Supplier ” type=“xsd:string”/> <element name=“ Warehouse ” type=“xsd:string”/> <element name=“ OnHand ” type=“xsd:string”/> Reduction in <element name=“ OnOrder ” type=“xsd:string”/> <element name=“ Demand ” type=“xsd:string”/> </sequence> </complexType > false positives <operation name=" GetStock " > <input message="tns:GetStockRequest" /> <complexType name=“Stock”> <output message="tns:GetStockResponse" /> <sequence> </operation> <element name=“ date ” type=“xsd:string”/> <element name=“ open ” type=“xsd:float”/> <element name=“ high ” type=“xsd:float”/> <element name=“ low ” type=“xsd:float”/> <element name=“ close ” type=“xsd:float”/> <element name=“ volume ” type=“xsd:float”/> </sequence> </complexType >
An Experiment Number of clone classes can increase with WSCells Clone ¡Classes ¡in ¡Set ¡1 ¡ Clone ¡Classes ¡in ¡Set ¡2 ¡ Difference ¡ Threshold ¡ Originals ¡ WSCells ¡ Originals ¡ WSCells ¡ 0.0 ¡ 169 ¡ 187 ¡ 587 ¡ 433 ¡ 0.1 ¡ 169 ¡ 139 ¡ 587 ¡ 499 ¡ 0.2 ¡ 172 ¡ 142 ¡ 589 ¡ 631 ¡ 0.3 ¡ 171 ¡ 136 ¡ 591 ¡ 631 ¡ <operation name=" GetStock " > <input message="tns:GetStockRequest" /> <complexType name=“Stock”> <output message="tns:GetStockResponse" /> <sequence> </operation> <element name=“ Supplier ” type=“xsd:string”/> <element name=“ Warehouse ” type=“xsd:string”/> <element name=“ OnHand ” type=“xsd:string”/> Splits by deeper <element name=“ OnOrder ” type=“xsd:string”/> <element name=“ Demand ” type=“xsd:string”/> </sequence> differences – </complexType > <operation name=" GetStock " > more precision <input message="tns:GetStockRequest" /> <complexType name=“Stock”> <output message="tns:GetStockResponse" /> <sequence> </operation> <element name=“ date ” type=“xsd:string”/> <element name=“ open ” type=“xsd:float”/> <element name=“ high ” type=“xsd:float”/> <element name=“ low ” type=“xsd:float”/> <element name=“ close ” type=“xsd:float”/> <element name=“ volume ” type=“xsd:float”/> </sequence> </complexType >
Recommend
More recommend