terminology systematization for cybersecurity domain in
play

TERMINOLOGY SYSTEMATIZATION FOR CYBERSECURITY DOMAIN IN ITALIAN - PowerPoint PPT Presentation

TERMINOLOGY SYSTEMATIZATION FOR CYBERSECURITY DOMAIN IN ITALIAN LANGUAGE CLAUDIA LANZA - PHD STUDENT, UNIVERSITY OF CALABRIA AND VISITING AT UNIVERSIT DE NANTES C.LANZA@DIMES.UNICAL.IT BATRICE DAILLE FULL PROFESSOR UNIVERSIT DE


  1. TERMINOLOGY SYSTEMATIZATION FOR CYBERSECURITY DOMAIN IN ITALIAN LANGUAGE CLAUDIA LANZA - PHD STUDENT, UNIVERSITY OF CALABRIA AND VISITING AT UNIVERSITÉ DE NANTES C.LANZA@DIMES.UNICAL.IT BÉATRICE DAILLE – FULL PROFESSOR UNIVERSITÉ DE NANTES

  2. OVERVIEW Introduction  Objectives  Methodology outline  Domain  Corpus  First phase:  - Mapping with standards - Experts of the domain - First Italian thesaurus draft for Cybersecurity Second phase:  - T erm extraction software comparison - Candidate terms observation 2 Future works  Claudia Lanza – PhD student, University of Calabria and visiting at Université de Nantes c.lanza@dimes.unical.it

  3. INTRODUCTION Starting elements Problems Language of the thesaurus: Italian Standardization of the terminology into Italian language (few standards in Italian for Cybersecurity) T erminology extraction tools in Italian Weak level of granularity given by the terms extraction tools for hierarchy and synonymy detection Support of the experts of the domain Contrasting perspective in the adjustment of the terminological assets 3 Claudia Lanza – PhD student, University of Calabria and visiting at Université de Nantes c.lanza@dimes.unical.it

  4. INTRODUCTION Translation from Standards EN-ITA Some terms are given by the English Standards and by the Italian offjcial glossaries in the English form and they are not translatable in Italian or the have become common use • Phishing terminology • Spam CYBER-DEFENCE • Smishing • Cyber trolling CYBER INTELLIGENCE 4 Claudia Lanza – PhD student, University of Calabria and visiting at Université de Nantes c.lanza@dimes.unical.it

  5. OBJECTIVES  Realizatjon of the Italian thesaurus of Cybersecurity;  Portray a comparison between the sofuware employed to confjgure a system of NLP rules to proceed with the enhancement of the candidate terms selectjon;  Set out a methodology that could automatjze, startjng from the source corpus, the semantjc updatjng for this fjeld of knowledge by adjustjng the terminological asset of the thesaurus in order to achieve an adequate coverage of Cybersecurity domain. 5 Claudia Lanza – PhD student, University of Calabria and visiting at Université de Nantes c.lanza@dimes.unical.it

  6. METHODOLOGY OUTLINE Mapping with the Creation of the Document ICT Security thesaurus as a selection and vocabularies means of analysis contained in the semantic control standards Software Establishment of T erminological comparison: the semantic extraction from relationships T2K, T ermSuite, documents (T2K) between terms Pke First Collaboration terminological list with the experts datasets fjltered T erms of the by frequency observation Cybersecurity (TF/IDF) and 6 domain accuracy Claudia Lanza – PhD student, University of Calabria and visiting at Université de Nantes c.lanza@dimes.unical.it

  7. DOMAIN  Multisciplinarity: ICT and sub-ares (Audiovisual techniques; Computer software; Electronics; etc)  Specifjcity: technicisms and standardized terms;  Cross-fjeld: computer science fjeld, legislative systems, regulations. 7 Claudia Lanza – PhD student, University of Calabria and visiting at Université de Nantes c.lanza@dimes.unical.it

  8. The overall size of the corpus is 563 documents with 11 CORPUS 806 558 terms contained in them. Portals Portals AltaLex AltaLex EUR-lex EUR-lex Guidlines (CERT) Guidlines (CERT) ENISA ENISA …etc …etc Criteria of selection: Criteria of selection: — time range : documents taken into account were ranged — time range : documents taken into account were ranged around the latest years, minimum the around the latest years, minimum the latest seven ; latest seven ; — language : only Italian documents have been considered — language : only Italian documents have been considered for the analysis ; for the analysis ; — contexts : national, European and regional laws have been — contexts : national, European and regional laws have been analysed. analysed. 8 Claudia Lanza – PhD student, University of Calabria and visiting at Université de Nantes c.lanza@dimes.unical.it

  9. FIRST (AND SECOND) PHASE MAPPING WITH STANDARDS Manually translated in Italian using IATE databases  NIST 7298 2013 r2  ISO 27000:2016 9 Claudia Lanza – PhD student, University of Calabria and visiting at Université de Nantes c.lanza@dimes.unical.it

  10. In the thesaurus In the thesaurus https://www.cybersecurityosservatorio.it/it/Services/ thesaurus.jsp

  11. FIRST PHASE: EXPERTS OF THE DOMAIN Informatics and Telematics Institute of CNR in Pisa (Tuscany) 11 Claudia Lanza – PhD student, University of Calabria and visiting at Université de Nantes c.lanza@dimes.unical.it

  12. FIRST PHASE: FIRST ITALIAN THESAURUS DRAFT FOR CYBERSECUIRTY  4 Main categories decided alongside with the experts in a abstraction information process according to their presence in the Cybesecurity taxonomies included in the gold standards and their frequency in the terminological lists : Cybersecurity, Cybercriminality, Cyberbullism, Cyber Defence .  245 candidate terms ;  Semantic Relationships decided according to the Head-term based derived with T2K and the approval by the experts;  Scope notes (Defjnitions of the terms) added in accordance to the co-occurrences in the offjcial sources of the corpus 12 Claudia Lanza – PhD student, University of Calabria and visiting at Université de Nantes c.lanza@dimes.unical.it

  13. FIRST PHASE: FIRST ITALIAN THESAURUS DRAFT FOR CYBERSECUIRTY 13 https://www.cybersecurityosservatorio.it/it/Services/th esaurus.jsp Claudia Lanza – PhD student, University of Calabria and visiting at Université de Nantes c.lanza@dimes.unical.it

  14. SECOND PHASE SOFTWARE COMPARISON  T2K has been used to extract the fjrst terminological dataset (Italian native tool); T erms oriented  TermSuite a software that helped in enhancing the structuring of the semantic relations by using variants (denominative, conceptual, linguistic) ; Source Cybersecurity Cybercriminality Cyberbullism Cyber Defence  Pke as the library that with its models, T opicRank and Multipartite Rank, supported the defjnition of Legal/technical Security; Cyber Information; fjght; / On-line information; topical information about the domain. In detail, for attacks; personal networks; security; cyber what concerns Multipartite Rank, the fjrst executions data; system cybercrime attacks; defence have been tested on the main four macro-categories Divulgative Cyber security ; Information; Prevention; Attacks; cyber included in the Italian thesaurus for Cybersecurity Document oriented security; networks; security; cyber educational system; attacks; threats; that have been validated by the group of experts : smart cars criminality; networks schools: education networks Cybersecurity, Cybercriminality, Cyberbullism, Cyber Defence . 14 Claudia Lanza – PhD student, University of Calabria and visiting at Université de Nantes c.lanza@dimes.unical.it

  15. CANDIDATE TERMS OBSERVATION Term T2K TermSuite Pke Brute Force Attack 2 occurrences 1 occurrence Not present But see for  T T erms: attacco a forza bruto; T erm : attacco bruto erm: attack forza bruta Legal document: security; Denominative variants: - npna: attacco a forza bruto systems; cyber attacks; - npna: attacco di forza bruto cybesecurity ;personal data Divulgative document: website; system; e-mail; virus; hacker ethics Phishing 176 occurrences 2 occurrences Not present T erms: phishing techniques, T erm: phishing phishing mail, spear phishing, Conceptual variant( expansion): phishing attacks, phishing - npn: messaggio di phishing website Trojan Horses Not present 1 occurrence Not present T erm: Trojan New info from PKE Threat Intelligence clusterized as Cyber counter-espionage; informative systems, actions, intelligence 15 Claudia Lanza – PhD student, University of Calabria and visiting at Université de Nantes c.lanza@dimes.unical.it

  16. FUTURE WORKS  Enhancement of categories organization through Pke;  Detection of semantic relationships starting by a corpora processing;  Checking of the terminological coverage with respect to terms variation through time 16 Claudia Lanza – PhD student, University of Calabria and visiting at Université de Nantes c.lanza@dimes.unical.it TIA 2019 – 1 July

  17. THANK YOU FOR YOUR ATTENTION 17 Claudia Lanza – PhD student, University of Calabria and visiting at Université de Nantes c.lanza@dimes.unical.it

Recommend


More recommend