scientific corpus HUI WEI Data collection Some digital libraries - PowerPoint PPT Presentation

Dec 16, 2022 •293 likes •418 views

Data mining, management and visualization in large scientific corpus HUI WEI Data collection Some digital libraries did not supply APIs We use raw PDF docs as input Data collection 1. to extract basic information of a paper such as authors,

Data mining, management and visualization in large scientific corpus HUI WEI
Data collection Some digital libraries did not supply APIs We use raw PDF docs as input
Data collection 1. to extract basic information of a paper such as authors, title, abstract sentences, doi 2. to extract references 3. to extract standard keywords and their frequency from each paper.
Text mining 1. Use Jape rules to define “Macros” to find important markers, such as”DOI”, “year”, “abstract” tags. 2. Use Annie NE Transducer and Gazetteer look up person names like “author”. 1. Use Gate ontology Gazetteer and Jape rules look up Computer Graphic terms in the content.
Text mining
Keywords onto
Data repositories Graph repository
Data repositories Data is managed in 4 NoSql repositories
Data repositories Data distribution and system workflow
Data visualization
Topic river visualization
Thanks hui.wei@beds.ac.uk

Recommend

Corpus Stylistics: Speech, Writing and Thought Presentation in a Corpus of English Writing

Corpus Stylistics: Speech, Writing and Thought Presentation in a Corpus of English Writing (Routledge Advances in Corpus Linguistics) Elena Semino, Mick Short Click here if your download doesn"t start automatically Corpus Stylistics:

389 views • 5 slides

The need for Corpus Statistics: Corpus analysis and the identification of linguistically relevant

The need for Corpus Statistics: Corpus analysis and the identification of linguistically relevant patterns Launching the Corpus Statistics Group 11 th Feb. 2016 University of Birmingham The Corpus Statistics group Core members (not just

462 views • 19 slides

TrustedOut Corpus Intelligence Corpus Intelligence Makes Intelligence Trustworthy. Florent Solt,

TrustedOut Corpus Intelligence Corpus Intelligence Makes Intelligence Trustworthy. Florent Solt, CTO & co-founder GESTE, Feb 20 th 2019, Paris. The problem: Distrust in media. TrustedOut Corpus Intelligence ?? ?? The consequence: In

441 views • 12 slides

MACAQ : A Multi Annotated Corpus to study how we adapt Answers to various Questions Anne

Overview Why a corpus of human answers? Corpus constitution Corpus annotation Conclusion MACAQ : A Multi Annotated Corpus to study how we adapt Answers to various Questions Anne Garcia-Fernandez , Sophie Rosset, Anne Vilnat LIMSI-CNRS and

565 views • 35 slides

Scientific report Mariusz ynel April 22, 2015 Scientific report 2 Contents 1 Scientific

Scientific report Mariusz ynel April 22, 2015 Scientific report 2 Contents 1 Scientific degrees 3 2 Employment 3 3 Scientific achievement 3 3.1 The title of scientific achievement . . . . . . . . . . . . . . . . . . . . 3 3.2

485 views • 26 slides

The Scientific Method The Scientific Method The Scientific Method involves 6 steps: Problem

The Scientific Method The Scientific Method The Scientific Method involves 6 steps: Problem /Question Research Hypothesis Experiment Data (Results /Observation) Conclusion The Scientific Method The scientific method idea is the method on

311 views • 11 slides

SH 358 IMPROVEMENTS Corpus Christi District Updated October 2018 SH 358 Improvements Corpus

SH 358 IMPROVEMENTS Corpus Christi District Updated October 2018 SH 358 Improvements Corpus Christi District All dates & schedules are subject to change Updated October 2018 Project Overview Length: 15 miles Total cost: $49.96

711 views • 34 slides

TrustedOut Corpus Intelligence Corpus Intelligence Makes Intelligence Smarter and Trustworthy.

TrustedOut Corpus Intelligence Corpus Intelligence Makes Intelligence Smarter and Trustworthy. Digimind Event, Paris, April 11st 2019. Freddy Mini, CEO & co-founder Strictly Confidential 1919 2016 2017 2018/05 2018/06 2018/2019

657 views • 16 slides

FY 2019 FY 2022 RURAL TRANSPORTATION IMPROVEMENT PROGRAM Corpus Christi District April 19,

FY 2019 FY 2022 RURAL TRANSPORTATION IMPROVEMENT PROGRAM Corpus Christi District April 19, 2018 Corpus Christi District FY 2019 FY 2022 Rural TIP Table of Contents 1 Purpose 3 Texas Department of Transportation (TxDOT) and Corpus

667 views • 19 slides

FAIC Foreign Accent Imitation Corpus Sara Neuhauser University of Jena, Germany IAFPA 2011

FAIC Foreign Accent Imitation Corpus FAIC Foreign Accent Imitation Corpus Sara Neuhauser University of Jena, Germany IAFPA 2011 Vienna, 24.28.07.2011 FAIC Foreign Accent Imitation Corpus Outline 1 Background Preliminary study

599 views • 21 slides

City of Corpus Christi Raw Water Supply Strategies Council Presentation July 24, 2018 1

City of Corpus Christi Raw Water Supply Strategies Council Presentation July 24, 2018 1 Current Water Supply 2 2017 Corpus Christi Regional Customer Water Demand 10% 4% City of Corpus Christi 1% 3% SPMWD 3% Alice 6% Beeville Mathis

611 views • 34 slides

Getting to know your corpus: applying Topic Modelling to a corpus of research articles Paul

Getting to know your corpus: applying Topic Modelling to a corpus of research articles Paul Thompson Akira Murakami Susan Hunston University of Birmingham University of Cambridge University of Birmingham p.thompson@bham.ac.uk

1.4k views • 77 slides

Corpus Analysis from a Mathematical Perspective Corpus Statistics Research Group launch event

Corpus Analysis from a Mathematical Perspective Corpus Statistics Research Group launch event Birmingham, 11th Feb 2016 Simon Preston (University of Nottingham) Joint work with R. Carrington, A. Hennessey, M. Mahlberg, K. Severn, Y. van Gennip,

512 views • 17 slides

A mas novas vos torn / Now I take you back Corpus to my tale Structure Corpus Study

Introduction Parallel Corpora A mas novas vos torn / Now I take you back Corpus to my tale Structure Corpus Study Conclusion The Romance of Flamenca References Olga Scrivner, E.D. Blodgett*, Sandra K ubler, Michael McGuire

529 views • 39 slides

CORPUS STYLISTICS: SPEECH, WRITING AND THOUGHT PRESENTATION IN A CORPUS OF ENGLISH WRITING

CORPUS STYLISTICS: SPEECH, WRITING AND THOUGHT PRESENTATION IN A CORPUS OF ENGLISH WRITING Download Free Author: Elena Semino, Mick Short Number of Pages: 272 pages Published Date: 15 Aug 2014 Publisher: Taylor & Francis Ltd Publication

89 views • 4 slides

From the National Corpus of Polish to the Polish Corpus Infrastructure Maciej Ogrodniczuk

From the National Corpus of Polish to the Polish Corpus Infrastructure Maciej Ogrodniczuk Linguistic Engineering Group Institute of Computer Science Polish Academy of Sciences SLOVKO 2019 Bratislava, 25 October 2019 Agenda Three main

443 views • 39 slides

An Introduction to Language Processing with Perl and Prolog Chapter 2: Corpus Processing Tools

An Introduction to Language Processing with Perl and Prolog Chapter 2: Corpus Processing Tools Pierre Nugues Lund University Pierre.Nugues@cs.lth.se http://www.cs.lth.se/home/Pierre_Nugues/ Pierre Nugues An Introduction to Language

640 views • 39 slides

Voyant Tools Alyssa and Daniel Outline -Basic overview of Voyant -Voyant Comparison of Corpuses

Voyant Tools Alyssa and Daniel Outline -Basic overview of Voyant -Voyant Comparison of Corpuses -Additional Voyant Tools -Voyant vs Topic Modelling -What Questions can Voyant answer? Basic Overview of Voyant What is Voyant? How to

453 views • 21 slides

Modeling Housing Affordability in Corpus Christi, Texas December 13, 2018 Ov Overview I.

Modeling Housing Affordability in Corpus Christi, Texas December 13, 2018 Ov Overview I. Background II. Owner-Occupied Housing Affordability III. Renter-Occupied Housing Affordability IV. Future Housing Needs V. Housing & Neighborhood

1.03k views • 67 slides

EUROPEAN PERFORMING ARTS AND TRANSMEDIA LAB PILOT SCHEME Centre Chorgraphique National de

EUROPEAN PERFORMING ARTS AND TRANSMEDIA LAB PILOT SCHEME Centre Chorgraphique National de Franche-Comt Belfort Le Granit, scne nationale de Belfort MA scne nationale-Pays de Montbliard A SHARED LABORATORY With the aim of

45 views • 3 slides

1 Customized AI Techniques for the Patent Field Dean Alderucci Carnegie Mellon University

1 Customized AI Techniques for the Patent Field Dean Alderucci Carnegie Mellon University Center for AI & Patent Analysis Patents General-purpose AI & NLP The gap between AI & the legal field Overview Bridging the gap:

793 views • 33 slides

Wells Fargo Securities Midstream and Utility Symposium Disclaimer The Recipient acknowledges

Wells Fargo Securities Midstream and Utility Symposium Disclaimer The Recipient acknowledges that the Company considers the Evaluation Material to include confidential, sensitive and proprietary information and the Recipient agrees that it will

628 views • 25 slides

Grade 10 going into Grade 11 COURSE SELECTION PRESENTATION Corpus Christi Catholic Secondary

Grade 10 going into Grade 11 COURSE SELECTION PRESENTATION Corpus Christi Catholic Secondary School Dignity Equity Respect Education and Career/Life Planning Framework: A Four-Step Inquiry Process Who am I? What is my plan for

789 views • 57 slides

INVESTOR PRESENTATION December 4, 2019 TSX: HOT.UN (CAD$) | TSX: HOT.U (US$) | TSX: HOT.DB.U

INVESTOR PRESENTATION December 4, 2019 TSX: HOT.UN (CAD$) | TSX: HOT.U (US$) | TSX: HOT.DB.U (Debentures) Forwardlookingstatements This corporate update is a summary and should be read together with the more detailed information, financial

382 views • 37 slides