Benefits and Challenges CESSE Annual Meeting July 18, 2013 1 1 - PowerPoint PPT Presentation

Quatrro Confidential Quatrro Confidential Author / Researcher Databases – Benefits and Challenges CESSE Annual Meeting July 18, 2013 1 1 www.Quatrro.com

Trends in STM Research Publishing  Exponential growth of scholarly output.  Evolution of social networks and topical communities  Authors seeking more visibility and recognition for their contributions.  Evolving user expectations from online content (functional efficiency, accuracy)  Increasing emphasis on data mining, analysis, integration  Governments, institutions and funding agencies evaluating their “investments” - faculty, departments, grants, collaborations – for Productivity, ROI  Increased interest in the “Who” of STM research - the producers of the research, not just the research itself. 2 Quatrro Confidential

Benefits of Clean, Aggregated Author Data  More efficient, enhanced editorial workflow (Peer Review) – Simpler, faster, higher quality review process (ID “the best” reviewers )  Improved online performance and search results – Enhanced discovery and more accurate retrieval of author information and content  More robust and accurate bibliometric analyses – Research productivity of institutions, departments, individuals – Indicators like citations, downloads, articles published, patents – Supports decisions like funding, promotion, and reappointment. – Better assessment of the impact of money spent on investment in research  Increased exposure to and support of the author – Visibility, tracking, collaborating  Support for the broader community – Analysis, networking, productivity, efficiency 3 Quatrro Confidential

Researcher ID – Thomson Reuters 4 Quatrro Confidential

Researcher ID – Thomson Reuters 5 Quatrro Confidential

Elsevier Scopus ID and Author Profile Author Profile Page 6 Quatrro Confidential

ORCID  Its prime aim is to improve the overall Research ecosystem by creating unique identifiers for researchers and scholars that link to other references such as publications, grants & patents. 7 Quatrro Confidential

Society/Association Initiatives 8 Quatrro Confidential

ACM Authorizer 9 Quatrro Confidential

IEEE Explore Author Search  Author profile user interface before the end of the year.  Authors will be asked to QC the data. 10 Quatrro Confidential

AIP Publishing  One of world’s largest physical science publishers  Overview: – 5.5 million potential author names – 6,000 authors with surname “Wang” – 800,000 articles back to early 20th century – Subject areas and keywords  Outcome: – 980,000 academic authors – 33,000 institutions – Database of publishing physicists complete with a record of affiliations, areas of expertise, papers published, co-authors.  Next Step: – Feedback from users and explore additional refinements 11 Quatrro Confidential

Why Create Author Database?  Support for Authors, Researchers – Create individual author profiles and provide new value added services. – Enhance the author experience with your publications (service).  Support for the Specialty / Domain Which the Society Serves – Having an accurate author record of your publications is important – Enhance interconnectivity and networking of a specific publishing community  Also need and want to respond to market needs, trends, expectations  Important, valuable information they want to own, maintain, develop proactively – Complimentary to similar, broader initiatives (ORCID, etc.)  Believe it is a service its members and community want from them.  ACM: “…emphasizing its continuing commitment to the interests of its authors and to the computing community.” 12 Quatrro Confidential

The Bigger Association/Society Picture Member Committee Editor, Member Reviewer Author Meeting Subscriber Attendee Marketer Donor 13 Quatrro Confidential

Practical Considerations 14 Quatrro Confidential

The Grunt Work  Extracting, cleansing and disambiguating the author data is an arduous but essential process – garbage in, garbage out. – Automated tools using an algorithm and scoring mechanism can be used (to discern whether a record for John Smith and J L Smith is likely to be the same person). – Fully automated solutions are prone to problems (data glitches and missing information results in mapping errors). – Expert human intervention is required to achieve a desirable level of quality. - At the front end, to analyze the data and establish the rule set for the automation; - In the processing phase, to ensure data is validated and standardized; - During disambiguation, for “hands on analysis and processing” when necessary.  Find a partner with sophisticated data cleansing and disambiguation capabilities and experience to help with analysis, strategy and execution.  Once completed, profiles including papers authored, affiliations and other info can be created in a very automated fashion, using existing bibliographic metadata from the publisher and in the “public domain”— e.g. CrossRef 15 Quatrro Confidential

Sourcing and Extracting Author Data  Multiple input formats: PDF, TIFF, XML and HTML (OCR needed?)  Inconsistent representation of Author Data in documents  Author Data represented in unstructured format Name Name Affiliation Affiliation 16 Quatrro Confidential

Issues with Names  Same authors with multiple name variants First Name Middle Name Last Name – Journals use different naming styles T Scullion Tom Scullion Thomas Hyun Scullion  Name changes due to marriage e.g. if Adela LANDOVÁ married Jakub ŠTYCHKOV , she may be known as Adela ŠTYCHKOVÁ or Adela LANDOVÁ- ŠTYCHKOVÁ .  International naming conventions – Eastern order - Family-name (surname) Forename (given name) – Western order - Forename (given name) Family-name (surname) – Surname Prefixes – Abdel, Abdul, Abu, Af, Akhu, Al, Ben, De, Della, Des, Du, El, Ibn, La, Le, On, Op – Multiple family names – María-Jose Carreño Quiñones. – Brazilians may have three or four family names. 17 Quatrro Confidential

Issues with Institutional/Affiliation Data  Lack of standardization in affiliation names  University of California at Davis  University of California Davis  University of California at Davis School of Medicine  University of California, Davis  Authors migrating from one affiliation to another First Name Last Name Department Organization E-mail Karadeniz Technical Abdurrahman Sahin abdurrahmansahin@hotmail.com Department of Civil Engineering University Abdurrahman Sahin Department of Earthquake Engineering Bogazici University abdurrahman.sahin@boun.edu.tr  Data represented in multiple languages  Institut für Klinische Pharmakologie und Toxikologie, Charité Campus Benjamin Franklin, Garystr. 5, 14195 Berlin  Institut für Arbeitsphysiologie an der Universität Dortmund  Institut für Theoretische Physik der Universität Heidelberg  Laboratoire d’Elecfrochimie et des Procédés Membranaires 18 Quatrro Confidential

Other Data Related Issues  Accented characters (require conversion into Unicode)  Surname Prefixes (van, von, de,...)  Names of cities and states being the same in different countries  Authors represented by generic emails (Yahoo or Gmail) without unique organization IDs  Email not as per the standard formats 19 Quatrro Confidential

Modular Approach to Data Preparation Disambiguation Data Preparation and Enhancement and Visualization • Source input documents • Identify author data Disambiguation by email Data Parsing • and affiliation mapping SME verification of identified author data with input document Disambiguation by co-author analysis • Error identification using global validation Data Validation checks across author names and affiliation Manual validation of email data ID if required Creation of unique author profiles • Standardization of author names and affiliation Data data using predefined rules and knowledge Standardization repositories Author Data clustering and Visualization 20 Quatrro Confidential

Parsing Module  Author records need to be split into their constituent data fields -- surname, first name, email, division, organization, city, state, country, etc. Source Document Parsed output data The data parsing module extracts author data from input documents, parses the data and populates the relevant fields in a predefined template. 21 21 Quatrro Confidential

Validation Module  Parsed data needs to be validated for accuracy – automation based on predefined rules, built-in databases and other knowledge repositories can help, but manual intervention is typically required to achieve a desirable level of accuracy. Output validation using pre defined rules The data validation module will identify the errors with respect to formatting and parsing for human validation and rectification of errors. 22 Quatrro Confidential

Standardization Module  This process isolates incorrect field names after comparing them with standard names in pre-built databases. It enables running partial or complete standardization rules, and manual validation for errors that cannot be corrected automatically. The self-learning standardization module has built-in thesauri which are continuously updated based on automatic and manual corrections. 23 Quatrro Confidential

Disambiguation Process 24 Quatrro Confidential

Benefits and Challenges CESSE Annual Meeting July 18, 2013 1 1 - PowerPoint PPT Presentation

Quatrro Confidential Quatrro Confidential Author / Researcher Databases Benefits and Challenges CESSE Annual Meeting July 18, 2013 1 1 www.Quatrro.com Trends in STM Research Publishing Exponential growth of scholarly output.

Health & Welfare Employee Benefits 2015 Benefits Overview AGENDA Benefits Team

Health & Welfare and Retirement Benefits for HOUSE STAFF 2015 Benefits Overview (6/1/2015)

PEA Benefits 101 Suzanne Helston Manager, Benefits May 2015 Session Outline Benefits Office

Butterball Employees Butterball Employees Butterball Employees Benefits Overview Ruan Benefits

2019-2020 GRADUATE STAFF BENEFITS HUMAN RESOURCES Graduate Staff Benefits website:

Retirement Benefits 2017 Benefits Overview (7/12/2017) WHO ARE ELIGIBLE FOR BENEFITS?

BENEFITS INFORMATION FOR FULL-TIME EMPLOYEES BENEFITS SALARY Benefits salary is defined by each

Supplemental Benefits Benefits Available To You As A UT Employee Katy Pannell Benefits

Your flexible benefits plan February 2015 Agenda Overview of the CT Flex benefits plan

Performance and Benefits Realisation HOW TO OPTIMISE AND MEASURE THE HOW TO OPTIMISE AND MEASURE

Bushwick Employees Ruan Benefits Overview Ruan Benefits Overview + Eligibility and Enrollment +

Ryder Employees Ruan Benefits Overview Ruan Benefits Overview + Eligibility and Enrollment +

MAPMG Physician Benefits Elaine Levin, Sr. Director, Physician and Employee Benefits This

Scan4Safety - Benefits Marie VuLeanza Benefits and Change Lead Welcome Marie VuLeanza,

Spring ISD Benefits 2018-2019 Susann Overton Candace Carter Benefits Manager (A - H) Leaves

1. SBCERA retirement plan and benefits 2. Benefits calculations 3. Ways to increase benefits 4.

ORIENTATION TO ORIENTATION TO ORIENTATION TO ORIENTATION TO ELECTRONIC VISIT VERIFICATION

The Dwarf Planets Trans Neptunian Objects (TNO) A New 9 th Planet And what about Rogue Planets ?

Department of Nematology Inc Inception eption of of t the Depar he Department tment : 1980

H omers life is a shadow in the mists of ancient history. All that we know

Realiz lizin ing R Regio ional l Integratio ion Goals t throu ough D Digital Trade: The

Presentation Brothers Our Lady of the Americas Province Newsletter ~ Serving in Canada, the United

I SLAMIC CIVILIZATION text in green is for notes Voorhees

BEYOND POST AND PRAY Proactive Approaches That Work Reach Candidates, Engage, Hire

Sambuz

Useful Links

Newsletter

Mail Us

Benefits and Challenges CESSE Annual Meeting July 18, 2013 1 1 - PowerPoint PPT Presentation

Quatrro Confidential Quatrro Confidential Author / Researcher Databases Benefits and Challenges CESSE Annual Meeting July 18, 2013 1 1 www.Quatrro.com Trends in STM Research Publishing Exponential growth of scholarly output.

Health &amp; Welfare Employee Benefits 2015 Benefits Overview AGENDA Benefits Team

Health &amp; Welfare and Retirement Benefits for HOUSE STAFF 2015 Benefits Overview (6/1/2015)

PEA Benefits 101 Suzanne Helston Manager, Benefits May 2015 Session Outline Benefits Office

Butterball Employees Butterball Employees Butterball Employees Benefits Overview Ruan Benefits

2019-2020 GRADUATE STAFF BENEFITS HUMAN RESOURCES Graduate Staff Benefits website:

Retirement Benefits 2017 Benefits Overview (7/12/2017) WHO ARE ELIGIBLE FOR BENEFITS?

BENEFITS INFORMATION FOR FULL-TIME EMPLOYEES BENEFITS SALARY Benefits salary is defined by each

Supplemental Benefits Benefits Available To You As A UT Employee Katy Pannell Benefits

Your flexible benefits plan February 2015 Agenda Overview of the CT Flex benefits plan

Performance and Benefits Realisation HOW TO OPTIMISE AND MEASURE THE HOW TO OPTIMISE AND MEASURE

Bushwick Employees Ruan Benefits Overview Ruan Benefits Overview + Eligibility and Enrollment +

Ryder Employees Ruan Benefits Overview Ruan Benefits Overview + Eligibility and Enrollment +

MAPMG Physician Benefits Elaine Levin, Sr. Director, Physician and Employee Benefits This

Scan4Safety - Benefits Marie VuLeanza Benefits and Change Lead Welcome Marie VuLeanza,

Spring ISD Benefits 2018-2019 Susann Overton Candace Carter Benefits Manager (A - H) Leaves

1. SBCERA retirement plan and benefits 2. Benefits calculations 3. Ways to increase benefits 4.

ORIENTATION TO ORIENTATION TO ORIENTATION TO ORIENTATION TO ELECTRONIC VISIT VERIFICATION

The Dwarf Planets Trans Neptunian Objects (TNO) A New 9 th Planet And what about Rogue Planets ?

Department of Nematology Inc Inception eption of of t the Depar he Department tment : 1980

H omers life is a shadow in the mists of ancient history. All that we know

Realiz lizin ing R Regio ional l Integratio ion Goals t throu ough D Digital Trade: The

Presentation Brothers Our Lady of the Americas Province Newsletter ~ Serving in Canada, the United

I SLAMIC CIVILIZATION text in green is for notes Voorhees

BEYOND POST AND PRAY Proactive Approaches That Work Reach Candidates, Engage, Hire

Sambuz

Useful Links

Newsletter

Mail Us

Health & Welfare Employee Benefits 2015 Benefits Overview AGENDA Benefits Team

Health & Welfare and Retirement Benefits for HOUSE STAFF 2015 Benefits Overview (6/1/2015)