Thomas Wood NLP/data science consultant
Past projects ● Boehringer Ingelheim - pharma ● CV Library: Predict industries/salaries from CV - word2vec + CNN ○ ○ Predict search terms from CV - LSTM ● Forensic stylometry demo Identifying author ○ ● Chatbots Intelligent home etc ○ ○ Question answering about products ● Document clustering, classification, trend detection, sentiment analysis Cambridge Masters: anaphora resolution it’s raining ●
Boehringer Ingelheim ● Before running a clinical trial a pharma company writes a 200 page PDF called a protocol. I developed an ML model which ● extracts important data from the protocol: type of treatment, toxicity, number of subjects, etc.
Boehringer Ingelheim (2) ● Company has factories all over the world. Most medicines go through multiple facilities and countries before going to market. When manufacturing defect occurs it is written in free text in local ● language by factory worker, e.g. temperature deviation of 5 degrees due to crack in vial probably occurring in transit ● I ran unsupervised topic detection to identify commonest problems in various categories of products from the unstructured text data.
CV-Library This was 2.5 years ago, before ELMO/BERT ● Upload CV When you upload a CV, it gets converted to TXT ● Goes through word2vec and passed through Recommends industry ● deep NN ... Use TensorFlow NMT to ● Then some fields which candidate previously recommend search term filled out, get autofilled! ○ Repurposed Viet translator Result: more Trained on 12 million CVs ● engagement, fewer dropouts ● Deployed on GCP ● 7% increase in signups - £££
Chatbots Artificial Solutions ● Worked building chatbots for mobile and web Shell, AT&T, IKEA, Samsung, HTC, Rightmove ● Integrated smart home with voice ● commands ○ turn on the coffee machine every Tuesday when I open the downstairs front door
Forensic stylometry ● https://www.fastdatascience.com/author-prediction-demo ● Oxford University workshop on NLP every summer
Document analysis, trend detection ● Developed NLP pipeline for English and German at Pattern Science AG, near Frankfurt Used for document classification ● ● Trend detection ● Emerging topics
Masters Cambridge ● Unsupervised learning for identifying pleonastic pronouns It seemed that things would never get any better ○ ○ It surprised me to hear him say that Download available ●
Recommend
More recommend