Extractive Evidence Based Medicine Summarisation Based on Sentence-Specific Statistics Abeed Sarker 1 a 1 ecile Paris 2 Diego Moll´ C´ 1 Centre for Language Technology, Macquarie University, Sydney 2 CSIRO ICT Centre, Sydney CBMS 2012, Rome
Background Method Evaluation Contents Background Evidence Based Medicine Method Corpus Generation of Statistics Evaluation EBM Summarisation Abeed Sarker, Diego Moll´ a, C´ ecile Paris 2/28
Background Method Evaluation Contents Background Evidence Based Medicine Method Corpus Generation of Statistics Evaluation EBM Summarisation Abeed Sarker, Diego Moll´ a, C´ ecile Paris 3/28
Background Method Evaluation Contents Background Evidence Based Medicine Method Corpus Generation of Statistics Evaluation EBM Summarisation Abeed Sarker, Diego Moll´ a, C´ ecile Paris 4/28
Background Method Evaluation Evidence Based Medicine http://laikaspoetnik.wordpress.com/2009/04/04/evidence-based-medicine-the-facebook-of-medicine/ EBM Summarisation Abeed Sarker, Diego Moll´ a, C´ ecile Paris 5/28
Background Method Evaluation EBM and Natural Language Processing http://hlwiki.slais.ubc.ca/index.php?title=Five_steps_of_EBM NLP tasks ◮ Question analysis and classification ◮ Information Retrieval ◮ Classification and re-ranking ◮ Information extraction ◮ Question answering ◮ Summarisation EBM Summarisation Abeed Sarker, Diego Moll´ a, C´ ecile Paris 6/28
Background Method Evaluation Contents Background Evidence Based Medicine Method Corpus Generation of Statistics Evaluation EBM Summarisation Abeed Sarker, Diego Moll´ a, C´ ecile Paris 7/28
Background Method Evaluation General Approach In a Nutshell 1. Gather statistics from the best 3-sentence extracts. ◮ Exhaustive search to find these best extracts. 2. Build three classifiers, one per sentence in the final extract. ◮ Classifier 1 based on statistics from best 1st sentence. ◮ Classifier 2 based on statistics from best 2nd sentence. ◮ Classifier 3 based on statistics from best 3rd sentence. EBM Summarisation Abeed Sarker, Diego Moll´ a, C´ ecile Paris 8/28
Background Method Evaluation Contents Background Evidence Based Medicine Method Corpus Generation of Statistics Evaluation EBM Summarisation Abeed Sarker, Diego Moll´ a, C´ ecile Paris 9/28
Background Method Evaluation Journal of Family Practice’s “Clinical Inquiries” EBM Summarisation Abeed Sarker, Diego Moll´ a, C´ ecile Paris 10/28
Background Method Evaluation The XML Contents I < r e c o r d i d =”7843” > < u rl > http ://www. j f p o n l i n e . com/ Pages . asp ?AID=7843& ; i s s u e=September 2009& ; UID= < /ur l > < question > Which treatments work best f o r hemorrhoids? < /question > < answer > < s n i p i d=”1” > < s n i p t e x t > E x c i s i o n i s the most e f f e c t i v e treatment f o r thrombosed e x t e r n a l hemorrhoids . < / s n i p t e x t > < s o r type=”B” > r e t r o s p e c t i v e s t u d i e s < /sor > < long i d =”1 1” > < l o n g t e x t > A r e t r o s p e c t i v e study of 231 p a t i e n t s t r e a t e d c o n s e r v a t i v e l y or s u r g i c a l l y found that the 48.5% of p a t i e n t s t r e a t e d s u r g i c a l l y had a lower r e c u r r e n c e r a t e than the c o n s e r v a t i v e group ( number needed to t r e a t [NNT]=2 f o r r e c u r r e n c e at mean f o l l o w − up of 7.6 months ) and e a r l i e r r e s o l u t i o n of symptoms ( average 3.9 days compared with 24 days f o r c o n s e r v a t i v e treatment ). < / l o n g t e x t > < r e f i d =”15486746” a b s t r a c t=”A b s t r a c t s /15486746. xml” > Greenspon J , Williams SB , Young HA , et a l . Thrombosed e x t e r n a l hemorrhoids : outcome a f t e r c o n s e r v a t i v e or s u r g i c a l management . Dis Colon Rectum . 2004; 47: 1493 − 1498. < / r e f > < /long > < long i d =”1 2” > < l o n g t e x t > A r e t r o s p e c t i v e a n a l y s i s of 340 p a t i e n t s who underwent o u t p a t i e n t e x c i s i o n of thrombosed e x t e r n a l hemorrhoids under l o c a l a n e s t h e s i a r e p o r t e d a low r e c u r r e n c e r a t e of 6.5% at a mean f o l l o w − up of 17.3 months. < / l o n g t e x t > EBM Summarisation Abeed Sarker, Diego Moll´ a, C´ ecile Paris 11/28
Background Method Evaluation The XML Contents II < r e f i d =”12972967” a b s t r a c t=”A b s t r a c t s /12972967. xml” > Jongen J , Bach S , S t ub i n g er SH , et a l . E x c i s i o n of thrombosed e x t e r n a l hemorrhoids under l o c a l a n e s t h e s i a : a r e t r o s p e c t i v e e v a l u a t i o n of 340 p a t i e n t s . Dis Colon Rectum . 2003; 46: 1226 − 1231. < / r e f > < /long > < long i d =”1 3” > < l o n g t e x t > A p r o s p e c t i v e , randomized c o n t r o l l e d t r i a l (RCT) of 98 p a t i e n t s t r e a t e d n o n s u r g i c a l l y found improved pain r e l i e f with a combination of t o p i c a l n i f e d i p i n e 0.3% and l i d o c a i n e 1.5% compared with l i d o c a i n e alone . The NNT f o r complete pain r e l i e f at 7 days was 3. < / l o n g t e x t > < r e f i d =”11289288” a b s t r a c t=”A b s t r a c t s /11289288. xml” > P e r r o t t i P, A n t r o p o l i C, Molino D , et a l . C o n s e r v a t i v e treatment of acute thrombosed e x t e r n a l hemorrhoids with t o p i c a l n i f e d i p i n e . Dis Colon Rectum . 2001; 44: 405 − 409. < / r e f > < /long > < /snip > < /answer > < /record > EBM Summarisation Abeed Sarker, Diego Moll´ a, C´ ecile Paris 12/28
Background Method Evaluation Corpus Statistics Size ◮ 456 questions (“records”). ◮ Over 1,100 distinct answers (“snips”). ◮ 3,036 text explanations (“longs”). ◮ 2,707 references. EBM Summarisation Abeed Sarker, Diego Moll´ a, C´ ecile Paris 13/28
Background Method Evaluation Summarisation Using This Corpus Input ◮ Question. ◮ Document Abstract. Output ◮ Extractive summary that answers the question. ◮ Target summary is the annotated evidence text (“long”). ◮ Evaluated using ROUGE-L. EBM Summarisation Abeed Sarker, Diego Moll´ a, C´ ecile Paris 14/28
Background Method Evaluation Contents Background Evidence Based Medicine Method Corpus Generation of Statistics Evaluation EBM Summarisation Abeed Sarker, Diego Moll´ a, C´ ecile Paris 15/28
Background Method Evaluation The Statistics Gathered 1. Source sentence position. 2. Sentence length. 3. Sentence similarity. 4. Sentence type. EBM Summarisation Abeed Sarker, Diego Moll´ a, C´ ecile Paris 16/28
Background Method Evaluation 1. Source Sentence Position ◮ Compute relative positions. ◮ Create normalised frequency histograms f 1 , f 2 , . . . , f 10 . ◮ Score all relative positions of bin i with its bin frequency: S pos ( i ) = f bin ( i ) . EBM Summarisation Abeed Sarker, Diego Moll´ a, C´ ecile Paris 17/28
Background Method Evaluation 2. Sentence Length Reward larger sentences and penalise shorter sentences: Normalised sentence length S len ( i ) = l s − l avg l d l s : sentence length l avg : average sentence length in the corpus l d : document length EBM Summarisation Abeed Sarker, Diego Moll´ a, C´ ecile Paris 18/28
Background Method Evaluation 3. Sentence Similarity Sentence Similarity ◮ Lowercase, stem, remove stop words. ◮ Build vector of tf . idf with remaining words and UMLS semantic types. X . Y ◮ CosSim ( X , Y ) = | X || Y | Maximal Marginal Relevance (Carbonell & Goldstein, 1998) Reward sentences similar to the query and penalise those similar to other summary sentences. MMR = λ ( CosSim ( S i , Q )) − (1 − λ ) max S j ǫ S ( CosSim ( S i , S j )) EBM Summarisation Abeed Sarker, Diego Moll´ a, C´ ecile Paris 19/28
Background Method Evaluation 4. PIBOSO (Kim et al. 2011) I 1. Classify all sentences into PIBOSO types (a variant of PICO). 2. Generate normalised frequency histograms of resulting PIBOSO types. EBM Summarisation Abeed Sarker, Diego Moll´ a, C´ ecile Paris 20/28
Background Method Evaluation 4. PIBOSO (Kim et al. 2011) II Position independent P best : proportion of this PIBOSO type among all best summary sentences. S PIPS ( i ) = P best P all P all : proportion of this PIBOSO type among all sentences. Position dependent P pos : proportion of this PIBOSO type among at best summary sentences at this position. S PDPS ( i ) = P pos P best EBM Summarisation Abeed Sarker, Diego Moll´ a, C´ ecile Paris 21/28
Background Method Evaluation Classification Edmunsonian Formula S S i = α S rpos i + β S len i + γ S PIPS i + δ S PDPS i + ǫ S MMR i ◮ MMR is replaced with cosine similarity for first sentence. ◮ In case of ties, the sentence with greatest length is chosen. ◮ Parameters are fine-tuned through exhaustive search using training set. α = 1 . 0, β = 0 . 8, γ = 0 . 1, δ = 0 . 8, ǫ = 0 . 1, λ = 0 . 1. EBM Summarisation Abeed Sarker, Diego Moll´ a, C´ ecile Paris 22/28
Recommend
More recommend