2019/7/19. 23 AI Summer Program : Hadoop Map&Reduce Programming for Big Traffic Data Management Applications using the Class Frequency Distribution of Maximal Repeats from Tagged Sequential Data. 王經篤 博士 (Dr. Jing-Doo Wang) 亞洲大學 ( Asia University )
Chinese proverbs: 『老王』賣瓜 Is it sweet and juicy? http://www.pxmart.com.tw/px/ingredients.px?id=2592 http://www.9ht.com/xue/44228.html
Outline • Introduction – What is “Sequential Data”? – A scalable approach of Maximal Repeat Extraction • Applications with Tagged Sequential Data – Analyzing Text Archaeology. – Extracting Significant Travel Time Interval – Mining for Biomarker. – Improving Quality Control. • Future Works
What is “ Sequential Data ”? • Textual Data : News, Journal Articles, etc. http://edition.cnn.com/2017/11/22/health/jfk-assassination-back-pain/index.html From:https://www.udn.com/news/story/7266/2834500 https://www.ncbi.nlm.nih.gov/pubmed/24372032
What is “ Sequential Data ”? • Genomic Sequences From:http://blogs.nature.com/naturejobs/2015/10/08 /big-data-the-impact-of-the-human-genome-project/
What is “ Sequential Data ”? • Traffic Transportation https://tptis2015.blogspot.tw/2015/07/300-brt.html https://attach.mobile01.com/640x480/attach/201312/ https://tptis2015.blogspot.tw/2017/10/blog-post.html mobile01-b004e8fd829e35140b3de0d91e847953.jpg
Product Traceability **************************************** http://www.slideshare.net/5045033/ss-1002323 7 http://technews.tw/2016/04/11/tsmc-and-largan/ www.iconarchive.com
It‘s a big data problem !
How to mine from these “sequential data”? http://clipart- http://clipart- library.com/clipart/6Tr5BGG7c.htm library.com/clipart/kiKB8qLRT.htm
How to mine from these “sequential data”? ? From: http://globe-views.com/dcim/dreams/mine/mine-03.jpg
It ’ s a Big Data problem! http://haphazardstuffblog.com/wp- content/uploads/2012/01/Big-truck.jpg http://www.mining.com/wp-content/uploads/2015/06/Veladero-Mine.jpg
What kind of “features” extracted from Sequential Data? • http://www.quickanddirtytips.com/sites/ default/files/images/2499/question- http://images.slideplayer.com/16/5176005/slides/slide_2.jpg mark2.jpg
What kind of “Mineral” do you want (mine)? https://www.popsci.com/features/how-to-be-an-expert-in- anything/images/feature_video.jpg https://media1.britannica.com/eb-media/71/143171-049-53725C29.jpg
Outline • Introduction – What is “Sequential Data”? – A scalable approach of Maximal Repeat Extraction • Applications with Tagged Sequential Data – Analyzing Text Archaeology. – Extracting Significant Travel Time Interval – Mining for Biomarker. – Improving Quality Control. • Future Works
Journal of Supercomputing, April 2016 https://link.springer.com/article/10.1007/s11227-016- 1713z?wt_mc=internal.event.1.SEM.ArticleAuthorOnlineFirst
Why use “ Maximal Repeats ” as features? • Dictionary – How to identify new words or phrases? – e.g. “just do it”, “ 洪荒之力 ” 。 • N-gram (K-mers) – 2-gram, 3- gram,…,5 -grams. (Google Ngram viewer) – The value of “N” is limited . • Maximal Repeat – The length of maximal repeat is variable.
Example: Maximal Repeat Pattern “ xabcyiiizabcqabcyrxar ” • ab • bc Not Maximal repeat Pattern • abc • abcy 17
Distinctive Pattern Mining(1) These Classes are labeled Classes by Domain Experts S1:******************************** S2:*********#****?***********@***** S3:********************$*********** S4:*****&*******%****************** Sequences S5:********************$*********** S6:*********#****?************@**** S7:*****&*******%****************** S8:******************************** S9:*****&*******%****************** S10:*********#****?************@**** S11:******************************** 18 jdwang@asia.edu.tw
Distinctive Pattern Mining(2) Classes ******************************** ******************************** ******************************** *********#****?************@**** *********#****?************@**** *********#****?************@**** *****&*******%****************** *****&*******%****************** *****&*******%****************** ********************$*********** ********************$*********** 19 jdwang@asia.edu.tw
Distinctive Pattern Mining(3) Maximal Repeats #****? @**** &*******% $********** ***** Class Frequency Distribution jdwang@asia.edu.tw
Applying for U.S.A. Patent From: https://www.google.com/patents/US20170255634
Patent Publication Date : Sep. 7, 2017 http://haphazardstuffblog.com/wp- content/uploads/2012/01/Big- truck.jpg
Outline • Introduction – What is “Sequential Data”? – A scalable approach of Maximal Repeat Extraction • Applications with Tagged Sequential Data – Analyzing Text Archaeology. – Extracting Significant Travel Time Interval – Mining for Biomarker. – Improving Quality Control. • Future Works
Applications with Tagged Sequential Data • Analyzing Trend Analysis via Text Archaeology. • Extracting Significant Travel Time Interval from Gantry Timestamped Sequences. • Mining for Biomarker from Genomic Sequences. • Improving Quality Control via Product Traceability.
From: http://www.mdpi.com/2076-3417/7/9/878
Superhighway From: http://chiangchiafeng.tian.yam.com/posts/70456997
e-Tag http://news.u-car.com.tw/article/16077
中華民國國道(高速公路)的電子收費系統 ( Electronic Toll Collection ,簡稱 ETC ) From: https://i.ytimg.com/vi/1ML2FFS2dJg/maxresdefault.jpg
https://attach.mobile01.com/640x480/attach/201312/mobile01-b004e8fd829e35140b3de0d91e847953.jpg
Gantry Sequences Of different Vehicle Types (VT)
Gantry Timestamp Sequences with Timestamps
Gantry Timestamp Sequences with TimeStamps for different Vehicles Type
Significant Time Intervals of Vehicles
http://www.7car.tw/articles/read/25927
https://buzzorange.com/wp- content/uploads/2015/04/640_4a486 dc48d6f1414404627e1c45f1cf9.jpg http://news.ltn.com.tw/photo/society/breakingnews/10883 61_1
05F0055N,13: 33 05F0287N,13: 15 05F0309N,13: 13 05F0438N,13: 06 05F0528N,13: 00
Significant Time Intervals of Vehicles 05F0528N_13_M1_ 00 05F0438N_13_M1_ 06 Significant Time intervals 05F0309N_13_M1_ 13 05F0287N_13_M1_ 15 05F0055N_13_M1_ 33 ##4 ##5 ## (2016-11-15_Mon_41#1#1) (2016-11-29_Mon_41#1#1) Class Frequency Distribution (2016-12-09_Thu_31#1#1) (2016-12-20_Mon_31#1#1)
Weekday vs. 24 Hours/per day
Vehicle Types vs. 24 Hours/per day
Significant Patterns of Travel Time Intervals of Vehicles
Outline • Introduction – What is “Sequential Data”? – A scalable approach of Maximal Repeat Extraction • Applications with Tagged Sequential Data – Analyzing Text Archaeology. – Extracting Significant Travel Time Interval – Mining for Biomarker. – Improving Quality Control. • Future Works
1+5 cluster nodes
2+ 8 cluster nodes
Cloud Computing Environment
Artificial Intelligence Artificial Intelligence Cloud Machine Big Data Computing Learning
古希臘的科學 Leverage principle ( 槓桿原理 ) 阿基米德撐起地球的支點 (Maximal Repeat Extraction with Class Frequency Distribution) Domain Expert Knownledge ? ? Labels (Tags) Relationship? Sequential Data ? Infrastructure (Cloud Computing) From:https://phycat.files.wordpress.com/2015/03/leverbigcorners.gif?w=810
插圖:紀玲玉
Acknowledgements ( Precision Medicine ) • Jeffrey J.P. Tsai ( 亞洲大學 蔡進發 校長 ) 計劃名稱:以生醫大數據分析為基礎的精準癌症醫療研究(2/3) 計畫編號 : MOST 106-2632-E-468-002 計畫執行起迄 : 106/08/01~107/07/31
Acknowledgements ( Bioinformatics ) • Charles C.N. Wang • Tsung-Chi Chen • Wen-Ling Chan • Rouh-Mei Hu • Jan-Gowth Chang • Yi-Chun Wang
Acknowledgements ( Traffic Information Analysis ) • 黃銘崇 主任 • 連耀南 教授 • 潘信宏 教授 • 何承遠 教授
Acknowledgements ( Big-Data: Hadoop Computing ) Jazz Wang ( 王耀聰 ) Philip Lin ( 林奇暻 ) wei-chiu chuang (莊偉赳) • Apache Hadoop Committer/PMC member
Acknowledgements • Hadoop Cluster Set Up and Consulting – SYSTEX 精誠資訊( 2017 ) • Herb Hsu- 徐啟超 – Athemaster 炬識科技股份有限公司( 2018 ) • Ferrari • 亞洲大學 資訊 發展 處 黃仁德先生
『老王』賣瓜,自賣自誇 Lao Wang selling melons praises his own goods http://www.pxmart.com.tw/px/ingredients.px?id=2592 http://www.9ht.com/xue/44228.html
Thanks for your listening! http://www.pptschool.com/250.html www.flickr.com www.slideshare.net
Recommend
More recommend