2019 10 16 outline introduction
play

2019 10 16 Outline - PowerPoint PPT Presentation

2019 10 16 Outline Introduction Recent Progress and Trends Scene Text Detection Scene Text Recognition End2End Scene Text


  1. 浅谈文字识别:新观察、 新思考、新机遇 金连文 华南理工大学 2019 年 10 月 16 日

  2. Outline  Introduction  Recent Progress and Trends  Scene Text Detection  Scene Text Recognition  End2End Scene Text Recognition  Discussion and Future Prospects

  3. 文字是信息交流及感知世界最重要的载体 生活中文字图像无处不在 3

  4. 离开文字,我们很难理解世界 4

  5. 离开文字,有时候我们很难理解图像 5

  6. 文字使我们的生活变得丰富多彩 … 6

  7. 文字的重要性 给一张图, 如果上面有文字, 绝大多数情况下, 图中的文字最有信息量! 7

  8. DAR 、 OCR 、 STR • 文档图像分析与识别 ( Document Analysis & Recognition, DAR ) • 光学字符识别( Optical Character Recognition , OCR ) – 场景文字检测与识别 ( Scene Text Recognition, STR ) • 在线文字识别 ( Online Handwritten Character Recognition, Online HCR) 光学字符识别( OCR ) 在线文字识别( Online HCR ) 联 联 字符 在 报刊书籍 印 手 机 机 线 扫描文档 刷 写 文本行 签 手 数 体 体 证照车牌 名 写 学 篇幅 文 文 及 表单名片 … 文 公 字 笔 字 … 复杂 字 式 迹 识 … 识 版面 识 识 识 别 别 场景文字 … 别 别 别 8

  9. 场景文字检测与识别 (STR)  文字检测  文字识别  字符 / 词 / 文本行  端到端( End-to-End )识别 9

  10. 10

  11. Challenge of Scene Text Detection Arbitrarily oriented 1. Irregular text, perspective distortion 2. Scale diversity 3. Ambiguity of annotation 4.  Char, Word , Text, Label sequence order Completeness and tightness 5.  IoU>=0.5 ? Arbitrary variation of text appearances 6. Different types of imaging artifacts 7. Complicated image background 8. Uneven lighting 9. 10. Low resolution 11. Heavy overlay 12. Long text detection 11

  12. Scene Text Detection  场景文字检测方法举例 : :  发展趋势 : :  基于回归的方法 水平矩形框检测  Gupta et al., CVPR 2016; Tian et al., ECCV 2016  Shi, Bai et al., CVPR 2017, Liu et al, CVPR 2017 多方向矩形框  Liao et al., AAAI 2017, Hu et al, ICCV 2017 …  基于分割的方法  Zhong et al., CVPR 2016; Zhou et al, CVPR 2017; 多方向四边形  Wu et al, ICCV 2017; Deng et al, AAAI 2018;  X Li, CVPR 2019; W Wang, et al., CVPR 2019  混合方法(分割 + 回归) 曲线文本  He et al, ICCV 2017; Lyu et al, CVPR 2018  Liao et al, CVPR 2018; Long et al, ECCV 2018; … 任意形状  Liu et al., IJCAI 2019 ; …  Segmentation based 的方法不容易准确区分相邻或重叠文本  Regression based 的方法对长文本不易检测完整  Bounding box regression 方法需要设置合理的 anchor 参数 12

  13. Direct Regression Net C. He, et.al, Multi-Oriented and Multi-Lingual Scene Text Detection With Direct Regression, TIP 2018. 13

  14. TextField • Text Directional Field – a two-dimensional unit vector that points away from its nearest text boundary pixel  Instance-level representation  Easy to separate adjacent text instances  Post processing is applied to produce the final detection result YC Xu, et.al, TextField- Learning A Deep Direction Field for Irregular Scene Text Detection , IEEE 14 TIP 2019.

  15. Arbitrary Shape Scene Text Detection with Adaptive Text Region Representation • Adaptive representation of text region • CNN + LSTM X. Wang, et.al, Arbitrary Shape Scene Text Detection with Adaptive Text Region Representation , 15 CVPR 2019 ( Oral ).

  16. LOMO : Look more than once • 解决长文本检测问题、曲线文本检测问题 • 设计了 IRM (解决长文本检测)及 SEM (解决 曲线文本检测)等新模块 • 无需复杂的后处理,端到端可训练 16 C. Zhang, et.al, Look More Than Once- An Accurate Detector for Text of Arbitrary Shape , CVPR 2019.

  17. PAN • PSENet (CVPR 2019) 团队新作 • 速度快,性能好 • 解决密集长文本检测、任意曲线文本检测 • Semantic Segmentation (Text region, Kernel, Similar vectors) • Text Instance Rebuilding wit Learnable Pixel Aggregation method W Wang, E Xie, et al., Efficient and Accurate Arbitrary-Shaped Text Detection with Pixel Aggregation 17 Network, ICCV 2019

  18. 其它一些细节方面的发展趋势( 1 )  多尺度问题( large scale variations )  “The scale ratio between the largest and the smallest texts is up to 230 times for images in the MSRA- TD500” -- C Xue, et al., IJCAI 2019  E Richardson, et al., It’s All About The Scale ! arXiv 2019.  C Xue, et al., MSR- Multi-Scale Shape Regression for STD, IJCAI 2019  YX Wang, et al., DSRN: A deep scale relationship network for …, IJCAI 2019.  W Wang, et al., PSENet, CVPR 2019  W He, Realtime multi- scale scene text detection…, PR 2019. 18

  19. 其它一些发展趋势( 2 )  Learning geometric properties of text/char/pixel regions, eg :  Char/text center line ;  Char/text border offset  Char/text center offset ;  Char/text vertex offset  Character affinity,  Corner point  Visual relationship …  eg ECCV 2018(TextSnake) , CVPR 2019 (LOMO, CRAFT),  IJCAI 2019 (MSR ) , TIP 2018 (TextField) ,  ACM MM2019 (SAST) , ICCV 2019 (ScRN)  C Ma, Z Zhong, ICDAR 2019  … 19

  20. 文字几何属性的应用 S Long,et al.,TextSnake, ECCV 2018 Y. Baek, et.al, CRAFT, CVPR 2019 M Yang, ScRN, ICCV 2019 ( 识别) P Wang, et.al, SAST, MM 2019

  21. ( 3 ) Anchor & RPN 调参问题  Anchor free 回归方法举例:  Segmentation based methods  C. He, et.al, Direct Regression… , ICCV 2017, TIP 2018.  Z Zhong et al., An Anchor- Free Region Proposal Network …, IJDAR 2019.  Zhi Tian, Chunhua Shen, et. al. FCOS, CVPR, 2019  Chenchen Zhu, Yihui He, et. al. FSAF, CVPR, 2019  Tao Kong, Fuchun Sun et. al. FoveaBox, arXiv 2019  Why anchor free?  大多数 RPN regression 方法需要设置合理的 anchors 参数  Eg: SSD  TextBox (AAAI 2017)  Alternative anchor design?  Lele Xie, Yuliang Liu, Lianwen Jin, Zecheng Xie, DeRPN: Taking a further step toward more general object detection, AAAI 2019 . 21

  22. DeRPN  Dimension-decomposition region proposal network (DeRPN).  DeRPN utilizes an anchor string mechanism. Match the objects with anchor strings based on length instead of IoU.  DeRPN can be employed directly on different models, tasks, and datasets without any modifications of hyperparameters or specialized optimization. Lele Xie, et al., DeRPN: Taking a further step toward more general object detection, AAAI 2019 .

  23. Experiments – Experimental Data • PASCAL VOC 2007 • PASCAL VOC 2012 • MS COCO • ICDAR 2013 • MS COCO Text – To verify the robustness & adaptivity, we maintained the same hyperparameters for DeRPN throughout all of our experiments without any modifications • 同一个模型胜任不同任务不同数据集,不用任何调参 

  24. Experiments 24

  25. Summary of DeRPN • Good adaptivity and generalization ability • Able to detect objects of variant size – range of [𝟗 𝒓, 𝟐𝟏𝟑𝟓 𝒓] • Regression loss of DeRPN is bounded – The largest deviation (ratio) between the anchor string and object 𝒓 edge is at most • Better performance – Higher recall rate – Tighter bounding box • (better performance for high IoU) • Limitation – Can only deal with rectangle Bbox – For two-stage framework only Lele Xie, et al., DeRPN: Taking a further step toward more general object detection, AAAI 2019 . 25 Code : https://github.com/HCIILAB/DeRPN

  26. ( 4 )标注歧义性问题 Char, Word or Line Label sequence order 26

  27. Sensitive to Label Sequence (SLS) issue • Existed methods addressed but did Solution B (TIP18) not solved this problem completely. Solution A (CVPR17) Solution C (ACM MM18) https://mp.weixin.qq.com/s/pxLR0R7tT7Rbhu-NFfv_aA

  28. Our Approach: Sequential-invariant Box Discretization (SBD) Yuliang Liu, el.al., IJCAI 2019 . https://mp.weixin.qq.com/s/pxLR0R7tT7Rbhu-NFfv_aA 28

  29. Box Discretization Network for Omni-directional Object detection Y. Liu, S. Zhang, L. Jin, el.al., Omnidirectional Scene Text Detection with Sequential-free Box 29 Discretization, IJCAI 2019 .

  30. Experimental Results • Our results were produced through single scale testing 30

  31. Generalization Ability of BDN • Ship detection using the same model without modification of any hyperparameter (不用调参,训练 3 小时左右,达到 STOA) Yuliang Liu, et al., Omnidirectional Scene Text Detection with Sequential-free Box Discretization, IJCAI 2019 . Code : https://github.com/Yuliang-Liu/Box_Discretization_Network 以此模型为基础,获得 ICDAR 2019 ReCTS 检测任务冠军

  32. 32

Recommend


More recommend