harris corner detec on on a numa manycore
play

Harris Corner Detec.on on a NUMA Manycore Claude TADONKI C entre de - PowerPoint PPT Presentation

Harris Corner Detec.on on a NUMA Manycore Claude TADONKI C entre de R echerche en I nformatique (CRI) Joint work with Olfa HAGGUI and Lionel LACASSAGNE S ousse N ational S chool of E ngineering - L aboratoire d I nformatique de P aris 6 (LIP6)


  1. Harris Corner Detec.on on a NUMA Manycore Claude TADONKI C entre de R echerche en I nformatique (CRI) Joint work with Olfa HAGGUI and Lionel LACASSAGNE S ousse N ational S chool of E ngineering - L aboratoire d’ I nformatique de P aris 6 (LIP6) Mines ParisTech - PSL 1

  2. Corner points are used for mo.on detec.on for instance From the intensity I (color not needed), we need to compute (approximated) deriva:ves and combined them 2 Claude TADONKI Harris Corner Dectec=on on a NUMA Manycore Seminar at Centre de Recherche en Informa=que – April 16, 2018 - FONTAINEBLEAU - FRANCE

  3. The procedure applies a series of convolu=on kernels to the input intensity matrix Each convolu.on is a stencil computa=on The whole computa.on can be fully serial , fully pilelined , or hybrid . Memory acces paEerns are the main focus w.r.t performances 3 Claude TADONKI Harris Corner Dectec=on on a NUMA Manycore Seminar at Centre de Recherche en Informa=que – April 16, 2018 - FONTAINEBLEAU - FRANCE

  4. Stencil computa=on : redundant memory accesses, cache misses , and unalignement Scheduling of the convolu=ons : Intermediate reads/writes ( space and access :me ) SIMD : not efficient in its standard form (what we get from the compiler) SM Parallelism : bus conten.on, threads synchroniza.on, NUMA We are going to explain our approach for each of the aforemen-oned aspects !!! 4 Claude TADONKI Harris Corner Dectec=on on a NUMA Manycore Seminar at Centre de Recherche en Informa=que – April 16, 2018 - FONTAINEBLEAU - FRANCE

  5. Separability Half-Pipe Clustering 5 Claude TADONKI Harris Corner Dectec=on on a NUMA Manycore Seminar at Centre de Recherche en Informa=que – April 16, 2018 - FONTAINEBLEAU - FRANCE

  6. Vectoriza.on with the original memory access paEern leads to unaligned accesses We propose a diagonal shiS to keep all accesses aligned The last vector register contains 2 dirty values, but the whole vector is stored (4-components vector) 6 Claude TADONKI Harris Corner Dectec=on on a NUMA Manycore Seminar at Centre de Recherche en Informa=que – April 16, 2018 - FONTAINEBLEAU - FRANCE

  7. The goal here is to load data into vector registers once , and then perform all dependent calcula=ons ( op:mal data consump:on and memory accesses saving ) 7 Claude TADONKI Harris Corner Dectec=on on a NUMA Manycore Seminar at Centre de Recherche en Informa=que – April 16, 2018 - FONTAINEBLEAU - FRANCE

  8. Thus, for the computa.on of an en.re row, the typical steps at each itera.on are: 8 Claude TADONKI Harris Corner Dectec=on on a NUMA Manycore Seminar at Centre de Recherche en Informa=que – April 16, 2018 - FONTAINEBLEAU - FRANCE

  9. 9 Claude TADONKI Harris Corner Dectec=on on a NUMA Manycore Seminar at Centre de Recherche en Informa=que – April 16, 2018 - FONTAINEBLEAU - FRANCE

  10. We consider the half-pipe clustering We pipeline the two cluster steps (SOLBEL+MUL and GAUSS+COARSITY) through a loop fusion We apply an array contrac=on ( mod 3 ) for the intermediate storage 10 Claude TADONKI Harris Corner Dectec=on on a NUMA Manycore Seminar at Centre de Recherche en Informa=que – April 16, 2018 - FONTAINEBLEAU - FRANCE

  11. Both input and output images are stored on NUMA node 0 Each NUMA node locally computes its chunk (block of lines) of the final result Within each NUMA node, the work is equally distributed by block to its threads Expected memory alloca.on on the NUMA nodes is done by explicit binding rou.nes 11 Claude TADONKI Harris Corner Dectec=on on a NUMA Manycore Seminar at Centre de Recherche en Informa=que – April 16, 2018 - FONTAINEBLEAU - FRANCE

  12. (1) Original SIMD without the in-registers strategy (2) Op.mized SIMD with the in-registers strategy In-register strategy doubles the overall peformances and Our sequen.al implementa.on outperforms the state-of-the-art absolute performance 12 Claude TADONKI Harris Corner Dectec=on on a NUMA Manycore Seminar at Centre de Recherche en Informa=que – April 16, 2018 - FONTAINEBLEAU - FRANCE

  13. 13 Claude TADONKI Harris Corner Dectec=on on a NUMA Manycore Seminar at Centre de Recherche en Informa=que – April 16, 2018 - FONTAINEBLEAU - FRANCE

  14. 14 Claude TADONKI Harris Corner Dectec=on on a NUMA Manycore Seminar at Centre de Recherche en Informa=que – April 16, 2018 - FONTAINEBLEAU - FRANCE

  15. Claude TADONKI Harris Corner Dectec=on on a NUMA Manycore Seminar at Centre de Recherche en Informa=que – April 16, 2018 - FONTAINEBLEAU - FRANCE

Recommend


More recommend