autotvm device fleet
play

AutoTVM & Device Fleet ` Learning to Optimize Tensor Programs - PowerPoint PPT Presentation

AutoTVM & Device Fleet ` Learning to Optimize Tensor Programs Frameworks High-level data flow graph and optimizations Hardware Learning to Optimize Tensor Programs Frameworks High-level data flow graph and optimizations Hardware


  1. AutoTVM & Device Fleet `

  2. Learning to Optimize Tensor Programs Frameworks High-level data flow graph and optimizations Hardware

  3. Learning to Optimize Tensor Programs Frameworks High-level data flow graph and optimizations Hardware

  4. Learning to Optimize Tensor Programs Frameworks High-level data flow graph and optimizations Machine Learning based Program Optimizer Hardware

  5. Learning to Optimize Tensor Programs Frameworks High-level data flow graph and optimizations Machine Learning based Program Optimizer Learning to generate optimized program for new operator workloads and hardware Hardware

  6. Search over Possible Program Transformations Compute Description C = tvm.compute((m, n), 
 lambda y, x: tvm.sum(A[k, y] * B[k, x], axis=k)) Loop Thread Bindings Cache Locality Transformations Thread Latency Hiding Tensorization Cooperation Hardware

  7. Search over Possible Program Transformations Compute Description C = tvm.compute((m, n), 
 lambda y, x: tvm.sum(A[k, y] * B[k, x], axis=k)) Loop Thread Bindings Cache Locality Transformations Thread Latency Hiding Tensorization Cooperation Hardware

  8. Search over Possible Program Transformations Compute Description C = tvm.compute((m, n), 
 lambda y, x: tvm.sum(A[k, y] * B[k, x], axis=k)) Billions Loop Thread Bindings Cache Locality of possible Transformations optimization Thread choices Latency Hiding Tensorization Cooperation Hardware

  9. Learning-based Program Optimizer Program Program Optimizer Code Generator � 4

  10. Learning-based Program Optimizer Program Program Optimizer Code Generator Runtime Measurements � 4

  11. Learning-based Program Optimizer Program Program Optimizer Code Generator Runtime Measurements High experiment cost, each trial costs ~1second � 4

  12. Learning-based Program Optimizer Program Program Optimizer Code Generator � 5

  13. Learning-based Program Optimizer Program Program Optimizer Code Generator Cost Model � 5

  14. Learning-based Program Optimizer Program Program Optimizer Code Generator Cost Model Need reliable cost model per hardware � 5

  15. Learning-based Program Optimizer Program Code Generator Program Optimizer

  16. <latexit sha1_base64="1Z6CzjBl0OMVztfQ+m452YDkcY0=">AB8nicbVDLSsNAFL2pr1pfVZdugkVwVRIRdFnUhcsK9gFtKJPpB06mQkzN0IJ/Qw3LhRx69e482+ctFlo64GBwzn3MueMBHcoOd9O6W19Y3NrfJ2ZWd3b/+genjUNirVlLWoEkp3Q2KY4JK1kKNg3UQzEoeCdcLJbe53npg2XMlHnCYsiMlI8ohTglbq9WOCY0pEdjcbVGte3ZvDXSV+QWpQoDmofvWHiqYxk0gFMabnewkGdHIqWCzSj81LCF0QkasZ6kMTNBNo8c8+sMnQjpe2T6M7V3xsZiY2ZxqGdzCOaZS8X/N6KUbXQcZlkiKTdPFRlAoXlZvf7w65ZhTF1BJCNbdZXTomlC0LVsCf7yaukfVH3vbr/cFlr3BR1lOETuEcfLiCBtxDE1pAQcEzvMKbg86L8+58LEZLTrFzDH/gfP4AdN2RWg=</latexit> <latexit sha1_base64="1Z6CzjBl0OMVztfQ+m452YDkcY0=">AB8nicbVDLSsNAFL2pr1pfVZdugkVwVRIRdFnUhcsK9gFtKJPpB06mQkzN0IJ/Qw3LhRx69e482+ctFlo64GBwzn3MueMBHcoOd9O6W19Y3NrfJ2ZWd3b/+genjUNirVlLWoEkp3Q2KY4JK1kKNg3UQzEoeCdcLJbe53npg2XMlHnCYsiMlI8ohTglbq9WOCY0pEdjcbVGte3ZvDXSV+QWpQoDmofvWHiqYxk0gFMabnewkGdHIqWCzSj81LCF0QkasZ6kMTNBNo8c8+sMnQjpe2T6M7V3xsZiY2ZxqGdzCOaZS8X/N6KUbXQcZlkiKTdPFRlAoXlZvf7w65ZhTF1BJCNbdZXTomlC0LVsCf7yaukfVH3vbr/cFlr3BR1lOETuEcfLiCBtxDE1pAQcEzvMKbg86L8+58LEZLTrFzDH/gfP4AdN2RWg=</latexit> <latexit sha1_base64="1Z6CzjBl0OMVztfQ+m452YDkcY0=">AB8nicbVDLSsNAFL2pr1pfVZdugkVwVRIRdFnUhcsK9gFtKJPpB06mQkzN0IJ/Qw3LhRx69e482+ctFlo64GBwzn3MueMBHcoOd9O6W19Y3NrfJ2ZWd3b/+genjUNirVlLWoEkp3Q2KY4JK1kKNg3UQzEoeCdcLJbe53npg2XMlHnCYsiMlI8ohTglbq9WOCY0pEdjcbVGte3ZvDXSV+QWpQoDmofvWHiqYxk0gFMabnewkGdHIqWCzSj81LCF0QkasZ6kMTNBNo8c8+sMnQjpe2T6M7V3xsZiY2ZxqGdzCOaZS8X/N6KUbXQcZlkiKTdPFRlAoXlZvf7w65ZhTF1BJCNbdZXTomlC0LVsCf7yaukfVH3vbr/cFlr3BR1lOETuEcfLiCBtxDE1pAQcEzvMKbg86L8+58LEZLTrFzDH/gfP4AdN2RWg=</latexit> <latexit sha1_base64="1Z6CzjBl0OMVztfQ+m452YDkcY0=">AB8nicbVDLSsNAFL2pr1pfVZdugkVwVRIRdFnUhcsK9gFtKJPpB06mQkzN0IJ/Qw3LhRx69e482+ctFlo64GBwzn3MueMBHcoOd9O6W19Y3NrfJ2ZWd3b/+genjUNirVlLWoEkp3Q2KY4JK1kKNg3UQzEoeCdcLJbe53npg2XMlHnCYsiMlI8ohTglbq9WOCY0pEdjcbVGte3ZvDXSV+QWpQoDmofvWHiqYxk0gFMabnewkGdHIqWCzSj81LCF0QkasZ6kMTNBNo8c8+sMnQjpe2T6M7V3xsZiY2ZxqGdzCOaZS8X/N6KUbXQcZlkiKTdPFRlAoXlZvf7w65ZhTF1BJCNbdZXTomlC0LVsCf7yaukfVH3vbr/cFlr3BR1lOETuEcfLiCBtxDE1pAQcEzvMKbg86L8+58LEZLTrFzDH/gfP4AdN2RWg=</latexit> Learning-based Program Optimizer Program Code Generator Program Optimizer D Training data

  17. <latexit sha1_base64="1Z6CzjBl0OMVztfQ+m452YDkcY0=">AB8nicbVDLSsNAFL2pr1pfVZdugkVwVRIRdFnUhcsK9gFtKJPpB06mQkzN0IJ/Qw3LhRx69e482+ctFlo64GBwzn3MueMBHcoOd9O6W19Y3NrfJ2ZWd3b/+genjUNirVlLWoEkp3Q2KY4JK1kKNg3UQzEoeCdcLJbe53npg2XMlHnCYsiMlI8ohTglbq9WOCY0pEdjcbVGte3ZvDXSV+QWpQoDmofvWHiqYxk0gFMabnewkGdHIqWCzSj81LCF0QkasZ6kMTNBNo8c8+sMnQjpe2T6M7V3xsZiY2ZxqGdzCOaZS8X/N6KUbXQcZlkiKTdPFRlAoXlZvf7w65ZhTF1BJCNbdZXTomlC0LVsCf7yaukfVH3vbr/cFlr3BR1lOETuEcfLiCBtxDE1pAQcEzvMKbg86L8+58LEZLTrFzDH/gfP4AdN2RWg=</latexit> <latexit sha1_base64="1Z6CzjBl0OMVztfQ+m452YDkcY0=">AB8nicbVDLSsNAFL2pr1pfVZdugkVwVRIRdFnUhcsK9gFtKJPpB06mQkzN0IJ/Qw3LhRx69e482+ctFlo64GBwzn3MueMBHcoOd9O6W19Y3NrfJ2ZWd3b/+genjUNirVlLWoEkp3Q2KY4JK1kKNg3UQzEoeCdcLJbe53npg2XMlHnCYsiMlI8ohTglbq9WOCY0pEdjcbVGte3ZvDXSV+QWpQoDmofvWHiqYxk0gFMabnewkGdHIqWCzSj81LCF0QkasZ6kMTNBNo8c8+sMnQjpe2T6M7V3xsZiY2ZxqGdzCOaZS8X/N6KUbXQcZlkiKTdPFRlAoXlZvf7w65ZhTF1BJCNbdZXTomlC0LVsCf7yaukfVH3vbr/cFlr3BR1lOETuEcfLiCBtxDE1pAQcEzvMKbg86L8+58LEZLTrFzDH/gfP4AdN2RWg=</latexit> <latexit sha1_base64="1Z6CzjBl0OMVztfQ+m452YDkcY0=">AB8nicbVDLSsNAFL2pr1pfVZdugkVwVRIRdFnUhcsK9gFtKJPpB06mQkzN0IJ/Qw3LhRx69e482+ctFlo64GBwzn3MueMBHcoOd9O6W19Y3NrfJ2ZWd3b/+genjUNirVlLWoEkp3Q2KY4JK1kKNg3UQzEoeCdcLJbe53npg2XMlHnCYsiMlI8ohTglbq9WOCY0pEdjcbVGte3ZvDXSV+QWpQoDmofvWHiqYxk0gFMabnewkGdHIqWCzSj81LCF0QkasZ6kMTNBNo8c8+sMnQjpe2T6M7V3xsZiY2ZxqGdzCOaZS8X/N6KUbXQcZlkiKTdPFRlAoXlZvf7w65ZhTF1BJCNbdZXTomlC0LVsCf7yaukfVH3vbr/cFlr3BR1lOETuEcfLiCBtxDE1pAQcEzvMKbg86L8+58LEZLTrFzDH/gfP4AdN2RWg=</latexit> <latexit sha1_base64="1Z6CzjBl0OMVztfQ+m452YDkcY0=">AB8nicbVDLSsNAFL2pr1pfVZdugkVwVRIRdFnUhcsK9gFtKJPpB06mQkzN0IJ/Qw3LhRx69e482+ctFlo64GBwzn3MueMBHcoOd9O6W19Y3NrfJ2ZWd3b/+genjUNirVlLWoEkp3Q2KY4JK1kKNg3UQzEoeCdcLJbe53npg2XMlHnCYsiMlI8ohTglbq9WOCY0pEdjcbVGte3ZvDXSV+QWpQoDmofvWHiqYxk0gFMabnewkGdHIqWCzSj81LCF0QkasZ6kMTNBNo8c8+sMnQjpe2T6M7V3xsZiY2ZxqGdzCOaZS8X/N6KUbXQcZlkiKTdPFRlAoXlZvf7w65ZhTF1BJCNbdZXTomlC0LVsCf7yaukfVH3vbr/cFlr3BR1lOETuEcfLiCBtxDE1pAQcEzvMKbg86L8+58LEZLTrFzDH/gfP4AdN2RWg=</latexit> Learning-based Program Optimizer Program Code Generator Program Optimizer Learning Statistical Cost Model D Training data

  18. <latexit sha1_base64="1Z6CzjBl0OMVztfQ+m452YDkcY0=">AB8nicbVDLSsNAFL2pr1pfVZdugkVwVRIRdFnUhcsK9gFtKJPpB06mQkzN0IJ/Qw3LhRx69e482+ctFlo64GBwzn3MueMBHcoOd9O6W19Y3NrfJ2ZWd3b/+genjUNirVlLWoEkp3Q2KY4JK1kKNg3UQzEoeCdcLJbe53npg2XMlHnCYsiMlI8ohTglbq9WOCY0pEdjcbVGte3ZvDXSV+QWpQoDmofvWHiqYxk0gFMabnewkGdHIqWCzSj81LCF0QkasZ6kMTNBNo8c8+sMnQjpe2T6M7V3xsZiY2ZxqGdzCOaZS8X/N6KUbXQcZlkiKTdPFRlAoXlZvf7w65ZhTF1BJCNbdZXTomlC0LVsCf7yaukfVH3vbr/cFlr3BR1lOETuEcfLiCBtxDE1pAQcEzvMKbg86L8+58LEZLTrFzDH/gfP4AdN2RWg=</latexit> <latexit sha1_base64="1Z6CzjBl0OMVztfQ+m452YDkcY0=">AB8nicbVDLSsNAFL2pr1pfVZdugkVwVRIRdFnUhcsK9gFtKJPpB06mQkzN0IJ/Qw3LhRx69e482+ctFlo64GBwzn3MueMBHcoOd9O6W19Y3NrfJ2ZWd3b/+genjUNirVlLWoEkp3Q2KY4JK1kKNg3UQzEoeCdcLJbe53npg2XMlHnCYsiMlI8ohTglbq9WOCY0pEdjcbVGte3ZvDXSV+QWpQoDmofvWHiqYxk0gFMabnewkGdHIqWCzSj81LCF0QkasZ6kMTNBNo8c8+sMnQjpe2T6M7V3xsZiY2ZxqGdzCOaZS8X/N6KUbXQcZlkiKTdPFRlAoXlZvf7w65ZhTF1BJCNbdZXTomlC0LVsCf7yaukfVH3vbr/cFlr3BR1lOETuEcfLiCBtxDE1pAQcEzvMKbg86L8+58LEZLTrFzDH/gfP4AdN2RWg=</latexit> <latexit sha1_base64="1Z6CzjBl0OMVztfQ+m452YDkcY0=">AB8nicbVDLSsNAFL2pr1pfVZdugkVwVRIRdFnUhcsK9gFtKJPpB06mQkzN0IJ/Qw3LhRx69e482+ctFlo64GBwzn3MueMBHcoOd9O6W19Y3NrfJ2ZWd3b/+genjUNirVlLWoEkp3Q2KY4JK1kKNg3UQzEoeCdcLJbe53npg2XMlHnCYsiMlI8ohTglbq9WOCY0pEdjcbVGte3ZvDXSV+QWpQoDmofvWHiqYxk0gFMabnewkGdHIqWCzSj81LCF0QkasZ6kMTNBNo8c8+sMnQjpe2T6M7V3xsZiY2ZxqGdzCOaZS8X/N6KUbXQcZlkiKTdPFRlAoXlZvf7w65ZhTF1BJCNbdZXTomlC0LVsCf7yaukfVH3vbr/cFlr3BR1lOETuEcfLiCBtxDE1pAQcEzvMKbg86L8+58LEZLTrFzDH/gfP4AdN2RWg=</latexit> <latexit sha1_base64="1Z6CzjBl0OMVztfQ+m452YDkcY0=">AB8nicbVDLSsNAFL2pr1pfVZdugkVwVRIRdFnUhcsK9gFtKJPpB06mQkzN0IJ/Qw3LhRx69e482+ctFlo64GBwzn3MueMBHcoOd9O6W19Y3NrfJ2ZWd3b/+genjUNirVlLWoEkp3Q2KY4JK1kKNg3UQzEoeCdcLJbe53npg2XMlHnCYsiMlI8ohTglbq9WOCY0pEdjcbVGte3ZvDXSV+QWpQoDmofvWHiqYxk0gFMabnewkGdHIqWCzSj81LCF0QkasZ6kMTNBNo8c8+sMnQjpe2T6M7V3xsZiY2ZxqGdzCOaZS8X/N6KUbXQcZlkiKTdPFRlAoXlZvf7w65ZhTF1BJCNbdZXTomlC0LVsCf7yaukfVH3vbr/cFlr3BR1lOETuEcfLiCBtxDE1pAQcEzvMKbg86L8+58LEZLTrFzDH/gfP4AdN2RWg=</latexit> Learning-based Program Optimizer Program Code Generator Program Optimizer Learning Statistical Cost Model D Training data Unique Problem • Relatively low experiment cost Characteristics • Domain-specific problem structure • Large quantity of similar tasks

  19. Program-aware Cost Modeling High-Level Configuration

  20. Program-aware Cost Modeling High-Level Configuration for y in range(8): for x in range(8): C[y][x]=0 for k in range(8): C[y][x]+=A[k][y]*B[k][x] Low-level Abstract Syntax Tree (shared between tasks)

  21. Program-aware Cost Modeling outer touched loop memory Boosted length High-Level Configuration C A B y 1 y 64 64 64 Tree Ensembles x 8 x 8 8 64 k 64 k 1 8 8 statistical features for y in range(8): for x in range(8): C[y][x]=0 for k in range(8): C[y][x]+=A[k][y]*B[k][x] Low-level Abstract Syntax Tree (shared between tasks)

Recommend


More recommend