Ithemal: Accurate, Portable and Fast Basic Block Throughput Estimation using Deep Neural Networks Charith Mendis Alex Renda Saman Amarasinghe Michael Carbin
Compilers need to search through code Sequences High-level code Optimizing Compiler lea r14, [rbx-0x40] …….. ….…. lea rdx, [rbp+0x38] cmp rdi, rax
Compilers need to search through code Sequences High-level code Optimizing Compiler How many cycles does it take to run? lea r14, [rbx-0x40] …….. ….…. Basic Block Throughput lea rdx, [rbp+0x38] cmp rdi, rax 40 Cycles
Compilers need to search through code Sequences High-level code Optimizing Compiler Code n Code 1 Code 2 lea r14, [rbx-0x40] lea r14, [rbx-0x40] lea r14, [rbx-0x40] …….. …….. …….. …….. ….…. ….…. ….…. ….…. lea rdx, [rbp+0x38] sub rbp, 0x60 mov rbp, rbx cmp rdi, rax cmp rdi, rax cmp rdi, rax 44 Cycles 40 Cycles 36 Cycles
Compilers need to search through code Sequences High-level code Optimizing Compiler Code n Code 1 Code 2 lea r14, [rbx-0x40] lea r14, [rbx-0x40] lea r14, [rbx-0x40] …….. …….. …….. …….. ….…. ….…. ….…. ….…. lea rdx, [rbp+0x38] sub rbp, 0x60 mov rbp, rbx cmp rdi, rax cmp rdi, rax cmp rdi, rax Slow Ground Truth 44 Cycles 40 Cycles 36 Cycles
Compilers need to search through code Sequences High-level code Optimizing Compiler Code n Code 1 Code 2 lea r14, [rbx-0x40] lea r14, [rbx-0x40] lea r14, [rbx-0x40] …….. …….. …….. …….. ….…. ….…. ….…. ….…. lea rdx, [rbp+0x38] sub rbp, 0x60 mov rbp, rbx cmp rdi, rax cmp rdi, rax cmp rdi, rax Analytical Fast Model 44 Cycles 40 Cycles 36 Cycles
<latexit sha1_base64="SxF9sK8UBP+ezB7ihesmnOLZHFg=">AB73icbVBNSwMxEJ31s9avqkcvwSJ4Krsi6LHoxWMF+wHtUrJptg3NJjHJimXpn/DiQRGv/h1v/huz7R609cHA470ZuZFijNjf/bW1ldW9/YLG2Vt3d29/YrB4ctI1NaJNILnUnwoZyJmjTMstpR2mKk4jTdjS+yf32I9WGSXFvJ4qGCR4KFjOCrZM6PayUlk/lfqXq1/wZ0DIJClKFAo1+5as3kCRNqLCEY2O6ga9smGFtGeF0Wu6lhipMxnhIu4KnFATZrN7p+jUKQMUS+1KWDRTf09kODFmkSuM8F2ZBa9XPzP6Y2vgozJlRqSDzRXHKkZUofx4NmKbE8okjmGjmbkVkhDUm1kWUhxAsvrxMWue1wK8FdxfV+nURwmO4QTOIBLqMtNKAJBDg8wyu8eQ/ei/fufcxbV7xi5gj+wPv8Acpwj8o=</latexit> <latexit sha1_base64="SxF9sK8UBP+ezB7ihesmnOLZHFg=">AB73icbVBNSwMxEJ31s9avqkcvwSJ4Krsi6LHoxWMF+wHtUrJptg3NJjHJimXpn/DiQRGv/h1v/huz7R609cHA470ZuZFijNjf/bW1ldW9/YLG2Vt3d29/YrB4ctI1NaJNILnUnwoZyJmjTMstpR2mKk4jTdjS+yf32I9WGSXFvJ4qGCR4KFjOCrZM6PayUlk/lfqXq1/wZ0DIJClKFAo1+5as3kCRNqLCEY2O6ga9smGFtGeF0Wu6lhipMxnhIu4KnFATZrN7p+jUKQMUS+1KWDRTf09kODFmkSuM8F2ZBa9XPzP6Y2vgozJlRqSDzRXHKkZUofx4NmKbE8okjmGjmbkVkhDUm1kWUhxAsvrxMWue1wK8FdxfV+nURwmO4QTOIBLqMtNKAJBDg8wyu8eQ/ei/fufcxbV7xi5gj+wPv8Acpwj8o=</latexit> <latexit sha1_base64="SxF9sK8UBP+ezB7ihesmnOLZHFg=">AB73icbVBNSwMxEJ31s9avqkcvwSJ4Krsi6LHoxWMF+wHtUrJptg3NJjHJimXpn/DiQRGv/h1v/huz7R609cHA470ZuZFijNjf/bW1ldW9/YLG2Vt3d29/YrB4ctI1NaJNILnUnwoZyJmjTMstpR2mKk4jTdjS+yf32I9WGSXFvJ4qGCR4KFjOCrZM6PayUlk/lfqXq1/wZ0DIJClKFAo1+5as3kCRNqLCEY2O6ga9smGFtGeF0Wu6lhipMxnhIu4KnFATZrN7p+jUKQMUS+1KWDRTf09kODFmkSuM8F2ZBa9XPzP6Y2vgozJlRqSDzRXHKkZUofx4NmKbE8okjmGjmbkVkhDUm1kWUhxAsvrxMWue1wK8FdxfV+nURwmO4QTOIBLqMtNKAJBDg8wyu8eQ/ei/fufcxbV7xi5gj+wPv8Acpwj8o=</latexit> <latexit sha1_base64="SxF9sK8UBP+ezB7ihesmnOLZHFg=">AB73icbVBNSwMxEJ31s9avqkcvwSJ4Krsi6LHoxWMF+wHtUrJptg3NJjHJimXpn/DiQRGv/h1v/huz7R609cHA470ZuZFijNjf/bW1ldW9/YLG2Vt3d29/YrB4ctI1NaJNILnUnwoZyJmjTMstpR2mKk4jTdjS+yf32I9WGSXFvJ4qGCR4KFjOCrZM6PayUlk/lfqXq1/wZ0DIJClKFAo1+5as3kCRNqLCEY2O6ga9smGFtGeF0Wu6lhipMxnhIu4KnFATZrN7p+jUKQMUS+1KWDRTf09kODFmkSuM8F2ZBa9XPzP6Y2vgozJlRqSDzRXHKkZUofx4NmKbE8okjmGjmbkVkhDUm1kWUhxAsvrxMWue1wK8FdxfV+nURwmO4QTOIBLqMtNKAJBDg8wyu8eQ/ei/fufcxbV7xi5gj+wPv8Acpwj8o=</latexit> Analytical models are inaccurate Analytical ~20% error ≈ Model • out-of-order • pipelined • super-scalar • bypassed • stateful components • complicated and inaccurate manuals • opaque implementations (vendor specific)
<latexit sha1_base64="SxF9sK8UBP+ezB7ihesmnOLZHFg=">AB73icbVBNSwMxEJ31s9avqkcvwSJ4Krsi6LHoxWMF+wHtUrJptg3NJjHJimXpn/DiQRGv/h1v/huz7R609cHA470ZuZFijNjf/bW1ldW9/YLG2Vt3d29/YrB4ctI1NaJNILnUnwoZyJmjTMstpR2mKk4jTdjS+yf32I9WGSXFvJ4qGCR4KFjOCrZM6PayUlk/lfqXq1/wZ0DIJClKFAo1+5as3kCRNqLCEY2O6ga9smGFtGeF0Wu6lhipMxnhIu4KnFATZrN7p+jUKQMUS+1KWDRTf09kODFmkSuM8F2ZBa9XPzP6Y2vgozJlRqSDzRXHKkZUofx4NmKbE8okjmGjmbkVkhDUm1kWUhxAsvrxMWue1wK8FdxfV+nURwmO4QTOIBLqMtNKAJBDg8wyu8eQ/ei/fufcxbV7xi5gj+wPv8Acpwj8o=</latexit> <latexit sha1_base64="SxF9sK8UBP+ezB7ihesmnOLZHFg=">AB73icbVBNSwMxEJ31s9avqkcvwSJ4Krsi6LHoxWMF+wHtUrJptg3NJjHJimXpn/DiQRGv/h1v/huz7R609cHA470ZuZFijNjf/bW1ldW9/YLG2Vt3d29/YrB4ctI1NaJNILnUnwoZyJmjTMstpR2mKk4jTdjS+yf32I9WGSXFvJ4qGCR4KFjOCrZM6PayUlk/lfqXq1/wZ0DIJClKFAo1+5as3kCRNqLCEY2O6ga9smGFtGeF0Wu6lhipMxnhIu4KnFATZrN7p+jUKQMUS+1KWDRTf09kODFmkSuM8F2ZBa9XPzP6Y2vgozJlRqSDzRXHKkZUofx4NmKbE8okjmGjmbkVkhDUm1kWUhxAsvrxMWue1wK8FdxfV+nURwmO4QTOIBLqMtNKAJBDg8wyu8eQ/ei/fufcxbV7xi5gj+wPv8Acpwj8o=</latexit> <latexit sha1_base64="SxF9sK8UBP+ezB7ihesmnOLZHFg=">AB73icbVBNSwMxEJ31s9avqkcvwSJ4Krsi6LHoxWMF+wHtUrJptg3NJjHJimXpn/DiQRGv/h1v/huz7R609cHA470ZuZFijNjf/bW1ldW9/YLG2Vt3d29/YrB4ctI1NaJNILnUnwoZyJmjTMstpR2mKk4jTdjS+yf32I9WGSXFvJ4qGCR4KFjOCrZM6PayUlk/lfqXq1/wZ0DIJClKFAo1+5as3kCRNqLCEY2O6ga9smGFtGeF0Wu6lhipMxnhIu4KnFATZrN7p+jUKQMUS+1KWDRTf09kODFmkSuM8F2ZBa9XPzP6Y2vgozJlRqSDzRXHKkZUofx4NmKbE8okjmGjmbkVkhDUm1kWUhxAsvrxMWue1wK8FdxfV+nURwmO4QTOIBLqMtNKAJBDg8wyu8eQ/ei/fufcxbV7xi5gj+wPv8Acpwj8o=</latexit> <latexit sha1_base64="SxF9sK8UBP+ezB7ihesmnOLZHFg=">AB73icbVBNSwMxEJ31s9avqkcvwSJ4Krsi6LHoxWMF+wHtUrJptg3NJjHJimXpn/DiQRGv/h1v/huz7R609cHA470ZuZFijNjf/bW1ldW9/YLG2Vt3d29/YrB4ctI1NaJNILnUnwoZyJmjTMstpR2mKk4jTdjS+yf32I9WGSXFvJ4qGCR4KFjOCrZM6PayUlk/lfqXq1/wZ0DIJClKFAo1+5as3kCRNqLCEY2O6ga9smGFtGeF0Wu6lhipMxnhIu4KnFATZrN7p+jUKQMUS+1KWDRTf09kODFmkSuM8F2ZBa9XPzP6Y2vgozJlRqSDzRXHKkZUofx4NmKbE8okjmGjmbkVkhDUm1kWUhxAsvrxMWue1wK8FdxfV+nURwmO4QTOIBLqMtNKAJBDg8wyu8eQ/ei/fufcxbV7xi5gj+wPv8Acpwj8o=</latexit> Analytical models are inaccurate Analytical ≈ Model prediction highly non-linear prediction problem is highly non-linear
Motivating Example - Zero Idioms vxorps xmm0, xmm0, xmm0 Intel Architecture Optimization Reference Manual 662 of 672 vxorps xmm1, xmm2, xmm3 Throughput: 1 clock cycle Intel Architecture Optimization Reference Manual 51 of 672 Special Case Throughput: 0.33 clock cycles
Motivating Example - Zero Idioms vxorps xmm0, xmm0, xmm0 llvm-mca • Part of LLVM compiler infrastructure 100 iterations • Uses industry standard compiler (LLVM) scheduling models Method Estimate • e.g., more than >230 commits spread over 2 years for x86 Haswell Scheduling model Measured 32 IACA • Intel Architecture Code Analyzer llvm-mca 100 • Developed in-house at Intel IACA 24
Recommend
More recommend