c support for better hardware software co design in c
play

C++ support for better hardware/software co-design in C# with SME - PowerPoint PPT Presentation

C++ support for better hardware/software co-design in C# with SME Kenneth Skovhede FSP 2017 Niels Bohr Institute 2017-09-07 University of Copenhagen Belgium Compared to the current solutions, I want something that is: [ ] Faster [


  1. C++ support for better hardware/software co-design in C# with SME Kenneth Skovhede FSP 2017 Niels Bohr Institute 2017-09-07 University of Copenhagen Belgium

  2. Compared to the current solutions, I want something that is: [ ✔ ] Faster [ ✔ ] Less bugs [ ✔ ] Easy to use

  3. Image from: http://theembeddedguy.com/2016/05/15/layers-of-abstraction/

  4. Control Abstraction

  5. int temp; Static loop bounds for(int i2=0; i2<=length; i2++) { for(int j=0; j<length; j++) Map array to HW { if(array[j]>array[j+1]) Loop dependencies { temp=array[j]; array[j]=array[j+1]; array[j+1]=temp; } } } Bubble sort in C++ From https://commons.wikimedia.org/wiki/File%3AVon_Neumann_Architecture.svg

  6. If I had asked people what they wanted, they would have said faster horses. Henry Ford - maybe

  7. Control Abstraction

  8. SME public class SimpleMockMemory : SimpleProcess { [InputBus, OutputBus] IMemoryInterface Interface; Synchronous Message Exchange ulong[] m_data = new ulong[1024]; protected override void OnTick() { if (Interface.ReadEnabled) Interface.ReadValue = m_data[Interface.ReadAddr]; Sequential if (Interface.WriteEnabled) m_data[Interface.WriteAddr] = Interface.WriteValue; } } Processes Busses public interface IMemoryInterface : IBus { [InitialValue(false)] Concurrency bool WriteEnabled { get; set; } [InitialValue(false)] bool ReadEnabled { get; set; } uint ReadAddr { get; set; } uint WriteAddr { get; set; } ulong WriteValue { get; set; } ulong ReadValue { get; set; } }

  9. public interface IMemoryInterface : IBus { [InitialValue(false)] bool WriteEnabled { get; set; } [InitialValue(false)] bool ReadEnabled { get; set; } uint ReadAddr { get; set; } uint WriteAddr { get; set; } ulong WriteValue { get; set; } ulong ReadValue { get; set; } }

  10. public class TickCounterMemory : SimpleProcess { [InputBus] IInputBus Input; [OutputBus] IOutputBus Output; protected override void OnTick() { var before = Output.Ticks; if (Input.Reset) { Output.Ticks = 0; Output.LastTicks = Output.Ticks; } else { Output.Ticks++; } // before is always the same as after, // because the output value is not propagated // immediately, but waits for a tick var after = Output.Ticks; } }

  11. Like CSP in that the collection of public class TickCounterMemory : SimpleProcess busses is { [InputBus] IInputBus Input; [OutputBus] communicated IOutputBus Output; protected override void OnTick() { as a single var before = Output.Ticks; if (Input.Reset) { channel action Output.Ticks = 0; Output.LastTicks = Output.Ticks; } else { Output.Ticks++; } // before is always the same as after, // because the output value is not propagated // immediately, but waits for a tick var after = Output.Ticks; Like a KPN because } } there is no blocking

  12. Slide with dependency tree

  13. Stencil network example

  14. [InputBus] ImageFragment Input; [OutputBus] ImageOutputLine Output; static readonly byte[] FILTER = new byte[] { 1,1,1, 1,1,1, 1,1,1, 1,1,1, 1,1,1, 1,1,1, 1,1,1, 1,1,1, 1,1,1 }; protected override void OnTick() { Output.IsValid = false; for (var i = 0; i < COLOR_WIDTH; i++) Output.Color[i] = 0; for (var i = 0; i < m_buffer.Length; i++) m_buffer[i] = 0; if (Input.IsValid) { for (var i = 0; i < Input.Data.Length; i += COLOR_WIDTH) for (var j = 0; j < m_buffer.Length; j++) m_buffer[j] += FILTER[i + j] * Input.Data[i + j]; for (var i = 0; i < m_buffer.Length; i++) Output.Color[i] = (byte)(m_buffer[i] / FILTER_SUMS[i]); Internal.Index++; Output.IsValid = true; } }

  15. Control Abstraction

  16. Supported Control - if, switch, fi xed iteration loops Structure - functions Data - anything static, 2-bit ... n-bit Boolean logic - and, or, xor, etc Bitwise - shifts, and, or, xor Integer arithmetics - add, sub, mul, div Arrays - fi xed length Not supported (supported in modelling, but not in transpiler) Anything dynamic - strings, lists, objects Floating point - single, double, decimal IP needs simulation implementation

  17. private static uint SubByte (uint a) { system_uint32 AESCore::SubByte(system_uint32 a) { uint value = 0x ff & a; system_uint32 num = 0; uint result = SBox[value]; system_uint32 num2 = 0; value = 0x ff & (a >> 8); result |= (uint)SBox[value] << 8; num = 255 & a; value = 0x ff & (a >> 16); num2 = (system_uint32)AES256CBC_AESCore_SBox[(system_int32)num]; result |= (uint)SBox[value] << 16; num = 255 & (a >> 8); value = 0x ff & (a >> 24); num2 |= (system_uint32) return result | (uint)(SBox[value] << 24); ((system_uint32)AES256CBC_AESCore_SBox[(system_int32)num] << 8); } num = 255 & (a >> 16); num2 |= (system_uint32) C# version ((system_uint32)AES256CBC_AESCore_SBox[(system_int32)num] << 16); num = 255 & (a >> 24); return num2 | (system_uint32) ((system_uint32)AES256CBC_AESCore_SBox[(system_int32)num] << 24); } C++ version pure function SubByte(constant a: in T_SYSTEM_UINT32) return T_SYSTEM_UINT32 is variable tmpvar_1: T_SYSTEM_UINT32; variable num: T_SYSTEM_UINT32; variable num2: T_SYSTEM_UINT32; begin num := STD_LOGIC_VECTOR(TO_UNSIGNED(255, T_SYSTEM_UINT32'length)) and a; num2 := STD_LOGIC_VECTOR(resize(UNSIGNED(AES256CBC_AESCore_SBox(TO_INTEGER(SIGNED(num)))), T_SYSTEM_UINT32'length)); num := STD_LOGIC_VECTOR(TO_UNSIGNED(255, T_SYSTEM_UINT32'length)) and STD_LOGIC_VECTOR((shift_right(UNSIGNED(a), 8))); num2 := num2 or STD_LOGIC_VECTOR((shift_left(UNSIGNED(STD_LOGIC_VECTOR(resize(UNSIGNED(AES256CBC_AESCore_SBox(TO_INTEGER(SIGNED(num)))), T_SYSTEM_UINT32'length))), 8))); num := STD_LOGIC_VECTOR(TO_UNSIGNED(255, T_SYSTEM_UINT32'length)) and STD_LOGIC_VECTOR((shift_right(UNSIGNED(a), 16))); num2 := num2 or STD_LOGIC_VECTOR((shift_left(UNSIGNED(STD_LOGIC_VECTOR(resize(UNSIGNED(AES256CBC_AESCore_SBox(TO_INTEGER(SIGNED(num)))), T_SYSTEM_UINT32'length))), 16))); num := STD_LOGIC_VECTOR(TO_UNSIGNED(255, T_SYSTEM_UINT32'length)) and STD_LOGIC_VECTOR((shift_right(UNSIGNED(a), 24))); tmpvar_1 := num2 or STD_LOGIC_VECTOR((shift_left(UNSIGNED(STD_LOGIC_VECTOR(resize(UNSIGNED(AES256CBC_AESCore_SBox(TO_INTEGER(SIGNED(num)))), T_SYSTEM_UINT32'length))), 24))); return tmpvar_1; end SubByte;

  18. C# C++ FPGA Shared SME source code

  19. ColorBin execution times Stencil execution times 10,000 AES CBC rounds

  20. Planned work Communication links C# <-> C++ via memory or pipes C# <-> FPGA via AXI, DRAM or ACP Components VGA driver Block RAM DSP PySME equivalence Shared transpiler

Recommend


More recommend