Exploring Benefits and Designs of Optically Connected Disintegrated Processor Architecture Yan Pan, Yigit Demir, Nikos Hardavellas, John Kim ! , Gokhan Memik ""#$%&'()*+,'-+% ! %#$%&'()*+,'-+% ./*+01'2+'*-%3-45'*24+6% <;8$=% "5)-2+/-7%89:%3$;% &)'>'/-7%</*')%
Motivation � %%?/@5)@/-%% � %%A#&B%;*C04+'C+D*'% Motivation � %%B/1'*%#/,()*42/-% � %%#/-CED24/-% � Transistor density grows exponentially � But, processors are physically constrained – Low yield, bandwidth wall, power wall – Dark silicon : we can build dense devices we cannot afford to power � Optically-Connected Disintegrated Processor (OCDP) – Divide (impractical) monolithic processor into chiplets – Improves yield – Breaks the bandwidth wall – Breaks the power wall • Spread out chiplets, cheaper cooling .4O/2%P)*Q)5'EE)2% F8.&$%GHIH%J4-%C/->:%F4+0%?8#KA%LMN% GRIS%
Motivation � %%?/@5)@/-%% � %%A#&B%;*C04+'C+D*'% Motivation � %%B/1'*%#/,()*42/-% � %%#/-CED24/-% � Advantage of nanophotonics – Latency – Bandwidth density � Using nanophotonics for inter-chip interconnect – Reduced memory latency – Increased off-chip bandwidth – Increased total chip area – Increased power budget � Analytical model* for performance estimation * N. Hardavellas et al., Tech Report NWU-EECS-10-05, Mar. 2010. .4O/2%P)*Q)5'EE)2% F8.&$%GHIH%J4-%C/->:%F4+0%?8#KA%LMN% MRIS%
Motivation � %%?/@5)@/-%% � %%A#&B%;*C04+'C+D*'% Memory Latency � %%B/1'*%#/,()*42/-% � %%#/-CED24/-% ,*+( ,*/( 23$$453( ,*.( ,*-( ,( !"#$%&'$( )*+( )( 0( ,)( ,0( -)( -0( 1)( 6$789:(;"<$'=:(>'#?( .4O/2%P)*Q)5'EE)2% F8.&$%GHIH%J4-%C/->:%F4+0%?8#KA%LMN% LRIS%
Motivation � %%?/@5)@/-%% � %%A#&B%;*C04+'C+D*'% Off-chip Bandwidth � %%B/1'*%#/,()*42/-% � %%#/-CED24/-% ,*+( ,*/( 23$$453( ,*.( ,*-( ,( !"#$%&'$( )*+( )( 0)( ,))( ,0)( -))( -0)( @AB=C&3(!"'4D&4<C(>E!F#?( .4O/2%P)*Q)5'EE)2% F8.&$%GHIH%J4-%C/->:%F4+0%?8#KA%LMN% TRIS%
Motivation � %%?/@5)@/-%% � %%A#&B%;*C04+'C+D*'% Scaling Power, Chip Area � %%B/1'*%#/,()*42/-% � %%#/-CED24/-% 2="%"J%$(K8D$9(!54L$<M(.N(@AB=C&3(!O( ,*+( 2="%"J%$(K8D$9(!54L$<M-N(@AB=C&3(!O( 2="%"J%$(K8D$9(!54L$<M(,N(@AB=C&3(!O( ,*/( P&N$4(K8D$9(!54L$<M(,N(@AB=C&3(!O( 23$$453( ,*.( ,*-( ,( !"#$%&'$( )*+( )( -))( .))( /))( +))( ,)))( ,-))( ,.))( G8<"%(H&$(I9$"(>77 - ?( .4O/2%P)*Q)5'EE)2% F8.&$%GHIH%J4-%C/->:%F4+0%?8#KA%LMN% URIS%
Motivation � %%?/@5)@/-%% � %%A#&B%;*C04+'C+D*'% Motivation � %%B/1'*%#/,()*42/-% � %%#/-CED24/-% � Performance impact – Reduced memory latency � minimal – Improved off-chip bandwidth � small – Total chip area � small – Power budget � big � Power budget scalability is critical – Spread out chiplets – Cheaper cooling � Optically-Connected Disintegrated Processor (OCDP) .4O/2%P)*Q)5'EE)2% F8.&$%GHIH%J4-%C/->:%F4+0%?8#KA%LMN% VRIS%
Motivation � %%?/@5)@/-%% � %%A#&B%;*C04+'C+D*'% Off-chip Optical Channels � %%B/1'*%#/,()*42/-% � %%#/-CED24/-% ()#&*"'+,-.. /%-)"0"#&-1+ /&#*4+ !"#$%&"' 2)$$3 53$1.C 2&'&*-1+ ;<=+3>?*@ ;<ABC* A;:@+ * 8"9$0:&3$ ()#&*+D&E$% ;<A+3>?F@ ;<CGC* AH;:@ � Optical fiber is low-loss, high speed – Enables further spreading out chiplets – BW density was a challenge * J. Cardenas et al., Optics Express 2009 .4O/2%P)*Q)5'EE)2% F8.&$%GHIH%J4-%C/->:%F4+0%?8#KA%LMN% WRIS%
Motivation � %%?/@5)@/-%% � %%A#&B%;*C04+'C+D*'% Dense Off-chip Coupling � %%B/1'*%#/,()*42/-% � %%#/-CED24/-% � Dense optical fiber array [Lee et al., OSA/OFC/NFOEC 2010] � <1dB loss, 8 Tbps/mm demonstrated .4O/2%P)*Q)5'EE)2% F8.&$%GHIH%J4-%C/->:%F4+0%?8#KA%LMN% SRIS%
Motivation � %%?/@5)@/-%% � %%A#&B%;*C04+'C+D*'% OCDP Design Considerations � %%B/1'*%#/,()*42/-% � %%#/-CED24/-% � Inter-chiplet optical channel technology – Optic fiber for low loss � Inter-chiplet optical channel organization – Point-to-point [Koka et al., ISCA 2010] – Minimize waveguide and coupler loss � On-chip topology – Scalable chiplet size � On-chip / off-chip bandwidth interfacing – Distributed BW, seamless integration .4O/2%P)*Q)5'EE)2% F8.&$%GHIH%J4-%C/->:%F4+0%?8#KA%LMN% IHRIS%
� %%?/@5)@/-%% � %%A#&B%;*C04+'C+D*'% OCDP Arch. OCDP Architecture � %%B/1'*%#/,()*42/-% � %%#/-CED24/-% Electrical Chiplet 3 cluster Chiplet 0 Cross-chiplet assemblies share an optical bus, forming optical crossbars (FlexiShare) Chiplet 2 Laser Source Optical fiber couplers dst src Chiplet 1 Chiplet 0 Chiplet 3 Chiplet 4 .4O/2%P)*Q)5'EE)2% F8.&$%GHIH%J4-%C/->:%F4+0%?8#KA%LMN% IIRIS%
� %%?/@5)@/-%% � %%A#&B%;*C04+'C+D*'% OCDP Arch. Firefly On-chip Topology � %%B/1'*%#/,()*42/-% � %%#/-CED24/-% P P A0 P P in out P P P P C0R0 C0R1 C0R0 C0R1 P P P P R 0 R 0 P C0 P P C0 P ... ... CH 0 P P P P P P P P C0R2 C0R3 C0R2 C0R3 P P P P in out CH 1 P P P P A1 R 1 R 1 ... ... P P ... ... C1 P C1 P � ... C1R0 C1R0 P P P P A2 CH M-1 ... ... P P C2 P C2 P in out C2R0 C2R0 P P R k-1 R k-1 Chiplet 0 P P ... ... A3 ... ... P P C3 P C3 P FlexiShare C3R0 C3R0 P P P P � Firefly on-chip topology [Pan et al., ISCA 2009] – Flexible chiplet sizing, optical on-chip communication � FlexiShare optical crossbars [Pan et al., HPCA 2010] – Flexible bandwidth provisioning – Light-weight optical arbitration needed, proposed .4O/2%P)*Q)5'EE)2% F8.&$%GHIH%J4-%C/->:%F4+0%?8#KA%LMN% IGRIS%
� %%?/@5)@/-%% � %%A#&B%;*C04+'C+D*'% OCDP Arch. Extending across chiplets � %%B/1'*%#/,()*42/-% � %%#/-CED24/-% Chiplet 0 Chiplet 1 � Distributed bandwidth across chiplets � Flexible inter-chiplet bandwidth provisioning � Minimal number of couplers � Seamless on-chip/off-chip interfacing .4O/2%P)*Q)5'EE)2% F8.&$%GHIH%J4-%C/->:%F4+0%?8#KA%LMN% IMRIS%
� %%?/@5)@/-%% � %%A#&B%;*C04+'C+D*'% Technology Assumptions � %%B/1'*%#/,()*42/-% Power Eval. � %%#/-CED24/-% Loss Parameter Parameter Value Coupler 1 dB Detector Sensitivity 0.01 mW 16 ! Splitters 1 dB DWDM Non-linear 1 dB fiber coupler loss 0.1 Modulator Insertion 0.1 dB fiber loss 2.00E-06 dB/cm Waveguide 0.3 dB/cm ring heating power 40 uW/ring Ring Through 0.001 dB Modulation Power 80 fJ/bit Filter Drop 1.5 dB Demodulation Power 40 fJ/bit PhotoDetector 0.1 dB � Moderate DWDM (16-way) .4O/2%P)*Q)5'EE)2% F8.&$%GHIH%J4-%C/->:%F4+0%?8#KA%LMN% ILRIS%
� %%?/@5)@/-%% � %%A#&B%;*C04+'C+D*'% Optical Power (320-core) � %%B/1'*%#/,()*42/-% Power Eval. � %%#/-CED24/-% >?2=0)%/2.,=0)8?@@)*AB3 >?2=0)D@EF15)?6)G.BH)G1A?B=2?5A $"! $" >-?@A=BCA $!! $! #" #"! #! #!! " "! ! ! %&'() 4.51607)*&893 4.51607)*&8:3 401;.<-=51 %&'() 4.51607)*&893 4.51607)*&8:3 401;.<-=51 *"+,-./0123 *"+,-./0123 � 5-chiplet OCDP vs. single-chip topologies � Total number of optical channels (wavelengths) held constant. .4O/2%P)*Q)5'EE)2% F8.&$%GHIH%J4-%C/->:%F4+0%?8#KA%LMN% ITRIS%
� %%?/@5)@/-%% � %%A#&B%;*C04+'C+D*'% Per-Core Network Static Power � %%B/1'*%#/,()*42/-% Power Eval. � %%#/-CED24/-% @A5?3,>5?51/,+AB48,+48,)A84,-CD6 '! &! %! $! #! "! ! ()*+, 718493:,-);%6 718493:,-);<6 734=1>0?84 -&./0123456 � ~ 30% power reduction compared to the best alternative. .4O/2%P)*Q)5'EE)2% F8.&$%GHIH%J4-%C/->:%F4+0%?8#KA%LMN% IURIS%
� %%?/@5)@/-%% � %%A#&B%;*C04+'C+D*'% Scaling Up � %%B/1'*%#/,()*42/-% Power Eval. � %%#/-CED24/-% >?@;5A'B@1C;5A7?DDA+EF. &" &! %" %! $" $! #" #! " ! '()* '()* 0123456 053819:;23 '()* '()* +*",-%$!. +*",-#$/!. +(7/,-#$/!. +-#$/!. +*<,-$%!&. +*#=,-&%"$. � OCDP limits the total on-chip waveguide length � Better optical scalability .4O/2%P)*Q)5'EE)2% F8.&$%GHIH%J4-%C/->:%F4+0%?8#KA%LMN% IVRIS%
� %%?/@5)@/-%% � %%A#&B%;*C04+'C+D*'% Scaling Up � %%B/1'*%#/,()*42/-% Power Eval. � %%#/-CED24/-% >?@:4A8@:@0BA&?C21A&21A$?12A'DE- !"""" !""" !"" !" ! #$%& #$%& /012345 /427089:12 #$%& #$%& '&()*+,"- '&()*!,."- '$6.)*!,."- '*!,."- '&;)*,+"<- '&!=)*<+(,- � OCDP shows very good power scalability. � Single-chip is impractical for 1280-core processor .4O/2%P)*Q)5'EE)2% F8.&$%GHIH%J4-%C/->:%F4+0%?8#KA%LMN% IWRIS%
� %%?/@5)@/-%% � %%A#&B%;*C04+'C+D*'% Conclusion � %%B/1'*%#/,()*42/-% � %%#/-CED24/-% Conclusion � OCDP leverages – Low latency / high bandwidth density – Low loss optic fibers � Power scalability is critical – Minimize optical loss on the path � Seamless on-chip / off-chip interfacing – Firefly intra-chiplet (distributed off-chiplet BW) – Point-to-point (Dragonfly) inter-chiplet � Performance evaluation needed � Chiplet composition to be explored .4O/2%P)*Q)5'EE)2% F8.&$%GHIH%J4-%C/->:%F4+0%?8#KA%LMN% ISRIS%
XD'2@/-2Y% GQIRS(T@UV(
Recommend
More recommend