Out-of-band Flow Control for Reliable Multica st 00S-SIW-071 Harry Wolfson < HarryWolfson@LL.MI T.EDU > MIT Li ncoln Laboratory March 30, 2000 MIT L inco ln Lab oratory 1 hw 30-Mar-00
Outline • Introd ucti on • Ti me Management I ssues • Rev ie w of RTI 1.3 Multi cast • Through put Processing Im balance • Fl ow Con trol Design i n R TI 1.3 • Summ ary MIT L inco ln Lab oratory 2 hw 30-Mar-00
Introduction • RTI R el iabl e Mult icast began with STOW Prog ram – Design emphas is on low la tency, high throughput performance – Large buffers ac commodated bursty traffic – “Last resort” / Anomalous behavior ‘ Drop messa ges instead of locking up MIT L inco ln Lab oratory 3 hw 30-Mar-00
Introduction (cont) • �RTI 1.3 R el iabl e Multi cast required: – 100% Re liability ‘ No dropped mess ages – Adherence to tick (min, max) ‘ Limited time permitted for p rocess ing mess ages ‘ Rece ive Queues added for temporary storage MIT L inco ln Lab oratory 4 hw 30-Mar-00
Time Management Issues • Fl ow Con trol has no impact on Time Managed Federati ons – Wall cloc k time might slow down / pause • Real Ti me Federat ions (Hardware in the Loo p) – People re sponsible for pla nning Execution need to provide a dequate communica tion / computational resourc es to match Scenari o MIT L inco ln Lab oratory 5 hw 30-Mar-00
Review of RTI 1.3 Multicast • Multi cast based on R eli able Di stri buto r – Client / Se rver – Simila r to “Exploder” • No “production / rel ease” quali ty Relia ble Mul tica st avail able during STOW and 1.3 development – Many to many multic ast – Dynamic J oin / Leave – Support DDM Interest Management MIT L inco ln Lab oratory 6 hw 30-Mar-00
Review of RTI 1.3 Multicast (cont) • Bu il t on top o f TCP / IP – Sequential m essage delivery – Reliable point to point msg transfer • TCP do es NOT provide Applic ation-to- Applica tion Fl ow Con trol • Ro bust, fault toler ant • Interest Fi lt eri ng (DDM) implemented at sender, expl oder and receiver MIT L inco ln Lab oratory 7 hw 30-Mar-00
Review of RTI 1.3 Multicast (cont) RD = Reli able Dist ributor Server LAN #1 F e d e r a t e (ie. Exploder) F e d e r a t e tcp = TCP Connection Client RT I RD t c p R T I RD t c p F e d e r a t e RT I RD F e d e r a t e t c p F e d e r a t e R T I RD t c p F e d e r a t e F e d e r a t e RT I RD t c p F e d e r a t e RT I RD t c p RT I R D t c p LAN #3 R T I RD t c p LAN #2 MIT L inco ln Lab oratory 8 hw 30-Mar-00
Review of RTI 1.3 Multicast (cont) RD = Reliable Dist ributor Server LAN #1 F e d e r a t e (ie. Exploder) F e d e r a t e tcp = TCP Connection Client RT I RD t c p R T I RD t c p F e d e r a t e RT I RD F e d e r a t e t c p F e d e r a t e R T I RD t c p F e d e r a t e F e d e r a t e RT I RD t c p F e d e r a t e RT I RD t c p RT I RD t c p LAN #3 R T I RD t c p LAN # 2 MIT L inco ln Lab oratory 9 hw 30-Mar-00
Review of RTI 1.3 Multicast (cont) RD = Reliable Dist ributor Server LAN #1 F e d e r a t e (ie. Exploder) F e d e r a t e tcp = TCP Connection Client RT I RD t c p R T I RD t c p F e d e r a t e RT I RD F e d e r a t e t c p F e d e r a t e R T I RD t c p F e d e r a t e F e d e r a t e RT I RD t c p F e d e r a t e RT I RD t c p RT I RD t c p LAN #3 R T I RD t c p LAN # 2 MIT L inco ln Lab oratory 10 hw 30-Mar-00
Review of RTI 1.3 Multicast (cont) RD = Reliable Dist ributor Server LAN #1 F e d e r a t e (ie. Exploder) F e d e r a t e RT I tcp = TCP Connection Client RD t c p R T I RD t c p F e d e r a t e RT I RD t c p F e d e r a t e F e d e r a t e R T I RD t c p F e d e r a t e F e d e r a t e RT I RD t c p F e d e r a t e RT I RD t c p RT I RD t c p LAN #3 R T I RD t c p LAN # 2 MIT L inco ln Lab oratory 11 hw 30-Mar-00
TCP Alone is NOT Sufficient • TCP / IP does NOT provide true Applic ation-to-Appli cati on Flow Control – Single c onnected pair F e d era t e F e d era t e would require “Blocking R T I R T I Send” – Exploder bre aks any F e d era t e F e d era t e Ex p l od er end-to-end flow control R T I R T I provided by TCP MIT L inco ln Lab oratory 12 hw 30-Mar-00
Throughput Imbalance (a) • Many Federati on Scenarios can l ead to Ov erl oaded N etwork Fede r a te F e de r a t e R TI R T I High speed processor Slow processor High volume data MIT L inco ln Lab oratory 13 hw 30-Mar-00
Throughput Imbalance (b) • Many Federati on Scenarios can l ead to Ov erl oaded N etwork F e d era t e F e d e r a t e F e d era t e R T I R T I R T I F e d era t e R T I Federate that receives data from numerous high volume Federates MIT L inco ln Lab oratory 14 hw 30-Mar-00
Throughput Imbalance (c) • Many Federati on Scenarios can l ead to Ov erl oaded N etwork F e d er a t e F e d era t e L AN #1 R T I R T I L AN # 2 F e d e r a t e Receiving Federate R T I at end of sl ow network l ink MIT L inco ln Lab oratory 15 hw 30-Mar-00
Throughput Imbalance (d) • Ov erl oaded Execution typical scenari o wit hou t “Appli cati on to Ap pli cati on” Fl ow Con trol – Receive r can’t proce ss incoming msgs – Buffers in LRCs and ker nel begin to fill up – Federates, or entire Ex ecution, slows to crawl as Federa tes try to clea r buffers – Deadlock / deadly e mbrace MIT L inco ln Lab oratory 16 hw 30-Mar-00
Flow Control Design in RTI 1.3 • Regu lat e message throughp ut level – Preve nt Federates from se nding new data based on RTI interna l state ‘ Squelch / ClearToSend ‘ LRC grabs control of tick from Federate – Hysteresi s in sys tem prevents “thras hing” • Out of B and hand shake protocol – Full TCP buffers do not impede protocol MIT L inco ln Lab oratory 17 hw 30-Mar-00
Monitor Internal Queues • Each Federate ’s LRC has two message queues – Receive Queue stores messa ges: ‘ After tick(m in, max ) expires ‘ During save / restore – Send Queue stores messages: ‘ After tick(m in, max ) expires ‘ As rem ote LRC’s buffers fill across Execution MIT L inco ln Lab oratory 18 hw 30-Mar-00
Monitor Internal Queues (cont) • Relia ble Di stri butor message queu es – One Send Q ueue for each Federate Client – One Send Q ueue for each remote Reliable Distributor – Stores mes sages i f clients’ buffers full • Squelch Mode acti vated when any Queu e exce eds threshold MIT L inco ln Lab oratory 19 hw 30-Mar-00
Out-of-Band Messaging Link • Squelch / Clea rToSend internal state commu nicat ed between RTI p air s – Federate to Rel iable Distributor – Reliable Distributor to Reliable Dis tributor • UDP “point to point” l ink betwee n pairs – Best Effort – Not subject to TCP congestion – Heartbeat Status (fail-sa fe time out) MIT L inco ln Lab oratory 20 hw 30-Mar-00
Out-of-Band Link (cont) • UDP l ink i n parall el to every TCP connection F e d erate F e d e r ate R T I RD tc p F e d erate R T I tc p R T I tc p MIT L inco ln Lab oratory 21 hw 30-Mar-00
Squelch Mode • Federat e’s LRC enters Squelch Mode i f: – Rcv Msg Queue over threshold, or – Snd Msg Queue ov er threshold, or – Receive d Remote Squelch message ‘ or: Clear ToSend times out F e d e r ate R T I Rcv M s g Q u e u e t c p S n d M s g Q u e u e Rem o te C l e a r T o Se n d MIT L inco ln Lab oratory 22 hw 30-Mar-00
Squelch Mode (cont) • Relia ble Di stri butor enters Squelch Mode if a ny of i ts Cli ents: – Snd Msg Queue ov er threshold – Receive d Remote Squelch message ‘ or: Clear ToSend times out R T I Re l i a b l e D i str i b u t o r t c p t c p tc p S n d M s g Q u e u e Rem o te C l e a r T o Se n d MIT L inco ln Lab oratory 23 hw 30-Mar-00
Squelch Mode (cont) • Federat e prevented from sending new messages when LRC in Squelch Mode – LRC grabs control of tick until Cle arToSend • Bu il t in hysteresi s / la tency prevents “thrashing” Squelch Squelch Threshold Threshold Message SQ buffer Received on on state buffer state CTS off off Receive Message Send Message Remote RTI Queue Queue(s) Squelch Message MIT L inco ln Lab oratory 24 hw 30-Mar-00
Summary • End-to-end , “Appli cati on to Ap pli cati on”, coordinated, Fl ow Control impl emented in RTI 1.3v7 • Preven ts new messages from being sent when slower F ederates can’t keep up – Allows sustainable mess age throughput MIT L inco ln Lab oratory 25 hw 30-Mar-00
Summary (cont) • Demonstrated system wide i mprovem ent in l arge Simula tions – JTC, JTLS ‘ 9 - 11 Federates , ~15,00 0 obj ects – “Reduced” Load Tes t ‘ 5 Federates ; 10,000 objects; updated every tick • Ti me Managed Simulati ons no t i mpacted • Real Ti me require s adequate resources MIT L inco ln Lab oratory 26 hw 30-Mar-00
Recommend
More recommend