sql server on linux will it perform
play

SQL Server on Linux, will it perform? Slava Oks Thank You! - PowerPoint PPT Presentation

SQL Server on Linux, will it perform? Slava Oks Thank You! Microsoft Research Windows team Midori Our goal is to make SQL Server perform and scale on any platform or hardware of customers choice Prolog: Meet the PALs Intro to Drawbridge:


  1. SQL Server on Linux, will it perform? Slava Oks

  2. Thank You! Microsoft Research Windows team Midori

  3. Our goal is to make SQL Server perform and scale on any platform or hardware of customers choice

  4. Prolog: Meet the PALs

  5. Intro to Drawbridge: A container technology to achieve isolation, security and density in the cloud • Modified Windows Kernel to run in user mode, , aka Library OS or LibOS De Designed for running on Windows and leverages Pico-pr process featur ure • • Pi Pico-pr process is a NT pr process with h empty addr ddress spa pace NT process Picoprocess All 1200+ system calls block Al ocked from om user- • shared picoprocess mode ( mo (NTOS a and w win32k) 32k) address space • En Enforced ed by 35-lin line chan ange to isolated Ki KiSystemServiceHandler gdi32 address space user32 • No No perf im impac act to other processes — 800+ le leverag ages “slo low path” used by UMS ntdll Win32 calls 400+ ABI 45 • 45 n 45 new s w system c m calls a added t to p process boundary PAL NT calls calls (D (Drawbridge system calls) host ntoskr host OS security monitor win32k OS • Ev Even hard-coded traps can’t ’t break out nl ntoskrnl

  6. LibOS: A user mode runtime library exposing semantics of Windows kernel NT UM Network Stack AFD Object Process Memory Threads Manager Manager Manager I/O Wait Loader APC Pool DRTL Union FS PEB/TEB Simple Heap PAL ABI Handler Streams Sync Objects Threads Memory Manager

  7. • Storage Manager • Asynchronous I/O submitted to the host and registered with WaitPool threads • On completion WaitPool threads deliver I/Os to the original thread through APC • Original threads deliver I/Os to their final destination • Network Manager • Custom version of AFD (WinSock semantics) with a thread pool • AFD Asynchronous I/O submitted to the host and registered with WaitPool threads • On completion WaitPool threads deliver I/Os to AFD threads through APC • threads deliver network requests to original threads initiated I/O through APC • Original threads deliver I/Os to their final destination • I/O General • No proper support for Scatter/Gather

  8. • Memory Manager • Global Virtual Address Descriptor (VAD) list • Global Heap • Object Manager • Global Directory • Process Manager • Per process runtime libraries – no image sharing • Threads • APCs “injection” through polling

  9. SQL OS (SOS): A user mode runtime library providing performance, scalability and diagnostic foundation for SQL Server Memory Node CPU Node CPU Node Scheduler Scheduler Scheduler Scheduler … … … Storage Storage Storage Storage Network Network Manager Manager Manager Manager Manager Manager

  10. • Network Manager • I/O completion port/thread per CPU Node • Asynchronous delivery • Storage Manager • I/O queue per scheduler • Synchronous delivery through periodic polling • Memory Manager / Object Manager / Scheduling Manager • NUMA awareness • Partitioned heaps • Non-preemptive scheduling & User Mode Threads • Synchronization primitives

  11. Chapter 1: SQL & PALs The marriage in heaven or…

  12. SQL Server On Top Of PALs SQL SQL Se Server SQL SQLOS Win Win32 Ri Ring ng 3 Drawbridge Dr Li Lib OS PA PAL Li Linux Kern rnel Ring Ri ng 0 Te Technologies SQL SQL Li LibOS Ho Host Ex Extensio ion Object Ma Ob Mana nagem emen ent ✔ ✔ ✔ Mem Memory y Ma Mana nagem emen ent ✔ ✔ ✔ Th Threading/Scheduling ✔ ✔ ✔ Sy Synchronization ✔ ✔ ✔ I/O I/O (Disk, , Network) ✔ ✔ ✔

  13. Chapter 2: The sign is on the wall Introducing Intelligent Hacks

  14. // We can do Fast I/O if and only if it follows rules employed by SQL Server // disk I/O: which is delivered nonpreemptively through polling an overlapped // data structure // - I/O is asynchronous // - No user mode APC required // - No I/O completion port specified // - Contains an event to be signaled (leveraged by SQL Server to wake up idle scheduler // - Disk I/O // canDoFastIO = WaitForCompletion == FALSE; canDoFastIO = canDoFastIO && (ApcRoutine == NULL && FileObject != NULL); • Ker ernel el aio aio canDoFastIO = canDoFastIO && (Args->SkipCompletionPort || NtpGetCompletionPortObject(FileObject, • Pum Pump p thr hreads ds vs &CompletionKey) == NULL); WaitPool th Wa threads canDoFastIO = canDoFastIO && (Args->EventObject != NULL && IoStatusBlock != NULL); canDoFastIO = canDoFastIO && (NtpGetObjectType(Args->Object) == NTUM_FILE && • Fa Fast I/O NtpIsIoAsynchronous(Args->Object)); canDoFastIO = canDoFastIO && ((FileObject->Type & NtpSeekableFile) && (Type == NTUM_IO_READ || Type == NTUM_IO_WRITE || Type == NTUM_IO_WRITE_GATHER || Type == NTUM_IO_READ_SCATTER)); // If it is Gather/Scatter I/O then length can't exceed DK_UIO_MAXIOV supported by the Host // canDoFastIO = canDoFastIO && (!(Type == NTUM_IO_WRITE_GATHER || Type == NTUM_IO_READ_SCATTER) || Length <= DK_UIO_MAXIOV);

  15. // Complete I/Os received via the the IOPort are submitted to the I/O // completion port queue Status = NtpTryToProcessIoCompletion(IoCompletionPort, IoCompletionInformation); // Process any APCs or interruptions for this thread. // NtpProcessKernelApc(threadObject); Request.IOPort = IoCompletionPort->IOPort; Request.PendingIOs = &PendingIOs; Status = DrtlReadStreamSync(IoCompletionPort->Stream, 0, • Pum Pump p thr hreads ds vs Wa WaitPool 0, (PVOID)&Request, • Fa Fast I/ I/O ~ AFD pas pass th through NULL); while (PendingIOs != NULL) • SQ SQLOS OS co completion threads { // are pump ar mp thread ads ~ no // Remember I/O to complete and move to the next I/O before // we complete the current one since by the time we return from conte co text switch on // completion routine the completed I/O will be freed // completion co CompletedIO = PendingIOs; PendingIOs = (PDK_ASYNC_RESULTS_LINKED)PendingIOs->Next; // // Complete I/O // NtpCompleteNetworkIoRequest((PNTUM_IO_REQUEST)CompletedIO->Request); }

  16. PVOID DrtlAllocate( __in ULONG Flags, __in SIZE_T Size, __in ULONG Tag ) { ULONG heapIdx; // // Early boot we might not have a thread // heapIdx = DrtlGetCurrentThreadId() % g_DrtlNumberHeaps; return DrtlpAllocate(&g_DrtlHeaps[heapIdx], Flags, Size, Tag); • Mu Multiple Heaps } NtpAllocateIORequestRaw( • I/ I/O Reque quest free lis list pe per thr hread ad __in NTUM_IO_TYPE Type) { • Pe Per process Virtual Ad Address // Use cache if we have i/O request Space Sp e Manager er // LocalRequest = (PNTUM_IO_REQUEST)ExpInterlockedPopEntrySList( &RequestingThread->IORequestsCache); • NU NUMA support rt // If the cache was empty allocate a new request structure. // • Pr Proce cessor Af Affini nity if (LocalRequest == NULL) { LocalRequest = (PNTUM_IO_REQUEST)ExAllocatePoolWithTag( PagedPool, sizeof(*LocalRequest), ' PRI'); }

  17. Chapter 3: Pressure is On

  18. Ha Hardware e Configuration Power Settings: OS Control power option, , High Performance in OS, , HT OFF, , Turbo boost OFF Ne Netwo work: 1x 1x10 10 GB Ne Netwo work connection per mac achine Ma Machine co configuration (server and cl client): Ge Gen3 systems Mo Model/Processors: Intel Xeon CPU E5-2660 0 @ 2.20 GHz (2S/16C), , Memory: 128 GB St Storage: 4x447.13 GB SSD SSDs. All SSD SSDs are striped together and mounted as 1 volume. Both data an and log ar are stored on this volume.

  19. Ha Hardware e Configuration Power Settings: OS Control power option, , High Performance in OS, , HT OFF, , Turbo boost OFF Ne Netwo work: 1x 1x10 10 GB Ne Netwo work connection per mac achine Ma Machine co configuration (server and cl client): 4S systems (for TPCC test) Mo Model/Processors: Intel Xeon CPU E7-4850 0 @ 2.00 GHz (4S/40C), , Memory: 768 GB Da Data S Storage: 2 : 2x1.46 T TB G GB F Fusion I IO d disk. A All d disks a are s striped t together a and m mounted a as 1 1 v volume. Lo Log Storage: 1x5.54 TB HDD

  20. Chapter 4: The ultimate PAL

  21. Introducing SQLPAL SQL SQL Se Server Principles: Pr Wi Win3 n32 • Re Remove re redundancy Li Lib-OS OS Ring 3 Ri SOSv2 SO • Op Optimize Perfor ormance critical paths s (I/O) O) SQ SQLPAL • Sh Shrink code pa path-le length Li LibOS and Win32 Host Ex Ho Exten ension Li Linux Kernel Ri Ring 0 Te Technologies SQ SQL SO SOSv2 Host Ex Ho Exten ension ❌ ❌ Object Ma Ob Management ✔ Memory Me ry Ma Management ❌ ✔ Ho Host translation (je jemallo alloc) ✔ ❌ Threading/Scheduling Th ✔ Ho Host translation (pt pthreads) ✔ ❌ Sy Synchronization ✔ Ho Host translation (condition variables es) ✔ ❌ I/ I/O (Disk, , Network) ✔ Ho Host translation (ka kaio) ✔

Recommend


More recommend