D D u k e S y s t t e m s Servers and A Li-le Bit of Networking Jeff Chase Duke University
Unix process view: data A process has multiple channels I/O channels (“file descriptors”) for data movement in and out of the process (I/O). stdin Process stdout The channels are typed. tty stderr Each channel is named pipe by a file descriptor . Thread socket The parent process and parent program set up and control the channels for a Program Files child (until exec ).
Servers and the cloud Where is your application? Where is your data? Where is your OS? networked server “cloud” Cloud and Software-as-a-Service (SaaS) Rapid evolution, no user upgrade, no user data management. Agile/elastic deployment on clusters and virtual cloud utility- infrastructure.
Networked services: big picture client host NIC device Internet “cloud” client kernel server hosts applications network with server Data is sent on the software applications network as messages called packets .
Sockets client int sd = socket (<internet stream>); socket gethostbyname(“www.cs.duke.edu”); <make a sockaddr_in struct> <install host IP address and port> A socket is a buffered connect (sd, <sockaddr_in>); channel for passing data between write (sd, “abcdefg”, 7); processes over a read (sd, … .); network. The socket () system call creates a socket object. • Other calls establish connections between socket pairs (e.g, connect ). • A file descriptor for a connected socket is bidirectional . • Write bytes at one end; read returns them at the other end. • The read syscall blocks if the (stream) socket is “empty”. • The write syscall blocks if the (stream) socket is “full”. • Both read and write fail if there is no valid connection. •
Sockets: client/server example request “ GET /images/fish.gif HTTP/1.1 ” reply client (initiator) server s = socket( … ); sd = socket( … ); bind(s, name); connect(sd, name); listen(s, 10); write(sd, request … ); sd = accept(s); read(sd, reply … ); read(sd, request … ); close(sd); write(sd, reply … ); close(sd);
Socket syscalls connect (c sd, <IP address and port> ). For a client: connect the socket named by descriptor c sd to a server at the specified IP address and port. Block until the connection is established. bind( sd, < … port> ) . For a server: associate the socket named by descriptor sd with a port number reachable at an IP address of the host machine. Does not block, but may fail, e.g., if some other process is already bound to the port. listen( sd, qsize ) . For a server: indicate that the socket named by descriptor sd is a server socket . When a connect request arrives for its port, establish the connection and place it on the accept queue (unless the accept queue is full). Listen does not block: it merely sets some parameters on the socket. accept ( sd, … ). For a server: accept a connection from the accept queue for the server socket named by descriptor sd . Block if the accept queue is empty. Returns the IP address and port of the client for this connection, and a new socket descriptor csd for the connection. Given a socket descriptor csd for an established connection (from a completed connect or accept ) a process may use write (or send ) to send bytes to the connection peer, and may use read (or recv ) to receive bytes sent by the peer.
catserver ... struct sockaddr_in socket_addr; sock = socket(PF_INET, SOCK_STREAM, 0); memset(&socket_addr, 0, sizeof socket_addr); socket_addr.sin_family = PF_INET; socket_addr.sin_port = htons(port); socket_addr.sin_addr.s_addr = htonl(INADDR_ANY); if ( bind (sock, (struct sockaddr *) &socket_addr, sizeof socket_addr) < 0) { perror("bind failed"); exit(1); } listen(sock, 10); while (1) { int acceptsock = accept (sock, NULL, NULL); forkme(acceptsock, prog, argv); /* fork/exec cat */ close(acceptsock); } }
Web Server Server request reply 4. parse request 9. format reply bu fg er bu fg er 1. network 5. fj le 10. network 3. kernel 8. kernel socket read socket copy copy read write Kernel 11. kernel copy from user bu fg er into network bu fg er 12. format outgoing 6. disk 2. copy arriving 7. disk packet and DMA request packet (DMA) data (DMA) Hardware Disk Interface Network Interface
Inside your Web server Server operations create socket (s) Server application bind to port number(s) (Apache, Tomcat/ Java, etc) listen to advertise port wait for client to arrive on port accept ( select / poll / epoll of ports) queue accept client connection packet queues read or recv request write or send response close client socket listen disk queue queue
Socket descriptors in Unix user space kernel space Disclaimer: this drawing is oversimplified file int fd pipe pointer socket Inbound per-process traffic descriptor tty global table “open file table” port table There’s no magic here: processes use read/write (and other syscalls) to operate on sockets, just like any Unix I/O object (“file”). A socket can even be mapped onto stdin or stdout . Deeper in the kernel, sockets are handled differently from files, pipes, etc. Sockets are the entry/exit point for the network protocol stack .
Ports • Each IP transport endpoint on a host has a logical port number (16-bit integer) that is unique on that host. • This port abstraction is an Internet Protocol concept. – Source/dest port is named in every IP packet. – Kernel looks at port to demultiplex incoming traffic. • What port number to connect to? – We have to agree on well-known ports for common services – Look at /etc/services – Ports 1023 and below are ‘ reserved’ and privileged : generally you must be root/admin/superuser to bind to them. • Clients need a return port, but it can be an ephemeral port assigned dynamically by the kernel.
Ports and packet demultiplexing The IP network carries data packets addressed to a destination node (host named by IP address) and port . Kernel network stack demultiplexes incoming network traffic: choose process/socket to receive it based on destination port. Apps with open sockets Incoming network packets Network adapter hardware aka, network interface controller (“NIC”)
Wakeup from interrupt handler return to user mode trap or fault sleep ready queue queue sleep switch wakeup interrupt Example 1: NIC interrupt wakes thread to receive incoming packets. Example 2: disk interrupt wakes thread when disk I/O completes. Example 3: clock interrupt wakes thread after N ms have elapsed. Note : it isn’t actually the interrupt itself that wakes the thread, but the interrupt handler (software). The awakened thread must have registered for the wakeup before sleeping (e.g., by placing its TCB on some sleep queue for the event).
The network stack, simplified Internet client host Internet server host User code Client Server Sockets interface (system calls) Kernel code TCP/IP TCP/IP Hardware interface (interrupts) Hardware Network Network and firmware adapter adapter Global IP Internet Note : the “protocol stack” should not be confused with a thread stack. It’s a layering of software modules that implement network protocols : standard formats and rules for communicating with peers over a network.
The Internet concept wasn’t obviously compelling, at least not to everyone. It had to be marketed, even within the tech community. In 1986, the US National Insert “Power of TCP/IP” Science Foundation (NSF) opened the door to a slide, /usr/net/87. commercial Internet (then (The poster in my office) NSFNET). IP support in sockets (Berkeley Unix) was widely used among academics. The driving force for adopting TCP/IP was a collection of Unix- oriented startups and upstarts arrayed against a few large companies with their own proprietary network standards.
Stream sockets with Transmission Control Protocol (TCP) TCP user user transmit buffers user receive buffers (application) TCP send buffers (optional) TCP rcv buffers (optional) COMPLETE SEND COMPLETE RECEIVE TCP transmit receive implementation queue queue get data window data flow flow TCP/IP protocol sender TCB TCP/IP protocol receiver ack ack outbound inbound checksum checksum packets packets network path Integrity : packets are covered by a checksum to detect errors. Reliability : receiver acks received data, sender retransmits if needed. Ordering : packets/bytes have sequence numbers , and receiver reassembles. Flow control : receiver tells sender how much / how fast to send ( window ). Congestion control : sender “guesses” current network capacity on path.
Illustration only
Who governs the Internet? IANA : a department of ICANN. ICANN : a US nonprofit organization that is responsible for the coordination of … unique identifiers related to the namespaces of the Internet, and ensuring the network's stable and secure operation.
Recommend
More recommend