C.12 Protocols, Ports, and Sockets
Once data is routed through the network and delivered to a specific host, it must be delivered to the correct user or process. As the data moves up or down the layers of TCP/IP , a mechanism is needed to deliver data to the correct protocols in each layer. The system must be able to combine data from many applications into a few transport protocols, and from the transport protocols into IP . Combining many sources of data into a single data stream is called multiplexing . Data arriving from the network must be demultiplexed - divided for delivery to multiple processes. To accomplish this, IP uses protocol numbers to identify transport protocols, and the transport protocols use port numbers to identify applications.
Some protocol and port numbers are reserved to identify well-known services. Well-known services are standard network protocols, such as FTP and Telnet that are commonly used throughout the network. The protocol numbers and port numbers allocated to well-known services are documented in the Assigned Numbers RFC . UNIX systems define protocol and port numbers in two simple text files, /etc/protocols and /etc/services .
C.12.1 Protocol Numbers
The protocol number is a single byte in the third word of the datagram header. The value identifies the protocol in the layer above IP to which the data should be passed.
On a UNIX system, the protocol numbers are defined in the /etc/protocols file. This file is a simple table containing the protocol name and the protocol number associated with that name. The format of the table is a single entry per line, consisting of the official protocol name, separated by white space from the protocol number. The protocol number is separated by white space from the "alias" for the protocol name. Comments in the table begin with #. An /etc/protocol file is shown below.
% cat /etc/protocols # # @(#)protocols 1.8 88/02/07 SMI # # Internet (IP) protocols # ip 0 IP # internet protocol, pseudo protocol number icmp 1 ICMP # internet control message protocol igmp 2 IGMP # internet group multicast protocol ggp 3 GGP # gateway-gateway protocol tcp 6 TCP # transmission control protocol pup 12 PUP # PARC universal packet protocol udp 17 UDP # user datagram protocol
The listing shown above is the contents of the /etc/protocols file from an actual workstation. This list of numbers is by no means complete. If you refer to the Protocol Numbers section of the Assigned Numbers RFC (which itself gets a new RFC number every time it is updated; that's why we don't give you the RFC number for it here), you'll see many more protocol numbers. However, a system only needs to include the numbers of the protocols it actually uses. Even the list shown above is more than this specific workstation needed, but the additional entries do no harm.
What exactly does this table mean? When a datagram arrives and its destination address matches the local IP address, the IP layer knows the datagram has to be delivered to one of the transport protocols above it. To decide which protocol should receive the datagram, IP looks at the datagram's protocol number. Using this table you can see that, if the datagram's protocol number is 6, IP delivers the datagram to TCP . If the protocol number is 17, IP delivers the datagram to UDP . TCP and UDP are the two transport layer services we are concerned with, but all of the protocols listed in the table use IP datagram delivery service directly. Some, such as ICMP and GGP , have already been mentioned. You don't need to be concerned with these minor protocols, but IGMP is an extension to IP for multicasting explained in RFC 988, and PUP is a packet protocol similar to UDP .
C.12.2 Port Numbers
After IP passes incoming data to the transport protocol ( TCP or UDP ), the transport protocol passes the data to the correct application process. Application processes (also called network services are identified by port numbers, which are 16-bit values. The source port number , which identifies the process that sent the data, and the destination port number , which identifies the process that is to receive the data, are contained in the first header word of each TCP segment and UDP packet.
On UNIX systems, port numbers are defined in the /etc/services file. There are many more network applications than there are transport layer protocols, as the size of the table shows. Port numbers below 256 are reserved for well-known services (like FTP and Telnet) and are defined in the Assigned Numbers RFC . Ports numbered from 256 to 1024 are used for UNIX -specific services, which are services like rlogin , that were originally developed for UNIX systems. However, most of them are no longer UNIX -specific.
Port numbers are not unique between transport layer protocols; the numbers are only unique within a specific transport protocol. In other words, TCP and UDP can, and do, both assign the same port numbers. It is the combination of protocol and port numbers that uniquely identifies the specific process the data should be delivered to.
A partial /etc/services file is shown below. The format of this file is very similar to the /etc/protocols file. Each single-line entry starts with the official name of the service, separated by white space from the port number/protocol pairing associated with that service. The port numbers are paired with transport protocol names, because different transport protocols may use the same port number. An optional list of aliases for the official service name may be provided after the port number/protocol pair.
peanut% cat /etc/services # # @(#)services 1.12 88/02/07 SMI # # Network services, Internet style # echo 7/udp echo 7/tcp ftp-data 20/tcp ftp 21/tcp telnet 23/tcp smtp 25/tcp mail time 37/tcp timserver time 37/udp timserver domain 53/udp domain 53/tcp # # Host specific functions # finger 79/tcp nntp 119/tcp usenet # Network News Transfer ntp 123/tcp # Network Time Protocol # # UNIX specific services # exec 512/tcp login 513/tcp shell 514/tcp cmd # no passwords used biff 512/udp comsat who 513/udp whod syslog 514/udp talk 517/udp route 520/udp router routed
This table, combined with the /etc/protocols table, provides all of the information necessary to deliver data to the correct application. A datagram arrives at its destination based on the destination address in the fifth word of the datagram header. IP uses the protocol number in the third word of the datagram header, to deliver the data from the datagram, to the proper transport layer protocol. The first word of the data delivered to the transport protocol contains the destination port number that tells the transport protocol to pass the data up to a specific application. Figure 13.21 shows this delivery process.
Figure 13.21: Protocol and port numbers
Well-known ports are standardized port numbers that enable remote computers to know which port to connect to for a particular network service. This simplifies the connection process because both the sender and receiver know in advance that data bound for a specific process will use a specific port. For example, all systems that offer Telnet, offer it on port 23.
There is a second type of port number called a dynamically allocated port . As the name implies, dynamically allocated ports are not preassigned. They are assigned to processes when needed. The system ensures that it does not assign the same port number to two processes, and that the numbers assigned are above the range of standard port numbers.
Dynamically assigned ports provide the flexibility needed to support multiple users. If a Telnet user is assigned port number 23 for both the source and destination ports, what port numbers are assigned to the second concurrent Telnet user? To uniquely identify every connection, the source port is assigned a dynamically allocated port number, and the well-known port number is used for the destination port.
In the Telnet example, the first user is given a random source port number and a destination port number of 23 (Telnet). The second user is given a different random source port number and the same destination port. It is the pair of port numbers, source and destination, that uniquely identifies each network connection. The destination host knows the source port, because it is provided in both the TCP segment header and the UDP packet header. Both hosts know the destination port because it is a well-known port.
Figure 13.22 shows the exchange of port numbers during the TCP handshake. The source host randomly generates a source port, in this example 3044. It sends out a segment with a source port of 3044 and a destination port of 23. The destination host receives the segment, and responds back using 23 as its source port and 3044 as its destination port.
Figure 13.22: Passing port numbers
The combination of an IP address and a port number is called a socket. A socket uniquely identifies a single network process within the entire Internet. Sometimes the terms "socket" and "port number" are used interchangeably. In fact, well-known services are frequently referred to as "well-known sockets." In the context of this discussion, a "socket" is the combination of an IP address and a port number. A pair of sockets, one socket for the receiving host and one for the sending host, define the connection for connection-oriented protocols such as TCP .
Let's build on the example of dynamically assigned ports and well-known ports. Assume a user on host 18.104.22.168 uses Telnet to connect to host 22.214.171.124. Host 126.96.36.199 is the source host. The user is dynamically assigned a unique port number - 3382. The connection is made to the Telnet service on the remote host which is, according to the standard, assigned well-known port 23. The socket for the source side of the connection is 188.8.131.52.3382 ( IP address 184.108.40.206 plus port number 3382). For the destination side of the connection, the socket is 220.127.116.11.23 (address 18.104.22.168 plus port 23). The port of the destination socket is known by both systems because it is a well-known port. The port of the source socket is known, because the source host informed the destination host of the source socket when the connection request was made. The socket pair is therefore known by both the source and destination computers. The combination of the two sockets uniquely identifies this connection; no other connection in the Internet has this socket pair.
Figure 13.23 shows how clients on multiple machines can all connect to the same port on a single server. The server can tell the difference between the connections because they each involve different remote IP addresses. Even if the connections are all coming from a single remote machine, as shown in Figure 13.24 , the server can still tell them apart because each connection uses a different port number on the remote machine.
Figure 13.23: Clients on multiple hosts connecting to the same port on a server
Figure 13.24: Multiple clients on a single host connecting to the same port on a server