TCP
1. Introduction
At the Transport Layer (equivalent to Layer 4 in the OSI model), two protocols exist:
- TCP (Transmission Control Protocol) - breaks information into datagrams and sends them, carrying
out resends, if required, and reassembles received datagrams, it gives 'reliable' delivery, a connection-oriented
service between applications.
- UDP (User Datagram Protocol) - does the same as TCP but it does not carry out any checking
or resending of datagrams, so it is described as 'unreliable', a connectionless service
(See UDP).
IP Datagrams are 'connectionless', however the TCP segment is 'connection-oriented'.
2. TCP Header
TCP allows clients to run concurrent applications using different port numbers
and at full-duplex thereby giving a multiplexing ability. TCP labels each octet of data with a Sequence
Number and a series of octets form a Segment, the sequence number of the first octet in the
segment is called the Segment Sequence Number.
TCP provides reliability with ACK packets and Flow Control using the technique of
a Sliding Window. During the setup of a TCP connection the maximum segment size is determined
based on the lowest MTU across the network.
The TCP header looks like this:
It is worth noting the following fields:
- Source and Destination ports - this identifies the upper layer applications
using the connection.
- Sequence Number - this 32-bit number ensures that data is correctly sequenced. Each byte of data
is assigned a sequence number. The first byte of data by a station in a particular TCP header will
have its sequence number in this field, say 58000. If this packet has 700 bytes of
data in it then the next packet sent by this station will have the sequence number of 58000 + 700 + 1
= 58701.
- Acknowledgment Number - this 32-bit number indicates the next sequence number that the sending
device is expecting from the other station.
- HLEN - gives the number of 32 bit words in the header. Sometimes called
the Data Offset field.
- Reserved - always set to 0.
- Code bits - these are flags that indicate the nature of the header. They
are:
- URG - Urgent Pointer
- ACK - Acknowledgement
- PSH - Push function, causes the TCP sender to push all unsent data to the receiver
rather than sends segments when it gets around to them i.e. when the buffer is full.
- RST - Reset the connection
- SYN - Synchronise sequence numbers
- FIN - End of data
- Window - indicates the range of acceptable sequence numbers beyond the last segment
that was successfully received. It is the allowed number of octets that
the sender of the ACK is willing to accept before an acknowledgement.
- Urgent Pointer - shows the end of the urgent data so that interrupted
data streams can continue. When the URG bit is set, the data is given priority over
other data streams.
- Option - mainly only the TCP Maximum Segment Size (MSS) sometimes called
Maximum Window Size or Send Maximum Segment Size (SMSS).
A segment is a series of data bytes within a TCP header.
3. Port Numbers
Applications open Port numbers, used by TCP and UDP to keep tabs of different communications occurring
around the network. Generally, port numbers below 255 were originally for public applications
(Assigned Internet protocol numbers); 255 is reserved.
Port numbers 256 to 1023 are for saleable applications by various manufacturers and are considered as 'Privileged', 'Well-Known' or
Extended Assigned port numbers.
Port numbers above 1024 (1024 is reserved)
are not regulated, are considered as Unprivileged, or Registered,
and these ports are commonly free to be used used by clients talking to Well-Known port numbers.
Applications open port numbers (the TCP/IP model differs from the OSI model in that the Application
layer sits straight on top of layer 4) and communicate to each other via these port numbers.
A telnet server with IP address 10.1.1.1 uses port number 23, however if two clients operating from IP address
10.1.1.2 attach themselves to the server then the server needs to distinguish between the two conversations.
This is achieved by the clients randomly picking two port numbers above 1023, say 1024 and 1025.
The client connection is referenced as a Socket and is defined as the IP address plus the
port number, e.g. 10.1.1.1.TCP.1025 and 10.1.1.1.TCP.1026. The server socket is 10.1.1.1.TCP.23.
This is how TCP multiplexes different connections.
The following table lists some commonly used port numbers:
TCP |
Application |
Port Number |
|
FTP |
20 (Data), 21 (Control, or Program) |
|
Telnet |
23 |
|
SMTP |
25 |
|
HTTP |
80 |
UDP |
|
|
|
DNS |
53 |
|
Bootp |
67/68 |
|
TFTP |
69 |
|
NTP |
123 |
|
SNMP |
161 |
RFC 793 (TCP) and
RFC 1323 (TCP Extensions) describe TCP in detail whilst
both
RFC 1500 and
RFC 1700 define the Well-Known port numbers for both
TCP and UDP.
A good list can also be found at
https://www.iana.org/assignments/service-names-port-numbers/service-names-port-numbers.xhtml.
4. Sequence Numbers
Each octet has its own sequence number so that each one can be acknowledged if necessary. In practice octets
are acknowledged in batches, the size of which is determined by the window size (see below). The sequence number is
a 32-bit binary number, although very large there is a finite number range that is used (0 to 232-1), whereby it cycles back to
zero. In order to track the sequence numbers required for checking, the arithmetic has to be performed as modulo
232.
5. TCP Operation
5.1 Three-way Handshake
If a source host wishes to use an IP application such as active FTP for instance, it selects a port number which
is greater than 1023 and connects to the destination station on port 21. The TCP connection
is set up via three-way handshaking:
- This begins with a SYN (Synchronise) segment (as indicated by the code bit) containing a
32-bit Sequence number A called the
Initial Send Sequence (ISS) being chosen by, and
sent from, host 1. This 32-bit sequence number A is the starting sequence number of the
data in that packet and increments by 1 for every byte of data sent
within the segment, i.e. there is a sequence number for each octet sent. The SYN segment also puts the
value A+1 in the first octet of the data.
- Host 2 receives the SYN with the Sequence number A and sends a SYN
segment with its own totally independent ISS number B in the Sequence number field.
In addition, it sends an increment on the Sequence number of the last
received segment (i.e. A+x
where x is the number of octets that make up the data in this segment)
in its Acknowledgment field. This Acknowledgment number informs the recipient that its data was
received at the other end and it expects
the next segment of data bytes to be sent, to start at sequence number A+x. This stage is aften called the
SYN-ACK. It is here that the MSS is agreed.
- Host 1 receives this SYN-ACK segment and sends an ACK segment containing the next
sequence number (B+y
where y is the number of octets in this particular segment), this is called
Forward Acknowledgement and is received by Host 2. The ACK segment is identified by the
fact that the ACK field is set.
Segments that are not acknowledged within a certain time span, are retransmitted.
TCP peers must not only keep track of their own initiated Sequence numbers but also those Acknowledgment numbers of their peers.
Closing a TCP connection is achieved by the initiator sending a FIN packet. The connection only closes
when an ACK has been sent by the other end and received by the initiator.
Maintaining a TCP connection requires the stations to remember a number of different parameters
such as port numbers and sequence numbers. Each connection has this set of variables located in
a Transmission Control Block (TCB).
5.2 Piggybacking ACKs
When the receiver receives a data segment, it checks the sequence number and if it matches
the next segment that the receiver expected, then the data is received in order.
Often the Acknowledgement can be piggy-backed on to normal traffic rather than wait for a response
every time, in fact TCP may be set up to wait 200ms just to see if any data is required
to be sent, just so that it can piggyback ACKs. If the receiver does not
receive a data segment in order e.g. a packet was dropped, then the receiver sends an ACK
for the sender to retransmit the missing segment.
5.3 Transmission Timeout
Because every TCP network has its own characteristics, the delay between sending a segment and receiving
an acknowledgement varies. Different methods are available for calculating this Transmission Timeout
and will depend on the stack. TCP maintains a retransmission timer for each connection. This retransmission timer is used
when TCP expects to receive an acknowledgment from the other end. Once data is sent, TCP monitors this Retransmission Time-Out
(RTO) and also a Round Trip Time (RTT). If an ACK is not received by the time the RTO expires,
TCP retransmits the data using an exponentially increasing value for the RTO. This doubling is called an Exponential Back-Off
The RTO is calculated as a linear function of the RTT and its value changes
over time with changes in routing and traffic load. Typically RTT+4*mean deviation.
6. Sliding Window
6.1 Buffers
Buffers are used at each end of the TCP connection to speed up data flow when the network is busy.
Flow Control is managed using the concept of a Sliding Window. A Window is the maximum number of
unacknowledged bytes that are allowed in any one transmission sequence,
or to put it another way, it is the range of sequence numbers across the whole chunk of data
that the receiver (the sender of the window size) is prepared to accept in its buffer.
The receiver specifies the current Receive Window size in every packet sent to the sender.
The sender can send up to this amount of data before it has to wait for an update on the Receive
Window size from the receiver.
The sender has to buffer all its own sent data until it receives ACKs for that data.
The Send Window size is determined by whatever is the smallest between the Receive Window and
the sender's buffer. When TCP transmits a segment, it places a copy of the data in a retransmission
queue and starts a timer. If an acknowledgment is not received for that segment (or a part of that
segment) before the timer runs out, then the segment (or the part of the segment that was not
acknowledged) is retransmitted.
6.2 Sliding Window Operation
- The current sequence number of the TCP sender is y.
- The TCP receiver specifies the current negotiated window size x in every packet.
This often specified by the operating system or the application, otherwise it starts at 536 bytes.
- The TCP sender sends a datagram with the number of data bytes equal to the receiver's window size
x and waits for an ACK
from the receiver. The window size can be many thousands of bytes!
- The receiver sends an ACK with the value y + x i.e. acknowledging that the last
x bytes have been received OK
and the receiver is expecting another transmission of bytes starting at byte y + x.
- After a successful receipt, the window size increases
by an additional x, this is called the Slow Start for new connections.
- The sender sends another datagram with 2x bytes, then 3x bytes and so on up to the
MSS as indicated in the TCP Options.
- If the receiver has a full buffer, then the window size is reduced to zero. In this state, the
window is said to be
Frozen and the sender cannot send any more bytes until it receives a datagram from the receiver
with a window size greater than zero.
- If the data fails to be received as determined by the timer which is set as soon as data is set
until receipt
of an ACK, then the window size is cut by half e.g. from 4x to 2x. Failure could be
due to congestion e.g. a full buffer on the receiver, or faults on the media.
- On the next successful transmission, the slow ramp up starts again.
RFC 813 describes strategies for TCP windows.
6.3 Window Size
The window size could be used up in one go if a segment was large enough, however
normally the window is used up by several segments of hundreds of bytes each.
A Window size of one means that each byte of data is required to be acknowledged before the next
one is sent. This is inefficient and therefore the window size is often much larger and is
normally a Sliding Window (as described earlier) which is dynamically negotiated during a TCP session depending on the
number of errors that occur in a connection. The 'sliding' element describes the octets that are allowed
to be transmitted from a stream of octets that form a chunk of data. As the transmission of this
chunk of data progresses, the window slides along the octets as octets are transmitted and acknowledged
i.e. as data is acknowledged the window advances along the data octets.
When the sender receives an ACK, this determines where the trailing edge of the window sits.
The Receive Window size determines where the leading edge of the window sits.
As the window slides along, any unsent data can be sent immediately as this implies
that there is room in the receiver buffer.
If the window size is slowly decreasing then it shows that the application
is slow to take the data off the TCP stack. If the receiver indicates a window size of 0, then the
sender cannot send any more bytes until the receiver sends a packet with a window size greater than 0.
Take the scenario where the sender has a sequence of bytes to send, say numbered 1 to 20, to a
receiver who has a window size of ten. The sender then would place a window around the first ten bytes and transmit them
in one go. It would then wait for an acknowledgment. The receiver then sends an ACK of 11 meaning that it successfully
received the first 10 bytes, and is now expecting byte 11. At this point, the sender moves the sliding window (of size 10)
10 bytes along to cover bytes 11 to 20. The sender then transmits these 10 bytes in one go.
Applications determine the initial window size and you can see this size for each
device at the initial synchronisation (the three-way handshake). Windows uses 8760 bytes for Ethernet
by default, although this can be changed in the registry. The number 8760 is 6 x 1460 which is the amount
of data a full Ethernet frame can carry and is the MSS for Ethernet by default, which is shared during the synchronisation.
When sizing a window, 6-8 times the packet size is considered the most
efficient. In the old days of the Internet (early 1980s) when protocols such as X.25 were prevalent, users were often advised
to assume a much smaller datagram size of 576 (from RFC 791),
although no longer necessary, you may come across smaller MSS and window size settings as a result.
The less errors that occur on the network, the larger the window is allowed to get and the more bandwidth is used for data.
The only problem with a large window size is that if there is a transmission failure at any point, the whole
segment has to be resent thereby taking up bandwidth anyway.
One thing to be aware of with TCP protocols is the slow ramping up of the window size.
For instance, if you are sending a 10Mb file using FTP, it may take 1Mb of transfer
before the transfer occurs at optimum speed. This is because the window size starts off
small so that much of the initial traffic is header rather than data. Downloading small
files using FTP does not reach the optimum data download speed, downloading large files
is more efficient. This mechanism is called Slow Start and is outlined in
RFC 2001.
The window size is the maximum number of bytes of data that can be transmitted in one segment without acknowledgement.
Another way of looking at this is that the window size decides the amount of data that can be sent within the RTT.
Here are some examples:
- An 8KB window size would take 32ms to be transmitted on a 2Mbps serial link ((8192 * 8)/2048000 = 0.032s).
The RTT is therefore 64ms. So for every 64ms, 8KB is transmitted because packets can only be sent for 32ms of that time
as we are having to await for ACKs
i.e. we are not able to use the full capability of the bandwidth. Multiply this up and we find that
an 8KB window gives us a maximum data throughput of 8192 * 8 * 1000/64 = 1024000bps (1Mbps), irrespective
of the potential speed of the link.
- An 8KB window size would take 400ms on a satellite link one way.
The RTT is therefore 800ms. So for every 800ms, 8KB is transmitted because packets can only be sent for 400ms of that time
as we are having to await for ACKs
i.e. again we are not able to use the full capability of the bandwidth. Multiply this up and we find that
an 8KB window gives us a maximum data throughput of 8192 * 8 * 1000/800 = 81920bps or about 82kbps, irrespective
of the potential speed of the link. This is because of the enormous delay.
- An 8KB window size would take 7s to be transmitted on a 9600bps serial link ((8192 * 8)/9600 = 6.83s). Most
of the 8KB window will be buffered because of the serialisation delay as bits are sent much more slowly.
The TCP 16-bit window size field allows a maxmimum size of 65535 bytes for the window size so 64KB can
be sent every RTT. For a satellite link with 800ms RTT the maximum throughput with this maximum
sized window is given by 65535 * 8 * 1000/800 = 655350bps or about 660Kbps. An expensive 2Mbps
satellite link would not be fully utilised. One of the TCP Options allows you to scale the window
size up to a 30-bit field this is the Window Scale Option described in
RFC 1323.
It would be preferable to have a window size appropriate to the size of the link. There would
be less buffering, the ACKs would return more quickly and more of the bandwidth would be used.
Ideally you are looking for a Window Size >= Bandwidth * RTT. So a 128Kbps serial line
with a RTT of 40ms would require a Window size of at least 128000/8 * 0.04 = 640 bytes.
Similarly, a 2Mbps link with a 20ms RTT would need a window size of at least
2000000/8 * 0.02 = 5000 bytes. So a 128Kbps satellite link
with a RTT of 800ms would require a Window size of at least 128000/8 * 0.8 = 12800 bytes.
A technique such as this (although more complex)
is used by the Packeteer product that spoofs the TCP connections between client and server and modifies the window sizes
according to the characteristics of the links between them.
7. TCP Segment Transfer Example
Consider the following TCP segment transfer. This has been laid out in a similar format to that which you would see
from a Network trace displayed in two-station format. We are just concentrating on the TCP sequence numbers and window
sizes:
Type of segment |
160.221.172.250 |
160.221.73.26 |
SYN |
Seq.no. 17768656 |
|
|
(next seq.no. 17768657) |
|
|
Ack.no. 0 |
|
|
Window 8192 |
|
|
LEN = 0 bytes |
|
SYN-ACK |
|
Seq.no. 82980009 |
|
|
(next seq.no. 82980010) |
|
|
Ack.no. 17768657 |
|
|
Window 8760 |
|
|
LEN = 0 bytes |
ACK |
Seq.no. 17768657 |
|
|
(next seq.no. 17768657) |
|
|
Ack.no. 82980010 |
|
|
Window 8760 |
|
|
LEN = 0 bytes |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Seq.no. 17768657 |
|
|
(next seq.no. 17768729) |
|
|
Ack.no. 82980010 |
|
|
Window 8760 |
|
|
LEN = 72 bytes of data |
|
|
|
Seq.no. 82980010 |
|
|
(next seq.no. 82980070) |
|
|
Ack.no. 17768729 |
|
|
Window 8688 |
|
|
LEN = 60 bytes of data |
|
Seq.no. 17768729 |
|
|
(next seq.no. 17768885) |
|
|
Ack.no. 82980070 |
|
|
Window 8700 |
|
|
LEN = 156 bytes of data |
|
|
|
Seq.no. 82980070 |
|
|
(next seq.no. 82980222) |
|
|
Ack.no. 17768885 |
|
|
Window 8532 |
|
|
LEN = 152 bytes of data |
FIN |
Seq.no. 17768885 |
|
|
(next seq.no. 17768886) |
|
|
Ack.no. 82980222 |
|
|
Window 8548 |
|
|
LEN = 0 bytes |
|
FIN-ACK |
|
Seq.no. 82980222 |
|
|
(next seq.no. 82980223) |
|
|
Ack.no. 17768886 |
|
|
Window 8532 |
|
|
LEN = 0 bytes |
ACK |
Seq.no. 17768886 |
|
|
(next seq.no. 17768886) |
|
|
Ack.no. 82980223 |
|
|
Window 8548 |
|
|
LEN = 0 bytes |
|
The value of LEN is the length of the TCP data which is calculated by subtracting the IP and TCP header
sizes from the IP datagram size.
- The session begins with station 160.221.172.250 initiating a SYN containing the sequence number
17768656 which is the ISS. In addition,
the first octet of data contains the next sequence number 17768657. There are only zeros in the
Acknowledgement number field as this is
not used in the SYN segment. The window size of the sender starts off as 8192 octets as assumed to be acceptable to the receiver.
- The receiving station sends both its own ISS
(82980009) in the sequence number field and acknowledges the sender's sequence number by incrementing
it by 1 (17768657) expecting this to be the starting sequence number of the data bytes that will be sent next by the sender. This is called
the SYN-ACK segment. The receiver's window size starts off as 8760.
- Once the SYN-ACK has been received, the sender issues an
ACK that acknowledges the receiver's ISS by incrementing it by 1 and placing it
in the acknowledgement field (82980010). The sender also sends the same sequence number that it sent previously (17768657). This
segment is empty of data and we don't want the session just to keep ramping up the sequence numbers unnecessarily. The window size of 8760
is acknowledged by the sender.
- From now on ACKs are used until just before the end of the session.
The sender now starts sending data by stating the sequence number 17768657 again since this is the sequence number of the first byte of the data
that it is sending. Again the acknowledgement number 82980010 is sent which is the expected sequence number of the first byte of data
that the receiver will send. In the above scenario, the sender is intitially sending 72 bytes of data in one segment. The network analyser may indicated
the next expected sequence number in the trace, in this case this will be 17768657 + 72 = 17768729. The sender has now agreed the window
size of 8760 and uses it itself.
- The receiver acknowledges the receipt of the data by sending back the number 17768729 in the acknowledgement number field thereby acknowledging
that the next byte of data to be sent will begin with sequence number 17768729 (implicit in this is the understanding that sequence numbers
up to and including 17768728 have been successfully received).
Notice that not every byte needs to be acknowledged. The receiver
also sends back the sequence number of the first byte of data in its own segment (82980010) that is to be sent.
The receiver is sending 60 bytes of data. The receiver subtracts 72 bytes from its previous window size of 8760 and sends 8688
as its new window size.
- The sender acknowledges the receipt of the data with the number 82980070 (82980010 + 60) in the acknowledgement number field, this being
the sequence number of the next data byte expected to be received from the receiver. The sender sends 156 bytes of data starting at sequence number
17768729. The sender subtracts 60 bytes from its previous window size of 8760 and sends the new size of 8700.
- The receiver acknowledges receipt of this data with the number 17768885 (17768729 + 156)
since it was expecting it, and sends 152 bytes of data beginning with the sequence number 82980070.
The receiver subtracts 156 bytes from the previous window size of 8688 and sends the new window size of 8532.
- The sender acknowledges this with the next expected sequence number 82980070 + 152 = 82980222 and sends the expected sequence number
17768885 in a FIN because at this point the application wants to close the session.
The sender subtracts 152 bytes from its previous window size of 8700 and sends the new size of 8548.
- The receiver sends an FIN-ACK acknowledging the FIN and increments the acknowledgement sequence number by 1 to
17768886 which is the
number it will expect on the final ACK. In addition the receiver sends the expected sequence number 82980223.
The window size remains at 8532 as no data was received from the sender's FIN.
- The final ACK is sent by the sender confirming the sequence number 17768886 and acknowledges receipt of
1 byte with the acknowledgement
number 82980223. The window size finishes at 8548 and the TCP connection is now closed.
From the above you can see that if you have applications where data flow is largely unidirectional, you can have a scenario where there could
be a long series of ACKs where the sequence numbers are the same as far as the data receiver is concerned.
Also, you may have a frozen window whilst the application catches up which means that in the meantime
acknowledgements are sent by the receiver with a window size of 0 until buffer space is freed up and an acknowledgement
is sent with the window size ramped up again, thereby allowing the sender to send data again and the sequence numbers
start increasing again.
The above example is a clean straightforward bi-directional data transfer session, however you often have multiple TCP sessions to
sort through using different ports and sequence numbers, plus in any one session segments could be resent,
sent in a row or the window is frozen due to the stack buffer being full all of which can make it interesting tracking sequence numbers.
Be aware that the ACK only has to acknowledge the last sequence number received, so if four segments have
been sent in a row, only one ACK is required. If sequence numbers do not arrive
then the whole segment is lost with all the bytes of data within it, plus any segments that may have been
sent in a row before the lost segment.
You will notice in the above example that the window size steadily decreased, this indicates that no data had been processed off
the TCP stack by the time the session had finished. On a longer session you should see the window size creep up again
as the buffer is emptied by the application. In the example the window sizes could easily be followed because the segment
packets followed each other, however most often acknowledgements do not always follow and may be acknowledging
more than one segment, this makes it more tricky to follow.
The above description details the simplest case of TCP connections, however
you can get more complex scenarios where simultaneous connections are set up, or segments get lost or resent. The judicious use
of RST (Reset) helps clean these connections up. You can follow step by step these different scenarios in
RFC 793.
8. TCP Header Compression
TCP header compression reduces the TCP header from 40 to 5 bytes.
This compression was devised by Van Jacobsen and is described in
RFC 1144.
Only a few bytes in the header change from one packet to another so the VJ algorithm
only transfers the bytes that have changed.
This should be used for protocols
such as HTTP where larger numbers of small packets are used
(e.g. keystrokes and button clicks), therefore many headers. Protocols
such as FTP normally use large packet sizes and so TCP header compression is not going
to have a significant benefit. Whenever TCP header compression is used make sure that
it is configured at both ends otherwise protocols that use TCP, such as Telnet, will not
operate.
|