TCP/IP Introduction Part - 1

Started by VelMurugan, Jul 14, 2008, 05:44 PM

Previous topic - Next topic

VelMurugan

TCP/IP Introduction

I keep seeing requests on various newsgroups for an introduction to TCP/IP. I also get such requests locally. I believe that the only appropriate description of TCP/IP is the RFC's. However I also think a brief introduction is likely to be helpful before plowing right into them. The following document is an attempt to do that. It also recommends some RFC's to look at and tells you how to get them.

This document is a brief introduction to TCP/IP, followed by advice on what to read for more information. This is not intended to be a complete description, but merely enough of an introduction to allow you to start reading the RFC's. At the end of the document there will be a list of the RFC's that we recommend reading.

TCP/IP is a set of protocols developed to allow cooperating computers to share resources across a network. It was developed by a community of researchers centered around the ARPAnet. Certainly the ARPAnet is the best-known TCP/IP network. However as of June, 87, at least 130 different vendors had products that support TCP/IP, and thousands of networks of all kinds use it.

First some basic definitions. Although TCP/IP (or IP/TCP) seems to be the most common term these days, most of the documentation refers to the Internet protocols. The Internet is a collection of networks, including the Arpanet, NSFnet, regional networks such as NYsernet, local networks at a number of University and research institutions, and a number of military networks. The term Internet applies to this entire set of networks. The subset of them which is managed by the Department of Defense is referred to as the DDN (Defense Data Network).

This includes some research-oriented networks, such as the Arpanet, as well as more strictly military ones. (Because much of the funding for Internet protocol developments is done via the DDN organization, the terms Internet and DDN can sometimes seem equivalent.) All of these networks are connected to each other, and users can send messages from any of them to any other (except where security or other policy restrictions control access). Officially speaking, the Internet protocol documents are simply standards adopted by the Internet community for its own use. More recently, the Department of Defense issued a MILSPEC definition of TCP/IP. This was intended to be a more formal definition, appropriate for use in purchasing specifications. However most of the TCP/IP community continues to use the Internet standards. The MILSPEC version is intended to be consistent with it.

Whatever it is called, TCP/IP is a family of protocols. A few are basic ones used for many applications. These include IP, TCP, and UDP. Others are protocols for doing specific tasks, e.g. transferring files between computers, sending mail, or finding out who is logged in on another computer. Any real application will use several of these protocols. A typical situation is sending mail. First, there is a protocol for mail. This defines a set of commands which one machine sends to another, e.g. commands to specify who the sender of the message is, who it is being sent to, and then the text of the message.

However this protocol assumes that there is a way to communicate reliably between the two computers. Mail, like other application protocols, simply defines a set of commands and messages to be sent. It is designed to be used together with TCP and IP. TCP is responsible for making sure that the commands get through to the other end. It keeps track of what is sent, and retransmitts anything that did not get through. If any message is too large for one packet, e.g. the text of the mail, TCP will split it up into several packets, and make sure that they all arrive correctly. Since these functions are needed for many applications, they are put together into a separate protocol, rather than being part of the specifications for sending mail. You can think of TCP as forming a library of routines that applications can use when they need reliable network communications with another computer.

Similarly, TCP calls on the services of IP. Although the services that TCP supplies are needed by many applications, there are still some kinds of applications that don't need them. However there are some services that every application needs. So these services are put together into IP. As with TCP, you can think of IP as a library of routines that TCP calls on, but which is also available to applications that don't use TCP. This strategy of building several levels of protocol is called layering. We think of the applications programs such as mail, TCP, and IP, as being separate layers, each of which calls on the services of the layer below it. Generally, TCP/IP applications use 4 layers:

    * an application protocol such as mail
    * a protocol such as TCP that provides services need by many applications
    * IP, which provides the basic service of getting packets to their destination
    * the protocols needed to manage a specific physical medium, such as Ethernet or a point to point line.

TCP/IP is based on the catenet model. (This is described in more detail in ien-48.txt.) This model assumes that there are a large number of independent networks connected together by gateways. The user should be able to access computers or other resources on any of these networks. Packets will often pass through a dozen different networks before getting to their final destination. The routing needed to accomplish this should be completely invisible to the user. As far as the user is concerned, all he needs to know in order to access another system is an Internet address. This is an address that looks like 128.6.4.194. It is actually a 32-bit number.

However it is normally written as 4 decimal numbers, each representing 8 bits of the address. (The term octet is used by Internet documentation for such 8-bit chunks. The term byte is not used, because TCP/IP is supported by some computers that have byte sizes other than 8 bits.) Generally the structure of the address gives you some information about how to get to the system. For example, 128.6 is a network number assigned by a central authority to Rutgers University. Rutgers uses the next octet to indicate which of the campus Ethernets is involved. 128.6.4 happens to be an Ethernet used by the Computer Science Department. The last octet allows for up to 254 systems on each Ethernet. Note that 128.6.4.194 and 128.6.5.194 would be different systems. (The structure of an Internet address is described in a bit more detail later.)

Of course we normally refer to systems by name, rather than by Internet address. When we specify a name, the network software looks it up in a database, and comes up with the corresponding Internet address. Most of the network software deals strictly in terms of the address. (rfc-882.txt describes the database used to look up names.)

TCP/IP is a connectionless protocol. Information is transfered in packets. Each of these packets is sent through the network individually. There are provisions to open connections to systems. However at some level, information is put into packets, and those packets are treated by the network as completely separate. For example, suppose you want to transfer a 15000 octet file. Most networks can't handle a 15000 octet packet. So the protocols will break this up into something like 30 500-octet packets. Each of these packets will be sent to the other end. At that point, they will be put back together into the 15000-octet file. However while those packets are in transit, the network doesn't know that there is any connection between them. It is perfectly possible that packet 14 will actually arrive before packet 13. It is also possible that somewhere in the network, an error will occur, and a packet won't get through at all. In that case, that packet has to be sent again. In fact, there are two separate protocols involved in doing this.

TCP (the transmission control protocol) is responsible for breaking up the message into packets, reassembling them at the other end, resending anything that gets lost, and putting things back in the right order. IP (the internet protocol) is responsible for routing individual packets. It may seem like TCP is doing all the work. And in small networks that is true. However in the Internet, simply getting a packet to its destination can be a complex job. A connection may require the packet to go through several networks at Rutgers, a serial line to the John von Neuman Supercomputer Center, a couple of Ethernets there, a series of 56Kbaud phone lines to another NSFnet site, and more Ethernets on another campus. Keeping track of the routes to all of the destinations and handling incompatibilities among different transport media turns out to be a complex job. Note that the interface between TCP and IP is fairly simple. TCP simply hands IP a packet with a destination. IP doesn't know how this packet relates to any packet before it or after it.

It may have occured to you that something is missing here. We have talked about Internet addresses, but not about how you keep track of multiple connections to a given system. Clearly it isn't enough to get a packet to the right destination. TCP has to know which connection this packet is part of. This task is referred to as demultiplexing. In fact, there are several levels of demultiplexing going on in TCP/IP. The information needed to do this demultiplexing is contained in a series of headers. A header is simply a few extra octets tacked onto the beginning of a packet by some protocol in order to keep track of it. It's a lot like putting a letter into an envelope and putting an address on the outside of the envelope. Except with modern networks it happens several times. It's like you put the letter into a little envelope, your secretary puts that into a somewhat bigger envelope, the campus mail center puts that envelope into a still bigger one, etc. Here is an overview of the headers that get stuck on a message that passes through a typical TCP/IP network:

We start with a single data stream, say a file you are trying to send to some other computer:

   ......................................................

TCP breaks it up into managable chunks. (In order to do this, TCP has to know how large a packet your network can handle. Actually, the TCP's at each end say how big a packet they can handle, and then they pick the smallest size.)

   ....   ....   ....   ....   ....   ....   ....   ....