What Is Usenet, Anyway?
How Does Usenet Handle News?
Netnews, or Usenet news, remains one of the most important and highly valued services on computer networks today. Dismissed by some as a mire of unsolicited commercial email and pornography, Netnews still maintains several cases of the high-quality discussion groups that made it a critical resource in pre-web days. Even in these times of a billion web pages, Netnews is still a source for online help and community on many topics.
The idea of network news was born in 1979 when two graduate students, Tom Truscott and Jim Ellis, thought of using UUCP to connect machines for information exchange among Unix users. They set up a small network of three machines in North Carolina.
Initially, traffic was handled by a number of shell scripts (later rewritten in C), but they were never released to the public. They were quickly replaced by "A News," the first public release of news software.
A News was not designed to handle more than a few articles per group and day. When the volume continued to grow, it was rewritten by Mark Horton and Matt Glickman, who called it the "B" release (a.k.a. B News). The first public release of B News was version 2.1 in 1982. It was expanded continuously, with several new features added. Its current version is B News 2.11. It is slowly becoming obsolete; its last official maintainer switched to INN.
Geoff Collyer and Henry Spencer rewrote B News and released it in 1987; this is release "C," or C News. Since its release, there have been a number of patches to C News, the most prominent being the C News Performance Release. On sites that carry a large number of groups, the overhead involved in frequently invoking relaynews, which is responsible for dispatching incoming articles to other hosts, is significant. The Performance Release adds an option to relaynews that allows it to run in daemon mode, through which the program puts itself in the background. The Performance Release is the C News version currently included in most Linux releases. We describe C News in detail in Chapter 21, C News.
All news releases up to C were primarily targeted for UUCP networks, although they could be used in other environments, as well. Efficient news transfer over networks like TCP/IP or DECNet required a new scheme. So in 1986, the Network News Transfer Protocol (NNTP) was introduced. It is based on network connections and specifies a number of commands to interactively transfer and retrieve articles.
There are a number of NNTP-based applications available from the Net. One of them is the nntpd package by Brian Barber and Phil Lapsley, which you can use to provide newsreading service to a number of hosts inside a local network. nntpd was designed to complement news packages, such as B News or C News, to give them NNTP features. If you want to use NNTP with the C News server, you should read Chapter 22, NNTP and the nntpd Daemon, which explains how to configure the nntpd daemon and run it with C News.
An alternative package supporting NNTP is INN, or Internet News. It is not just a frontend, but a news system in its own right. It comprises a sophisticated news relay daemon that can maintain several concurrent NNTP links efficiently, and is therefore the news server of choice for many Internet sites. We discuss it in detail in Chapter 23, Internet News.
One of the most astounding facts about Usenet is that it isn't part of any organization, nor does it have any sort of centralized network management authority. In fact, it's part of Usenet lore that except for a technical description, you cannot define what it is; at the risk of sounding stupid, one might define Usenet as a collaboration of separate sites that exchange Usenet news. To be a Usenet site, all you have to do is find another Usenet site and strike an agreement with its owners and maintainers to exchange news with you. Providing another site with news is called feeding it, whence another common axiom of Usenet philosophy originates: "Get a feed, and you're on it."
The basic unit of Usenet news is the article. This is a message a user writes and "posts" to the net. In order to enable news systems to deal with it, it is prepended with administrative information, the so-called article header. It is very similar to the mail header format laid down in the Internet mail standard RFC-822, in that it consists of several lines of text, each beginning with a field name terminated by a colon, which is followed by the field's value.
Articles are submitted to one or more newsgroup. One may consider a newsgroup a forum for articles relating to a common topic. All newsgroups are organized in a hierarchy, with each group's name indicating its place in the hierarchy. This often makes it easy to see what a group is all about. For example, anybody can see from the newsgroup name that comp.os.linux.announce is used for announcements concerning a computer operating system named Linux.
These articles are then exchanged between all Usenet sites that are willing to carry news from this group. When two sites agree to exchange news, they are free to exchange whatever newsgroups they like, and may even add their own local news hierarchies. For example, groucho.edu might have a news link to barnyard.edu, which is a major news feed, and several links to minor sites which it feeds news. Now Barnyard College might receive all Usenet groups, while GMU only wants to carry a few major hierarchies like sci, comp, or rec. Some of the downstream sites, say a UUCP site called brewhq, will want to carry even fewer groups, because they don't have the network or hardware resources. On the other hand, brewhq might want to receive newsgroups from the fj hierarchy, which GMU doesn't carry. It therefore maintains another link with gargleblaster.com, which carries all fj groups and feeds them to brewhq. The news flow is shown in Figure 20.1.
The labels on the arrows originating from brewhq may require some explanation, though. By default, it wants all locally generated news to be sent to groucho.edu. However, as groucho.edu does not carry the fj groups, there's no point in sending it any messages from those groups. Therefore, the feed from brewhq to GMU is labeled
all,!fj, meaning that all groups except those below fj are sent to it.
Today, Usenet has grown to enormous proportions. Sites that carry the whole of Netnews usually transfer something like a paltry 60 MB a day. Of course, this requires much more than pushing files around. So let's take a look at the way most Unix systems handle Usenet news.
News begins when users create and post articles. Each user enters a message into a special application called a newsreader, which formats it appropriately for transmission to the local news server. In Unix environments the newsreader commonly uses the inews command to transmit articles to the newsserver using the TCP/IP protocol. But it's also possible to write the article directly into a file in a special directory called the news spool. Once the posting is delivered to the local news server, it takes responsibility for delivering the article to other news users.
News is distributed through the net by various transports. The medium used to be UUCP, but today the main traffic is carried by Internet sites. The routing algorithm used is called flooding. Each site maintains a number of links (news feeds) to other sites. Any article generated or received by the local news system is forwarded to them, unless it has already been at that site, in which case it is discarded. A site may find out about all other sites the article has already traversed by looking at the
Path: header field. This header contains a list of all systems through which the article has been forwarded in bang path notation.
To distinguish articles and recognize duplicates, Usenet articles have to carry a message ID (specified in the
Message-Id: header field), which combines the posting site's name and a serial number into <serial@site>. For each article processed, the news system logs this ID into a history file, against which all newly arrived articles are checked.
The flow between any two sites may be limited by two criteria. For one, an article is assigned a distribution (in the
Distribution: header field), which may be used to confine it to a certain group of sites. On the other hand, the newsgroups exchanged may be limited by both the sending and receiving systems. The set of newsgroups and distributions allowed to be transmitted to a site are usually kept in the sys file.
The sheer number of articles usually requires that improvements be made to the above scheme. On UUCP networks, systems collect articles over a period of time and combine them into a single file, which is compressed and sent to the remote site. This is called batching.
An alternative technique is the ihave/sendme protocol that prevents duplicate articles from being transferred, thus saving net bandwidth. Instead of putting all articles in batch files and sending them along, only the message IDs of articles are combined into a giant "ihave" message and sent to the remote site. The remote site reads this message, compares it to its history file, and returns the list of articles it wants in a "sendme" message. Only the requested articles are sent.
Of course, ihave/sendme makes sense only if it involves two big sites that receive news from several independent feeds each, and that poll each other often enough for an efficient flow of news.
Sites that are on the Internet generally rely on TCP/IP-based software that uses the Network News Transfer Protocol (NNTP). NNTP is described in RFC-977; it is responsible for the transfer of news between news servers and provides Usenet access to single users on remote hosts.
NNTP knows three different ways to transfer news. One is a real-time version of ihave/sendme, also referred to as pushing news. The second technique is called pulling news, in which the client requests a list of articles in a given newsgroup or hierarchy that have arrived at the server's site after a specified date, and chooses those it cannot find in its history file. The third technique is for interactive newsreading and allows you or your newsreader to retrieve articles from specified newgroups, as well as post articles with incomplete header information.
At each site, news is kept in a directory hierarchy below /var/spool/news, each article in a separate file, and each newsgroup in a separate directory. The directory name is made up of the newsgroup name, with the components being the path components. Thus, comp.os.linux.misc articles are kept in /var/spool/news/comp/os/linux/misc. The articles in a newsgroup are assigned numbers in the order they arrive. This number serves as the file's name. The range of numbers of articles currently online is kept in a file called active, which at the same time serves as a list of newsgroups your site knows.
Since disk space is a finite resource, you have to start throwing away articles after some time. This is called expiring. Usually, articles from certain groups and hierarchies are expired at a fixed number of days after they arrive. This may be overridden by the poster by specifying a date of expiration in the
Expires: field of the article header.
You now have enough information to choose what to read next. UUCP users should read about C-News in Chapter 21. If you're using a TCP/IP network, read about NNTP in Chapter 22. If you need to transfer moderate amounts of news over TCP/IP, the server described in that chapter may be enough for you. To install a heavy-duty news server that can handle huge volumes of material, go on to read about InterNet News in Chapter 23.