home | O'Reilly's CD bookshelfs | FreeBSD | Linux | Cisco | Cisco Exam  


C.2 A Webget Client

Here's a simple client that contacts a remote server and fetches a list of documents from it. This is a more interesting client than the previous one because it sends a line of data to the server before fetching that server's response.

use IO::Socket;
unless (@ARGV > 1) { die "usage: $0 host document ..." }
$host = shift(@ARGV);
foreach $document ( @ARGV ) {
 $remote = IO::Socket::INET->new( Proto => "tcp",
 PeerAddr => $host,
 PeerPort => "http(80)",
 );
 unless ($remote) { die "cannot connect to http daemon on $host" }
 $remote->autoflush(1);
 print $remote "GET $document HTTP/1.0\n\n";
 while ( <$remote> ) { print }
 close $remote;
}

The web server handling the http service is assumed to be at its standard port, number 80. If the server you're trying to connect to is at a different port (say, 8080), you should give PeerPort => 8080 as the third argument to new() . The autoflush method is used on the socket because otherwise the system would buffer up the output we sent.

Connecting to the server is only the first part of the process: after you have the connection, you have to use the server's language. Each server on the network has its own little command language that it expects as input. The string that we send to the server starting with GET is in HTTP syntax. In this case, we simply request each specified document. Yes, we really are making a new connection for each document, even though it's the same host. That's the way it works with HTTP. (Recent versions of web browsers may request that the remote server leave the connection open a little while, but the server doesn't have to honor such a request.)

We'll call our program webget.plx . Here's how it might execute:


command_prompt> perl webget.plx www.perl.com /guanaco.html

HTTP/1.1 404 File Not Found
Date: Thu, 08 May 1997 18:02:32 GMT
Server: Apache/1.2b6
Connection: close
Content-type: text/html
<HEAD><TITLE>404 File Not Found</TITLE></HEAD>
<BODY><H1>File Not Found</H1>
The requested URL /guanaco.html was not found on this server.<P>
</BODY>

OK, so the program is not very interesting, because it didn't find that particular document. But a long response wouldn't have fit on this page.

For a more full-featured version of this program, you should look for the lwp-request program included with the LWP modules from CPAN.

You might also want to investigate the Win32::Internet extension module that provides easy access to the HTTP and FTP protocols. Win32::Internet is bundled with libwin32 , or is available separately for those using the ActiveState distribution.