[Bruce Barnett introduces this topic in article
20.5
.
-JP]
In news posting <5932@tahoe.unr.edu> malc@equinox.unr.edu
(Malcolm Carlock) asked how to make tar
write a remote tape drive
via
rsh
(1.33
)
and
dd
(35.6
)
.
Here's the answer:
% tar cf - . | rsh foo dd of=/dev/
device
obs=20b
Be forewarned that most incarnations of dd
are extremely slow at handling this.
What is going on? This answer requires some background:
Tapes have "block sizes." Not all tapes, mind you - most SCSI
tapes have a fixed block size that can, for the most part, be
ignored. Nine-track tapes, however, typically record data in
"records" separated by "gaps," and only whole records can be
reread later.
In order to accommodate this, UNIX tape drivers generally translate
each read( )
or write( )
system call into a single record transfer.
The size of a written record is the number of bytes passed to
write( )
. (There may be some additional constraints, such as
"the size must be even" or "the size must be no more than 32768 bytes."
Note that phase-encoded (1600-bpi) blocks should be no
longer than 10240 bytes, and GCR (6250-bpi) blocks should be no
longer than 32768 bytes, to reduce the chance of an unrecoverable
error.) Each read( )
call must ask for at least one whole record
(many drivers get this wrong and silently drop trailing portions
of a record that was longer than the byte count given to read( )
);
each read( )
returns the actual number of bytes in the record.
Network connections are generally "byte streams": the two host
"peers" (above, the machine running tar
, and the machine with the
tape drive) will exchange data but will drop any "record boundary"
notion at the protocol-interface level. If record boundaries are
to be preserved, this must be done in a layer above the network
protocol itself. (Not all network protocols are stream-oriented,
not even flow-controlled, error-recovering protocols. Internet RDP
and XNS SPP are two examples of reliable record-oriented protocols.
Many of these, however, impose fairly small record sizes.)
rsh
simply opens a stream protocol, and does no work to preserve
"packet boundaries."
dd
works in mysterious ways:
dd if=x of=y
is the same as:
dd if=x of=y ibs=512 obs=512
which means: open files x
and y
, then loop doing read(fd_x)
with a byte count of 512, take whatever you got, copy it into an output
buffer for file y
, and each time that buffer reaches 512 bytes,
do a single write(fd_y)
with 512 bytes.
On the other hand:
dd if=x of=y bs=512
means something completely different: open files x
and y
, then
loop doing read(fd_x)
with a byte count of 512, take whatever
you got, and do a single write(fd_y)
with that count.
All of this means that:
% tar cf - . | rsh otherhost dd of=/dev/
device
will write 512-byte blocks (not what you wanted), while:
% tar cf - . | rsh otherhost dd of=/dev/
device
bs=20b
will be even worse: it will take whatever it gets from stdin
-which,
being a TCP connection, will be arbitrarily lumpy depending on the
underlying network parameters and the particular TCP
implementation - and write essentially random-sized records.
On purely "local" (Ethernet) connections, with typical implementations,
you will wind up with 1024-byte blocks (a tar
"block factor" of 2).
If a blocking factor of 2 is acceptable, and if cat
forces 1024-byte blocks (both true in some cases), you can use:
% tar cf - . | rsh otherhost "cat >/dev/
device
"
but this depends on undocumented features in cat
. In any case, on
nine-track tapes, since each gap occupies approximately 0.7 inches of
otherwise useful tape space, a block size of 1024 has ten times as many
gaps as a block size of 10240, wasting 9x1600x0.7 = 10 kbytes of
tape at 1600 bpi, or 32 times as many as a size of 32768, wasting
31x6250x0.7 = 136 kbytes of tape at 6250 bpi.
I say "approximately" because
actual gap sizes vary. In particular, certain "streaming" drives
(all too often called streaming because they do not - in some cases
the controller is too "smart" to be able to keep up with the required
data rate, even when fed back-to-back DMA requests) have been known
to stretch the gaps to 0.9 inches.
In general, because of tape gaps, you should use the largest record size
that permits error recovery. Note, however, that some olid
[2]
hardware (such
as that found on certain AT&T 3B systems) puts a ridiculous upper limit
(5K) on tape blocks.