home | O'Reilly's CD bookshelfs | FreeBSD | Linux | Cisco | Cisco Exam  


Book HomePerl & LWPSearch this book

10.3. Detaching and Reattaching

Suppose that the output of our above rewriter is not satisfactory. While its output contains an apparently harmless one-cell one-row table, this is somehow troublesome when the president of the company tries viewing that web page on his cellphone/PDA, which has a typically limited understanding of HTML. Some experimentation shows that any web pages with tables in them will deeply confuse the boss's PDA.

So your task should be changed to this: find the one interesting cell in the table (the td with class="story"), detach it, then replace the table with the td, and delete the table. This is a complex series of actions, but luckily every one of them is directly translatable into an HTML::Element method. The result is Example 10-2.

Example 10-2. Detaching and reattaching nodes

use strict;
use HTML::TreeBuilder;
my $root = HTML::TreeBuilder->new;
$root->parse_file('rewriters1/in002.html') || die $!;

my $good_td = $root->look_down( '_tag', 'td',  'class', 'story', );
die "No good td?!" unless $good_td;      # sanity checking
my $big_table = $root->look_down( '_tag', 'table' );
die "No big table?!" unless $big_table;  # sanity checking

$good_td->detach;
$big_table->replace_with($good_td);
  # Yes, there's even a method for replacing one node with another!

open(OUT, ">rewriters1/out002b.html") || die "Can't write: $!";
print OUT $root->as_HTML(undef, '  '); # two-space indent in output
close(OUT);
$root->delete; # done with it, so delete it

The resulting document looks like this:

<html>
  <head>
    <title>Shatner and Kunis Sweep the Oscars</title>
  </head>
  <body>
    <td class="story">
      <h1>Shatner and Kunis Sweep the Oscars</h1>
      <p>Stars of <cite>American Psycho II</cite> walked [...] </td>
    <hr>Copyright 2002, United Lies Syndicate </body>
</html>

One problem, though: we have a td outside of a table. Simply change it from a td element into something innocuous, such as a div, and while we're at it, delete that class attribute:

$good_td->tag('div'); 
$good_td->attr('class', undef);

That makes the output look like this:

<html>
  <head>
    <title>Shatner and Kunis Sweep the Oscars</title>
  </head>
  <body>
    <div>
      <h1>Shatner and Kunis Sweep the Oscars</h1>
      <p>Stars of <cite>American Psycho II</cite> walked [...] </div>
    <hr>Copyright 2002, United Lies Syndicate </body>
</html>

An alternative is not to detach and save the td in the first place, but to detach and save only its content. That's simple enough:

my @good_content = $good_td->content_list;
foreach my $c (@good_content) {
  $c->detach if ref $c;
    # text nodes aren't objects, so aren't really "attached" anyhow
}


Library Navigation Links

Copyright © 2002 O'Reilly & Associates. All rights reserved.