Relative URLs (Perl & LWP)

4.2. Relative URLs

URL paths are either absolute or relative. An absolute URL starts with a scheme, then has whatever data this scheme requires. For an HTTP URL, this means a hostname and a path:

http://phee.phye.phoe.fm/thingamajig/stuff.html

Any URL that doesn't start with a scheme is relative. To interpret a relative URL, you need a base URL that is absolute (just as you don't know the GPS coordinates of "800 miles west of here" unless you know the GPS coordinates of "here").

A relative URL leaves some information implicit, which you look to its base URL for. For example, if your base URL is http://phee.phye.phoe.fm/thingamajig/stuff.html, and you see a relative URL of /also.html, then the implicit information is "with the same scheme (http)" and "on the same host (phee.phye.phoe.fm)," and the explicit information is "with the path /also.html." So this is equivalent to an absolute URL of:

http://phee.phye.phoe.fm/also.html

Some kinds of relative URLs require information from the path of the base URL in a way that closely mirrors relative filespecs in Unix filesystems, where ".." means "up one level", "." means "in this level", and anything else means "in this directory". So a relative URL of just zing.xml interpreted relative to http://phee.phye.phoe.fm/thingamajig/stuff.html yields this absolute URL:

http://phee.phye.phoe.fm/thingamajig/zing.xml

That is, we use all but the last bit of the absolute URL's path, then append the new component.

Similarly, a relative URL of ../hi_there.jpg interpreted against the absolute URL http://phee.phye.phoe.fm/thingamajig/stuff.html gives us this URL:

http://phee.phye.phoe.fm/hi_there.jpg

In figuring this out, start with http://phee.phye.phoe.fm/thingamajig/ and the ".." tells us to go up one level, giving us http://phee.phye.phoe.fm/. Append hi_there.jpg giving us the URL you see above.

There's a third kind of relative URL, which consists entirely of a fragment, such as #endnotes. This is commonly met with in HTML documents, in code like so:

<a href="#endnotes">See the endnotes for the full citation</a>

Interpreting a fragment-only relative URL involves taking the base URL, stripping off any fragment that's already there, and adding the new one. So if the base URL is this:

http://phee.phye.phoe.fm/thingamajig/stuff.html

and the relative URL is #endnotes, then the new absolute URL is this:

http://phee.phye.phoe.fm/thingamajig/stuff.html#endnotes

We've looked at relative URLs from the perspective of starting with a relative URL and an absolute base, and getting the equivalent absolute URL. But you can also look at it the other way: starting with an absolute URL and asking "what is the relative URL that gets me there, relative to an absolute base URL?". This is best explained by putting the URLs one on top of the other:

Base: http://phee.phye.phoe.fm/thingamajig/stuff.xml
Goal: http://phee.phye.phoe.fm/thingamajig/zing.html

To get from the base to the goal, the shortest relative URL is simply zing.xml. However, if the goal is a directory higher:

Base: http://phee.phye.phoe.fm/thingamajig/stuff.xml
Goal: http://phee.phye.phoe.fm/hi_there.jpg

then a relative path is ../hi_there.jpg. And in this case, simply starting from the document root and having a relative path of /hi_there.jpg would also get you there.

The logic behind parsing relative URLs and converting between them and absolute URLs is not simple and is very easy to get wrong. The fact that the URI class provides functions for doing it all for us is one of its greatest benefits. You are likely to have two kinds of dealings with relative URLs: wanting to turn an absolute URL into a relative URL and wanting to turn a relative URL into an absolute URL.