[OpenBSD]

[FAQ Index] [To Section 5 - Building the System from Source] [To Section 7 - Keyboard and Display Controls]

6 - Networking


Table of Contents


6.1 - Before we go any further

For the bulk of this document, it helps if you have read and at least partially understood the Kernel Configuration and Setup section of the FAQ, and the ifconfig(8) and netstat(1) man pages.

If you are a network administrator, and you are setting up routing protocols, if you are using your OpenBSD box as a router, if you need to go in depth into IP networking, you really need to read Understanding IP Addressing. This is an excellent document. "Understanding IP Addressing" contains fundamental knowledge to build upon when working with IP networks, especially when you deal with or are responsible for more than one network.

If you are working with applications such as web servers, ftp servers, and mail servers, you may benefit greatly by reading the RFCs. Most likely, you can't read all of them. Pick some topics that you are interested in, or that you use in your network environment. Look them up, find out how they are intended to work. The RFCs define many (thousands of) standards for protocols on the Internet and how they are supposed to work.

6.2 - Network configuration

Normally, OpenBSD is initially configured by the installation process. However, it is good to understand what is happening in this process and how it works. All network configuration is done using simple text files in the /etc directory.

6.2.1 - Identifying and setting up your network interfaces

In OpenBSD, interfaces are named for the type of card, not for the type of connection. You can see your network card get initialized during the booting process, or after the booting process using the dmesg(8) command. You also have the chance of seeing your network interface using the ifconfig(8) command. For example, here is the output of dmesg for a Intel Fast Ethernet network card, which uses the device name fxp.

fxp0 at pci0 dev 10 function 0 "Intel 82557" rev 0x0c: irq 5, address 00:02:b3:2b:10:f7 inphy0 at fxp0 phy 1: i82555 10/100 media interface, rev. 4

If you don't know what your device name is, please look at the supported hardware list for your platform. You will find a list of many common card names and their OpenBSD device names here. Combine the short alphabetical device name (such as fxp) with a number assigned by the kernel and you have an interface name (such as fxp0). The number is assigned based on various criteria, depending upon the card and other details of the system. Some cards are assigned by the order they are found during bus probing. Others may be by hardware resource settings or MAC address.

You can find out what network interfaces have been identified by using the ifconfig(8) utility. The following command will show all network interfaces on a system. This sample output shows us only one physical Ethernet interface, an fxp(4).

$ ifconfig lo0: flags=8049<UP,LOOPBACK,RUNNING,MULTICAST> mtu 33224 inet 127.0.0.1 netmask 0xff000000 inet6 ::1 prefixlen 128 inet6 fe80::1%lo0 prefixlen 64 scopeid 0x5 lo1: flags=8008<LOOPBACK,MULTICAST> mtu 33224 fxp0: flags=8843<UP,BROADCAST,RUNNING,SIMPLEX,MULTICAST> mtu 1500 address: 00:04:ac:dd:39:6a media: Ethernet autoselect (100baseTX full-duplex) status: active inet 10.0.0.38 netmask 0xffffff00 broadcast 10.0.0.255 inet6 fe80::204:acff:fedd:396a%fxp0 prefixlen 64 scopeid 0x1 pflog0: flags=0<> mtu 33224 pfsync0: flags=0<> mtu 2020 sl0: flags=c010<POINTOPOINT,LINK2,MULTICAST> mtu 296 sl1: flags=c010<POINTOPOINT,LINK2,MULTICAST> mtu 296 ppp0: flags=8010<POINTOPOINT,MULTICAST> mtu 1500 ppp1: flags=8010<POINTOPOINT,MULTICAST> mtu 1500 tun0: flags=10<POINTOPOINT> mtu 3000 tun1: flags=10<POINTOPOINT> mtu 3000 enc0: flags=0<> mtu 1536 bridge0: flags=0<> mtu 1500 bridge1: flags=0<> mtu 1500 vlan0: flags=0<> mtu 1500 address: 00:00:00:00:00:00 vlan1: flags=0<> mtu 1500 address: 00:00:00:00:00:00 gre0: flags=9010<POINTOPOINT,LINK0,MULTICAST> mtu 1450 carp0: flags=0<> mtu 1500 carp1: flags=0<> mtu 1500 gif0: flags=8010<POINTOPOINT,MULTICAST> mtu 1280 gif1: flags=8010<POINTOPOINT,MULTICAST> mtu 1280 gif2: flags=8010<POINTOPOINT,MULTICAST> mtu 1280 gif3: flags=8010<POINTOPOINT,MULTICAST> mtu 1280

As you can see here, ifconfig(8) gives us a lot more information than we need at this point. But, it still allows us to see our interface. In the above example, the interface card is already configured. This is obvious because an IP network is already configured on fxp0, hence the values "inet 10.0.0.38 netmask 0xffffff00 broadcast 10.0.0.255". Also, the UP and RUNNING flags are set.

Finally, you will notice several other interfaces come enabled by default. These are virtual interfaces that serve various functions. The following manual pages describe them:

The interface is configured at boot time using the /etc/hostname.if(5) files, where if will be replaced by the full name of your interface, for the example above, /etc/hostname.fxp0.

The layout of this file is simple:

address_family address netmask broadcast [other options]
Much more detail about the format of this file can be found in the hostname.if(5) man page. You will need to read this for less trivial configurations.

A typical interface configuration file, configured for an IPv4 address, would look like this:

$ cat /etc/hostname.fxp0 inet 10.0.0.38 255.255.255.0 NONE

In this case, we have defined an IPv4 (inet) address, with an IP address of 10.0.0.38, a subnet mask of 255.255.255.0 and no specific broadcast address (which will default to 10.0.0.255 in this case).

You could also specify media types for Ethernet, say, if you wanted to force 100baseTX full-duplex mode.

inet 10.0.0.38 255.255.255.0 NONE media 100baseTX mediaopt full-duplex

(Of course, you should never force full duplex mode unless both sides of the connection are set to do this! In the absence of special needs, media settings should be excluded. A more likely case might be to force 10base-T or half duplex when your infrastructure requires it.)

Or, you may want to use special flags specific to a certain interface. The format of the hostname file doesn't change much!

$ cat /etc/hostname.vlan0 inet 172.21.0.31 255.255.255.0 NONE vlan 2 vlandev fxp1

6.2.2 - Default gateway

Put the IP of your gateway in the file /etc/mygate. This will allow for your gateway to be set upon boot. This file consists of one line, with just the address of this machine's gateway address:
10.0.0.1
It is possible use a symbolic name there, but be careful: you can't assume things like the resolver are fully configured or even reachable until AFTER the default gateway is configured. In other words, it had better be an IP address or something that is defined in the /etc/hosts file.

6.2.3 - DNS Resolution

DNS resolution is controlled by the file /etc/resolv.conf. Here is an example of a /etc/resolv.conf file:
search example.com nameserver 125.2.3.4 nameserver 125.2.3.5 lookup file bind
In this case, the default domain name will be example.com, there are two DNS resolvers, 125.2.3.4 and 125.2.3.5 specified, and the /etc/hosts file will be consulted before the DNS resolvers are.

As with virtually all Unix (and many non-Unix) systems, there is an /etc/hosts file which can be used to specify systems that are not in (or if used with the above "lookup" priority, not as desired in) the formal DNS system.

If you are using DHCP, you'll want to read 6.4 - DHCP taking note of resolv.conf.tail(5).

6.2.4 - Host name

Every Unix machine has a name. In OpenBSD, the name is specified as a "Fully Qualified Domain Name" (FQDN) in one line in the file /etc/myname. If this machine is named "puffy" and in the domain "example.com", the file would contain the one line:
puffy.example.com

6.2.5 - Activating the changes

From here, you can either reboot or run the /etc/netstart script. You can do this by simply typing (as root):
# sh /etc/netstart writing to routing socket: File exists add net 127: gateway 127.0.0.1: File exists writing to routing socket: File exists add net 224.0.0.0: gateway 127.0.0.1: File exists

Notice that a few errors were produced. By running this script, you are reconfiguring things which are already configured. As such, some routes already exist in the kernel routing table. From here your system should be up and running. Again, you can check to make sure that your interface was setup correctly with ifconfig(8).

Even though you can completely reconfigure networking on an OpenBSD system without rebooting, a reboot is HIGHLY recommended after any significant reconfiguration. The reason for this is the environment at boot is somewhat different than it is when the system is completely up and running. For example, if you had specified a DNS-resolved symbolic name in any of the files, you would probably find it worked as expected after reconfigure, but on initial boot, your external resolver may not be available, so the configuration will fail.

6.2.6 - Checking routes

You can check your routes via netstat(1) or route(8). If you are having routing problems, you may want to use the -n flag to route(8) which prints the IP addresses rather than doing a DNS lookup and displaying the hostname. Here is an example of viewing your routing tables using both programs.
$ netstat -rn Routing tables Internet: Destination Gateway Flags Refs Use Mtu Interface default 10.0.0.1 UGS 0 86 - fxp0 127/8 127.0.0.1 UGRS 0 0 - lo0 127.0.0.1 127.0.0.1 UH 0 0 - lo0 10.0.0/24 link#1 UC 0 0 - fxp0 10.0.0.1 aa:0:4:0:81:d UHL 1 0 - fxp0 10.0.0.38 127.0.0.1 UGHS 0 0 - lo0 224/4 127.0.0.1 URS 0 0 - lo0 Encap: Source Port Destination Port Proto SA(Address/SPI/Proto) $ route show Routing tables Internet: Destination Gateway Flags default 10.0.0.1 UG 127.0.0.0 LOCALHOST UG localhost LOCALHOST UH 10.0.0.0 link#1 U 10.0.0.1 aa:0:4:0:81:d UH 10.0.0.38 LOCALHOST UGH BASE-ADDRESS.MCA LOCALHOST U

6.2.7 - Setting up your OpenBSD box as a forwarding gateway

This is the basic information you need to set up your OpenBSD box as a gateway (also called a router). If you are using OpenBSD as a router on the Internet, we suggest that you also read the Packet Filter setup instructions below to block potentially malicious traffic. Also, due to the low availability of IPv4 addresses from network service providers and regional registries, you may want to look at Network Address Translation for information on conserving your IP address space.

The GENERIC kernel already has the ability to allow IP Forwarding, but needs to be turned on. You should do this using the sysctl(8) utility. To change this permanently you should edit the file /etc/sysctl.conf to allow for IP Forwarding. To do so add this line in that configuration file.

net.inet.ip.forwarding=1

To make this change without rebooting you would use the sysctl(8) utility directly. Remember though that this change will no longer exist after a reboot, and needs to be run as root.

# sysctl net.inet.ip.forwarding=1 net.inet.ip.forwarding: 0 -> 1

Now modify the routes on the other hosts on both sides. There are many possible uses of OpenBSD as a router by using software such as OpenBSD's own OpenBGPD, routed(8), mrtd, zebra, and quagga. OpenBSD has support in the ports collection for zebra, quagga, and mrtd. OpenBGPD and routed are installed as part of the base system. OpenBSD supports several T1, HSSI, ATM, FDDI, Ethernet, and serial (PPP/SLIP) interfaces.

6.2.8 - Setting up aliases on an interface

OpenBSD has a simple mechanism for setting up IP aliases on an interface. To do this simply edit the file /etc/hostname.<if>. This file is read upon boot by the /etc/netstart(8) script, which is part of the rc startup hierarchy. For the example, we assume that the user has an interface dc0 and is on the network 192.168.0.0. Other important information:

A few side notes about aliases. In OpenBSD you use the interface name only. There is no difference between the first alias and the second alias. Unlike some other operating systems, OpenBSD doesn't refer to them as dc0:0, dc0:1. If you are referring to a specific aliased IP address with ifconfig, or adding an alias, be sure to say "ifconfig int alias" instead of just "ifconfig int" at the command line. You can delete aliases with "ifconfig int delete".

Assuming you are using multiple IP addresses which are in the same IP subnet with aliases, your netmask setting for each alias becomes 255.255.255.255. They do not need to follow the netmask of the first IP bound to the interface. In this example, /etc/hostname.dc0, two aliases are added to the device dc0, which, by the way, was configured as 192.168.0.2 netmask 255.255.255.0.

# cat /etc/hostname.dc0 inet 192.168.0.2 255.255.255.0 media 100baseTX inet alias 192.168.0.3 255.255.255.255 inet alias 192.168.0.4 255.255.255.255

Once you've made this file, it just takes a reboot for it to take effect. You can, however, bring up the aliases by hand using the ifconfig(8) utility. To bring up the first alias you would use the command:

# ifconfig dc0 inet alias 192.168.0.3 netmask 255.255.255.255
(but again, a reboot is recommended to make sure you entered everything as you expected it to be!)

To view these aliases you must use the command:

$ ifconfig -A dc0: flags=8863<UP,BROADCAST,NOTRAILERS,RUNNING,SIMPLEX,MULTICAST> media: Ethernet manual inet 192.168.0.2 netmask 0xffffff00 broadcast 192.168.0.255 inet 192.168.0.3 netmask 0xffffffff broadcast 192.168.0.3

6.3 - How do I filter and firewall with OpenBSD?

Packet Filter (from here on referred to as PF) is OpenBSD's system for filtering IP traffic and doing Network Address Translation. PF is also capable of normalizing and conditioning IP traffic and providing bandwidth control and packet prioritization, and can be used to create powerful and flexible firewalls. It is described in the PF User's Guide.

6.4 - Dynamic Host Configuration Protocol (DHCP)

Dynamic Host Configuration Protocol is a way to configure network interfaces "automatically". OpenBSD can be a DHCP server (configuring other machines), a DHCP client (configured by another machine), and in some cases, can be both.

6.4.1 - DHCP Client

To use the DHCP client dhclient(8) included with OpenBSD, edit /etc/hostname.xl0 (this is assuming your main Ethernet interface is xl0. Yours might be ep0 or fxp0 or something else.) All you need to put in this hostname file is 'dhcp':

# echo dhcp > /etc/hostname.xl0

This will cause OpenBSD to automatically start the DHCP client on boot. OpenBSD will gather its IP address, default gateway, and DNS servers from the DHCP server.

If you want to start a DHCP client from the command line, make sure /etc/dhclient.conf exists, then try:

# dhclient fxp0

Where fxp0 is the interface on which you want to receive DHCP.

No matter how you start the DHCP client, you can edit the /etc/dhclient.conf file to not update your DNS according to the dhcp server's idea of DNS by first uncommenting the 'request' lines in it (they are examples of the default settings, but you need to uncomment them to override dhclient's defaults.)

request subnet-mask, broadcast-address, time-offset, routers, domain-name, domain-name-servers, host-name, lpr-servers, ntp-servers;

and then remove domain-name-servers. Of course, you may want to remove hostname, or other settings too.

By changing options in your dhclient.conf(5) file, you're telling the DHCP client how to build your resolv.conf(5) file. The DHCP client overrides any information you already have in resolv.conf(5) with the information it retrieves from the DHCP server. Therefore, you'll lose any changes you made manually to resolv.conf.

There are two mechanisms available to prevent this:

An example would be if you're using DHCP but you want to append lookup file bind to the resolv.conf(5) created by dhclient(8). There is no option for this in dhclient.conf so you must use resolv.conf.tail to preserve this.

# echo "lookup file bind" > /etc/resolv.conf.tail
Now your resolv.conf(5) should include "lookup file bind" at the end.
nameserver 192.168.1.1 nameserver 192.168.1.2 lookup file bind

6.4.2 - DHCP Server

If you want to use OpenBSD as a DHCP server dhcpd(8), edit /etc/rc.conf.local so that it contains the line dhcpd_flags="". Put the interfaces that you want dhcpd to listen on in /etc/dhcpd.interfaces. # echo xl1 xl2 xl3 >/etc/dhcpd.interfaces

Then, edit /etc/dhcpd.conf. The options are pretty self-explanatory. option domain-name "example.com"; option domain-name-servers 192.168.1.3, 192.168.1.5; subnet 192.168.1.0 netmask 255.255.255.0 { option routers 192.168.1.1; range 192.168.1.32 192.168.1.127; }

This will tell your DHCP clients that the domain to append to DNS requests is example.com (so, if the user types in 'telnet joe' then it will send them to joe.example.com). It will point them to DNS servers 192.168.1.3 and 192.168.1.5. For hosts that are on the same network as an Ethernet interface on the OpenBSD machine, which is in the 192.168.1.0/24 range, it will assign them an IP address between 192.168.1.32 and 192.168.1.127. It will set their default gateway as 192.168.1.1.

If you want to start dhcpd(8) from the command line, after editing /etc/dhcpd.conf, try: # touch /var/db/dhcpd.leases # dhcpd fxp0

The touch line is needed to create an empty dhcpd.leases file before dhcpd(8) can start. The OpenBSD startup scripts will create this file if needed on boot, but if you are starting dhcpd(8) manually, you must create it first. fxp0 is an interface that you want to start serving DHCP on.

If you are serving DHCP to a Windows box, you may want dhcpd(8) to give the client a 'WINS' server address. To make this happen, just add the following line to your /etc/dhcpd.conf: option netbios-name-servers 192.168.92.55;

(where 192.168.92.55 is the IP of your Windows or Samba server.) See dhcp-options(5) for more options that your DHCP clients may want.

6.5 - PPP

The Point to Point Protocol (PPP) is generally what is used to create a connection to your ISP via a dial-up modem. OpenBSD has 2 ways of doing this:

Both ppp and pppd perform similar functions, in different ways. pppd works with the kernel ppp(4) driver, whereas ppp works in userland with tun(4). This document will cover only the userland PPP daemon, since it is easier to debug and to interact with. To start off you will need some simple information about your ISP. Here is a list of helpful information that you will need.

Some of these you can do without, but would be helpful in setting up ppp. The userland PPP daemon uses the file /etc/ppp/ppp.conf as its configuration file. There are many helpful files in /etc/ppp that can have different setups for many different situations. You should take a browse through that directory.

Initial Setup - for PPP(8)

Initial Setup for the userland PPP daemon consists of editing your /etc/ppp/ppp.conf file. This file doesn't exist by default, but there is a file /etc/ppp/ppp.conf.sample which you can simply edit to create your own ppp.conf file. Here I will start with the simplest and probably most used setup. Here is a quick ppp.conf file that simply sets some defaults:

default: set log Phase Chat LCP IPCP CCP tun command set device /dev/cua01 set speed 115200 set dial "ABORT BUSY ABORT NO\\sCARRIER TIMEOUT 5 \"\" AT OK-AT-OK ATE1Q0 OK \\dATDT\\T TIMEOUT 40 CONNECT"

The section under the default: tag gets executed each time. Here we set up all our critical information. With "set log" we set our logging levels. This can be changed: refer to ppp(8) for more info on setting up logging levels. Our device gets set with "set device". This is the device that the modem is on. In this example the modem is on com port 2. Therefore com port 1 would be /dev/cua00. With "set speed" we set the speed of our dial-up connection and with "set dial" we set our dial-up parameters. With this we can change our timeout time, etc. This line should stay pretty much as it is though.

Now we can move on and set up information specific to our ISP. We do this by adding another tag under our default: section. This tag can be called anything you want - easiest to just use the name of your ISP. Here I will use myisp: as our tag referring to our ISP. Here is a simple setup incorporating all we need to get ourselves connected:

myisp: set phone 1234567 set login "ABORT NO\\sCARRIER TIMEOUT 5 ogin:--ogin: ppp word: ppp" set timeout 120 set ifaddr 10.0.0.1/0 10.0.0.2/0 255.255.255.0 0.0.0.0 add default HISADDR enable dns

Here we have set up essential info for that specific ISP. The first option "set phone" sets your ISP's dial-up number. The "set login" sets our login options. Here we have the timeout set to 5; this means that we will abort our login attempt after 5 seconds if no carrier is found. Otherwise it will wait for "login:" to be sent and send in your username and password.

In this example our Username = ppp and Password = ppp. These values will need to be changed. The line "set timeout" sets the idle timeout for the entire connection duration to 120 seconds. The "set ifaddr" line is a little tricky. Here is a more extensive explanation.

set ifaddr 10.0.0.1/0 10.0.0.2/0 255.255.255.0 0.0.0.0

In the above line, we have it set in the format of "set ifaddr [myaddr[/nn] [hisaddr[/nn] [netmask [triggeraddr]]]]". So the first IP specified is what we want as our IP. If you have a static IP address, you set it here. In our example we use /0 which says that no bits of this IP address need to match and the whole thing can be replaced. The second IP specified is what we expect as their IP. If you know this you can specify it. Again in our line we don't know what will be assigned, so we let them tell us. The third option is our netmask, here set to 255.255.255.0. If triggeraddr is specified, it is used in place of myaddr in the initial IPCP negotiation. However, only an address in the myaddr range will be accepted. This is useful when negotiating with some PPP implementations that will not assign an IP number unless their peer requests ``0.0.0.0''.

The next option used "add default HISADDR" sets our default route to their IP. This is 'sticky', meaning that if their IP should change, our route will automatically be updated. With "enable dns" we are telling our ISP to authenticate our nameserver addresses. Do NOT do this if you are running a local DNS, as ppp will simply circumvent its use by entering some nameserver lines in /etc/resolv.conf.

Instead of traditional login methods, many ISPs now use either CHAP or PAP authentication. If this is the case, our configuration will look slightly different:

myisp: set phone 1234567 set authname ppp set authkey ppp set login set timeout 120 set ifaddr 10.0.0.1/0 10.0.0.2/0 255.255.255.0 0.0.0.0 add default HISADDR enable dns

In the above example, we specify our username (ppp) and password (ppp) using authname and authkey, respectively. There is no need to specify whether CHAP or PAP authentication is used - it will be negotiated automatically. "set login" merely specifies to attempt to log in, with the username and password previously specified.

Using PPP(8)

Now that we have our ppp.conf file set up we can start trying to make a connection to our ISP. I will detail some commonly used arguments with ppp:

If the above fails, try running /usr/sbin/ppp with no options - it will run ppp in interactive mode. The options can be specified one by one to check for error or other problems. Using the setup specified above, ppp will log to /var/log/ppp.log. That log, as well as the man page, all contain helpful information.

ppp(8) extras

In some situations you might want commands executed as your connection is made or dropped. There are two files you can create for just these situations: /etc/ppp/ppp.linkup and /etc/ppp/ppp.linkdown. Sample configurations can be viewed here:

ppp(8) variations

Many ISPs now offer xDSL services, which are faster than traditional dial-up methods. This includes variants such as ADSL and SDSL. Although no physical dialling takes place, connection is still based on the Point to Point Protocol. Examples include:

PPPoE/PPPoA

The Point to Point Protocol over Ethernet (PPPoE) is a method for sending PPP packets in Ethernet frames. The Point to Point Protocol over ATM (PPPoA) is typically run on ATM networks, such as those found in the UK and Belgium.

Typically this means you can establish a connection with your ISP using just a standard Ethernet card and Ethernet-based DSL modem (as opposed to a USB-only modem).

If you have a modem which speaks PPPoE/PPPoA, it is possible to configure the modem to do the connecting. Alternatively, if the modem has a `bridge' mode, it is possible to enable this and have the modem "pass through" the packets to a machine running PPPoE software (see below).

The main software interface to PPPoE/PPPoA on OpenBSD is pppoe(8), which is a userland implementation (in much the same way that we described ppp(8), above). A kernel PPPoE implementation, pppoe(4), has been incorporated into OpenBSD.

PPTP

The Point to Point Tunneling Protocol (PPTP) is a proprietary Microsoft protocol. A pptp client is available which interfaces with pppd(8) and is capable of connecting to the PPTP-based Virtual Private Networks (VPN) used by some cable and xDSL providers. pptp itself must be installed from packages or ports. Further instructions on setting up and using pptp are available in the man page which is installed with the pptp package.

6.6 - Tuning networking parameters

6.6.1 - How can I tweak the kernel so that there are a higher number of retries and longer timeouts for TCP sessions?

You would normally use this to allow for routing or connection problems. Of course, for it to be most effective, both sides of the connection need to use similar values.

To tweak this, use sysctl and increase the values of: net.inet.tcp.keepinittime net.inet.tcp.keepidle net.inet.tcp.keepintvl

Using sysctl -a, you can see the current values of these (and many other) parameters. To change one, do something like sysctl net.inet.tcp.keepidle=28800.

6.6.2 - How can I turn on directed broadcasts?

Normally, you don't want to do this. This allows someone to send traffic to the broadcast address(es) of your connected network(s) if you are using your OpenBSD box as a router.

There are some instances, in closed networks, where this may be useful, particularly when using older implementations of the NetBIOS protocol. This is another sysctl. sysctl net.inet.ip.directed-broadcast=1 turns this on. Read about smurf attacks if you want to know why it is off by default.

6.6.3 - I don't want the kernel to dynamically allocate a certain port

There is a sysctl for this also. From sysctl(8): Set the list of reserved TCP ports that should not be allocated by the kernel dynamically. This can be used to keep daemons from stealing a specific port that another program needs to function. List elements may be separated by commas and/or whitespace. # sysctl net.inet.tcp.baddynamic=749,750,751,760,761,871 It is also possible to add or remove ports from the current list. # sysctl net.inet.tcp.baddynamic=+748 # sysctl net.inet.tcp.baddynamic=-871

6.6.4 - How can I increase performance on really high-speed, high traffic links?

If you are seeing performance limitations when using a high-speed WAN connection transferring lots of data, you may see a performance gain by altering the following sysctls:
net.inet.tcp.recvspace net.inet.tcp.sendspace
Try a value like 65536 instead of the default of 16384. Note that very few will see any benefit from this. Don't adjust this unless you are actually seeing performance below what you expect.

6.7 - Simple NFS usage

NFS, or Network File System, is used to share a filesystem over the network. A few choice man pages to read before trying to setup a NFS server are:

This section will go through the steps for a simple setup of NFS. This example details a server on a LAN, with clients accessing NFS on the LAN. It does not talk about securing NFS. We presume you have already setup packet filtering or other firewalling protection, to prevent outside access. If you are allowing outside access to your NFS server, and you have any kind of sensitive data stored on it, we strongly recommend that you employ IPsec. Otherwise, people can potentially see your NFS traffic. Someone could also pretend to be the IP address which you are allowing into your NFS server. There are several attacks that can result. When properly configured, IPsec protects against these types of attacks.

Another important security note. Don't just add a filesystem to /etc/exports without some kind of list of allowed host(s). Without a list of hosts which can mount a particular directory, anyone on who can reach your host will be able to mount your NFS exports.

portmap(8) must be running for NFS to operate. Portmap(8) is off by default on OpenBSD, so you must add the line

portmap=YES
to rc.conf.local(8) to start it on boot. It can also be started manually:
# /usr/sbin/portmap

The setup consists of a server with the ip 10.0.0.1. This server will be serving NFS only to clients within that network. The first step to setting up NFS is to setup your /etc/exports file. This file lists which filesystems you wish to have accessible via NFS and defines who is able to access them. There are many options that you can use in your /etc/exports file, and it is best that you read the exports(5) man page. For this example we have an /etc/exports that looks like this:

# # NFS exports Database # See exports(5) for more information. Be very careful, misconfiguration # of this file can result in your filesystems being readable by the world. /work -alldirs -ro -network=10.0.0 -mask=255.255.255.0

This means that the local filesystem /work will be made available via NFS. -alldirs specifies that clients will be able to mount at any point under the /work mount point. -ro specifies that it will only be allowed to be mounted read-only. The last two arguments specify that only clients within the 10.0.0.0 network using a netmask of 255.255.255.0 will be authorized to mount this filesystem. This is important for some servers that are accessible by different networks.

Once your /etc/exports file is setup, you can go ahead and setup your NFS server. You should first make sure that options NFSSERVER & NFSCLIENT are in your kernel configuration. (GENERIC kernel has these options included.) Next, you should add the line

nfs_server=YES
to /etc/rc.conf.local. This will bring up both nfsd(8) and mountd(8) when you reboot. Now, you can go ahead and start the daemons yourself. These daemons need to be started as root, and you need to make sure that portmap(8) is running on your system. Here is an example of starting nfsd(8) which serves on both TCP and UDP using 4 daemons. You should set an appropriate number of NFS server daemons to handle the maximum number of concurrent client requests that you want to service.
# /sbin/nfsd -tun 4

Not only do you have to start the nfsd(8) server, but you need to start mountd(8). This is the daemon that actually services the mount requests on NFS. To start mountd(8), make sure an empty mountdtab file exists, and run the daemon:

# echo -n >/var/db/mountdtab # /sbin/mountd

If you make changes to /etc/exports while NFS is already running, you need to make mountd aware of this! Just HUP it:

# kill -HUP `cat /var/run/mountd.pid`

Checking Stats on NFS

From here, you can check to make sure that all these daemons are up and registered with RPC. To do this, use rpcinfo(8).

$ rpcinfo -p 10.0.0.1 program vers proto port 100000 2 tcp 111 portmapper 100000 2 udp 111 portmapper 100005 1 udp 633 mountd 100005 3 udp 633 mountd 100005 1 tcp 916 mountd 100005 3 tcp 916 mountd 100003 2 udp 2049 nfs 100003 3 udp 2049 nfs 100003 2 tcp 2049 nfs 100003 3 tcp 2049 nfs

During normal usage, there are a few other utilities that allow you to see what is happening with NFS. One is showmount(8), which allows you to view what is currently mounted and who is mounting it. There is also nfsstat(1) which shows much more verbose statistics. To use showmount(8), try /usr/bin/showmount -a host. For example:

$ /usr/bin/showmount -a 10.0.0.1 All mount points on 10.0.0.1: 10.0.0.37:/work

Mounting NFS Filesystems

NFS filesystems should be mounted via mount(8), or more specifically, mount_nfs(8). To mount a filesystem /work on host 10.0.0.1 to local filesystem /mnt, do this (note that you don't need to use an IP address; mount will resolve host names):

# mount -o ro -t nfs 10.0.0.1:/work /mnt

To have your system mount upon boot, add something like this to your /etc/fstab:

10.0.0.1:/work /mnt nfs ro 0 0

It is important that you use 0 0 at the end of this line so that your computer does not try to fsck the NFS filesystem on boot!!!! The other standard security options, such as noexec, nodev, and nosuid, should also be used where applicable. Such as:

10.0.0.1:/work /mnt nfs ro,nodev,nosuid 0 0

This way, no devices or setuid programs on the NFS server can subvert security measures on the NFS client. If you are not mounting programs which you expect to run on the NFS client, add noexec to this list.

6.9 - Setting up a network bridge in OpenBSD

A bridge is a link between two or more separate networks. Unlike a router, packets transfer through the bridge "invisibly" -- logically, the two network segments appear to be one segment to nodes on either side of the bridge. The bridge will only forward packets that have to pass from one segment to the other, so among other things, they provide an easy way to reduce traffic in a complex network and yet allow any node to access any other node when needed.

Note that because of this "invisible" nature, an interface in a bridge may or may not have an IP address of its own. If it does, the interface has effectively two modes of operation, one as part of a bridge, the other as a normal, stand-alone NIC. If neither interface has an IP address, the bridge will pass network data, but will not be externally maintainable (which can be a feature).

An example of a bridge application

One of my computer racks has a number of older systems, none of which have a built-in 10BASE-TX NIC. While they all have an AUI or AAUI connector, my supply of transceivers is limited to coax. One of the machines on this rack is an OpenBSD-based terminal server which is always on and connected to the high-speed network. Adding a second NIC with a coax port will allow me to use this machine as a bridge to the coax network.

This system has two NICs in it now, an Intel EtherExpress/100 (fxp0) and a 3c590-Combo card (ep0) for the coax port. fxp0 is the link to the rest of my network and will thus have an IP address, ep0 is going to be for bridging only and will have no IP address. Machines attached to the coax segment will communicate as if they were on the rest of my network. So, how do we make this happen?

The file hostname.fxp0 contains the configuration info for the fxp0 card. This machine is set up using DHCP, so its file looks like this:

$ cat /etc/hostname.fxp0 dhcp NONE NONE NONE

No surprises here.

The ep0 card is a bit different, as you might guess:

$ cat /etc/hostname.ep0 up media 10base2

Here, we are instructing the system to activate this interface using ifconfig(8) and set it to 10BASE-2 (coax). No IP address or similar information needs to be specified for this interface. The options the ep card accepts are detailed in its man page.

Now, we need to set up the bridge. Bridges are initialized by the existence of a file named something like bridgename.bridge0. Here is an example for my situation here:

$ cat /etc/bridgename.bridge0 add fxp0 add ep0 up

This is saying set up a bridge consisting of the two NICs, fxp0 and ep0, and activate it. Does it matter which order the cards are listed? No, remember a bridge is very symmetrical -- packets flow in and out in both directions.

That's it! Reboot, and you now have a functioning bridge.

Filtering on a bridge

While there are certainly uses for a simple bridge like this, it is likely you might want to DO something with the packets as they go through your bridge. As you might expect, Packet Filter can be used to restrict what traffic goes through your bridge.

Keep in mind, by the nature of a bridge, the same data flows through both interfaces, so you only need to filter on one interface. Your default "Pass all" statements would look something like this:

pass in on ep0 all pass out on ep0 all pass in on fxp0 all pass out on fxp0 all

Now, let's say I wish to filter traffic hitting these old machines, I want only Web and SSH traffic to reach them. In this case, we are going to let all traffic in and out of the ep0 interface, but filter on the fxp0 interface, using keep state to handle the reply data:

# Pass all traffic through ep0 pass in quick on ep0 all pass out quick on ep0 all # Block fxp0 traffic block in on fxp0 all block out on fxp0 all pass in quick on fxp0 proto tcp from any to any port {22, 80} \ flags S/SA keep state

Note that this rule set will prevent anything but incoming HTTP and SSH traffic from reaching either the bridge machine or any of the other nodes "behind" it. Other results could be had by filtering the other interface.

To monitor and control the bridge you have created, use the brconfig(8) command, which can also be used to create a bridge after boot.

Tips on bridging

6.10 - How do I boot using PXE? (i386, amd64)

The Preboot Execution Environment, or PXE, is a way to boot a computer from the network, rather than from a hard disk, a floppy or a CD-ROM. The technology was originally developed by Intel, but is supported by most major network card and computer manufacturers now. Note that there are several different network boot protocols, PXE is relatively recent. Traditionally, PXE booting is done using ROMs on the NIC or mainboard of the system, but boot floppies are available from various sources that will permit PXE booting, as well. Many ROMs on older NICs support network booting but do NOT support PXE; OpenBSD/i386 or amd64 cannot currently be booted across the network by these.

How does PXE booting work?

First, it is wise to understand how OpenBSD boots on i386 and amd64 platforms. Upon starting the boot process, the PXE-capable NIC broadcasts a DHCP request over the network. The DHCP server will assign the adapter an IP address, and gives it the name of a file to be retrieved from a tftp(1) server and executed. This file then conducts the rest of the boot process. For OpenBSD, the file is pxeboot, which takes the place of the standard boot(8) file. pxeboot(8) is then able to load and execute a kernel (such as bsd or bsd.rd) from the same tftp(1) server.

How do I do it?

The first and obvious step is you must have a PXE-boot capable computer or network adapter. Some documentation will indicate all modern NICs and computers are PXE capable, but this is clearly not true -- many low cost systems do not include PXE ROMs or use an older network boot protocol. You also need a properly configured DHCP and TFTP server.

Assuming an OpenBSD machine is the source of the boot files (this is NOT required), your DHCP server dhcpd.conf file will need to have the following line: filename "pxeboot"; to have the DHCP server offer that file to the booting workstation. For example: shared-network LOCAL-NET { option domain-name "example.com"; option domain-name-servers 192.168.1.3, 192.168.1.5; subnet 192.168.1.0 netmask 255.255.255.0 { option routers 192.168.1.1; filename "pxeboot"; range 192.168.1.32 192.168.1.127; default-lease-time 86400; max-lease-time 90000; } }

You will also have to activate the tftpd(8) daemon. This is typically done through inetd(8). The standard OpenBSD install has a sample line in inetd.conf which will do nicely for you: #tftp dgram udp wait root /usr/libexec/tftpd tftpd -s /tftpboot which simply needs to have the '#' character removed and send inetd(8) a -HUP signal to get it to reload /etc/inetd.conf. tftpd(8) serves files from a particular directory, in the case of this line, that directory is /tftpboot, which we will use for this example. Obviously, this directory needs to be created and populated. Typically, you will have only a few files here for PXE booting:

Note that /etc/boot.conf is only needed if the kernel you wish to boot from is not named bsd, or other pxeboot defaults are not as you need them (for example, you wish to use a serial console). You can test your tftpd(8) server using a tftp(1) client, making sure you can fetch the needed files.

When your DHCP and TFTP servers are running, you are ready to try it. You will have to activate the PXE boot on your system or network card; consult your system documentation. Once you have it set, you should see something similar to the following: Intel UNDI, PXE-2.0 (build 067) Copyright (C) 1997,1998 Intel Corporation For Realtek RTL 8139(X) PCI Fast Ethernet Controller v1.00 (990420) DHCP MAC ADDR: 00 E0 C5 C8 CF E1 CLIENT IP: 192.168.1.76 MASK: 255.255.255.0 DHCP IP: 192.168.1.252 GATEWAY IP: 192.168.1.1 probing: pc0 com0 com1 apm pxe![2.1] mem[540k 28m a20=on] disk: hd0* net: mac 00:e0:c5:c8:cf:e1, ip 192.168.1.76, server 192.168.1.252 >> OpenBSD/i386 PXEBOOT 1.00 boot> At this point, you have the standard OpenBSD boot prompt. If you simply type "bsd.rd" here, you will then fetch the file bsd.rd from the TFTP server. >> OpenBSD/i386 PXEBOOT 1.00 boot> bsd.rd booting tftp:bsd.rd: 4375152+733120 [58+122112+105468]=0x516d04 entry point at 0x100120 Copyright (c) 1982, 1986, 1989, 1991, 1993 The Regents of the University of California. All rights reserved. Copyright (c) 1995-2007 OpenBSD. All rights reserved. http://www.OpenBSD.org OpenBSD 4.2 (RAMDISK_CD) #468: Tue Aug 28 11:02:17 MDT 2007 ... The bsd.rd install kernel will now boot.

Can I boot other kinds of kernels using PXE other than bsd.rd?

Yes, although with the tools currently in OpenBSD, PXE booting is primarily intended for installing the OS.

6.11 - The Common Address Redundancy Protocol (CARP)

6.11.1 - What is CARP and how does it work?

CARP is a tool to help achieve system redundancy, by having multiple computers creating a single, virtual network interface between them, so that if any machine fails, another can respond instead, and/or allowing a degree of load sharing between systems. CARP is an improvement over the Virtual Router Redundancy Protocol (VRRP) standard. It was developed after VRRP was deemed to be not free enough because of a possibly-overlapping Cisco patent. For more information on CARP's origins and the legal issues surrounding VRRP, please visit this page.

To avoid legal conflicts, Ryan McBride (with help from Michael Shalayeff, Marco Pfatschbacher and Markus Friedl) designed CARP to be fundamentally different. The inclusion of cryptography is the most prominent change, but still only one of many.

How it works: CARP is a multicast protocol. It groups several physical computers together under one or more virtual addresses. Of these, one system is the master and responds to all packets destined for the group, the other systems act as hot spares. No matter what the IP and MAC address of the local physical interface, packets sent to the CARP address are returned with the CARP information.

At configurable intervals, the master advertises its operation on IP protocol number 112. If the master goes offline, the other systems in the CARP group begin to advertise. The host that's able to advertise most frequently becomes the new master. When the main system comes back up, it becomes a back up host by default, although if it's more desirable for one host to be master whenever possible (e.g. one host is a fast Sun Fire V120 and the others are comparatively slow SPARCstation IPCs), you can so configure them.

While highly redundant and fault-tolerant hardware minimizes the need for CARP, it doesn't erase it. There's no hardware fault tolerance that's capable of helping if someone knocks out a power cord, or if your system administrator types reboot in the wrong window. CARP also makes it easier to make the patch and reboot cycle transparent to users, and easier to test a software or hardware upgrade--if it doesn't work, you can fall back to your spare until fixed.

There are, however, situations in which CARP won't help. CARP's design does require that the members of a group be on the same physical subnet with a static IP address, although with the introduction of the carpdev directive, there is no more need for IP addresses on the physical interfaces. Similarly, services that require a constant connection to the server (such as SSH or IRC) will not be transparently transferred to the other system--though in this case, CARP can help with minimizing downtime. CARP by itself does not synchronize data between applications, this has to be done through "alternative channels" such as pfsync(4) (for redundant filtering), manually duplicating data between boxes with rsync, or whatever is appropriate for your application.

6.11.2 - Configuration

CARP's controls are located in two places: sysctl(8) and ifconfig(8). Let's look at the sysctls first.

The first sysctl, net.inet.carp.allow, defines whether the host handles CARP packets at all. Clearly, this is necessary to use CARP. This sysctl is enabled by default.

The second, net.inet.carp.arpbalance, is used for load balancing. If this feature is enabled, CARP source-hashes the originating IP of a request. The hash is then used to select a virtual host from the available pool to handle the request. This is disabled by default.

The third, net.inet.carp.log, logs CARP errors. Disabled by default.

Fourth, net.inet.carp.preempt enables natural selection among CARP hosts. The most fit for the job (that is to say, able to advertise most frequently) will become master. Disabled by default, meaning a system that is not a master will not attempt to (re)gain master status.

All these sysctl variables are documented in sysctl(3).

For the remainder of CARP's configuration, we rely on ifconfig(8). The CARP-specific commands advbase and advskew deal with the interval between CARP advertisements. The formula (in seconds) is advskew divided by 256, then added to advbase. advbase can be used to decrease network traffic or allow longer latency before a backup host takes over; advskew lets you control which host will be master without much delaying failover (should that be required).

Next, pass sets a password, and vhid sets the virtual host identifier number of the CARP group. You need to assign a unique number for each CARP group, even if (for load balancing purposes) they share the same IP address. CARP is limited to 255 groups.

Finally, carpdev specifies which physical interface belongs to this particular CARP group. By default, whichever interface has an IP address in the same subnet assigned to the CARP interface will be used.

Let's put all these settings together in a basic configuration. Let's say you're deploying two identically configured Web servers, rachael (192.168.0.5) and pris (192.168.0.6), to replace an older system that was at 192.168.0.7. The commands:

rachael# ifconfig carp0 create rachael# ifconfig carp0 vhid 1 pass tyrell carpdev fxp0 \     192.168.0.7 netmask 255.255.255.0

create the carp0 interface and give it a vhid of 1, a password of tyrell, and the IP address 192.168.0.7 with mask 255.255.255.0. Assign fxp0 as the member interface. To make it permanent across reboots, you can create an /etc/hostname.carp0 file that looks like this:

inet 192.168.0.7 255.255.255.0 192.168.0.255 vhid 1 pass tyrell carpdev fxp0
Note that the broadcast address is specified in that line, in addition to the vhid and the password. Failing to do this is a common cause of errors, as it is needed as a place holder.

Do the same on pris. Whichever system brings the CARP interface up first will be master (assuming that preempt is disabled; the opposite is true when preempt is enabled).

But let's say you're not deploying from scratch. Rachael was already in place at the address 192.168.0.7. How do you work around that? Fortunately, CARP can deal with this situation. You simply assign the address to the CARP interface and leave the physical interface specified by the `carpdev' keyword without an IP address. However, it tends to be cleaner to have an IP for each system--it makes individual monitoring and access much simpler.

Let's add another layer of complexity; we want rachael to stay master when possible. There are several reasons we might want this: hardware differences, simple prejudice, "if this system isn't master, there's a problem," or knowing the default master without doing scripting to parse and email the output of ifconfig.

On rachael, we'll use the sysctl we created above, then edit /etc/sysctl.conf to make it permanent.

rachael# sysctl net.inet.carp.preempt=1

We'll do configuration on pris, too:

pris# ifconfig carp0 advskew 100

This slightly delays pris's advertisements, meaning rachael will be master when alive.

Note that if you are using PF on a CARP'd computer, you must pass "proto carp" on all involved interfaces, with a line similar to:

pass on fxp0 proto carp keep state

6.11.3 - Load balancing

Flash forward a few months. Our company of the previous example has grown to the point where a single internal Web server is just barely managing the load. What to do? CARP to the rescue. It's time to try load balancing. Create a new CARP interface and group on rachael:

rachael# ifconfig carp1 create rachael# ifconfig carp1 vhid 2 advskew 100 pass bryant carpdev fxp0 \     192.168.0.7 netmask 255.255.255.0

On pris, we'll create the new group and interface as well, then set the "preempt" sysctl:

pris# ifconfig carp1 create pris# ifconfig carp1 vhid 2 pass bryant carpdev fxp0 \     192.168.0.7 netmask 255.255.255.0 pris# sysctl net.inet.carp.preempt=1

Now we have two CARP groups with the same IP address. Each group is skewed toward a different host, which means rachael will stay master of the original group, but pris will take over the new one.

All we have to do now is enable the load balancing sysctl we discussed previously on both machines:

# sysctl net.inet.carp.arpbalance=1

While these examples are for a two-machine cluster, the same principles apply to more systems. Please note, however, that it's not expected that you will achieve perfect 50/50 distribution between the two machines--CARP uses a hash of the originating IP address to determine which system handles the request, rather than by load.

6.11.4 - More Information on CARP

6.12 - Using OpenNTPD

Accurate time is important for many computer applications. However, many people have noticed that their $5 watch can keep better time than their $2000 computer. In addition to knowing what time it is, it is also often important to synchronize computers so that they all agree on what time it is. For some time, ntp.org has produced a Network Time Protocol (RFC1305, RFC2030) application, available through ports, which can be used to synchronize clocks on computers over the Internet. However, it is a nontrivial program to set up, difficult code to audit, and has a large memory requirement. In short, it fills an important role for some people, but it is far from a solution for all.

OpenNTPD is an attempt to resolve some of these problems, making a trivial-to-administer, safe and simple NTP compatible way to have accurate time on your computer. OpenBSD's ntpd(8) is controlled with an easy to understand configuration file, /etc/ntpd.conf.

Simply activating ntpd(8) through rc.conf.local will result in your computer's clock slowly moving towards, then keeping itself synchronized to, the pool.ntp.org servers, a collection of publicly available time servers. Once your clock is accurately set, ntpd will hold it at a high degree of accuracy, however, if your clock is more than a few minutes off, it is highly recommended that you bring it to close to accurate initially, as it may take days or weeks to bring a very-off clock to sync. You can do this using the "-s" option of ntpd(8) or any other way to accurately set your system clock.

6.12.1 - "But OpenNTPD isn't as accurate as the ntp.org daemon!"

That may be true. That is not OpenNTPD's design goal, it is intended to be free, simple, reliable and secure. If you really need microsecond precision more than the benefits of OpenNTPD, feel free to use ntp.org's ntpd, as it will remain available through ports and packages. There is no plan or desire to have OpenNTPD bloated with every imaginable feature.

6.12.2 - "Someone has claimed that OpenNTPD is 'harmful'!"

Some people have not understood the goals of OpenNTPD -- a simple, secure and easy to maintain way to keep your computer's clock accurate. If accurate time keeping is important, a number of users have reported better results from OpenNTPD than from ntp.org's ntpd. If security is important, OpenNTPD's code is much more readable (and thus, auditable) and was written using native OpenBSD function calls like strlcpy, rather than more portable functions like strcpy, and written to be secure from the beginning, not "made secure later". If having more people using time synchronization is valuable, OpenNTPD makes it much easier for larger numbers of people to use it. If this is "harmful", we are all for it.

There are applications where the ntp.org ntpd is more appropriate; however it is felt that for a large majority of the users, OpenNTPD is more than sufficient.

A more complete response to this by one of the maintainers of OpenNTPD can be read here.

6.12.3 - Why can't my other machines synchronize to OpenNTPD?

ntpd(8) does not listen on any address by default. So in order to use it as a server, you have to uncomment the "#listen on *" line in /etc/ntpd.conf and restart the ntpd(8) daemon. Of course, if you wish it to listen on a particular IP address rather than all available addresses and interfaces, replace the "*" with the desired address.

When you have ntpd(8) listening, it may happen that other machines still can't synchronize to it! A freshly started ntpd(8) daemon (for example, if you just restarted it after modifying ntpd.conf) refuses to serve time information to other clients until it adjusts its own clock to a reasonable level of stability first. When ntpd(8) considers its own time information stable, it announces it by a "clock now synced" message in /var/log/daemon. Even if the system clock is pretty accurate in the beginning, it can take up to 10 minutes to get in sync, and hours or days if the clock is not accurately set at the start.

6.13 - What are my wireless networking options?

OpenBSD has support for a number of wireless chipsets: (AP) indicates card can be used as an access point.
(NFF) indicates chip requires a non-free firmware which can not be included with OpenBSD.

Adapters based on these chips can be used much like any other network adapter to connect an OpenBSD system to an existing wireless network, configured using the standard ifconfig(8) (please see the manual pages for precise details). Some of these cards can also be used in the "Host-Based Access Point" mode, permitting them to be made into the wireless access point for your network as part of your firewall.

Note that in order to use some of these cards, you will need to acquire the firmware files, which the manufacturers refuse to allow free distribution of, so they can not be included with OpenBSD. When possible, the man pages linked above include contact information so you can contact the right people at the manufacturers to let them know what you feel about this, or to let them know what other product you have purchased instead.

Another option to consider for using your OpenBSD-based firewall to provide wireless access is to use a conventional NIC and an external bridging Access Point. This has the added advantage of letting you easily position the antenna where it is most effective, which is often not directly on the back of your firewall.

6.14 - How can I do equal-cost multipath routing?

Equal-cost multipath routing refers to having multiple routes in the routing table for the same network, such as the default route, 0.0.0.0/0. When the kernel is doing a route lookup to determine where to send packets destined to that network, it can choose from any of the equal-cost routes. In most scenarios, multipath routing is used to provide redundant uplink connections, e.g., redundant connections to the Internet.

The route(8) command is used to add/change/delete routes in the routing table. The -mpath argument is used when adding multipath routes.

# route add -mpath default 10.130.128.1
# route add -mpath default 10.132.0.1

Verify the routes:

# netstat -rnf inet | grep default default 10.130.128.1 UGS 2 134 - fxp1 default 10.132.0.1 UGS 0 172 - fxp2

In this example we can see that one default route points to 10.130.128.1 which is accessible via the fxp1 interface, and the other points to 10.132.0.1 which is accessible via fxp2.

Since the mygate(5) file does not yet support multipath default routes, the above commands should be added to the bottom of the hostname.if(5) files for the fxp1 and fxp2 interfaces. The /etc/mygate file should then be deleted.

/etc/hostname.fxp1
!route add -mpath default 10.130.128.1
/etc/hostname.fxp2
!route add -mpath default 10.132.0.1

Lastly, don't forget to activate the use of multipath routes by enabling the proper sysctl(3) variable.

# sysctl net.inet.ip.multipath=1
# sysctl net.inet6.ip6.multipath=1

Be sure to edit sysctl.conf(5) to make the changes permanent.

Now try a traceroute to different destinations. The kernel will load balance the traffic over each multipath route.

# traceroute -n 154.11.0.4 traceroute to 154.11.0.4 (154.11.0.4), 64 hops max, 60 byte packets 1 10.130.128.1 19.337 ms 18.194 ms 18.849 ms 2 154.11.95.170 17.642 ms 18.176 ms 17.731 ms 3 154.11.5.33 110.486 ms 19.478 ms 100.949 ms 4 154.11.0.4 32.772 ms 33.534 ms 32.835 ms # traceroute -n 154.11.0.5 traceroute to 154.11.0.5 (154.11.0.5), 64 hops max, 60 byte packets 1 10.132.0.1 14.175 ms 14.503 ms 14.58 ms 2 154.11.95.38 13.664 ms 13.962 ms 13.445 ms 3 208.38.16.151 13.964 ms 13.347 ms 13.788 ms 4 154.11.0.5 30.177 ms 30.95 ms 30.593 ms

For more information about how the route is chosen, please refer to RFC2992, "Analysis of an Equal-Cost Multi-Path Algorithm".

It's worth noting that if an interface used by a multipath route goes down (i.e., loses carrier), the kernel will still try to forward packets using the route that points to that interface. This traffic will of course be blackholed and end up going nowhere. It's highly recommended to use ifstated(8) to check for unavailable interfaces and adjust the routing table accordingly.

[FAQ Index] [To Section 5 - Building the System from Source] [To Section 7 - Keyboard and Display Controls]


[back] www@openbsd.org
$OpenBSD: faq6.html,v 1.260 2008/01/05 17:13:04 joel Exp $