Next: Memory management Up: Developer's Guide Previous: Compilation Configuration Process

Protocol Scheduling

In the abstract, we can model the operation of a participant in routing protocol exchange and computation as a loop consisting of the following primitive operations: read a protocol message, perform a route computation (this computation may trigger other sub-computations or cause a computation to be scheduled for later execution), write a protocol message.

Gated provides a common substrate, which includes abstractions such as tasks, timers and jobs, for implementing these primitive operations efficiently. A task represents a separately schedulable thread of protocol computation. A timer represents an event scheduled for a future instant, which causes some non-preemptable computation to be performed at that instant. This computation is associated with a specific task. A job (which can either be foreground or background) represents some non-preemptable protocol computation that can be scheduled anytime in the future.

Tasks in Gated

Since Gated provides a platform for prototyping and operating multiple routing protocol instances in a single entity (e.g. a router), it must be able to harmoniously manage primitive operations belonging to different routing protocol instances (e.g. a BGP instance and an OSPF instance). To do this, Gated provides a task, which is the thread of execution of a sequence of related primitive operations. These related operations can be scheduled independently of (i.e. arbitrarily interleaved with) the operation of other protocol instances or other operations within the same protocol instance. Tasks are roughly equivalent to co-routines (tasks are scheduled non-preemptively).

Different protocol implementations use tasks differently. The simplest way to use tasks is to allocate a single task to all computation that happens within a protocol. OSPF and RIP have a single task in whose context all protocol computation is performed. BGP, which maintains a transport connection to each individual peer, assigns a task to all protocol activity (message processing, route computation etc.) associated with the peer (or a peer group).

In the current Gated implementation, tasks do not correspond to separate UNIX processes (with the exception of the system trace dump task).

Task Data Structures

The central data structure in the task module is the task struct. It is a descriptor for all the processing state associated with a particular sequence of primitive operations. Such state includes pointers to protocol specific information (such as the protocol implementation state block, e.g. the bgpPeer struct associated with this block). A task is usually associated with a UNIX socket; the task structure contains the socket descriptor.

In addition, the task structure contains pointers to protocol specific routines for performing different primitive operations that the task module supports. These primitive operations are:

Reading one or more routing protocol messages.
Writing one or more routing protocol messages.
Flashing route changes to the forwarding table.
Scheduling a computation for a specified time in the future.
Scheduling a computation for some time in the future.

For example, when it is ready to read update messages from a peer, the BGP protocol implementation registers a pointer to bgp_recv_v4_update(). with the corresponding peer's task. Then, when data actually arrives on a socket, the task module simply calls this handler to perform the protocol specific read.

To notify protocols of Gated reconfiguration, tasks also maintain pointers to protocol specific notification handlers (e.g., the protocol specific handler to be called before reconfiguration, after reconfiguration and so on).

Gated individually prioritizes each of these operations. At any given instant, pending timers have highest priority, followed by foreground jobs, reads from sockets, protocol specific exception handling, writes to sockets, and finally background jobs. Gated maintains a global list of tasks ( task_read_queue, task_write_queue etc.) waiting to perform these operations; the task structure also contains linked list pointers for placement on each of these queues. (Actually, jobs and timers are queued in separate data structures. That's because a task may have more than one timer or job pending).

Task Module Entry Points

Protocol instances normally create a task by first allocating a task data structure for the task (by calling task_alloc()), instantiating the various field of the task structure, and then calling task_create() to insert the task structure into a global list of tasks. This insertion takes place in priority order (tasks are assigned a static priority). This latter function also creates a socket for use by the task for reads and writes.

Operations on the socket associated with the task include task_close() (to close the socket associated with the task, and remove it from the global read and write queues), task_set_socket() and task_reset_socket().

Storage associated with the task data structure is removed when task_delete() is called. This frees up any other resources (timers, jobs) associated with the task and removes the task from any other lists to which it may belong.

Some task module functions are used to notify each individual task of events of global interest. Some of these are processed in task priority order. Thus, task_reconfigure() is called to force all protocol instances to re-read the configuration files. Similarly, task_ifachange() and task_iflchange() are used to notify protocol instances of changes in the status of one or more network interfaces. And, task_terminate() is used to notify a protocol instance of GateD's intention to wind up operations (e.g. in response to operator action). Also, task_newpolicy() is used to inform interested protocol modules of routing policy configuration changes. Finally, task_flash() calls each protocol instance's flash handler when any change happens to GateD's global route database.

Once a task has been created, individual protocol instances may schedule reads on the task socket using task_set_recv(), schedule blocked writes to the socket using task_set_write() (when a protocol instance has some data to send on a socket, it attempts to write it directly; if that blocks, it calls this function). No protocol implementation seems to use the socket exception feature.

From the operating system, task reads go directly into a global task_recv_buffer (Gated can do this because tasks are non-preemptive). It is then the protocol instance's responsibility to consume all the data in the buffer. Similarly, in preparing to write data, protocol instances formulate messages in a single global task_send_buffer; if the write blocks, it is the protocol instance's responsibility to spool the data elsewhere.

Task Processing and the Gated Main Loop

All protocol processing in Gated is scheduled in its main() loop. As described above, the main loop prioritizes different computations and repeatedly loops to discover which computations have been enabled since the last time the loop was traversed.

Basically, the main loop does a select() on all socket descriptors used for reading, writing, or exception handling. It falls into the select() when no timers are scheduled and the job queues are empty. Gated stays in the select() until the next scheduled timer or 8 seconds, whichever is earlier. After coming out of the select, it processes "high priority" timer events (see the timer section). If any operator actions occurred (in the form of UNIX signals), it processes these (from task_process_signals()) and causes any queued foreground jobs to run before returning to the select(). Otherwise, it performs socket reads and writes (from task_process_sockets()), pending timers and background jobs in that order.

Thus, a protocol's message read and write operations, as well as timers and jobs get scheduled from GateD's main loop. Only the flash routines are performed whenever (as a result of a read) GateD's routing database gets updated.

Timers in Gated

A timer is Gated schedules a particular computation for a specified future instant. The Gated timer module allows specification of both one-shot timers and interval timers. Time specification granularity is 1 second. Timer expiry is governed by system load; thus, a timer may not expire at exactly the scheduled interval since its last expiry.

Gated allows protocols to specify higher priority timers (these have a greater probability of correct expiry compared to "normal" timers). Gated also allows specification of drifting timers: that is, depending on how the timer is initialized, the next timer expiry is calculated based on when the timer actually expired last and the interval. It is also possible to configure the timer to be non-drifting: that is, timer expiry is scheduled at exactly some integral multiple intervals after the timer was first scheduled.

Protocols use timers for periodic tasks such as monitoring connection status, timing out connection opens and so on.

In the current Gated implementation, timer expiry is checked by polling all active timers in the main loop; operating system facilities for timer expiry notification are not used.

Timer data structures

The central data structure is the timer descriptor block task_timer. This includes timer parameters (interval, start time) as well as state describing the action to perform when timer expiry happens.

These task_timer data structures are queued into one of three global queues maintained by the task module: a list of high priority timers, a list of active timers and a list of inactive timers (the last is used when protocols temporarily need to idle timer expiry). The first two lists are sorted in increasing order of next expiry, to ease processing.

High priority timers are used by some protocols like OSPF for timing their hello deliveries. BGP uses "normal" timers for all its timing needs: keepalives, connection retries and so on.

Timer Module Entry Points

The function task_timer_create() creates a timer with the specified parameters and inserts it into the appropriate global list of timers. Conversely, task_timer_delete() removes a timer from its queue and frees any resources that the timer holds.

Also, task_timer_reset() is used to idle the timer (i.e. move it into the inactive list). task_timer_uset() is used to reschedule an idle timer or a timer already in progress.

Timer Module Processing

The primary timer handling is performed in the task_timer_dispatch() and task_timer_hiprio_dispatch() routines. These simply traverse the list of timers, call the appropriate handlers, recompute the next interval at which the timer should fire and reschedule the timer for then. These are called from the main loop.

Jobs in Gated

A Gated job represents a unit of computation that can be scheduled for a time later than the present. Jobs are either foreground or background. Foreground jobs are time-limited computations run when it is safe to modify the Gated routing database and are typically called from high priority timer handlers or flash handlers. Background jobs are longer running computations scheduled when no timers or I/O is pending. Each background job has a priority between zero and seven; background jobs are scheduled in priority order.

For example, BGP uses a foreground job to close a peer connection if an error is observed when flushing routes to that peer. This cannot be performed correctly from the task's write routine since there may be reads queued up or timers scheduled for the task. Similarly, the OSPF implementation sets a timer after an SPF computation to ensure that the SPF computation is not done within that time period; at the expiry of the timer, it sets up a background task to run the next SPF computation.

Job data structures

The descriptor block for a single job, task_job, contains pointers to the task for which this job is scheduled, the routine encompassing the computation to be performed and a single argument to pass to that routine.

The Gated module maintains a global list of foreground jobs (ordered FIFO) and a list of background jobs (ordered by priority). In addition, there exists a single array of pointers to the locations in the background queue marking ends of blocks of task_job structs with the same priority.

Job Entry Points

Protocol instances call task_job_create() to allocate a task_job structure and insert it into the appropriate list. They call task_job_delete() to do the opposite. Foreground jobs are automatically deleted after completion. Background jobs must be explicitly deleted.

Job Processing

The Gated main loop calls task_job_fg_dispatch() to schedule computation of foreground jobs. This runs all foreground jobs to completion. When possible, the Gated main loop calls task_job_bg_dispatch() to schedule computation of foreground jobs. Unlike the former, this function simply removes the first, highest-priority background job from the list of background jobs, schedules its execution and returns.

Next: Memory management Up: Developer's Guide Previous: Compilation Configuration Process

Laurent Joncheray
Wed Jun 12 15:35:22 EDT 1996