System¶

Process Name¶

Every EXOS process has a name that uniquely identifies it.

exos.api.get_process_name()[source]¶: Get the name of this process using exosGetProcessName().

exos.api.set_process_name(process_name)[source]¶

Set the name of this process using exosSetProcessName().

If running in daemon mode, the process name is taken from epmrc and this call is ignored.

If running in non-daemon mode, the process name defaults to expy and can be overridden with this function.

Health Monitoring¶

EXOS can monitor processings and restart them upon failure. Additionally, it can monitor threads within a process. Threads must first be registered with EXOS, then they must periodically send a heartbeat, indiciating they are still functioning properly.

Thread Monitoring¶

The following methods have been added to the threading.Thread class to support thread monitoring.

class threading.Thread¶

Thread.register(period)¶: Register this thread for monitoring by the EXOS process manager. The keepalive() method must be called at least every period seconds to indicate it is still alive. Otherwise, the process will be considered failed and recovery will be attempted.

Thread.deregister()¶: Deregister this thread from monitoring.

Thread.keepalive()¶: Indicate that this thread is still alive. Should be called at least as often as registered period. It is typical to call it at the top of a processing loop.

Pool Monitoring¶

The MonitoredThreadPool class inherits from futures.ThreadPoolExecutor to implement a monitored pool. It automatically registers a pool monitoring thread with EXOS and periodically sends heartbeats. As long as the pool is still processing jobs, the monitoring thread continues to send heartbeats. If the pool gets stuck, EXOS will note the failure and can recover the process.

class exos.api.MonitoredThreadPool(*args, **kwds)[source]¶

A ThreadPoolExecutor that is monitored by the EXOS process manager. If the thread pool stops processing jobs, the process manager will fail the process and attempt to recover it.

__init__(self, max_workers, name=None, period=15, *args, **kwargs)[source]¶

Create a new MonitoredThreadPool named name. The thread pool will send a keepalive message to the process manager every period seconds. If the pool is stuck for more than 3 periods, the process will be failed and recovered.

Remaining arguments are passed to futures.ThreadPoolExecutor.

Process State¶

An EXOS processes have a state, indicating whether it is READY, STOPPED, etc. When a process is started, it will transition from BOOTING to LOADCFG and finally to READY. A process must call ready() to enter the READY state.

exos.api.ready()[source]¶: Declare this process ready for clients. This will update the process state to READY.

exos.api.get_process_state(process_name)[source]¶: Get the ProcessState of process_name.

Process States:

ProcessState.FAIL¶

ProcessState.STOPPED¶

ProcessState.STARTED¶

ProcessState.BOOTING¶

ProcessState.LOADCFG¶

ProcessState.READY¶

Stacking¶

EXOS switches can be “stacked” for redundancy and simplified management.

Not all switches can be stacked. Use the is_stackable() function to determine if the current switch supports stacking.

exos.api.is_stackable()[source]¶

Return True if we are running on a stackable. However, stacking may not be enabled.

Slots¶

The slot collection provides information about the slot numbers used in the stack.

exos.api.slot¶

class exos.api.SlotProperties[source]¶

self¶: The current switch’s slot number.

first¶: The first valid slot number.

last¶: The last valid slot number.

Primary, Backup, and Standby¶

Within each stack, one switch will be master, another may be a backup, and the rest are standbys. The following functions allow a process to determine its switches current state.

exos.api.is_primary()[source]¶: Return True if this switch is the primary.

exos.api.is_backup()[source]¶: Return True if this switch is the backup.

exos.api.is_standby()[source]¶: Return True if this switch is a standby.

Processes can “checkpoint” state so that failover is seamless. For example, as a routing protocol learns about neighbors, it may checkpoint that list to the backup so that the list does not need to re-learned after a failover.

Checkpointing¶

Under Python, checkpointing is implemented as “call this, but over there.” For example, if the primary learned about a new neighbor, it may call an add_neighbor() locally and then use call_on_backup() to make the same call, but on the backup switch:

add_neighbor(new_ip)
api.call_on_backup(add_neighbor, new_ip)

The following checkpointing functions are available.

exos.api.is_checkpointing()[source]¶: Return True if this switch is ready to checkpoint data.

exos.api.call_on_primary(fn, *args, **kwds)[source]¶: Call fn on the primary with args and kwds. fn, args, kwds must be pickle-able. If not, a PicklingError is raised. True is returned if the message was sent.

exos.api.call_on_backup(fn, *args, **kwds)[source]¶: Call fn on the backup with args and kwds. fn, args, kwds must be pickle-able. If not, a PicklingError is raised. True is returned if the message was sent.

exos.api.call_on_standby(slot, fn, *args, **kwds)[source]¶: Call fn on the standby with args and kwds. fn, args, kwds must be pickle-able. If not, a PicklingError is raised. True is returned if the message was sent.

exos.api.call_on_standbys(fn, *args, **kwds)[source]¶: Call fn on all standbys with args and kwds. fn, args, kwds must be pickle-able. If not, a PicklingError is raised. True is returned if the message was sent.

Miscellaneous¶

exos.api.get_system_mac()[source]¶: Get the MAC address of the system. Returns None if not available, or the MAC as a tuple of bytes

exos.api.get_sysname()[source]¶: Return the switch name

exos.api.is_capability_supported(capability)[source]¶: Return True if this switch supports the given capability, otherwise False. capability is a string. If it is not a recognized, a ValueError is raised.

Exceptions¶

exception exos.api.ExosError[source]¶

exception exos.api.DeadlockDetectedError[source]¶: An imminent deadlock was detected, typically because a synchronous call was made from an asynchronous context.