Monitoring Basics¶
This part of the Icinga 2 documentation provides an overview of all the basic monitoring concepts you need to know to run Icinga 2. Keep in mind these examples are made with a Linux server. If you are using Windows, you will need to change the services accordingly. See theITL referencefor further information.
Attribute Value Types¶
The Icinga 2 configuration uses different value types for attributes.
Type | Example |
---|---|
Number | 5 |
Duration | 1m |
String | "These are notes" |
Boolean | true |
Array | [ "value1", "value2" ] |
Dictionary | { "key1" = "value1", "key2" = false } |
It is important to use the correct value type for object attributes as otherwise theconfiguration validationwill fail.
Hosts and Services¶
Icinga 2可以用来监视可用of hosts and services. Hosts and services can be virtually anything which can be checked in some way:
- Network services (HTTP, SMTP, SNMP, SSH, etc.)
- Printers
- Switches or routers
- Temperature sensors
- Other local or network-accessible services
Host objects provide a mechanism to group services that are running on the same physical device.
Here is an example of a host object which defines two child services:
object Host "my-server1" { address = "10.0.0.1" check_command = "hostalive" } object Service "ping4" { host_name = "my-server1" check_command = "ping4" } object Service "http" { host_name = "my-server1" check_command = "http" }
The example creates two servicesping4
andhttp
which belong to the hostmy-server1
.
It also specifies that the host should perform its own check using thehostalive
check command.
Theaddress
attribute is used by check commands to determine which network address is associated with the host object.
Details on troubleshooting check problems can be foundhere.
Host States¶
Hosts can be in any one of the following states:
Name | Description |
---|---|
UP | The host is available. |
DOWN | The host is unavailable. |
Service States¶
Services can be in any one of the following states:
Name | Description |
---|---|
OK | The service is working properly. |
WARNING | The service is experiencing some problems but is still considered to be in working condition. |
CRITICAL | The check successfully determined that the service is in a critical state. |
UNKNOWN | The check could not determine the service’s state. |
Check Result State Mapping¶
Check pluginsreturn with an exit code which is converted into a state number. Services map the states directly while hosts will treat0
or1
asUP
for example.
Value | Host State | Service State |
---|---|---|
0 | Up | OK |
1 | Up | Warning |
2 | Down | Critical |
3 | Down | Unknown |
Hard and Soft States¶
When detecting a problem with a host/service, Icinga re-checks the object a number of times (based on themax_check_attempts
andretry_interval
settings) before sending notifications. This ensures that no unnecessary notifications are sent for transient failures. During this time the object is in aSOFT
state.
After all re-checks have been executed and the object is still in a non-OK state, the host/service switches to aHARD
state and notifications are sent.
Name | Description |
---|---|
HARD | The host/service’s state hasn’t recently changed.check_interval applies here. |
SOFT | The host/service has recently changed state and is being re-checked withretry_interval . |
Host and Service Checks¶
Hosts and services determine their state by running checks in a regular interval.
object Host "router" { check_command = "hostalive" address = "10.0.0.1" }
Thehostalive
command is one of several built-in check commands. It sends ICMP echo requests to the IP address specified in theaddress
attribute to determine whether a host is online.
Tip
hostalive
is the same asping
but with different default thresholds. Both use theping
CLI command to execute sequential checks.If you need faster ICMP checks, look into theicmpCheckCommand.
A number of otherbuilt-in check commandsare also available. In addition to these commands the next few chapters will explain in detail how to set up your own check commands.
Host Check Alternatives¶
If the host is not reachable with ICMP, HTTP, etc. you can also use thedummyCheckCommand to set a default state.
object Host "dummy-host" { check_command = "dummy" vars.dummy_state = 0 //Up vars.dummy_text = "Everything OK." }
This method is also used when you send inexternal check results.
A more advanced technique is to calculate an overall state based on all services. This is describedhere.
Templates¶
Templates may be used to apply a set of identical attributes to more than one object:
“通用服务”{max_check_att模板服务empts = 3 check_interval = 5m retry_interval = 1m enable_perfdata = true } apply Service "ping4" { import "generic-service" check_command = "ping4" assign where host.address } apply Service "ping6" { import "generic-service" check_command = "ping6" assign where host.address6 }
In this example theping4
andping6
services inherit properties from the templategeneric-service
.
Objects as well as templates themselves can import an arbitrary number of other templates. Attributes inherited from a template can be overridden in the object if necessary.
You can also import existing non-template objects.
Note
Templates and objects share the same namespace, i.e. you can’t define a template that has the same name like an object.
Multiple Templates¶
The following example usescustom variableswhich are provided in each template. Theweb-server
template is used as the base template for any host providing web services. In addition to that it specifies the custom variablewebserver_type
, e.g.apache
. Since this template is also the base template, we import thegeneric-host
template here. This provides thecheck_command
attribute by default and we don’t need to set it anywhere later on.
template Host "web-server" { import "generic-host" vars = { webserver_type = "apache" } }
Thewp-server
host template specifies a Wordpress instance and sets theapplication_type
custom variable. Please note the+=
operatorwhich addsdictionaryitems, but does not override any previousvars
attribute.
template Host "wp-server" { vars += { application_type = "wordpress" } }
The final host object imports both templates. The order is important here: First the base templateweb-server
is added to the object, then additional attributes are imported from thewp-server
对象。
object Host "wp.example.com" { import "web-server" import "wp-server" address = "192.168.56.200" }
If you want to override specific attributes inherited from templates, you can specify them on the host object.
object Host "wp1.example.com" { import "web-server" import "wp-server" vars.webserver_type = "nginx" //overrides attribute from base template address = "192.168.56.201" }
Custom Variables¶
In addition to built-in object attributes you can define your own custom attributes inside thevars
attribute.
Tip
This is called
custom variables
throughout the documentation, backends and web interfaces.Older documentation versions referred to this as
custom attribute
.
The following example specifies the keyssh_port
as custom variable and assigns an integer value.
object Host "localhost" { check_command = "ssh" vars.ssh_port = 2222 }
vars
is adictionarywhere you can set specific keys to values. The example above uses the shorterindexersyntax.
An alternative representation can be written like this:
vars = { ssh_port = 2222 }
or
vars["ssh_port"] = 2222
Custom Variable Values¶
Valid values for custom variables include:
You can also define nested values such as dictionaries in dictionaries.
This example defines the custom variabledisks
as dictionary. The first key is set todisk /
is itself set to a dictionary with one key-value pair.
vars.disks["disk /"] = { disk_partitions = "/" }
This can be written as resolved structure like this:
vars = { disks = { "disk /" = { disk_partitions = "/" } } }
Keep this in mind when trying to access specific sub-keys in apply rules or functions.
Another example which is shown in the example configuration:
vars.notification["mail"] = { groups = [ "icingaadmins" ] }
This defines thenotification
custom variable as dictionary with the keymail
. Its value is a dictionary with the keygroups
which itself has an array as value. Note: This array is the exact same as theuser_groups
attribute fornotification apply rulesexpects.
vars.notification = { mail = { groups = [ "icingaadmins" ] } }
Functions as Custom Variables¶
Icinga 2 lets you specifyfunctionsfor custom variables. The special case here is that whenever Icinga 2 needs the value for such a custom variable it runs the function and uses whatever value the function returns:
object CheckCommand "random-value" { command = [ PluginDir + "/check_dummy", "0", "$text$" ] vars.text = {{ Math.random() * 100 }} }
This example uses theabbreviated lambda syntax.
These functions have access to a number of variables:
Variable | Description |
---|---|
user | The User object (for notifications). |
service | The Service object (for service checks/notifications/event handlers). |
host | The Host object. |
command | The command object (e.g. a CheckCommand object for checks). |
Here’s an example:
vars.text = {{ host.check_interval }}
In addition to these variables themacrofunction can be used to retrieve the value of arbitrary macro expressions:
vars.text = {{ if (macro("$address$") == "127.0.0.1") { log("Running a check for localhost!") } return "Some text" }}
Theresolve_arguments
function can be used to resolve a command and its arguments much in the same fashion Icinga does this for thecommand
andarguments
attributes for commands. Theby_ssh
command uses this functionality to let users specify a command and arguments that should be executed via SSH:
arguments = { "-C" = {{ var command = macro("$by_ssh_command$") var arguments = macro("$by_ssh_arguments$") if (typeof(command) == String && !arguments) { return command } var escaped_args = [] for (arg in resolve_arguments(command, arguments)) { escaped_args.add(escape_shell_arg(arg)) } return escaped_args.join(" ") }} ... }
Accessing object attributes at runtime inside these functions is described in theadvanced topicschapter.
Runtime Macros¶
Macros can be used to access other objects’ attributes andcustom variablesat runtime. For example they are used in command definitions to figure out which IP address a check should be run against:
object CheckCommand "my-ping" { command = [ PluginDir + "/check_ping" ] arguments = { "-H" = "$ping_address$" "-w" = "$ping_wrta$,$ping_wpl$%" "-c" = "$ping_crta$,$ping_cpl$%" "-p" = "$ping_packets$" } // Resolve from a host attribute, or custom variable. vars.ping_address = "$address$" // Default values vars.ping_wrta = 100 vars.ping_wpl = 5 vars.ping_crta = 250 vars.ping_cpl = 10 vars.ping_packets = 5 } object Host "router" { check_command = "my-ping" address = "10.0.0.1" }
In this example we are using the$address$
macro to refer to the host’saddress
attribute.
We can also directly refer to custom variables, e.g. by using$ping_wrta$
. Icinga automatically tries to find the closest match for the attribute you specified. The exact rules for this are explained in the next section.
Note
When using the
$
sign as single character you must escape it with an additional dollar character ($$
).
Evaluation Order¶
When executing commands Icinga 2 checks the following objects in this order to look up macros and their respective values:
- User object (only for notifications)
- Service object
- Host object
- Command object
- Global custom variables in the
Vars
constant
This execution order allows you to define default values for custom variables in your command objects.
Here’s how you can override the custom variableping_packets
from the previous example:
object Service "ping" { host_name = "localhost" check_command = "my-ping" vars.ping_packets = 10 // Overrides the default value of 5 given in the command }
If a custom variable isn’t defined anywhere, an empty value is used and a warning is written to the Icinga 2 log.
You can also directly refer to a specific attribute – thereby ignoring these evaluation rules – by specifying the full attribute name:
$service.vars.ping_wrta$
This retrieves the value of theping_wrta
custom variable for the service. This returns an empty value if the service does not have such a custom variable no matter whether another object such as the host has this attribute.
Host Runtime Macros¶
The following host custom variables are available in all commands that are executed for hosts or services:
Name | Description |
---|---|
宿主name | The name of the host object. |
宿主display_name | The value of thedisplay_name attribute. |
宿主state | The host’s current state. Can be one ofUNREACHABLE ,UP andDOWN . |
宿主state_id | The host’s current state. Can be one of0 (up),1 (down) and2 (unreachable). |
宿主state_type | The host’s current state type. Can be one ofSOFT andHARD . |
宿主check_attempt | The current check attempt number. |
宿主max_check_attempts | The maximum number of checks which are executed before changing to a hard state. |
宿主last_state | The host’s previous state. Can be one ofUNREACHABLE ,UP andDOWN . |
宿主last_state_id | The host’s previous state. Can be one of0 (up),1 (down) and2 (unreachable). |
宿主last_state_type | The host’s previous state type. Can be one ofSOFT andHARD . |
宿主last_state_change | 最后的状态变化的时间戳。 |
宿主downtime_depth | The number of active downtimes. |
宿主duration_sec | The time since the last state change. |
宿主latency | The host’s check latency. |
宿主execution_time | The host’s check execution time. |
宿主output | The last check’s output. |
宿主perfdata | The last check’s performance data. |
宿主last_check | The timestamp when the last check was executed. |
宿主check_source | 监测实例,进行最后的check. |
宿主num_services | Number of services associated with the host. |
宿主num_services_ok | 服务与主机相关的数量are in anOK state. |
宿主num_services_warning | 服务与主机相关的数量are in aWARNING state. |
宿主num_services_unknown | 服务与主机相关的数量are in anUNKNOWN state. |
宿主num_services_critical | 服务与主机相关的数量are in aCRITICAL state. |
In addition to these specific runtime macroshost objectattributes can be accessed too.
Service Runtime Macros¶
The following service macros are available in all commands that are executed for services:
Name | Description |
---|---|
service.name | The short name of the service object. |
service.display_name | The value of thedisplay_name attribute. |
service.check_command | The short name of the command along with any arguments to be used for the check. |
service.state | The service’s current state. Can be one ofOK ,WARNING ,CRITICAL andUNKNOWN . |
service.state_id | The service’s current state. Can be one of0 (ok),1 (warning),2 (critical) and3 (unknown). |
service.state_type | The service’s current state type. Can be one ofSOFT andHARD . |
service.check_attempt | The current check attempt number. |
service.max_check_attempts | The maximum number of checks which are executed before changing to a hard state. |
service.last_state | The service’s previous state. Can be one ofOK ,WARNING ,CRITICAL andUNKNOWN . |
service.last_state_id | The service’s previous state. Can be one of0 (ok),1 (warning),2 (critical) and3 (unknown). |
service.last_state_type | The service’s previous state type. Can be one ofSOFT andHARD . |
service.last_state_change | 最后的状态变化的时间戳。 |
service.downtime_depth | The number of active downtimes. |
service.duration_sec | The time since the last state change. |
service.latency | The service’s check latency. |
service.execution_time | The service’s check execution time. |
service.output | The last check’s output. |
service.perfdata | The last check’s performance data. |
service.last_check | The timestamp when the last check was executed. |
service.check_source | 监测实例,进行最后的check. |
In addition to these specific runtime macrosservice objectattributes can be accessed too.
Command Runtime Macros¶
The following custom variables are available in all commands:
Name | Description |
---|---|
command.name | The name of the command object. |
User Runtime Macros¶
The following custom variables are available in all commands that are executed for users:
Name | Description |
---|---|
user.name | The name of the user object. |
user.display_name | The value of thedisplay_name attribute. |
In addition to these specific runtime macrosuser objectattributes can be accessed too.
Notification Runtime Macros¶
Name | Description |
---|---|
notification.type | The type of the notification. |
notification.author | The author of the notification comment if existing. |
notification.comment | The comment of the notification if existing. |
In addition to these specific runtime macrosnotification objectattributes can be accessed too.
Global Runtime Macros¶
The following macros are available in all executed commands:
Name | Description |
---|---|
icinga.timet | Current UNIX timestamp. |
icinga.long_date_time | Current date and time including timezone information. Example:2014-01-03 11:23:08 +0000 |
icinga.short_date_time | Current date and time. Example:2014-01-03 11:23:08 |
icinga.date | Current date. Example:2014-01-03 |
icinga.time | Current time including timezone information. Example:11:23:08 +0000 |
icinga.uptime | Current uptime of the Icinga 2 process. |
The following macros provide global statistics:
Name | Description |
---|---|
icinga.num_services_ok | Current number of services in state ‘OK’. |
icinga.num_services_warning | Current number of services in state ‘Warning’. |
icinga.num_services_critical | Current number of services in state ‘Critical’. |
icinga.num_services_unknown | Current number of services in state ‘Unknown’. |
icinga.num_services_pending | Current number of pending services. |
icinga.num_services_unreachable | Current number of unreachable services. |
icinga.num_services_flapping | Current number of flapping services. |
icinga.num_services_in_downtime | Current number of services in downtime. |
icinga.num_services_acknowledged | Current number of acknowledged service problems. |
icinga.num_hosts_up | Current number of hosts in state ‘Up’. |
icinga.num_hosts_down | Current number of hosts in state ‘Down’. |
icinga.num_hosts_unreachable | Current number of unreachable hosts. |
icinga.num_hosts_pending | Current number of pending hosts. |
icinga.num_hosts_flapping | Current number of flapping hosts. |
icinga.num_hosts_in_downtime | Current number of hosts in downtime. |
icinga.num_hosts_acknowledged | Current number of acknowledged host problems. |
Apply Rules¶
Several object types require an object relation, e.g.Service,Notification,Dependency,ScheduledDowntimeobjects. The object relations are documented in the linked chapters.
If you for example create a service object you have to specify thehost_nameattribute and reference an existing host attribute.
object Service "ping4" { check_command = "ping4" host_name = "icinga2-agent1.localdomain" }
This isn’t comfortable when managing a huge set of configuration objects which couldmatchon a common pattern.
Instead you want to useapplyrules.
If you want basic monitoring for all your hosts, add aping4
service apply rule for all hosts which have theaddress
attribute specified. Just one rule for 1000 hosts instead of 1000 service objects. Apply rules will automatically generate them for you.
apply Service "ping4" { check_command = "ping4" assign where host.address }
More explanations on assign where expressions can be foundhere.
Apply Rules: Prerequisites¶
Before you start with apply rules keep the following in mind:
- Define the best match.
- A set of uniquecustom variablesfor these hosts/services?
- Orgroupmemberships, e.g. a host being a member of a hostgroup which should have a service set?
- A generic patternmatchon the host/service name?
- Multiple expressions combinedwith
&&
or||
operators
- All expressions must return a boolean value (an empty string is equal to
false
e.g.)
More specific object type requirements are described in these chapters:
- Apply services to hosts
- Apply notifications to hosts and services
- Apply dependencies to hosts and services
- Apply scheduled downtimes to hosts and services
Apply Rules: Usage Examples¶
You can set/override object attributes in apply rules using the respectively available objects in that scope (host and/or service objects).
vars.application_type = host.vars.application_type
Custom variablescan also store nested dictionaries and arrays. That way you can use them for not only matching for their existence or values in apply expressions, but also assign (“inherit”) their values into the generated objected from apply rules.
Remember the examples shown forcustom variable values:
vars.notification["mail"] = { groups = [ "icingaadmins" ] }
You can do two things here:
- Check for the existence of the
notification
custom variable and its nested dictionary keymail
. If this is boolean true, the notification object will be generated. - Assign the value of the
groups
key to theuser_groups
attribute.
apply Notification "mail-icingaadmin" to Host { [...] user_groups = host.vars.notification.mail.groups assign where host.vars.notification.mail }
A more advanced example is to useapply rules with for loops on arrays or dictionariesprovided bycustom atttributesor groups.
Remember the examples shown forcustom variable values:
vars.disks["disk /"] = { disk_partitions = "/" }
You can iterate over all dictionary keys defined indisks
. You can optionally use the value to specify additional object attributes.
apply Service for (disk => config in host.vars.disks) { [...] vars.disk_partitions = config.disk_partitions }
Please read theapply for chapterfor more specific insights.
Tip
Building configuration in that dynamic way requires detailed information of the generated objects. Use the
object list
CLI commandafter successfulconfiguration validation.
Apply Rules Expressions¶
You can use simple or advanced combinations of apply rule expressions. Each expression must evaluate into the booleantrue
value. An empty string will be for instance interpreted asfalse
. In a similar fashion undefined attributes will returnfalse
.
Returnsfalse
:
assign where host.vars.attribute_does_not_exist
Multipleassign where
condition rows are evaluated asOR
condition.
您可以结合多个表达式匹配only a subset of objects. In some cases, you want to be able to add more than one assign/ignore where expression which matches a specific condition. To achieve this you can use the logicaland
andor
operators.
Apply Rules Expressions Examples¶
Assign a service to a specific host in a host grouparrayusing thein operator:
assign where "hostgroup-dev" in host.groups
Assign an object when a custom variable isequalto a value:
assign where host.vars.application_type == "database" assign where service.vars.sms_notify == true
Assign an object if a dictionarycontainsa given key:
assign where host.vars.app_dict.contains("app")
Match the host name by either using acase insensitive match:
assign where match("webserver*", host.name)
Match the host name by using aregular expression. Please note theescapedbackslash character:
assign where regex("^webserver-[\\d+]", host.name)
Matchall*mysql*
patterns in the host name and (&&
) custom variableprod_mysql_db
matches thedb-*
pattern. All hosts with the custom variabletest_server
set totrue
should be ignored, or any host name ending with*internal
pattern.
object HostGroup "mysql-server" { display_name = "MySQL Server" assign where match("*mysql*", host.name) && match("db-*", host.vars.prod_mysql_db) ignore where host.vars.test_server == true ignore where match("*internal", host.name) }
Similar example for advanced notification apply rule filters: If the service attributenotes
matchesthehas gold support 24x7
stringAND
one of the two condition passes, either thecustomer
host custom variable is set tocustomer-xy
OR
the host custom variablealways_notify
is set totrue
.
The notification is ignored for services whose host name ends with*internal
OR
thepriority
custom variable isless than2
.
“cust-xy-notification”{使用模板的通知rs = [ "noc-xy", "mgmt-xy" ] command = "mail-service-notification" } apply Notification "notify-cust-xy-mysql" to Service { import "cust-xy-notification" assign where match("*has gold support 24x7*", service.notes) && (host.vars.customer == "customer-xy" || host.vars.always_notify == true) ignore where match("*internal", host.name) || (service.vars.priority < 2 && host.vars.is_clustered == true) }
More advanced examples are coveredhere.
Apply Services to Hosts¶
The sample configuration already includes a detailed example inhosts.confandservices.conffor this use case.
The example forssh
applies a service object to all hosts with theaddress
attribute being defined and the custom variableos
set to the stringLinux
invars
.
apply Service "ssh" { import "generic-service" check_command = "ssh" assign where host.address && host.vars.os == "Linux" }
Other detailed examples are used in their respective chapters, for exampleapply services with custom command arguments.
Apply Notifications to Hosts and Services¶
Notifications are applied to specific targets (Host
orService
) and work in a similar manner:
apply Notification "mail-noc" to Service { import "mail-service-notification" user_groups = [ "noc" ] assign where host.vars.notification.mail }
In this example themail-noc
notification will be created as object for all services having thenotification.mail
custom variable defined. The notification command is set tomail-service-notification
and all members of the user groupnoc
will get notified.
It is also possible to generally apply a notification template and dynamically overwrite values from the template by checking for custom variables. This can be achieved by usingconditional statements:
apply Notification "host-mail-noc" to Host { import "mail-host-notification" // replace interval inherited from `mail-host-notification` template with new notfication interval set by a host custom variable if (host.vars.notification_interval) { interval = host.vars.notification_interval } // same with notification period if (host.vars.notification_period) { period = host.vars.notification_period } // Send SMS instead of email if the host's custom variable `notification_type` is set to `sms` if (host.vars.notification_type == "sms") { command = "sms-host-notification" } else { command = "mail-host-notification" } user_groups = [ "noc" ] assign where host.address }
In the example above the notification templatemail-host-notification
contains all relevant notification settings. The apply rule is applied on all host objects where the宿主address
is defined.
If the host object has a specific custom variable set, its value is inherited into the local notification object scope, e.g.宿主vars.notification_interval
,宿主vars.notification_period
and宿主vars.notification_type
. This overwrites attributes already specified in the importedmail-host-notification
template.
The corresponding host object could look like this:
object Host "host1" { import "host-linux-prod" display_name = "host1" address = "192.168.1.50" vars.notification_interval = 1h vars.notification_period = "24x7" vars.notification_type = "sms" }
Apply Dependencies to Hosts and Services¶
Detailed examples can be found in thedependencieschapter.
Apply Recurring Downtimes to Hosts and Services¶
The sample configuration includes an example indowntimes.conf.
Detailed examples can be found in therecurring downtimeschapter.
Using Apply For Rules¶
Next to the standard way of usingapply rulesthere is the requirement of applying objects based on a set (array or dictionary) usingapply forexpressions.
The sample configuration already includes a detailed example inhosts.confandservices.conffor this use case.
Take the following example: A host provides the snmp oids for different service check types. This could look like the following example:
object Host "router-v6" { check_command = "hostalive" address6 = "2001:db8:1234::42" vars.oids["if01"] = "1.1.1.1.1" vars.oids["temp"] = "1.1.1.1.2" vars.oids["bgp"] = "1.1.1.1.5" }
The idea is to create service objects forif01
andtemp
but notbgp
. The oid value should also be used as service custom variablesnmp_oid
. This is the command argument required by thesnmpcheck command. The service’sdisplay_name
should be set to the identifier inside the dictionary, e.g.if01
.
apply Service for (identifier => oid in host.vars.oids) { check_command = "snmp" display_name = identifier vars.snmp_oid = oid ignore where identifier == "bgp" //don't generate service for bgp checks }
Icinga 2 evaluates theapply for
rule for all objects with the custom variableoids
set. It iterates over all dictionary items inside thefor
loop and evaluates theassign/ignore where
expressions. You can access the loop variable in these expressions, e.g. to ignore specific values.
In this example thebgp
identifier is ignored. This avoids to generate unwanted services. A different approach would be to match theoid
value with aregex/wildcard matchpattern for example.
ignore where regex("^\d.\d.\d.\d.5$", oid)
Note
You don’t need an
assign where
expression which checks for the existence of theoids
custom variable.
This method saves you from creating multiple apply rules. It also moves the attribute specification logic from the service to the host.
Apply For and Custom Variable Override¶
Imagine a different more advanced example: You are monitoring your network device (host) with many interfaces (services). The following requirements/problems apply:
- Each interface service should be named with a prefix and a name defined in your host object (which could be generated from your CMDB, etc.)
- Each interface has its own VLAN tag
- Some interfaces have QoS enabled
- Additional attributes such as
display_name
ornotes
,notes_url
andaction_url
must be dynamically generated.
Tip
Define the SNMP community as global constant in yourconstants.conffile.
const IftrafficSnmpCommunity = "public"
Define theinterfaces
custom variableon thecisco-catalyst-6509-34
host object and add three example interfaces as dictionary keys.
Specify additional attributes inside the nested dictionary as learned withcustom variable values:
对象主机“思科-催化剂- 6509 - 34”{导入“创eric-host" display_name = "Catalyst 6509 #34 VIE21" address = "127.0.1.4" /* "GigabitEthernet0/2" is the interface name, * and key name in service apply for later on */ vars.interfaces["GigabitEthernet0/2"] = { /* define all custom variables with the * same name required for command parameters/arguments * in service apply (look into your CheckCommand definition) */ iftraffic_units = "g" iftraffic_community = IftrafficSnmpCommunity iftraffic_bandwidth = 1 vlan = "internal" qos = "disabled" } vars.interfaces["GigabitEthernet0/4"] = { iftraffic_units = "g" //iftraffic_community = IftrafficSnmpCommunity iftraffic_bandwidth = 1 vlan = "remote" qos = "enabled" } vars.interfaces["MgmtInterface1"] = { iftraffic_community = IftrafficSnmpCommunity vlan = "mgmt" interface_address = "127.99.0.100" #special management ip } }
Start with the apply for definition and iterate over宿主vars.interfaces
. This is a dictionary and should use the variablesinterface_name
as key andinterface_config
as value for each generated object scope.
"if-"
specifies the object name prefix for each service which results inif-
for each iteration.
/* loop over the host.vars.interfaces dictionary * for (key => value in dict) means `interface_name` as key * and `interface_config` as value. Access config attributes * with the indexer (`.`) character. */ apply Service "if-" for (interface_name => interface_config in host.vars.interfaces) {
Import thegeneric-service
template, assign theiftrafficcheck_command
. Use the dictionary keyinterface_name
to set a properdisplay_name
string for external interfaces.
import "generic-service" check_command = "iftraffic" display_name = "IF-" + interface_name
Theinterface_name
key’s value is the same string used as command parameter foriftraffic
:
/* use the key as command argument (no duplication of values in host.vars.interfaces) */ vars.iftraffic_interface = interface_name
Remember thatinterface_config
is a nested dictionary. In the first iteration it looks like this:
interface_config = { iftraffic_units = "g" iftraffic_community = IftrafficSnmpCommunity iftraffic_bandwidth = 1 vlan = "internal" qos = "disabled" }
Access the dictionary keys with theindexersyntax and assign them to custom variables used as command parameters for theiftraffic
check command.
/* map the custom variables as command arguments */ vars.iftraffic_units = interface_config.iftraffic_units vars.iftraffic_community = interface_config.iftraffic_community
If you just want to inherit all attributes specified inside theinterface_config
dictionary, add it to the generated service custom variables like this:
/* the above can be achieved in a shorter fashion if the names inside host.vars.interfaces * are the _exact_ same as required as command parameter by the check command * definition. */ vars += interface_config
If the user did not specify default values for required service custom variables, add them here. This also helps to avoid unwanted configuration validation errors or runtime failures. Please read more about conditional statementshere.
/* set a default value for units and bandwidth */ if (interface_config.iftraffic_units == "") { vars.iftraffic_units = "m" } if (interface_config.iftraffic_bandwidth == "") { vars.iftraffic_bandwidth = 1 } if (interface_config.vlan == "") { vars.vlan = "not set" } if (interface_config.qos == "") { vars.qos = "not set" }
If the host object did not specify a custom SNMP community, set a default value specified by theglobal constantIftrafficSnmpCommunity
.
/* set the global constant if not explicitely * not provided by the `interfaces` dictionary on the host */ if (len(interface_config.iftraffic_community) == 0 || len(vars.iftraffic_community) == 0) { vars.iftraffic_community = IftrafficSnmpCommunity }
Use the provided values tocalculatemore object attributes which can be e.g. seen in external interfaces.
/* Calculate some additional object attributes after populating the `vars` dictionary */ notes = "Interface check for " + interface_name + " (units: '" + interface_config.iftraffic_units + "') in VLAN '" + vars.vlan + "' with ' QoS '" + vars.qos + "'" notes_url = "https://foreman.company.com/hosts/" + host.name action_url = "https://snmp.checker.company.com/" + host.name + "/if-" + interface_name }
Tip
Building configuration in that dynamic way requires detailed information of the generated objects. Use the
object list
CLI commandafter successfulconfiguration validation.
Verify that the apply-for-rule successfully created the service objects with the inherited custom variables:
# icinga2 daemon -C # icinga2 object list --type Service --name *catalyst* Object 'cisco-catalyst-6509-34!if-GigabitEthernet0/2' of type 'Service': ...... * vars % = modified in '/etc/icinga2/conf.d/iftraffic.conf', lines 59:3-59:26 * iftraffic_bandwidth = 1 * iftraffic_community = "public" % = modified in '/etc/icinga2/conf.d/iftraffic.conf', lines 53:3-53:65 * iftraffic_interface = "GigabitEthernet0/2" % = modified in '/etc/icinga2/conf.d/iftraffic.conf', lines 49:3-49:43 * iftraffic_units = "g" % = modified in '/etc/icinga2/conf.d/iftraffic.conf', lines 52:3-52:57 * qos = "disabled" * vlan = "internal" Object 'cisco-catalyst-6509-34!if-GigabitEthernet0/4' of type 'Service': ... * vars % = modified in '/etc/icinga2/conf.d/iftraffic.conf', lines 59:3-59:26 * iftraffic_bandwidth = 1 * iftraffic_community = "public" % = modified in '/etc/icinga2/conf.d/iftraffic.conf', lines 53:3-53:65 % = modified in '/etc/icinga2/conf.d/iftraffic.conf', lines 79:5-79:53 * iftraffic_interface = "GigabitEthernet0/4" % = modified in '/etc/icinga2/conf.d/iftraffic.conf', lines 49:3-49:43 * iftraffic_units = "g" % = modified in '/etc/icinga2/conf.d/iftraffic.conf', lines 52:3-52:57 * qos = "enabled" * vlan = "remote" Object 'cisco-catalyst-6509-34!if-MgmtInterface1' of type 'Service': ... * vars % = modified in '/etc/icinga2/conf.d/iftraffic.conf', lines 59:3-59:26 * iftraffic_bandwidth = 1 % = modified in '/etc/icinga2/conf.d/iftraffic.conf', lines 66:5-66:32 * iftraffic_community = "public" % = modified in '/etc/icinga2/conf.d/iftraffic.conf', lines 53:3-53:65 * iftraffic_interface = "MgmtInterface1" % = modified in '/etc/icinga2/conf.d/iftraffic.conf', lines 49:3-49:43 * iftraffic_units = "m" % = modified in '/etc/icinga2/conf.d/iftraffic.conf', lines 52:3-52:57 % = modified in '/etc/icinga2/conf.d/iftraffic.conf', lines 63:5-63:30 * interface_address = "127.99.0.100" * qos = "not set" % = modified in '/etc/icinga2/conf.d/iftraffic.conf', lines 72:5-72:24 * vlan = "mgmt"
Use Object Attributes in Apply Rules¶
Since apply rules are evaluated after the generic objects, you can reference existing host and/or service object attributes as values for any object attribute specified in that apply rule.
object Host "opennebula-host" { import "generic-host" address = "10.1.1.2" vars.hosting["cust1"] = { http_uri = "/shop" customer_name = "Customer 1" customer_id = "7568" support_contract = "gold" } vars.hosting["cust2"] = { http_uri = "/" customer_name = "Customer 2" customer_id = "7569" support_contract = "silver" } }
hosting
is a custom variable with the Dictionary value type. This is mandatory to iterate with thekey => value
notation in the below apply for rule.
apply Service for (customer => config in host.vars.hosting) { import "generic-service" check_command = "ping4" vars.qos = "disabled" vars += config vars.http_uri = "/" + customer + "/" + config.http_uri display_name = "Shop Check for " + vars.customer_name + "-" + vars.customer_id notes = "Support contract: " + vars.support_contract + " for Customer " + vars.customer_name + " (" + vars.customer_id + ")." notes_url = "https://foreman.company.com/hosts/" + host.name action_url = "https://snmp.checker.company.com/" + host.name + "/" + vars.customer_id }
Each loop iteration has different values forcustomer
and config` in the local scope.
1.
customer = "cust 1" config = { http_uri = "/shop" customer_name = "Customer 1" customer_id = "7568" support_contract = "gold" }
2.
customer = "cust2" config = { http_uri = "/" customer_name = "Customer 2" customer_id = "7569" support_contract = "silver" }
You can now add theconfig
dictionary intovars
.
vars += config
Now it looks like the following in the first iteration:
customer = "cust 1" vars = { http_uri = "/shop" customer_name = "Customer 1" customer_id = "7568" support_contract = "gold" }
Remember, you know this structure already. Custom attributes can also be accessed by using theindexersyntax.
vars.http_uri = ... + config.http_uri
can also be written as
vars += config vars.http_uri = ... + vars.http_uri
Groups¶
A group is a collection of similar objects. Groups are primarily used as a visualization aid in web interfaces.
Group membership is defined at the respective object itself. If you have a hostgroup namewindows
for example, and want to assign specific hosts to this group for later viewing the group on your alert dashboard, first create a HostGroup object:
object HostGroup "windows" { display_name = "Windows Servers" }
Then add your hosts to this group:
template Host "windows-server" { groups += [ "windows" ] } object Host "mssql-srv1" { import "windows-server" vars.mssql_port = 1433 } object Host "mssql-srv2" { import "windows-server" vars.mssql_port = 1433 }
This can be done for service and user groups the same way:
object UserGroup "windows-mssql-admins" { display_name = "Windows MSSQL Admins" } template User "generic-windows-mssql-users" { groups += [ "windows-mssql-admins" ] } object User "win-mssql-noc" { import "generic-windows-mssql-users" email = "noc@example.com" } object User "win-mssql-ops" { import "generic-windows-mssql-users" email = "ops@example.com" }
Group Membership Assign¶
Instead of manually assigning each object to a group you can also assign objects to a group based on their attributes:
object HostGroup "prod-mssql" { display_name = "Production MSSQL Servers" assign where host.vars.mssql_port && host.vars.prod_mysql_db ignore where host.vars.test_server == true ignore where match("*internal", host.name) }
In this example all hosts with thevars
attributemssql_port
will be added as members to the host groupmssql
. However, all hostsmatchingthe string\*internal
or with thetest_server
attribute set totrue
arenotadded to this group.
Details on theassign where
syntax can be found in theLanguage Reference.
Notifications¶
Notifications for service and host problems are an integral part of your monitoring setup.
When a host or service is in a downtime, a problem has been acknowledged or the dependency logic determined that the host/service is unreachable, no notifications are sent. You can configure additional type and state filters refining the notifications being actually sent.
There are many ways of sending notifications, e.g. by email, XMPP, IRC, Twitter, etc. On its own Icinga 2 does not know how to send notifications. Instead it relies on external mechanisms such as shell scripts to notify users. More notification methods are listed in theaddons and pluginschapter.
A notification specification requires one or more users (and/or user groups) who will be notified in case of problems. These users must have all custom attributes defined which will be used in theNotificationCommand
on execution.
The usericingaadmin
in the example below will get notified only onWarning
andCritical
problems. In addition to thatRecovery
notifications are sent (they require theOK
state).
object User "icingaadmin" { display_name = "Icinga 2 Admin" enable_notifications = true states = [ OK, Warning, Critical ] types = [ Problem, Recovery ] email = "icinga@localhost" }
If you don’t set thestates
andtypes
configuration attributes for theUser
object, notifications for all states and types will be sent.
Details on troubleshooting notification problems can be foundhere.
Note
Make sure that thenotificationfeature is enabled in order to execute notification commands.
You should choose which information you (and your notified users) are interested in case of emergency, and also which information does not provide any value to you and your environment.
An example notification command is explainedhere.
You can add all shared attributes to aNotification
template which is inherited to the defined notifications. That way you’ll save duplicated attributes in eachNotification
对象。Attributes can be overridden locally.
template Notification "generic-notification" { interval = 15m command = "mail-service-notification" states = [ Warning, Critical, Unknown ] types = [ Problem, Acknowledgement, Recovery, Custom, FlappingStart, FlappingEnd, DowntimeStart, DowntimeEnd, DowntimeRemoved ] period = "24x7" }
The time period24x7
is included as example configuration with Icinga 2.
Use theapply
keyword to createNotification
objects for your services:
apply Notification "notify-cust-xy-mysql" to Service { import "generic-notification" users = [ "noc-xy", "mgmt-xy" ] assign where match("*has gold support 24x7*", service.notes) && (host.vars.customer == "customer-xy" || host.vars.always_notify == true ignore where match("*internal", host.name) || (service.vars.priority < 2 && host.vars.is_clustered == true) }
Instead of assigning users to notifications, you can also add theuser_groups
attribute with a list of user groups to theNotification
对象。Icinga 2 will send notifications to all group members.
Note
Only users who have been notified of a problem before (
Warning
,Critical
,Unknown
states for services,Down
for hosts) will receiveRecovery
notifications.
Icinga 2 v2.10 allows you to configure aUser
object withAcknowledgement
and/orRecovery
without aProblem
notification. These notifications will be sent without any problem notifications beforehand, and can be used for e.g. ticket systems.
object User "ticketadmin" { display_name = "Ticket Admin" enable_notifications = true states = [ OK, Warning, Critical ] types = [ Acknowledgement, Recovery ] email = "ticket@localhost" }
Notifications: Users from Host/Service¶
A common pattern is to store the users and user groups on the host or service objects instead of the notification object itself.
The sample configuration provided inhosts.confandnotifications.confalready provides an example for this question.
Tip
Please make sure to read theapplyandcustom variable valueschapter to fully understand these examples.
Specify the user and groups as nested custom variable on the host object:
object Host "icinga2-agent1.localdomain" { [...] vars.notification["mail"] = { groups = [ "icingaadmins" ] users = [ "icingaadmin" ] } vars.notification["sms"] = { users = [ "icingaadmin" ] } }
As you can see, there is the option to use two different notification apply rules here: One formail
and one forsms
.
This example assigns theusers
andgroups
nested keys from thenotification
custom variable to the actual notification object attributes.
Since errors are hard to debug if host objects don’t specify the required configuration attributes, you can add a safety condition which logs which host object is affected.
critical/config: Host 'icinga2-client3.localdomain' does not specify required user/user_groups configuration attributes for notification 'mail-icingaadmin'.
You can also use thescript debuggerfor more advanced insights.
apply Notification "mail-host-notification" to Host { [...] /* Log which host does not specify required user/user_groups attributes. This will fail immediately during config validation and help a lot. */ if (len(host.vars.notification.mail.users) == 0 && len(host.vars.notification.mail.user_groups) == 0) { log(LogCritical, "config", "Host '" + host.name + "' does not specify required user/user_groups configuration attributes for notification '" + name + "'.") } users = host.vars.notification.mail.users user_groups = host.vars.notification.mail.groups assign where host.vars.notification.mail && typeof(host.vars.notification.mail) == Dictionary } apply Notification "sms-host-notification" to Host { [...] /* Log which host does not specify required user/user_groups attributes. This will fail immediately during config validation and help a lot. */ if (len(host.vars.notification.sms.users) == 0 && len(host.vars.notification.sms.user_groups) == 0) { log(LogCritical, "config", "Host '" + host.name + "' does not specify required user/user_groups configuration attributes for notification '" + name + "'.") } users = host.vars.notification.sms.users user_groups = host.vars.notification.sms.groups assign where host.vars.notification.sms && typeof(host.vars.notification.sms) == Dictionary }
The example above usestypeofas safety function to ensure that themail
key really provides a dictionary as value. Otherwise the configuration validation could fail if an admin adds something like this on another host:
vars.notification.mail = "yes"
You can also do a more fine granular assignment on the service object:
apply Service "http" { [...] vars.notification["mail"] = { groups = [ "icingaadmins" ] users = [ "icingaadmin" ] } [...] }
This notification apply rule is different to the one above. The service notification users and groups are inherited from the service and if not set, from the host object. A default user is set too.
apply Notification "mail-service-notification" to Service { [...] if (service.vars.notification.mail.users) { users = service.vars.notification.mail.users } else if (host.vars.notification.mail.users) { users = host.vars.notification.mail.users } else { /* Default user who receives everything. */ users = [ "icingaadmin" ] } if (service.vars.notification.mail.groups) { user_groups = service.vars.notification.mail.groups } else if (host.vars.notification.mail.groups) { user_groups = host.vars.notification.mail.groups } assign where ( host.vars.notification.mail && typeof(host.vars.notification.mail) == Dictionary ) || ( service.vars.notification.mail && typeof(service.vars.notification.mail) == Dictionary ) }
Notification Escalations¶
当一个问题发送通知和问题still exists at the time of re-notification you may want to escalate the problem to the next support level. A different approach is to configure the default notification by email, and escalate the problem via SMS if not already solved.
You can define notification start and end times as additional configuration attributes making theNotification
object a so-callednotification escalation
. Using templates you can share the basic notification attributes such as users or theinterval
(and override them for the escalation then).
Using the example from above, you can define additional users being escalated for SMS notifications between start and end time.
object User "icinga-oncall-2nd-level" { display_name = "Icinga 2nd Level" vars.mobile = "+1 555 424642" } object User "icinga-oncall-1st-level" { display_name = "Icinga 1st Level" vars.mobile = "+1 555 424642" }
Define an additionalNotificationCommandfor SMS notifications.
Note
The example is not complete as there are many different SMS providers. Please note that sending SMS notifications will require an SMS provider or local hardware with an active SIM card.
object NotificationCommand "sms-notification" { command = [ PluginDir + "/send_sms_notification", "$mobile$", "..." }
The two new notification escalations are added onto the local host and its serviceping4
using thegeneric-notification
template. The usericinga-oncall-2nd-level
will get notified by SMS (sms-notification
command) after30m
until1h
.
Note
The
interval
was set to 15m in thegeneric-notification
template example. Lower that value in your escalations by using a secondary template or by overriding the attribute directly in thenotifications
array position forescalation-sms-2nd-level
.
If the problem does not get resolved nor acknowledged preventing further notifications, theescalation-sms-1st-level
user will be escalated1h
after the initial problem was notified, but only for one hour (2h
asend
key for thetimes
dictionary).
apply Notification "mail" to Service { import "generic-notification" command = "mail-notification" users = [ "icingaadmin" ] assign where service.name == "ping4" } apply Notification "escalation-sms-2nd-level" to Service { import "generic-notification" command = "sms-notification" users = [ "icinga-oncall-2nd-level" ] times = { begin = 30m end = 1h } assign where service.name == "ping4" } apply Notification "escalation-sms-1st-level" to Service { import "generic-notification" command = "sms-notification" users = [ "icinga-oncall-1st-level" ] times = { begin = 1h end = 2h } assign where service.name == "ping4" }
Notification Delay¶
Sometimes the problem in question should not be announced when the notification is due (the object reaching theHARD
state), but after a certain period. In Icinga 2 you can use thetimes
dictionary and setbegin = 15m
as key and value if you want to postpone the notification window for 15 minutes. Leave out theend
key – if not set, Icinga 2 will not check against any end time for this notification.
Note
Setting the
end
key to0
时立即停止发送通知吗problem occurs, effectively disabling the notification.
Make sure to specify a relatively low notificationinterval
to get notified soon enough again.
apply Notification "mail" to Service { import "generic-notification" command = "mail-notification" users = [ "icingaadmin" ] interval = 5m times.begin = 15m // delay notification window assign where service.name == "ping4" }
Also note that this mechanism doesn’t take downtimes etc. into account, only theHARD
state change time matters. E.g. for a problem which occurred in the middle of a downtime from 2 PM to 4 PMtimes.begin = 2h
means 5 PM, not 6 PM.
Disable Re-notifications¶
If you prefer to be notified only once, you can disable re-notifications by setting theinterval
attribute to0
.
apply Notification "notify-once" to Service { import "generic-notification" command = "mail-notification" users = [ "icingaadmin" ] interval = 0 // disable re-notification assign where service.name == "ping4" }
Notification Filters by State and Type¶
If there are no notification state and type filter attributes defined at theNotification
orUser
object, Icinga 2 assumes that all states and types are being notified.
Available state and type filters for notifications are:
template Notification "generic-notification" { states = [ OK, Warning, Critical, Unknown ] types = [ Problem, Acknowledgement, Recovery, Custom, FlappingStart, FlappingEnd, DowntimeStart, DowntimeEnd, DowntimeRemoved ] }
Commands¶
Icinga 2 uses three different command object types to specify how checks should be performed, notifications should be sent, and events should be handled.
Check Commands¶
CheckCommandobjects define the command line how a check is called.
CheckCommandobjects are referenced byHostandServiceobjects using thecheck_command
attribute.
Note
Make sure that thecheckerfeature is enabled in order to execute checks.
Integrate the Plugin with a CheckCommand Definition¶
Unless you have done so already, download your check plugin and put it into thePluginDirdirectory. The following example uses thecheck_mysql
plugin contained in the Monitoring Plugins package.
The plugin path and all command arguments are made a list of double-quoted string arguments for proper shell escaping.
Call thecheck_disk
plugin with the--help
parameter to see all available options. Our example defines warning (-w
) and critical (-c
) thresholds for the disk usage. Without any partition defined (-p
),它会检查所有的地方partitions.
icinga@icinga2 $ /usr/lib64/nagios/plugins/check_mysql --help ... This program tests connections to a MySQL server Usage: check_mysql [-d database] [-H host] [-P port] [-s socket] [-u user] [-p password] [-S] [-l] [-a cert] [-k key] [-C ca-cert] [-D ca-dir] [-L ciphers] [-f optfile] [-g group]
Next step is to understand howcommand parametersare being passed from a host or service object, and add aCheckCommanddefinition based on these required parameters and/or default values.
Please continue reading in theplugins sectionfor additional integration examples.
Passing Check Command Parameters from Host or Service¶
Check command parameters are defined as custom variables which can be accessed as runtime macros by the executed check command.
The check command parameters for ITL provided plugin check command definitions are documentedhere, for exampledisk.
In order to practice passing command parameters you shouldintegrate your own plugin.
The following example will usecheck_mysql
provided by theMonitoring Plugins.
定义默认检查命令定义变量,for examplemysql_user
andmysql_password
(freely definable naming schema) and optional their default threshold values. You can then use these custom variables as runtime macros forcommand argumentson the command line.
Tip
Use a common command type as prefix for your command arguments to increase readability.
mysql_user
helps understanding the context better than justuser
as argument.
The default custom variables can be overridden by the custom variables defined in the host or service using the check commandmy-mysql
. The custom variables can also be inherited from a parent template using additive inheritance (+=
).
# vim /etc/icinga2/conf.d/commands.conf对象切ckCommand "my-mysql" { command = [ PluginDir + "/check_mysql" ] //constants.conf -> const PluginDir arguments = { "-H" = "$mysql_host$" "-u" = { required = true value = "$mysql_user$" } "-p" = "$mysql_password$" "-P" = "$mysql_port$" "-s" = "$mysql_socket$" "-a" = "$mysql_cert$" "-d" = "$mysql_database$" "-k" = "$mysql_key$" "-C" = "$mysql_ca_cert$" "-D" = "$mysql_ca_dir$" "-L" = "$mysql_ciphers$" "-f" = "$mysql_optfile$" "-g" = "$mysql_group$" "-S" = { set_if = "$mysql_check_slave$" description = "Check if the slave thread is running properly." } "-l" = { set_if = "$mysql_ssl$" description = "Use ssl encryption" } } vars.mysql_check_slave = false vars.mysql_ssl = false vars.mysql_host = "$address$" }
The check command definition also setsmysql_host
to the$address$
default value. You can override this command parameter if for example your MySQL host is not running on the same server’s ip address.
Make sure pass all required command parameters, such asmysql_user
,mysql_password
andmysql_database
.MysqlUsername
andMysqlPassword
are specified asglobal constantsin this example.
# vim /etc/icinga2/conf.d/services.conf apply Service "mysql-icinga-db-health" { import "generic-service" check_command = "my-mysql" vars.mysql_user = MysqlUsername vars.mysql_password = MysqlPassword vars.mysql_database = "icinga" vars.mysql_host = "192.168.33.11" assign where match("icinga2*", host.name) ignore where host.vars.no_health_check == true }
Take a different example: The example host configuration inhosts.confalso applies anssh
service check. Your host’s ssh port is not the default22
, but set to2022
. You can pass the command parameter as custom variablessh_port
directly inside the service apply rule insideservices.conf:
apply Service "ssh" { import "generic-service" check_command = "ssh" vars.ssh_port = 2022 //custom command parameter assign where (host.address || host.address6) && host.vars.os == "Linux" }
If you prefer this being configured at the host instead of the service, modify the host configuration object instead. The runtime macro resolving order is describedhere.
object Host "icinga2-agent1.localdomain { ... vars.ssh_port = 2022 }
Passing Check Command Parameters Using Apply For¶
The hostlocalhost
with the generated services from thebasic-partitions
dictionary (seeapply forfor details) checks a basic set of disk partitions with modified custom variables (warning thresholds at10%
, critical thresholds at5%
free disk space).
The custom variabledisk_partition
can either hold a single string or an array of string values for passing multiple partitions to thecheck_disk
check plugin.
object Host "my-server" { import "generic-host" address = "127.0.0.1" address6 = "::1" vars.local_disks["basic-partitions"] = { disk_partitions = [ "/", "/tmp", "/var", "/home" ] } } apply Service for (disk => config in host.vars.local_disks) { import "generic-service" check_command = "my-disk" vars += config vars.disk_wfree = "10%" vars.disk_cfree = "5%" }
More details on using arrays in custom variables can be found inthis chapter.
Command Arguments¶
Next to the shortcommand
array specified in the command object, it is advised to define plugin/script parameters in thearguments
字典属性。
The value of the--parameter
key itself is a dictionary with additional keys. They allow to create generic command objects and are also for documentation purposes, e.g. with thedescription
field copying the plugin’s help text in there. The Icinga Director uses this field to show the argument’s purpose when selecting it.
arguments = { "--parameter" = { description = "..." value = "..." } }
Each argument is optional by default and is omitted if the value is not set.
Learn more about integrating plugins with CheckCommand objects inthis chapter.
There are additional possibilities for creating a command only once, with different parameters and arguments, shown below.
Command Arguments: Value¶
In order to find out about the command argument, call the plugin’s help or consult the README.
./check_systemd.py --help ... -u UNIT, --unit UNIT Name of the systemd unit that is beeing tested.
Whenever the long parameter name is available, prefer this over the short one.
arguments = { "--unit" = { } }
Define a uniqueprefix
for the command’s specific arguments. Best practice is to follow this schema:
_
Therefore usesystemd_
as prefix, and use the long plugin parameter nameunit
inside theruntime macrosyntax.
arguments = { "--unit" = { value = "$systemd_unit$" } }
In order to specify a default value, specify acustom variableinside the CheckCommand object.
vars.systemd_unit = "icinga2"
This value can be overridden from the host/service object as command parameters.
Command Arguments: Description¶
Best practice, also inside theITL, is to always copy the command parameter help output into thedescription
field of your check command.
Learn more about integrating plugins with CheckCommand objects inthis chapter.
With theexample above, inspect the parameter’s help text.
./check_systemd.py --help ... -u UNIT, --unit UNIT Name of the systemd unit that is beeing tested.
Copy this into the command argumentsdescription
entry.
arguments = { "--unit" = { value = "$systemd_unit$" description = "Name of the systemd unit that is beeing tested." } }
Command Arguments: Required¶
Specifies whether this command argument is required, or not. By default all arguments are optional.
Tip
Good plugins provide optional parameters in square brackets, e.g.
[-w SECONDS]
.
Therequired
field can be toggled with abooleanvalue.
arguments = { "--host" = { value = "..." description = "..." required = true } }
Whenever the check is executed and the argument is missing, Icinga logs an error. This allows to better debug configuration errors instead of sometimes unreadable plugin errors when parameters are missing.
Command Arguments: Skip Key¶
Thearguments
attribute requires a key, empty values are not allowed. To overcome this for parameters which don’t need the name in front of the value, use theskip_key
booleantoggle.
command = [ PrefixDir + "/bin/icingacli", "businessprocess", "process", "check" ] arguments = { "--process" = { value = "$icingacli_businessprocess_process$" description = "Business process to monitor" skip_key = true required = true order = -1 } }
The service specifies thecustom variableicingacli_businessprocess_process
.
vars.icingacli_businessprocess_process = "bp-shop-web"
This results in this command line without the--process
parameter:
'/bin/icingacli''businessprocess''process''check''bp-shop-web'
You can use this method to put everything into thearguments
attribute in a defined order and without keys. This avoids entries in thecommand
attributes too.
Command Arguments: Set If¶
This can be used for the following scenarios:
Parameters without value, e.g.--sni
.
command = [ PluginDir + "/check_http"] arguments = { "--sni" = { set_if = "$http_sni$" } }
Whenever a host/service object sets thehttp_sni
custom variabletotrue
, the parameter is added to the command line.
'/usr/lib64/nagios/plugins/check_http''--sni'
Numericvalues are allowed too.
Parameters with value, but additionally controlled with an extra custom variable boolean flag.
The following example is taken from thepostgresCheckCommand. The host parameter should use avalue
but only whenever thepostgres_unixsocket
custom variableis set to false.
Note:set_if
is using a runtime lambda function because the value is evaluated at runtime. This is explained inthis chapter.
command = [ PluginContribDir + "/check_postgres.pl" ] arguments = { "-H" = { value = "$postgres_host$" set_if = {{ macro("$postgres_unixsocket$") == false }} description = "hostname(s) to connect to; defaults to none (Unix socket)" }
An executed check for this host and services …
object Host "postgresql-cluster" { // ... vars.postgres_host = "192.168.56.200" vars.postgres_unixsocket = false }
… use the following command line:
'/usr/lib64/nagios/plugins/check_postgres.pl''-H''192.168.56.200'
Host/service objects which setpostgres_unixsocket
tofalse
don’t add the-H
命令行参数及其值。
References:abbreviated lambda syntax,macro.
Command Arguments: Order¶
插件可能需要参数在一个特殊的秩序。One after the other, or e.g. one parameter always in the first position.
arguments = { "--first" = { value = "..." description = "..." order = -5 } "--second" = { value = "..." description = "..." order = -4 } "--last" = { value = "..." description = "..." order = 99 } }
Keep in mind that positional arguments need to be tested thoroughly.
Command Arguments: Repeat Key¶
Parameters can useArrayas value type. Whenever Icinga encounters an array, it repeats the parameter key and each value element by default.
command = [ NscpPath + "\\nscp.exe", "client" ] arguments = { "-a" = { value = "$nscp_arguments$" description = "..." repeat_key = true } }
On a host/service object, specify thenscp_arguments
custom variableas an array.
vars.nscp_arguments = [ "exclude=sppsvc", "exclude=ShellHWDetection" ]
This translates into the following command line:
nscp.exe 'client' '-a' 'exclude=sppsvc' '-a' 'exclude=ShellHWDetection'
If the plugin requires you to pass the list without repeating the key, setrepeat_key = false
in the argument definition.
command = [ NscpPath + "\\nscp.exe", "client" ] arguments = { "-a" = { value = "$nscp_arguments$" description = "..." repeat_key = false } }
This translates into the following command line:
nscp.exe 'client' '-a' 'exclude=sppsvc' 'exclude=ShellHWDetection'
Command Arguments: Key¶
Thearguments
attribute requires unique keys. Sometimes, you’ll need to override this in the resulting command line with same key names. Therefore you can specifically override the arguments key.
arguments = { "--key1" = { value = "..." key = "-specialkey" } "--key2" = { value = "..." key = "-specialkey" } }
This results in the following command line:
'-specialkey' '...' '-specialkey' '...'
Environment Variables¶
Theenv
command object attribute specifies a list of environment variables with values calculated from custom variables which should be exported as environment variables prior to executing the command.
This is useful for example for hiding sensitive information on the command line output when passing credentials to database checks:
object CheckCommand "mysql" { command = [ PluginDir + "/check_mysql" ] arguments = { "-H" = "$mysql_address$" "-d" = "$mysql_database$" } vars.mysql_address = "$address$" vars.mysql_database = "icinga" vars.mysql_user = "icinga_check" vars.mysql_pass = "password" env.MYSQLUSER = "$mysql_user$" env.MYSQLPASS = "$mysql_pass$" }
The executed command line visible withps
ortop
looks like this and hides the database credentials in the user’s environment.
/usr/lib/nagios/plugins/check_mysql-H192.168.56.101-dicinga
Note
If the CheckCommand also supports setting the parameter in the command line, ensure to use a different name for the custom variable. Otherwise Icinga 2 adds the command line parameter.
If a specific CheckCommand object provided with theIcinga Template Libraryneeds additional environment variables, you can import it into a new custom CheckCommand object and add additionalenv
keys. Example for themysql_healthCheckCommand:
object CheckCommand "mysql_health_env" { import "mysql_health" // https://labs.consol.de/nagios/check_mysql_health/ env.NAGIOS__SERVICEMYSQL_USER = "$mysql_health_env_username$" env.NAGIOS__SERVICEMYSQL_PASS = "$mysql_health_env_password$" }
Specify the custom variablesmysql_health_env_username
andmysql_health_env_password
in the service object then.
Note
Keep in mind that the values are still visible with thedebug consoleand the inspect mode in theIcinga Director.
You can also set global environment variables in the application’s sysconfig configuration file, e.g.HOME
or specific library paths for Oracle. Beware that these environment variables can be used by any CheckCommand object and executed plugin and can leak sensitive information.
Notification Commands¶
NotificationCommandobjects define how notifications are delivered to external interfaces (email, XMPP, IRC, Twitter, etc.).NotificationCommandobjects are referenced byNotificationobjects using thecommand
attribute.
Note
Make sure that thenotificationfeature is enabled in order to execute notification commands.
虽然可以指定整个notification command right in the NotificationCommand object it is generally advisable to create a shell script in the/etc/icinga2/scripts
directory and have the NotificationCommand object refer to that.
A fresh Icinga 2 install comes with with two example scripts for host and service notifications by email. Based on the Icinga 2 runtime macros (such as$service.output$
当前检查输出森)是可能的d email to the user(s) associated with the notification itself ($user.email$
). Feel free to take these scripts as a starting point for your own individual notification solution - and keep in mind that nearly everything is technically possible.
Information needed to generate notifications is passed to the scripts as arguments. The NotificationCommand objectsmail-host-notification
andmail-service-notification
correspond to the shell scriptsmail-host-notification.sh
andmail-service-notification.sh
in/etc/icinga2/scripts
and define default values for arguments. These defaults can always be overwritten locally.
Note
This example requires the
Depending on the distribution, you need a local mail transfer agent (MTA) such as Postfix, Exim or Sendmail in order to send emails.
These tools virtually provide the
mail-host-notification¶
Themail-host-notification
NotificationCommand object uses the example notification script located in/etc/icinga2/scripts/mail-host-notification.sh
.
Here is a quick overview of the arguments that can be used. See alsohost runtime macrosfor further information.
Name | Description |
---|---|
notification_date |
Required.Date and time. Defaults to$icinga.long_date_time$ . |
notification_hostname |
Required.The host’sFQDN . Defaults to$host.name$ . |
notification_hostdisplayname |
Required.The host’s display name. Defaults to$host.display_name$ . |
notification_hostoutput |
Required.Output from host check. Defaults to$host.output$ . |
notification_useremail |
Required.The notification’s recipient(s). Defaults to$user.email$ . |
notification_hoststate |
Required.Current state of host. Defaults to$host.state$ . |
notification_type |
Required.Type of notification. Defaults to$notification.type$ . |
notification_address |
Optional.The host’s IPv4 address. Defaults to$address$ . |
notification_address6 |
Optional.The host’s IPv6 address. Defaults to$address6$ . |
notification_author |
Optional.Comment author. Defaults to$notification.author$ . |
notification_comment |
Optional.Comment text. Defaults to$notification.comment$ . |
notification_from |
Optional.Define a valid From: string (e.g."Icinga 2 Host Monitoring ). RequiresGNU mailutils (Debian/Ubuntu) ormailx (RHEL/SUSE). |
notification_icingaweb2url |
Optional.Define URL to your Icinga Web 2 (e.g."https://www.example.com/icingaweb2" ) |
notification_logtosyslog |
Optional.Settrue to log notification events to syslog; useful for debugging. Defaults tofalse . |
mail-service-notification¶
Themail-service-notification
NotificationCommand object uses the example notification script located in/etc/icinga2/scripts/mail-service-notification.sh
.
Here is a quick overview of the arguments that can be used. See alsoservice runtime macrosfor further information.
Name | Description |
---|---|
notification_date |
Required.Date and time. Defaults to$icinga.long_date_time$ . |
notification_hostname |
Required.The host’sFQDN . Defaults to$host.name$ . |
notification_servicename |
Required.The service name. Defaults to$service.name$ . |
notification_hostdisplayname |
Required.Host display name. Defaults to$host.display_name$ . |
notification_servicedisplayname |
Required.Service display name. Defaults to$service.display_name$ . |
notification_serviceoutput |
Required.Output from service check. Defaults to$service.output$ . |
notification_useremail |
Required.The notification’s recipient(s). Defaults to$user.email$ . |
notification_servicestate |
Required.Current state of host. Defaults to$service.state$ . |
notification_type |
Required.Type of notification. Defaults to$notification.type$ . |
notification_address |
Optional.The host’s IPv4 address. Defaults to$address$ . |
notification_address6 |
Optional.The host’s IPv6 address. Defaults to$address6$ . |
notification_author |
Optional.Comment author. Defaults to$notification.author$ . |
notification_comment |
Optional.Comment text. Defaults to$notification.comment$ . |
notification_from |
Optional.Define a valid From: string (e.g."Icinga 2 Host Monitoring ). RequiresGNU mailutils (Debian/Ubuntu) ormailx (RHEL/SUSE). |
notification_icingaweb2url |
Optional.Define URL to your Icinga Web 2 (e.g."https://www.example.com/icingaweb2" ) |
notification_logtosyslog |
Optional.Settrue to log notification events to syslog; useful for debugging. Defaults tofalse . |
Dependencies¶
Icinga 2 uses host and serviceDependencyobjects for determining their network reachability.
A service can depend on a host, and vice versa. A service has an implicit dependency (parent) to its host. A host to host dependency acts implicitly as host parent relation. When dependencies are calculated, not only the immediate parent is taken into account but all parents are inherited.
Theparent_host_name
andparent_service_name
attributes are mandatory for service dependencies,parent_host_name
is required for host dependencies.Apply ruleswill allow you todetermine these attributesin a more dynamic fashion if required.
parent_host_name = "core-router" parent_service_name = "uplink-port"
Notifications are suppressed by default if a host or service becomes unreachable. You can control that option by defining thedisable_notifications
attribute.
disable_notifications = false
If the dependency should be triggered in the parent object’s soft state, you need to setignore_soft_states
tofalse
.
The dependency state filter must be defined based on the parent object being either a host (Up
,Down
) or a service (OK
,Warning
,Critical
,Unknown
).
The following example will make the dependency fail and trigger it if the parent object isnotin one of these states:
states = [ OK, Critical, Unknown ]
In other words
If the parent service object changes into the
Warning
state, this dependency will fail and render all child objects (hosts or services) unreachable.
You can determine the child’s reachability by querying thelast_reachable
attribute via theREST API.
Note
Reachability calculation depends on fresh and processed check results. If dependencies disable checks for child objects, this won’t work reliably.
Implicit Dependencies for Services on Host¶
Icinga 2 automatically adds an implicit dependency for services on their host. That way service notifications are suppressed when a host isDOWN
orUNREACHABLE
. This dependency does not overwrite other dependencies and implicitly setsdisable_notifications = true
andstates = [ Up ]
for all service objects.
Service checks are still executed. If you want to prevent them from happening, you can apply the following dependency to all services setting their host asparent_host_name
and disabling the checks.assign where true
matches on allService
objects.
apply Dependency "disable-host-service-checks" to Service { disable_checks = true assign where true }
Dependencies for Network Reachability¶
A common scenario is the Icinga 2 server behind a router. Checking internet access by pinging the Google DNS servergoogle-dns
is a common method, but will fail in case thedsl-router
host is down. Therefore the example below defines a host dependency which acts implicitly as parent relation too.
Furthermore the host may be reachable but ping probes are dropped by the router’s firewall. In case thedsl-router
‘sping4
service check fails, all further checks for theping4
service on hostgoogle-dns
service should be suppressed. This is achieved by setting thedisable_checks
attribute totrue
.
object Host "dsl-router" { import "generic-host" address = "192.168.1.1" } object Host "google-dns" { import "generic-host" address = "8.8.8.8" } apply Service "ping4" { import "generic-service" check_command = "ping4" assign where host.address } apply Dependency "internet" to Host { parent_host_name = "dsl-router" disable_checks = true disable_notifications = true assign where host.name != "dsl-router" } apply Dependency "internet" to Service { parent_host_name = "dsl-router" parent_service_name = "ping4" disable_checks = true assign where host.name != "dsl-router" }
Apply Dependencies based on Custom Variables¶
You can useapply rulesto set parent or child attributes, e.g.parent_host_name
to other objects’ attributes.
A common example are virtual machines hosted on a master. The object name of that master is auto-generated from your CMDB or VMWare inventory into the host’s custom variables (or a generic template for your cloud).
Define your master host object:
/* your master */ object Host "master.example.com" { import "generic-host" }
Add a generic template defining all common host attributes:
/* generic template for your virtual machines */ template Host "generic-vm" { import "generic-host" }
Add a template for all hosts on your example.com cloud setting custom variablevm_parent
tomaster.example.com
:
template Host "generic-vm-example.com" { import "generic-vm" vars.vm_parent = "master.example.com" }
Define your guest hosts:
object Host "www.example1.com" { import "generic-vm-master.example.com" } object Host "www.example2.com" { import "generic-vm-master.example.com" }
Apply the host dependency to all child hosts importing thegeneric-vm
template and set theparent_host_name
to the previously defined custom variable宿主vars.vm_parent
.
apply Dependency "vm-host-to-parent-master" to Host { parent_host_name = host.vars.vm_parent assign where "generic-vm" in host.templates }
You can extend this example, and make your services depend on themaster.example.com
host too. Their local scope allows you to use宿主vars.vm_parent
similar to the example above.
apply Dependency "vm-service-to-parent-master" to Service { parent_host_name = host.vars.vm_parent assign where "generic-vm" in host.templates }
That way you don’t need to wait for your guest hosts becoming unreachable when the master host goes down. Instead the services will detect their reachability immediately when executing checks.
Note
This method with setting locally scoped variables only works in apply rules, but not in object definitions.
Dependencies for Agent Checks¶
Another good example are agent based checks. You would define a health check for the agent daemon responding to your requests, and make all other services querying that daemon depend on that health check.
apply Service "agent-health" { check_command = "cluster-zone" display_name = "cluster-health-" + host.name /* This follows the convention that the agent zone name is the FQDN which is the same as the host object name. */ vars.cluster_zone = host.name assign where host.vars.agent_endpoint }
Now, make all other agent based checks dependent on the OK state of theagent-health
service.
apply Dependency "agent-health-check" to Service { parent_service_name = "agent-health" states = [ OK ] // Fail if the parent service state switches to NOT-OK disable_notifications = true assign where host.vars.agent_endpoint // Automatically assigns all agent endpoint checks as child services on the matched host ignore where service.name == "agent-health" // Avoid a self reference from child to parent }
This is described in detail inthis chapter.
Event Commands¶
Unlike notifications, event commands for hosts/services are called on every check execution if one of these conditions matches:
- The host/service is in asoft state
- The host/service state changes into ahard state
- The host/service state recovers from asoft or hard statetoOK/Up
EventCommandobjects are referenced byHostandServiceobjects with theevent_command
attribute.
Therefore theEventCommand
object should define a command line evaluating the current service state and other service runtime attributes available through runtime variables. Runtime macros such as$service.state_type$
and$service.state$
will be processed by Icinga 2 and help with fine-granular triggered events
If the host/service is located on a client ascommand endpointthe event command will be executed on the client itself (similar to the check command).
Common use case scenarios are a failing HTTP check which requires an immediate restart via event command. Another example would be an application that is not responding and therefore requires a restart. You can also use event handlers to forward more details on state changes and events than the typical notification alerts provide.
Use Event Commands to Send Information from the Master¶
This example sends a web request from the master node to an external tool for every event triggered on abusinessprocess
service.
Define anEventCommandobjectsend_to_businesstool
which sends state changes to the external tool.
object EventCommand "send_to_businesstool" { command = [ "/usr/bin/curl", "-s", "-X PUT" ] arguments = { "-H" = { value ="$businesstool_url$" skip_key = true } "-d" = "$businesstool_message$" } vars.businesstool_url = "http://localhost:8080/businesstool" vars.businesstool_message = "$host.name$ $service.name$ $service.state$ $service.state_type$ $service.check_attempt$" }
Set theevent_command
attribute tosend_to_businesstool
on the Service.
object Service "businessprocess" { host_name = "businessprocess" check_command = "icingacli-businessprocess" vars.icingacli_businessprocess_process = "icinga" vars.icingacli_businessprocess_config = "training" event_command = "send_to_businesstool" }
In order to test this scenario you can run:
数控-l8080
This allows to catch the web request. You can also enable thedebug logand search for the event command execution log message.
tail-f/var/log/icinga2/debug.log|grepEventCommand
Feed in a check result via REST API actionprocess-check-resultor via Icinga Web 2.
Expected Result:
# nc -l 8080 PUT /businesstool HTTP/1.1 User-Agent: curl/7.29.0 Host: localhost:8080 Accept: */* Content-Length: 47 Content-Type: application/x-www-form-urlencoded businessprocess businessprocess CRITICAL SOFT 1
Use Event Commands to Restart Service Daemon via Command Endpoint on Linux¶
This example triggers a restart of thehttpd
service on the local system when theprocs
service check executed via Command Endpoint fails. It only triggers if the service state isCritical
and attempts to restart the service before a notification is sent.
Requirements:
- Icinga 2 as client on the remote node
- icinga user with sudo permissions to the httpd daemon
Example on CentOS 7:
# visudo icinga ALL=(ALL) NOPASSWD: /usr/bin/systemctl restart httpd
Note: Distributions might use a different name. On Debian/Ubuntu the service is calledapache2
.
Define anEventCommandobjectrestart_service
which allows to trigger local service restarts. Put it into aglobal zoneto sync its configuration to all clients.
[root@icinga2-master1.localdomain /]# vim /etc/icinga2/zones.d/global-templates/eventcommands.conf object EventCommand "restart_service" { command = [ PluginDir + "/restart_service" ] arguments = { "-s" = "$service.state$" "-t" = "$service.state_type$" "-a" = "$service.check_attempt$" "-S" = "$restart_service$" } vars.restart_service = "$procs_command$" }
This event command triggers the following script which restarts the service. The script only is executed if the service state isCRITICAL
. Warning and Unknown states are ignored as they indicate not an immediate failure.
(root@icinga2-agent1.localdomain /) # vim /usr/lib64/nagios/plugins/restart_service #!/bin/bash while getopts "s:t:a:S:" opt; do case $opt in s) servicestate=$OPTARG ;; t) servicestatetype=$OPTARG ;; a) serviceattempt=$OPTARG ;; S) service=$OPTARG ;; esac done if ( [ -z $servicestate ] || [ -z $servicestatetype ] || [ -z $serviceattempt ] || [ -z $service ] ); then echo "USAGE: $0 -s servicestate -z servicestatetype -a serviceattempt -S service" exit 3; else # Only restart on the third attempt of a critical event if ( [ $servicestate == "CRITICAL" ] && [ $servicestatetype == "SOFT" ] && [ $serviceattempt -eq 3 ] ); then sudo /usr/bin/systemctl restart $service fi fi [root@icinga2-agent1.localdomain /]# chmod +x /usr/lib64/nagios/plugins/restart_service
Add a service on the master node which is executed via command endpoint on the client. Set theevent_command
attribute torestart_service
, the name of the previously defined EventCommand object.
[root@icinga2-master1.localdomain /]# vim /etc/icinga2/zones.d/master/icinga2-agent1.localdomain.conf object Service "Process httpd" { check_command = "procs" event_command = "restart_service" max_check_attempts = 4 host_name = "icinga2-agent1.localdomain" command_endpoint = "icinga2-agent1.localdomain" vars.procs_command = "httpd" vars.procs_warning = "1:10" vars.procs_critical = "1:" }
In order to test this configuration just stop thehttpd
on the remote hosticinga2-agent1.localdomain
.
[root@icinga2-agent1.localdomain /]# systemctl stop httpd
You can enable thedebug logand search for the executed command line.
[root@icinga2-agent1.localdomain /]# tail -f /var/log/icinga2/debug.log | grep restart_service
Use Event Commands to Restart Service Daemon via Command Endpoint on Windows¶
This example triggers a restart of thehttpd
service on the remote system when theservice-windows
service check executed via Command Endpoint fails. It only triggers if the service state isCritical
and attempts to restart the service before a notification is sent.
Requirements:
- Icinga 2 as client on the remote node
- Icinga 2 service with permissions to execute Powershell scripts (which is the default)
Define anEventCommandobjectrestart_service-windows
which allows to trigger local service restarts. Put it into aglobal zoneto sync its configuration to all clients.
[root@icinga2-master1.localdomain /]# vim /etc/icinga2/zones.d/global-templates/eventcommands.conf object EventCommand "restart_service-windows" { command = [ "C:\\Windows\\SysWOW64\\WindowsPowerShell\\v1.0\\powershell.exe", PluginDir + "/restart_service.ps1" ] arguments = { "-ServiceState" = "$service.state$" "-ServiceStateType" = "$service.state_type$" "-ServiceAttempt" = "$service.check_attempt$" "-Service" = "$restart_service$" "; exit" = { order = 99 value = "$$LASTEXITCODE" } } vars.restart_service = "$service_win_service$" }
This event command triggers the following script which restarts the service. The script only is executed if the service state isCRITICAL
. Warning and Unknown states are ignored as they indicate not an immediate failure.
Add therestart_service.ps1
Powershell script intoC:\Program Files\Icinga2\sbin
:
param( [string]$Service = '', [string]$ServiceState = '', [string]$ServiceStateType = '', [int]$ServiceAttempt = '' ) if (!$Service -Or !$ServiceState -Or !$ServiceStateType -Or !$ServiceAttempt) { $scriptName = GCI $MyInvocation.PSCommandPath | Select -Expand Name; Write-Host "USAGE: $scriptName -ServiceState servicestate -ServiceStateType servicestatetype -ServiceAttempt serviceattempt -Service service" -ForegroundColor red; exit 3; } # Only restart on the third attempt of a critical event if ($ServiceState -eq "CRITICAL" -And $ServiceStateType -eq "SOFT" -And $ServiceAttempt -eq 3) { Restart-Service $Service; } exit 0;
Add a service on the master node which is executed via command endpoint on the client. Set theevent_command
attribute torestart_service-windows
, the name of the previously defined EventCommand object.
[root@icinga2-master1.localdomain /]# vim /etc/icinga2/zones.d/master/icinga2-agent2.localdomain.conf object Service "Service httpd" { check_command = "service-windows" event_command = "restart_service-windows" max_check_attempts = 4 host_name = "icinga2-agent2.localdomain" command_endpoint = "icinga2-agent2.localdomain" vars.service_win_service = "httpd" }
In order to test this configuration just stop thehttpd
on the remote hosticinga2-agent1.localdomain
.
C:> net stop httpd
You can enable thedebug logand search for the executed command line inC:\ProgramData\icinga2\var\log\icinga2\debug.log
.
Use Event Commands to Restart Service Daemon via SSH¶
This example triggers a restart of thehttpd
daemon via SSH when thehttp
service check fails.
Requirements:
- SSH connection allowed (firewall, packet filters)
- icinga user with public key authentication
- icinga user with sudo permissions to restart the httpd daemon.
Example on Debian:
# ls /home/icinga/.ssh/ authorized_keys # visudo icinga ALL=(ALL) NOPASSWD: /etc/init.d/apache2 restart
Define a genericEventCommandobjectevent_by_ssh
which can be used for all event commands triggered using SSH:
[root@icinga2-master1.localdomain /]# vim /etc/icinga2/zones.d/master/local_eventcommands.conf /* pass event commands through ssh */ object EventCommand "event_by_ssh" { command = [ PluginDir + "/check_by_ssh" ] arguments = { "-H" = "$event_by_ssh_address$" "-p" = "$event_by_ssh_port$" "-C" = "$event_by_ssh_command$" "-l" = "$event_by_ssh_logname$" "-i" = "$event_by_ssh_identity$" "-q" = { set_if = "$event_by_ssh_quiet$" } "-w" = "$event_by_ssh_warn$" "-c" = "$event_by_ssh_crit$" "-t" = "$event_by_ssh_timeout$" } vars.event_by_ssh_address = "$address$" vars.event_by_ssh_quiet = false }
The actual event command only passes theevent_by_ssh_command
attribute. Theevent_by_ssh_service
custom variable takes care of passing the correct daemon name, whiletest $service.state_id$ -gt 0
makes sure that the daemon is only restarted when the service is not in anOK
state.
object EventCommand "event_by_ssh_restart_service" { import "event_by_ssh" //only restart the daemon if state > 0 (not-ok) //requires sudo permissions for the icinga user vars.event_by_ssh_command = "test $service.state_id$ -gt 0 && sudo systemctl restart $event_by_ssh_service$" }
Now set theevent_command
attribute toevent_by_ssh_restart_service
and tell it which service should be restarted using theevent_by_ssh_service
attribute.
apply Service "http" { import "generic-service" check_command = "http" event_command = "event_by_ssh_restart_service" vars.event_by_ssh_service = "$host.vars.httpd_name$" //vars.event_by_ssh_logname = "icinga" //vars.event_by_ssh_identity = "/home/icinga/.ssh/id_rsa.pub" assign where host.vars.httpd_name }
Specify thehttpd_name
custom variable on the host to assign the service and set the event handler service.
object Host "remote-http-host" { import "generic-host" address = "192.168.1.100" vars.httpd_name = "apache2" }
In order to test this configuration just stop thehttpd
on the remote hosticinga2-agent1.localdomain
.
[root@icinga2-agent1.localdomain /]# systemctl stop httpd
You can enable thedebug logand search for the executed command line.
[root@icinga2-agent1.localdomain /]# tail -f /var/log/icinga2/debug.log | grep by_ssh