Network and server monitoring is the process of overseeing infrastructure and its parameters to gain a complete understanding of how to manage IT infrastructure. Monitoring network devices allows for early identification of issues, prevention of failures, and maintenance of high availability and performance of systems. This gives administrators better insight into the functioning of the entire infrastructure and enables them to respond promptly to potential problems.
Network monitoring plays a key role in managing IT infrastructure. When properly configured, it not only helps in the rapid detection and resolution of problems but also in their prevention. Preventive measures are made possible through constant access to data on the performance of networks, servers, applications, and network devices.
Various tools and protocols are used for network monitoring. The most popular of these is SNMP (Simple Network Management Protocol), which allows administrators to collect and analyze data from network devices such as routers, switches, servers, and end devices. Using SNMP, information about the status of devices, their load, and the detection of issues can be gathered.
Data collected via SNMP can be supplemented with data from other sources, such as system logs (syslog) or NetFlow, which allow for the analysis of network traffic. However, merely collecting data is not sufficient; it is crucial to process, analyze, and visualize it properly.
To achieve this, administrators utilize a variety of tools that assist in monitoring infrastructure. There are many commercial and open-source solutions available on the market that support IT administrators’ work. Examples of such tools include LibreNMS, Nagios, SolarWinds, and Zabbix.
Zabbix is a comprehensive open-source software solution for monitoring IT infrastructure that allows monitoring the status and performance of various components such as networks, servers, virtual machines, and cloud services. Zabbix supports a wide range of devices, from networking hardware to servers and virtual machines, and can also monitor devices like UPSs, IoT sensors (e.g., thermometers, entrance counters, humidity measuring devices), and other devices connected to the network.
Zabbix enables not only the collection and analysis of data but also its visualization, reporting, and alerting in case of problems. A significant advantage of this tool is its openness—Zabbix is an open-source project, meaning that every user has access to the source code and a large community that actively supports the software’s development. Its large user base and well-prepared documentation make Zabbix user-friendly and flexible in configuration.
Zabbix consists of three main modules:
Figure 1. Zabbix Architecture
An additional element of the architecture can be the Zabbix Proxy, which collects data on network performance and availability on behalf of the Zabbix server. With this architecture, Zabbix becomes a highly scalable application. In large installations, when the Zabbix server or proxy requires more resources, another Zabbix Proxy can be added to collect data from another part of the network.
The Zabbix Agent can be installed on various operating systems, including Linux, Windows, and macOS.
Figure 2. Available Zabbix Installers by Platform
The software can operate on both physical machines and in virtual environments, and it can also be deployed in the cloud or in containers(e.g., Docker). Depending on the chosen environment and the number of monitored devices, the resource requirements will vary.
To install Zabbix, appropriate physical resources will be needed, such as CPU, RAM, and disk space. These resources depend on the number of monitored devices and the amount of data collected. For example, in the case of larger installations, it is recommended to use more powerful servers and ample disk space for monitoring history storage.
Figure 3. Recommended Physical Parameters by the Manufacturer
The disk on which the data is stored must have an appropriate size, depending on how long we want to retain historical data and how large that data will be. The required disk space can be calculated using a formula that takes into account the configuration file, history, trends, and events. The size of each parameter can be determined using:
Figure 4. Formula for Required Disk Space
Trends: A built-in mechanism in Zabbix that allows for the reduction of historical data. It stores minimum, maximum, average values, and the total count of values for each hour for numerical data. Trends help decrease the amount of stored data without losing information about long-term performance changes.
History: Stores every collected value, which means it is more resource-intensive than trends. History is useful when detailed information about each event is required.
Events: Generated by triggers in the Zabbix system. Each event is recorded in the database, allowing for tracking when and why a particular issue occurred. The amount of space allocated for events depends on the number of alarms generated in the system.
The choice of the database where the data will be stored depends on the preferences and experience of the administrator.
Now that we have discussed the physical requirements, we should also mention network communication. The default values are as follows:
Figure 5. Network Communication for the Zabbix Application
In Zabbix, a “host” refers to any physical or virtual device, application, service, or any logically related set of monitored parameters. To add a new host, navigate to the Configuration tab => Hosts => Create host.
Figure 6. Configuration of a New Host
The value of “Host name” must be unique for each object created in Zabbix. When creating a host, we have the option to assign it to the appropriate host group, which will facilitate future configuration. Therefore, before proceeding with the configuration, we should analyze our needs and consider what groups will be created. If we are operating in a distributed architecture, we can also choose which proxy server will be responsible for collecting data from the host.
Depending on whether we will monitor our object using an agent or SNMP, we select the appropriate option and provide the IP address of our host. Once the configuration is complete, we click Add.
Host groups allow for the grouping of hosts of the same type. In the future, a template can be assigned to a particular group instead of doing so individually for each host. If we choose not to use host groups, we can also assign the appropriate template tag to the host.
If we have a file with previously collected hosts, there is an option to import them into Zabbix using a file.
An item is an individual metric used for data collection. After configuring a host, an item must be added to obtain actual data. One way to quickly add multiple items is to assign one of the predefined templates to the host. However, to optimize system performance, it may be necessary to fine-tune the templates to ensure there are only as many items and as frequent monitoring as needed.
Items can be created from the host configuration level or from the template level. To create a new item, navigate to Configuration => Hosts => host_name => Items => Create Item.
Figure 7. Configuration of a New Item
Each item must have a unique name. One item can be used for multiple hosts. Depending on the needs, the item type can be customized, such as data collected by an agent, SNMP, or other data sources. We can individually adjust the data retention length or leave the default global settings. To complete the configuration, we click Add.
Figure 8. Example List of Item Types
A trigger is a logical expression that “evaluates” the data collected by items and represents the current state of the system. Triggers allow for the definition of a threshold, determining what state of the data is “acceptable.” If the data exceeds the acceptable threshold, the trigger will be “activated” and change its status to PROBLEM.
To create a new trigger, follow the same steps as for an item: Configuration => Hosts => host_name => Triggers => Create Trigger.
Figure 9. Configuration of a Trigger
We need to create a unique name for the trigger and select the appropriate severity, which defines the importance of the problem in the system. The most challenging part is creating the correct expression that will evaluate the state of the collected data. We can use the expression wizard, which simplifies the task, or create it manually. For example, we can configure a trigger that responds to a lack of response from three consecutive pings to the device. We can also set a recovery expression that defines the conditions for resolving the trigger. To finish, we click Add or Update if we are editing an existing trigger.
In Zabbix, an event is a record of changes in the state of a monitored item or trigger. Events are a key element of the monitoring system as they log when specific changes occurred, allowing for precise tracking of issues and the system’s responses to these problems.
An example of an event is a trigger event—every time a trigger changes its status (OK → PROBLEM → OK), an event is generated. All generated events can be viewed in the Monitoring => Problems tab.
Figure 10. Example of a Generated Event
With a large amount of data flowing into Zabbix, it is significantly easier for users to analyze the data if they can view a visual representation of the situation rather than just numbers. In this case, graphs come into play. Graphs allow for a quick understanding of data flow, correlating problems, discovering when something began, or determining when something might escalate into an issue.
Figure 11. Creating a New Graph
Creating a new graph is done from the following path: Configuration => Hosts => host_name => Graphs => Create Graph.
Figure 12. Example of a Created Graph
In addition to a unique name for the graph, we can configure its size and select the item based on which the graph will be created. We can also add a legend to the graph, and if there is a configured trigger that responds to the exceeding of certain values, this will be noted on the graph as an event.
Figure 13. Example of a Created Graph for RAM Usage of a Virtual Machine
Maps, screens, and dashboards allow for the visualization of several or multiple graphs and events in one place. A dashboard serves as a central location where we can present the status of the entire network.
Figure 14. Dashboard for Assessing the Status of the Network
Maps allow for the graphical grouping of hosts. An example of a map could be, for instance, a map of physical connections between network devices, which represents the topology.
Figure 15. Graphical Topology of Network Device Connections
Depending on the status of the device and the triggered triggers, the color of the devices on the map changes. We can also create nested maps that allow navigation between different maps by clicking on a specific device or group of devices.
Figure 16. Higher-Level Map
In Figure 16, the Higher-Level Map hides subsequent lower-level maps. A problem triggered in one of the locations will be displayed on the global map.
Screens are nothing more than a slideshow composed of selected maps, allowing us to create a sequence of successive maps or dashboards on the monitoring screen.
Figure 17. Creating a Screen
Templates are a useful tool for simplifying the administrator’s work. In templates, you can define the values of items, graphs, and triggers, which will be automatically assigned to devices or virtual machines that are added to them. This way, we don’t have to configure variables separately for each host, but rather for a group of hosts. It’s advisable to consider the division of devices before starting the configuration of Zabbix to effectively utilize templates. Nested templates can also be created within templates.
Figure 18. Example of Templates with Defined Values
Zabbix supports the creation of macros. Macros are variables that can be defined in any way. They assign a specific value depending on the context. Using macros saves time and simplifies configuration. Macros can be used, for example, in items, such as “item.key[server_{HOST.HOST}_local]”. Effective use of macros makes configuration clearer.
All Zabbix users access the application through the web interface. Each user is assigned a unique login name and password. User accounts can be defined locally or, for example, using LDAP. Communication between the user and the Zabbix web server is secured using the SSL protocol.
Figure 19. List of Created Users
The Zabbix agent can be deployed on the monitored device to actively monitor local resources and applications, such as hard drives, memory, CPU statistics, etc. The agent collects operational data locally and sends it to the Zabbix server for further processing. In the event of a failure (e.g., disk overflow or malfunctioning processes), the Zabbix server can immediately notify administrators of the problem. From the agent, we can also configure active monitoring tasks, such as executing the fping command to assess whether the machine can communicate with the Internet.
Figure 20. List of available agents depending on the system
Figure 21. Checking with the Zabbix agent whether the virtual machine’s system is not in ReadOnly state
Figure 22. Example of Zabbix agent configuration
Zabbix Proxy is a module that can collect performance and availability data on behalf of the Zabbix server. The proxy can take on some of the load from the Zabbix server, alleviating it, and is invaluable in the case of distributed installations. With the proxy, we can centralize monitoring from multiple locations, and all data is sent to a single Zabbix server.
Figure 23. Example of using a proxy server
Instead of manually adding hosts or agents, Zabbix also offers a host auto-discovery feature. This can be achieved using SNMP by scanning a specific subnet or through agent auto-registration. Auto-discovery automates the process of adding new hosts to the system.
Figure 24. Setting up auto-discovery rules
Zabbix provides an API that enables automation of system configuration and interaction. The Zabbix API operates based on HTTP requests and data encoded in JSON format. It can be used for automatically creating hosts, items, triggers, and generating reports.
In the Monitoring => Latest Data tab, we can check the most recent data received from agents or SNMP. This tab is useful for verifying the accuracy of the data received, such as resource usage information for virtual machines. Here, you can also check whether the data is arriving on time and if there are any communication issues.
Figure 25. Data received regarding disk I/O on virtual machines
The Reports tab allows users to generate custom-defined reports. With these reports, administrators can assess the state of the network, see how many triggers have been fired, understand which SLAs are being met, and evaluate the downtime of individual resources. An example of a report is the SLA report, which provides information about the availability levels of services.
Figure 26. Example SLA Report
Zabbix supports a wide range of integrations with third-party systems. If a specific integration is not officially supported, users can benefit from community assistance, which offers numerous plugins and scripts. These integrations allow alerts to be sent to various tools, such as email, SMS, or messaging applications. Zabbix can be integrated with LDAP for centralized user management, as well as with SMTP servers for sending email alerts. It is also possible to send SMS notifications through telecommunications operators.
Zabbix is a powerful and versatile tool for monitoring IT infrastructure, providing full control over various components, from networks to applications and servers. With its modular architecture, flexible configuration options, and the ability to integrate with other systems, Zabbix meets the needs of both small and large organizations. Its open architecture and active community make it one of the best open-source tools in its class. Zabbix enables the automation of many processes, enhances security, and ensures the stability of the entire IT infrastructure, making it an invaluable tool in the daily management of IT environments.
W oparciu o nasze wieloletnie doświadczenie, wspieramy firmy w efektywnym zarządzaniu infrastrukturą IT. Nasza wiedza, poparta licznymi projektami oraz aktywną współpracą z klientami, pozwala nam dobierać rozwiązania precyzyjnie dopasowane do potrzeb każdej organizacji. Znamy wyzwania związane z monitorowaniem sieci i serwerów, dlatego nasze działania zawsze uwzględniają zarówno optymalizację wydajności, jak i bezpieczeństwo, bez względu na skalę przedsiębiorstwa.
If you want to learn more or have doubts about which solution would be best for you, talk to our engineers!