Grafana

This page is about the analytics and visualization application Grafana, one of our recommended components for Cinchy v5 on Kubernetes.

Table of Contents

Grafana

5. Recommended Alerts

6. Updating your Grafana Password

1. Grafana Overview

Grafana is an open source analytics and interactive visualization web application. When connected to your Cinchy platform, it provides charts, graphs, and alerting capabilities (Image 1).

Grafana, and its paired application Prometheus (which consumes metrics from the running components in your environment) is the recommended visualization application for Cinchy v5 on Kubernetes.

2. Getting Started with Grafana

Grafana has a robust library of documentation of tutorials designed to help you learn the fundamentals of the application. We have listed a few notable ones below:

When using the default configuration pairing of Grafana and Prometheus, Prometheus is already set up as a data source in your metrics dashboard.

2.1 Accessing your Saved Dashboards:

There are some saved dashboards that come out of the box with your Cinchy deployment. These dashboards will provide a great jumping off point for your metrics monitoring, and you can always customize, manage, and add further dashboards at your leisure.

Navigate to the left navigation pane, select the Dashboards icon > Manage (Image 2).

2. You will see a list of all of the Dashboards available to you (Image 3). Clicking on any of them will take you to a full metrics view (Image 4).

3. You can favourite any of your commonly used or most important dashboards by clicking on the star (Image 5).

4. Once you have favourited a dashboard, you can easily find it by navigating to the left navigation pane, select the Dashboards icon > Home. This will open the Dashboards Home. You can see both your favourite and your recent dashboards in this view (Image 6)

3. Recommended Dashboards

Your Cinchy v5 deployment comes with some out-of-the-box dashboards premade for you. You are able to customize these to suit your specifications. The following are a few notable ones:

3.1 Kubernetes/Compute Resources/Cluster

Purpose: This dashboard provides a general overview of your entire cluster including all of your environments and pods (Image 7).

Metrics:

The following are some example metrics that you could expect to see from this dashboard:

CPU Usage
CPU Quota
Memory Usage
Memory Requests
Current Network Usage
Bandwidth (Transmitted and Received)
Average Container Bandwidth by Namespace
Rate of Packets
Rate of Packets Dropped
Storage IO & Distribution

3.2 Kubernetes/Compute Resources/Namespace (Workloads)

Purpose: This dashboard is useful for looking at environment specific details (Image 8). You can use the namespace drop down menu to select which environment you want to visualize (Image 9). This can be particularly helpful during load testing. You are also able to drill down to a specific workload by clicking on its name.

Metrics:

The following are some example metrics that you could expect to see from this dashboard:

CPU Usage
CPU Quota
Memory Usage
Memory Quota
Current Network Usage
Bandwidth (Transmitted and Received)
Average Container Bandwidth by Workload
Rate of Packets
Rate of Packets Dropped

4. Setting Up Alerts

Grafana allows you to set up push alerts against your dashboards and queries. Once you have created your dashboard, you can follow the steps below to set up your alert.

Grafana does not have the capability to run alerts against queries with template variables.

To send emails out from Grafana, you need to configure your SMTP. This would have been done in the automations script run during your initial Cinchy v5 deployment. If you did not input this information at that time, you must do so before setting up your email alerts.

4.1 Set Up Your Notifications Channel

Your notifications channel refers to who will be receiving your alert. To set one up:

Click on the Alert icon on the left navigation tab (Image 10), and locate "Notifications Channel"

2. Click the "Add a Channel" button

3. Add in the following parameters, including any optional checkboxes you wish to use (Image 11):

Name: The name of this channel

Type: You have several options here, but email is the most common

Addresses: Input all the email addresses you want to be notified of this alert, separated by a comma

4. Click Test to send out a test email, if desired.

5. Save your Notification Channel

4.2 Setting up your Alert

The following details how to set up alerts on your dashboards. You can also set up alerts upon creation of your dashboard from the same window.

Navigate to the dashboard and dashboard panel that you want to set up an alert for. In this example, we are setting up an alert for CPU usage on our cluster.
Click on the dashboard name > Edit
Click on the Alert tab (Image 12).

4. Input the following parameters to set up your alert (Image 13):

Alert Name: A title for your alert
Alert Timing: Choose how often to evaluate and for how long. In this example it is evaluated every minute for five minutes.
Conditions: Here you can set your threshold conditions for when an alert will be sent out. In this example, it is sent when the average of query A is above 75.
Set what happens if there's no data, or an error in your data
Add in your notification channel (i.e., who will be sent this notification)
Add a message to accompany the alert.
Click Apply > Save to finalize your alert.

Click on an image to enlarge it.

5. Recommended Alerts

Below are a few alerts we recommend setting up on your Grafana.

5.1 CPU Usage

Set up this alert to notify you when the CPU Usage on your nodes exceeds a specified limit.

Dashboard Query:

You can use the following example queries to set up a dashboard that will capture CPU Usage by Node (Image 14).

avg by (node_name) (100 - ((avg by (cpu,node_name) (irate(node_cpu_seconds_total{mode="idle"}[1m]))) * 100))

100 - ((avg by (cpu,node_name) (irate(node_cpu_seconds_total{mode="idle"}[1m]))) * 100)

Alert:

Set up your alert. This example uses a threshold limit of 75 (Image 15).

5.2 Memory Usage

Set up this alert to notify you when the Memory Usage on your nodes exceeds a specified limit.

Dashboard Query:

You can use the following example queries to set up a dashboard that will capture CPU Usage by Node (Image 16)

((node_memory_MemTotal_bytes-node_memory_MemAvailable_bytes) / (node_memory_MemTotal_bytes))*100

Alert:

Set up your alert. This example uses a threshold limit of 85 (Image 17).

5.3 Disk Usage

Set up this alert to notify you when the Disk Usage on your nodes exceeds a specified limit.

Dashboard Query:

You can use the following example queries to set up a dashboard that will capture Disk Usage by Node (Image 18)

(sum((node_filesystem_size_bytes))by(node_name) - sum((node_filesystem_free_bytes))by(node_name)) *100/(sum((node_filesystem_avail_bytes))by(node_name)+(sum((node_filesystem_size_bytes))by(node_name) - sum((node_filesystem_free_bytes))by(node_name)))

Alert:

Set up your alert. This example uses a threshold limit of 80 (Image 17).

5.4 Iowait

Set up this alert to check the amount of iowait from the CPU. A high value usually indicates a slow/overloaded HDD or Network.

Dashboard Query:

You can use the following example queries to set up a dashboard that will capture the CPU Iowait (Image 19).

(sum(irate(node_cpu_seconds_total{mode="iowait"}[1m]))by(node_name) * 100 / 4)

Alert:

Set up your alert. This example uses a threshold limit of 60 (Image 19).

6. Updating your Grafana Password

This capability was added in Cinchy v5.4.

Your Grafana password can be updated in your deployment.json file (you may have renamed this during your original deployment).

Navigate to "cluster_component_config" > "grafana".
The default password is set to "prom-operator"; update this with your preferred new password, written in clear text.
Run the below command in the root directory of your devops.automations repo to update your configurations. If you have changed the name of your deployment.json file, make sure to update the command accordingly.

dotnet Cinchy.DevOps.Automations.dll "deployment.json"

4. Commit and push your changes.

5. If your environment is not set-up to automatically apply upon configuration,navigate to the ArgoCD portal and refresh your component(s). If that does not work, re-sync.

PreviousMonitoring and Logging on Kubernetes NextOpensearch Dashboards

Last updated 2 years ago

Was this helpful?