This page serves as a general guide to using ArgoCD for monitoring purposes
Table of Contents |
---|
ArgoCD is implemented as a kubernetes controller which continuously monitors running applications and compares the current, live state against the desired target state (as specified in your Git repo).
You can use ArgoCD's dashboard (Image 1) to visually monitor your namespaces and pods, and to quickly visualize deployment issues. It can easily show you what your cluster or pods are doing, and if they are healthy.
ArgoCD has a robust set of documentation that can help you to get started with the application. We recommend the following two pages:
Your ArgoCD dashboard contains a lot of important information about how your Cinchy instance is behaving.
The application tiles view is the default view when logging into ArgoCD. You can also access it through the grid widget in the upper right hand corner of the screen.
These application tiles each represent a pod in one of your Cinchy environments. By default, you should have an application tile for each of the following pods per namespace (Image 2):
Base
Connections
Event Listener
IDP
Maintenance CLI
Meta Forms
Web
Worker
Each application tile shows you some important information about the pod, including its health and sync status. You can easily use the Sync, Refresh, and Delete buttons to manage your pods.
You can also click the star button on any application title to favourite the associated pod.
The pie chart view (Image 3) shows an easy visualization of the health of your applications. Access this view by clicking on the pie chart widget located in the upper right hand corner of the screen.
You can filter your view so that it only shows certain data. You can find the various filter options from the Filter column on the left hand side of any of the data views.
Favorites Only: You can filter by only the tiles that you have favourited (Image 4).
Sync Status: You can easily view what is out of sync using this filter (Image 5).
Health Status: Check on the health of your applications by filtering using these parameters (Image 6).
Labels: Use these labels to filter by specific instances/environments (Image 7).
Projects: If you choose to group your applications into projects, you can filter them using this tile (Image 8).
Clusters: If you have more than one cluster, you can filter for it using the Clusters tile (Image 9).
Namespace: Lastly, you can filter by Namespace. The below example shows a filter based on the dev-aurora-1 namespace (Image 10).
To bring up a more detailed view of your applications, click on the application tile. This view will show you all of components, their health and their sync status (Image 11). You can use the top navigational buttons perform actions such as syncing, rolling back, or refreshing.
This view can be particularly useful for load testing, snice you can see each individual pod spinning up and down.
Quickly view the status of your apps by looking at the health and sync status along the top of the page (Image 12).
Clicking on any individual pod or component tile in this view will bring up its information, including a Summary, a list of Events, your Manifest, and Parameters (Image 13).
You can also use this screen to edit or delete applications.
From the detailed tile summary you can also set your sync policies, such as automation, resource pruning, and self healing (Image 14).
You can click on the "Logs" tab in your detailed summary page to view the applicable logs for your selected pod (Image 15). You can filter, follow, snooze, copy, or download any logs.
This page details monitoring and logging for the Cinchy v5 platform when deployed on Kubernetes.
Cinchy v5 on Kubernetes offers and recommends a robust set of open-source tools and applications for monitoring, logging, and visualizing the data on your Cinchy instances.
Click on the respective tool below to learn more about it:
This page is about the analytics and visualization application Grafana, one of our recommended components for Cinchy v5 on Kubernetes.
Table of Contents |
---|
Grafana is an open source analytics and interactive visualization web application. When connected to your Cinchy platform, it provides charts, graphs, and alerting capabilities (Image 1).
Grafana, and its paired application Prometheus (which consumes metrics from the running components in your environment) is the recommended visualization application for Cinchy v5 on Kubernetes.
Grafana has a robust library of documentation of tutorials designed to help you learn the fundamentals of the application. We have listed a few notable ones below:
When using the default configuration pairing of Grafana and Prometheus, Prometheus is already set up as a data source in your metrics dashboard.
There are some saved dashboards that come out of the box with your Cinchy deployment. These dashboards will provide a great jumping off point for your metrics monitoring, and you can always customize, manage, and add further dashboards at your leisure.
Navigate to the left navigation pane, select the Dashboards icon > Manage (Image 2).
2. You will see a list of all of the Dashboards available to you (Image 3). Clicking on any of them will take you to a full metrics view (Image 4).
3. You can favourite any of your commonly used or most important dashboards by clicking on the star (Image 5).
4. Once you have favourited a dashboard, you can easily find it by navigating to the left navigation pane, select the Dashboards icon > Home. This will open the Dashboards Home. You can see both your favourite and your recent dashboards in this view (Image 6)
Your Cinchy v5 deployment comes with some out-of-the-box dashboards premade for you. You are able to customize these to suit your specifications. The following are a few notable ones:
Purpose: This dashboard provides a general overview of your entire cluster including all of your environments and pods (Image 7).
Metrics:
The following are some example metrics that you could expect to see from this dashboard:
CPU Usage
CPU Quota
Memory Usage
Memory Requests
Current Network Usage
Bandwidth (Transmitted and Received)
Average Container Bandwidth by Namespace
Rate of Packets
Rate of Packets Dropped
Storage IO & Distribution
Purpose: This dashboard is useful for looking at environment specific details (Image 8). You can use the namespace drop down menu to select which environment you want to visualize (Image 9). This can be particularly helpful during load testing. You are also able to drill down to a specific workload by clicking on its name.
Metrics:
The following are some example metrics that you could expect to see from this dashboard:
CPU Usage
CPU Quota
Memory Usage
Memory Quota
Current Network Usage
Bandwidth (Transmitted and Received)
Average Container Bandwidth by Workload
Rate of Packets
Rate of Packets Dropped
Grafana allows you to set up push alerts against your dashboards and queries. Once you have created your dashboard, you can follow the steps below to set up your alert.
Grafana does not have the capability to run alerts against queries with template variables.
To send emails out from Grafana, you need to configure your SMTP. This would have been done in the automations script run during your initial Cinchy v5 deployment. If you did not input this information at that time, you must do so before setting up your email alerts.
Your notifications channel refers to who will be receiving your alert. To set one up:
Click on the Alert icon on the left navigation tab (Image 10), and locate "Notifications Channel"
2. Click the "Add a Channel" button
3. Add in the following parameters, including any optional checkboxes you wish to use (Image 11):
Name: The name of this channel
Type: You have several options here, but email is the most common
Addresses: Input all the email addresses you want to be notified of this alert, separated by a comma
4. Click Test to send out a test email, if desired.
5. Save your Notification Channel
The following details how to set up alerts on your dashboards. You can also set up alerts upon creation of your dashboard from the same window.
Navigate to the dashboard and dashboard panel that you want to set up an alert for. In this example, we are setting up an alert for CPU usage on our cluster.
Click on the dashboard name > Edit
Click on the Alert tab (Image 12).
4. Input the following parameters to set up your alert (Image 13):
Alert Name: A title for your alert
Alert Timing: Choose how often to evaluate and for how long. In this example it is evaluated every minute for five minutes.
Conditions: Here you can set your threshold conditions for when an alert will be sent out. In this example, it is sent when the average of query A is above 75.
Set what happens if there's no data, or an error in your data
Add in your notification channel (i.e., who will be sent this notification)
Add a message to accompany the alert.
Click Apply > Save to finalize your alert.
Click on an image to enlarge it.
Below are a few alerts we recommend setting up on your Grafana.
Set up this alert to notify you when the CPU Usage on your nodes exceeds a specified limit.
You can use the following example queries to set up a dashboard that will capture CPU Usage by Node (Image 14).
Set up your alert. This example uses a threshold limit of 75 (Image 15).
Set up this alert to notify you when the Memory Usage on your nodes exceeds a specified limit.
You can use the following example queries to set up a dashboard that will capture CPU Usage by Node (Image 16)
Set up your alert. This example uses a threshold limit of 85 (Image 17).
Set up this alert to notify you when the Disk Usage on your nodes exceeds a specified limit.
You can use the following example queries to set up a dashboard that will capture Disk Usage by Node (Image 18)
Set up your alert. This example uses a threshold limit of 80 (Image 17).
Set up this alert to check the amount of iowait from the CPU. A high value usually indicates a slow/overloaded HDD or Network.
You can use the following example queries to set up a dashboard that will capture the CPU Iowait (Image 19).
Set up your alert. This example uses a threshold limit of 60 (Image 19).
This capability was added in Cinchy v5.4.
Your Grafana password can be updated in your deployment.json file (you may have renamed this during your original deployment).
Navigate to "cluster_component_config" > "grafana".
The default password is set to "prom-operator"; update this with your preferred new password, written in clear text.
Run the below command in the root directory of your devops.automations repo to update your configurations. If you have changed the name of your deployment.json file, make sure to update the command accordingly.
4. Commit and push your changes.
5. If your environment is not set-up to automatically apply upon configuration,navigate to the ArgoCD portal and refresh your component(s). If that does not work, re-sync.
Table of Contents |
---|
When deploying Cinchy v5 on Kubernetes, we recommend using Opensearch Dashboards for your logging. Opensearch is a community-driven fork of Elasticsearch created by Amazon, and it captures and indexes all of your logs into a single, easily accessible dashboard location. These logs can be queried, searched, and filtered, and Correlation IDs mean that they can also be traced across various components. These logging components take advantage of persistent storage.
Review some of the Opensearch documentation here:
The below sections will guide you through setting up your first Index, Visualization, Dashboard, and Alert.
Opensearch comes with sample data that you can use to get a feel of the various capabilities. You will find this on the main page upon logging in.
Navigate to your cinchy.kubernetes/environment_kustomizations/instance_template/worker/kustomization.yaml file.
In the below code, copy the Base64 encoded string in the value parameter.
Decode the value to retrieve your AppSettings.
Navigate to the below Serilog section of the code and update the "Default" parameter as needed to set your log level. The options are:
Ensure that you commit your changes.
Navigate to ArgoCD > Worker Application and refresh.
The following are some common search patterns when looking through your Opensearch Logs.
If an HTTP request to Cinchy Web/IDP fails, check the page's requests and the relevant response headers to find the "x-correlation-id" header. That header value can be used to search and find all logs associated with the HTTP request.
When debugging batch syncs, filter the "ExecutionId" field in the logs for your batch sync execution ID to narrow down your search.
When debugging real time syncs, search for your data sync config name in the Event Listener or Workers logs to find all the associated logging information.
The first step to utilizing the power of Opensearch Dashboards is to set up an index to pull data from your sources. An Index Pattern identifies which indices you want to explore. An index pattern can point to a specific index, for example, your log data from yesterday, or all indices that contain your log data.
Login to Opensearch. You would have configured the access point during your deployment installation; traditionally it will be found at <baseurl>/dashboard.
If this is your first time logging in, the username and password will be set to admin/admin.
We highly recommend you update the password as soon as possible.
2. Navigate to the Stack Management tab in the left navigation menu (Image 1).
3. From the left navigation, click on Index Patterns (Image 2).
4. Click on the Create Index Pattern button.
5. To set up your index pattern, you must define the source. Opensearch will list the sources available to you on the screen below. Input your desired source(s) in the text box (Image 3).
You can use the asterisk (*) to match multiple sources.
6. Configure your index pattern settings (Image 4).
Time field: Select a primary time field to use with the global time filter
Custom index pattern ID: By default, Opensearch gives a unique identifier to each index pattern. You can use this field to optional override the default ID with a custom one.
7. Once created, you can review your Index Patterns from the Index Patterns page (Image 5).
8. Click on your Index Pattern to review your fields (Image 6).
You can easily pull out any data from your index sources and view them in a variety of visualizations.
From the left navigation pane, click Visualize (Image 7).
2. If you have any Visualizations, they will appear on this page. To create a new one, click the Create Visualization button (Image 8).
3. Select your visualization type from the populated list (Image 9).
4. Choose your source (Image 10). If the source you want to pull data from isn't listed, you will need to set it up as an index first.
5. Configure the data parameters that appear in the right hand pane of the Create screen. These options will vary depending on what type of visualization you choose in step 3. The following example uses a pie chart visualization (Image 11):
Metrics
Aggregation: Choose how you want your data aggregated. This example uses Count.
Custom Label: You can use this optional field for custom labelling.
Buckets
Aggregation: Choose how you want your data aggregated. This example uses Split Slices > Terms.
Field: This drop down is populated based on the index source your chose. Select which field you want to use in your visualization. This example uses machine.os.keyword.
Order By: Define how you want your data to be ordered. This example uses Metric: Count, in descending order of size 10.
Choose wheteher to group other values in a seperate bucket. If you toggle this on, you will need to label the new bucket.
Choose whether to show missing values.
Advanced
You can optionally choose a JSON input. These will be merged with the opensearch aggregation definition.
Options
The variables in the options tab can be used to configure the UI of the visualization itself.
6. There are a few ways to further focus your visualization:
Use DQL to search your index data (Image 12). You can also save any queries you write for easy access by clicking on the save icon.
Add a filter on any of your fields (Image 13).
Update your date filter (Image 14).
7. Click save when finished with your visualization.
Once you have created your visualizations, you can combine them together on one Dashboard for easy access.
You can also create new visualizations from the Dashboard screen.
From the left navigation pane, click on Dashboards (Image 15).
2. If you have any Dashboards, they will appear on this page. To create a new one, click the Create Dashboard button (Image 16).
3. The "Editing New Dashboard" screen will appear. Click on Add an Existing object (Image 17).
4. Select any of the visualizations you created and it will automatically add to your Dashboard (Image 18). Repeat this step for as many visualizations as you'd like to appear.
5. Click Save to finish (Image 19).
This capability was added in Cinchy v5.4.
Your Opensearch password can be updated in your deployment.json file (you may have renamed this during your original deployment).
Navigate to "cluster_component_config" > "opensearch".
There are two users that you can configure the passwords for: Admin and Kibana Server. Kibana Server is used for communication between the opensearch dashboard and the opensearch server. The default password for both is set to "password";. To update this, you will need to use a machine with docker available.
Update your Admin password:
Your password must be hashed. You can do so by running the following command on a machine with docker available, inputting your new password where noted:
2. Navigate to "opensearch_admin_user_hashed_password" and input your hashed password.
3. You must also provide your password in a base64 encoded format; input your cleartext password here to receive your new encoded password.
4. Navigate to "opensearch_admin_user_password_base64" and input your encoded password.
4. Update your Kibana Server password:
1. Your password must be hashed. You can do so by running the following command on a machine with docker available, inputting your new password where noted:
2. Navigate to "opensearch_kibanaserver_user_hashed_password" and input your hashed password.
3. You must also provide your new password in cleartext. Navigate to "opensearch_kibanaserver_user_password" and input your cleartext password.
5. Run the below command in the root directory of your devops.automations repo to update your configurations. If you have changed the name of your deployment.json file, make sure to update the command accordingly.
4. Commit and push your changes.
5. If your environment is not set-up to automatically apply upon configuration,navigate to the ArgoCD portal and refresh your component(s). If that does not work, re-sync.
Opensearch comes with the ability to set up alerts based on any number of monitors. You can then push these alerts via email, should you desire.
Prior to setting up a monitor or alert, ensure that you have added your data source as an index pattern.
Definitions:
Your destination will be where you want your alerts to be pushed to. Opensearch supports various options, however this guide will focus on email.
From the left navigation pane, click Alerting (Image 1).
2. Click on the Destinations Tab > Add Destination
3. Add a name to label your destination and select Email as type (Image 2)
4. You will need to assign a Sender. This is the email address that the alert will send from when you specify this specific destination. To add a new Sender, click Manage Senders (Image 3).
5. Click Add Sender
6. Add in the following information (Image 4):
Sender Name
Email Address
Host (this is the host address for the email provider)
Port (this is the Port of the email provider)
Encryption
7. You will need to assign your Recipients. This is the email address(es) that will receive the alert when you specify this specific destination. To add a new Recipient, you can either type their email(s) into the box, or click Manage Senders to create an email group (Image 5).
8. Click Update to finalize your Destination.
You will need to authenticate your sender to allow emails to come through. Please contact Cinchy Customer Support to help you with this step.
Via email: support@cinchy.com
Via phone: 1-888-792-6051
Your monitor is a job that runs on a defined schedule and queries OpenSearch indices. The results of these queries are then used as input for one or more triggers.
From the Alerting dashboard, select Monitors > Create Monitor (Image 6).
2. Under Monitor Details, add in the following information (Image 7).
Monitor Name
Monitor Type (This example uses Per Bucket)
Whereas query-level monitors run your specified query and then check whether the query’s results triggers any alerts, bucket-level monitors let you select fields to create buckets and categorize your results into those buckets.
The alerting plugin runs each bucket’s unique results against a script you define later, so you have finer control over which results should trigger alerts. Each of those buckets can trigger an alert, but query-level monitors can only trigger one alert at a time.
Monitor Defining Method: the way you want to define your query and triggers. (This example uses Visual Editor)
Visual definition works well for monitors that you can define as “some value is above or below some threshold for some amount of time.”
Schedule: Choose a frequency and time zone for your monitor.
3. Under Data Source add in the following information (Image 8):
Index: Define the index you want to use as a source for this monitor
Time Field: Select the time field that will be used for the x-axis of your monitor
4. The Query section will appear differently depending on the Monitor Defining Method selected in step 2 (Image 9). This example is using the visual editor.
To define a monitor visually, select an aggregation (for example, count()
or average()
), a data filter if you want to monitor a subset of your source index, and a group-by field if you want to include an aggregation field in your query. At least one group-by field is required if you’re defining a bucket-level monitor. Visual definition works well for most monitors.
A trigger is a condition that, if met, will generate an alert.
To add a trigger, click the Add a Trigger button (Image 10).
2. Under New Trigger, define your trigger name and severity level (with 1 being the highest) (Image 11).
3. Under Trigger Conditions, you will specify the thresholds for the query metrics you set up previously (Image 12). In the below example, our trigger will be met if our COUNT threshold goes ABOVE 10000.
You can also use keyword filters to drill down into a more specific subset of data from your data source.
4. In the Action section you will define what happens if the trigger condition is met (Image 13). Enter the following information to set up your Action:
Action Name
Message Subject: In the case of an email alert, this will be the email subject line.
Message: In the case of an email alert, this will be the email body.
Perform Action: If you’re using a bucket-level monitor, decide whether the action is performed per execution or per alert.
Throttling: Enable action throttling if you wish. Use action throttling to limit the number of notifications you receive within a given span of time.
5. Click Send Test Message, if you want to test that the alert functions correctly.
6. Click Save.
In this example, we are pushing an alert based on errors. We will monitor our Connections stream for any instance of 'error', and push out an alert when our trigger threshold is hit.
Index: In this example we are looking specifically at Connections.
Time Field
Time Range: Define how far back you want to monitor
Data Filter: We want to monitor specifically whenever the Stream field of our index is stderr (standard error).
This is how our example monitor will appear; it shows when in the last 15 days our Connections app had errors in the log (Image 15).
Trigger Name
Severity Level
Trigger Condition: In this example, we use IS ABOVE and the threshold of 1.
The trigger threshold will be visible on your monitoring graph as a red line.
In this example, we are pushing an alert based on the kubectl.kubernetes.io/restartedAt annotation, which updates whenever your pod restarts. We will monitor this annotation across our entire product-mssql instance, and push out an alert when our trigger threshold is hit.
Index: In this example we are looking at our entire product-mssql instance.
Time Field
Query: This example is using the total count of the kubectl.kubernetes.io/restartedAt annotattion
Time Range: Define how far back you want to monitor. This example goes back 30 days.
This is how our example monitor will appear; it shows when in the last 30 days our instance had restarts (Image 18).
Trigger Name
Severity Level
Trigger Condition: In this example, we use IS ABOVE and the threshold of 100.
The trigger threshold will be visible on your monitoring graph as a red line.
In this example, we are pushing an alert based on status codes. We will monitor our entire instance for 400 status codes and push out an alert when our trigger threshold is hit.
Index: In this example we are looking across out entire product-mssql-1 instance.
Time Field
Time Range: Define how far back you want to monitor. In this example we are looking at the past day.
Data Filter: We want to monitor specifically whenever the Status Code is 400 (bad request).
This is how our example monitor will appear (note that there are no instances of a 400 status code in this graph) (Image 21).
Trigger Name
Severity Level
Trigger Condition: In this example, we use IS ABOVE and the threshold of 0.
The trigger threshold will be visible on your monitoring graph as a red line.
Ensure that you, else your alert will not work.
Through the support portal:
Query definition gives you flexibility in terms of what you query for (using ) and how you evaluate the results of that query (Painless scripting).
First we create our by defining the following (Image 14):
2. Once our monitor is created, we need to define a . When this condition is met, the alert will be pushed out to our defined In this example we want to be alerted when there is more than one stderr in our Connections stream (Image 16). Input the following:
First we create our by defining the following (Image 17):
2. Once our monitor is created, we need to define a . When this condition is met, the alert will be pushed out to our defined In this example we want to be alerted when there is more than 100 restarts across our instance (Image 19). Input the following:
First we create our by defining the following (Image 20):
2. Once our monitor is created, we need to define a . When this condition is met, the alert will be pushed out to our defined In this example we want to be alerted when there is at least one 400 status code across out instance (Image 22). Input the following:
Verbose
Verbose is the noisiest level, rarely (if ever) enabled for a production app.
Debug
Debug is used for internal system events that are not necessarily observable from the outside, but useful when determining how something happened. This is the default setting for Cinchy.
Information
Information events describe things happening in the system that correspond to its responsibilities and functions. Generally these are the observable actions the system can perform.
Warning
When service is degraded, endangered, or may be behaving outside of its expected parameters, Warning level events are used.
Error
When functionality is unavailable or expectations broken, an Error event is used.
Fatal
The most critical level, Fatal events demand immediate attention.
Monitor | A job that runs on a defined schedule and queries OpenSearch indices. The results of these queries are then used as input for one or more triggers. |
Trigger | Conditions that, if met, generate alerts. |
Alert | An event associated with a trigger. When an alert is created, the trigger performs actions, which can include sending a notification. |
Action | The information that you want the monitor to send out after being triggered. Actions have a destination, a message subject, and a message body. |
Destination | A reusable location for an action. Supported locations are Amazon Chime, Email, Slack, or custom webhook. |