Monitor Scope Issues: Datadog Troubleshooting Guide

could not find any scope for this monitor datadog

Datadog is an observability platform that provides monitoring and security services for cloud-based applications. It offers a range of features, including serverless functions for AWS Lambda, synthetic monitoring, cloud security management, and more.

Datadog's monitoring features include the ability to create and manage monitors, which are used to watch a metric or check and notify your team when a defined threshold has been exceeded. When creating a monitor, you can specify the type of monitor, such as an anomaly, APM, composite, custom, or forecast monitor. You can also define the search query, set alert conditions, configure notifications and automations, and define permissions and audit notifications.

Datadog also provides authorization scopes that allow you to limit and define granular access to your organization's data. These scopes can be used with OAuth2 clients for Datadog Apps. Some of the available scopes include cloud_cost_management_read, which allows you to view Cloud Cost pages, and dashboards_public_share, which enables you to generate public and authenticated links to share dashboards externally.

shundigital

Monitor status and monitor state

Monitor status is tracked by group. For a multi-alert monitor, a group is a set of tags with one value for each grouping key. For a simple alert, there is only one group, representing everything within the monitor's scope.

Monitor state is updated based on the evaluation results of their queries and configurations. If a monitor is in an alert state, the "Resolve" button is visible, and can be used to manually resolve the monitor. The "Resolve" function artificially switches the monitor status to "OK" for its next evaluation.

Monitor status and state can be affected by a number of factors, including:

  • Metrics being too sparse within a metric monitor's evaluation window
  • Monitor state updates due to external conditions, such as auto-resolve
  • Monitor configurations, such as alert conditions and recovery thresholds
  • Monitor scope and groups
  • Monitor arithmetic and sparse metrics

shundigital

Verify the presence of data

If your monitor's state or status is not what you expect, confirm the behaviour of the underlying data source. For a metric monitor, you can use the history graph to view the data points being pulled in by the metric query.

To search your monitors, construct a query using the facet panel on the left or the search bar at the top. After searching, select one or more monitors to update using the checkboxes next to each result.

Monitor status and groups

For both monitor evaluations and state, status is tracked by group. For a multi-alert monitor, a group is a set of tags with one value for each grouping key (for example, env:dev, host:myhost for a monitor grouped by env and host). For a simple alert, there is only one group (*), representing everything within the monitor's scope.

By default, Datadog keeps monitor groups available in the UI for 24 hours, or 48 hours for host monitors, unless the query is changed.

If you anticipate creating new monitor groups within the scope of your multi-alert monitors, you may want to configure a delay for the evaluation of these new groups. This can help you avoid alerts from the expected behaviour of new groups, such as high resource usage associated with the creation of a new container.

If your monitor queries for crawler-based cloud metrics, use an evaluation delay to ensure that the metrics have arrived before the monitor evaluates.

Alert configurations

If your monitor query uses the as_count() function, check the as_count() in Monitor Evaluations guide.

If using recovery thresholds, check the conditions listed in the recovery thresholds guide to see if the behaviour is expected.

shundigital

Alert configurations

The evaluation frequency determines how often Datadog performs the monitor query, with the default being once per minute. This frequency can be customised and depends on the evaluation window used. Longer windows result in lower evaluation frequencies.

Datadog offers two types of notifications: alerts and warnings. These notifications can be sent through various channels, including email, Slack, or PagerDuty. You can also include workflow automations or cases within the alert notifications.

Additionally, you can set up advanced alert conditions, such as notifications for missing data. This is useful when you expect a metric to always report data, and no data being reported could indicate an issue.

To enhance the alert setup, you can define permissions and audit notifications. This allows you to configure granular access controls and designate specific roles or users who can edit the monitor. You can also enable audit notifications to be alerted when the monitor is modified.

Finally, you can configure monitor status and resolution. The "Resolve" button is used to manually resolve a monitor, which is useful when data is reported intermittently. You can also configure auto-resolve, which automatically resolves the alert after a certain time period of inactivity on the metric.

shundigital

Monitor status and groups

Monitor Status

The monitor status page offers a comprehensive overview of a monitor's performance, including its current status, time of status update, and title. It also includes buttons for muting, resolving, and settings. The mute function allows for muting the entire monitor or specific parts of it by setting a scope, which is based on the monitor's group tags. Resolving a monitor manually can be done through the resolve button, which switches the monitor status to OK for its next evaluation.

Monitor Groups

Monitor groups refer to the ability to break down a monitor's performance by specific tags, such as "env:dev, host:myhost" for a monitor grouped by environment and host. Datadog keeps monitor groups available in the UI for 24 hours by default, providing a window for analysis and troubleshooting. To avoid alerts from expected behaviour, such as high resource usage when creating a new container, a delay can be configured for the evaluation of new groups. This ensures that you only receive relevant and actionable notifications.

The status of a monitor is distinct from its groups, with the former being the overall performance and the latter being specific tag-based breakdowns. Monitor status is updated based on evaluation results, while groups provide a more detailed view, allowing for insights into the performance of specific components or entities within the system.

shundigital

Absent notifications

  • Check email preferences for the recipient and ensure that "Notification from monitor alerts" is checked.
  • Check the event stream for events with the string "Error delivering notification".
  • If you are using multiple @opsgenie-[...] notifications in your monitor, we send those notifications with the same alias to Opsgenie. Due to an Opsgenie feature, Opsgenie will discard what is seen as a duplication.

Frequently asked questions

The monitor status page displays the monitor’s status over time, broken out by group. The header contains the monitor’s status, time of status, and monitor title.

While monitor evaluations are stateless, monitors themselves are stateful, and their state is updated based on the evaluation results of their queries and configurations. A monitor evaluation with a given status won’t necessarily cause the monitor’s state to change to the same status.

For a multi alert monitor, a group is a set of tags with one value for each grouping key (for example, env:dev, host:myhost for a monitor grouped by env and host). For a simple alert, there is only one group (*), representing everything within the monitor’s scope.

A monitor scope is an authorization mechanism that allows you to limit and define the granular access that applications have to an organization’s Datadog data.

Written by
Reviewed by
Share this post
Print
Did this article help you?

Leave a comment