Create AWS Metrics and Alarms based on Logs with Terraform

Felipe López
2 min readNov 13, 2021

In this simple post I want to show how to create a metric and an alarm based on log analysis in terraform.

This approach is interesting for legacy services where developers are not really wanting to touch core functionalities.

Repo: https://github.com/felipeloha/tf-cw-filter

The example analyses unstructured logs and creates a metric with dimensions based on a text pattern.

Specifically it measures execution time of API calls with a (very) specific https status and triggers an alarm when the value is too high.

The log sample with pattern [DEP GET $STATUS $EXEC_TIME] would look like this:

DEP GET 200 250
DEP GET 200 900
DEP GET 200 550
DEP GET 500 100
DEP GET 500 100
DEP GET 408 1000

The following code creates a metric filter that parses this lines, catches the execution time and classifies the samples by http code

resource "aws_cloudwatch_log_metric_filter" "metric" {
name = "my-metric-filter"
pattern = "[dep = DEP, get = GET, status, time]"
log_group_name = aws_cloudwatch_log_group.logs.name

metric_transformation {
name = "metric-exec-time"
namespace = "metric-namespace"
value = "$time"
unit = "Milliseconds"
dimensions = {
status = "$status"
}
}
}

This snippet creates an alarm that is triggered whenever successful calls (200) are too slow (longer than 800ms)

resource "aws_cloudwatch_metric_alarm" "too-slow-200" {
alarm_name = "too-slow-200"
comparison_operator = "GreaterThanOrEqualToThreshold"
evaluation_periods = "1"
metric_name = "metric-exec-time"
namespace = "metric-namespace"
period = "60"
statistic = "Average"
threshold = "800"
alarm_description = "200 are too slow"
dimensions = {
status = "200"
}
}

Beware of:

  • Setting accurate values for evaluation period and statistic in the alarm
  • To react to multiple http codes, multiple metrics and alerts are needed with different patterns. e.g.: “[dep, get, status = 2*, time]”

I hope this saves you some time! Happy coding!

--

--