Skip to content
This repository has been archived by the owner on Jan 7, 2019. It is now read-only.

Commit

Permalink
158-simplify-cloudwatch-to-graphite-app-remove-leadbutt-dependency
Browse files Browse the repository at this point in the history
Refactor/ simplification of the app:

* Remove `pipe-watchdog.py` entirely
  * We're no longer using pipes to direct the output. An error should be output to the cloudwatch error logs on failure `cf logs --recent digitalmarketplace-cloudwatch-to-graphite-preview`
  * Remove this from the run command in `manifest.yml.example`

* Add `flake8` dependency and `.flake8` file in line with GDS recommendations
* Activate travis service on repo to check Flake8 rules and requirements files
  * Add `.travis.yml`

* Add `Makefile` with shortcuts for running requirements freeze and flake8
* Update `README.md`
* Change requirements handling
  * Add `requirements-app.txt`
  * Ensure `requirements.txt` contains full pinned requirements
  * Add `flake8` requirement for devs

* Rewrite `app.py`
  * Add function descriptions to all functions
  * Remove dependency on `leadbutt`, `signal` and `subprocess`
  * Use `boto3` instead of `leadbutt`s `boto` (https://github.com/crccheck/cloudwatch-to-graphite/blob/master/leadbutt.py#L25)
  * Remove `get_timestamp` method. The same can be achieved with `time.time`
  * `logger.warn` is deprecated, use `logger.warning` instead
  * Bring string formatting in line with rest of strings in file
  * Rename `initialize_metrics ` to `create_hostedgraphite_base_metrics` for better description
  * Remove `call_leadbutt` function and replace with code calling aws ourselves. Extrapolated from (https://github.com/crccheck/cloudwatch-to-graphite/blob/master/leadbutt.py)
  *  Change `loglevel` to `INFO` this is all that is required now we're not piping output
  * Move retry decorator on to wrapper method in `if __name__` block

* Update `config.yaml.j2` template
  * `Dimensions` default = `{}` to avoid type manipulations
  * Remove `Auth` section from config. This can be hard coded until we have any need to specify it as a config variable
  * Remove `Options.Count` again, this can be hard coded. It's only used in one place and we've never needed to change it

Extrapolate code out of leadbutt
This code drops leadbutts retry functionality. Boto3 has the capacity to retry failed attempts. We should prefer this.
  • Loading branch information
benvand committed Sep 25, 2018
1 parent ccca2d9 commit 57f38df
Show file tree
Hide file tree
Showing 12 changed files with 163 additions and 93 deletions.
1 change: 0 additions & 1 deletion .cfignore
Original file line number Diff line number Diff line change
Expand Up @@ -6,6 +6,5 @@

# App requirements
!app.py
!pipe-watchdog.py
!config.yaml

3 changes: 3 additions & 0 deletions .flake8
Original file line number Diff line number Diff line change
@@ -0,0 +1,3 @@
[flake8]
exclude = venv*,__pycache__,.git
max-line-length = 120
11 changes: 11 additions & 0 deletions .travis.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,11 @@
language: python
sudo: false
python:
- "3.6"
dist: "trusty"
install:
- pip install -r requirements-dev.txt
script:
- make test-requirements test-flake8
notifications:
email: false
28 changes: 28 additions & 0 deletions Makefile
Original file line number Diff line number Diff line change
@@ -0,0 +1,28 @@
SHELL := /bin/bash
VIRTUALENV_ROOT := $(shell [ -z $$VIRTUAL_ENV ] && echo $$(pwd)/venv || echo $$VIRTUAL_ENV)

.PHONY: virtualenv
virtualenv:
[ -z $$VIRTUAL_ENV ] && [ ! -d venv ] && python3 -m venv venv || true

.PHONY: test-flake8
test-flake8: virtualenv
${VIRTUALENV_ROOT}/bin/flake8 .

.PHONY: freeze-requirements
freeze-requirements:
rm -rf venv-freeze
python3 -m venv venv-freeze
$$(pwd)/venv-freeze/bin/pip install -r requirements-app.txt
echo '# This file is autogenerated. Do not edit it manually.' > requirements.txt
cat requirements-app.txt >> requirements.txt
echo '' >> requirements.txt
$$(pwd)/venv-freeze/bin/pip freeze -r <(sed '/^--/d' requirements-app.txt) | sed -n '/The following requirements were added by pip freeze/,$$p' >> requirements.txt
rm -rf venv-freeze

.PHONY: test-requirements
test-requirements:
@diff requirements-app.txt requirements.txt | grep '<' \
&& { echo "requirements.txt doesn't match requirements-app.txt."; \
echo "Run 'make freeze-requirements' to update."; exit 1; } \
|| { echo "requirements.txt is up to date"; exit 0; }
5 changes: 3 additions & 2 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -3,9 +3,10 @@ Digital Marketplace Cloudwatch to Graphite

## Purpose

Ships Cloudwatch metrics to Hosted Graphite using https://github.com/crccheck/cloudwatch-to-graphite.
Ships Cloudwatch metrics to Hosted Graphite.
Based on the work of https://github.com/crccheck/cloudwatch-to-graphite and the `leadbutt` script. Particularly config parsing to define metrics.

An example config file is also included (`config.yaml.example`) to help you define the correct structure for your metrics.
An example config file is included (`config.yaml.example`) to help you define the correct structure for your metrics.

## Initial setup

Expand Down
130 changes: 92 additions & 38 deletions app.py
Original file line number Diff line number Diff line change
@@ -1,10 +1,9 @@
#!/usr/bin/env python
import os
import signal
import subprocess
import logging
from datetime import datetime, timedelta
import os

from datetime import datetime, timedelta
import boto3
import requests
import yaml
from retrying import retry
Expand All @@ -13,58 +12,113 @@
logger = logging.getLogger("app")


def get_timestamp():
return (
(datetime.now() - timedelta(days=5)) - datetime(1970, 1, 1)
).total_seconds()
def get_config():
"""Get the config file."""
with open('config.yaml') as fp:
config = yaml.load(fp)
return config


def send_to_hostedgraphite(metrics):
"""Put request to hostedgraphite."""
response = requests.put(
"http://www.hostedgraphite.com/api/v1/sink",
auth=(os.getenv("HOSTED_GRAPHITE_API_KEY"), ""),
data=metrics
)

if response.status_code >= 400:
logger.warn(f"Error sending metrics to hosted graphite - Status code {response.status_code}")
if not response.ok:
logger.warning(
"Error sending metrics to hosted graphite - {}: {}".format(response.status_code, response.reason)
)
else:
logger.info(f"Metrics sent to hosted graphite - Status code {response.status_code}")
logger.info("Metrics sent to hosted graphite - Status code {}".format(response.status_code))


def initialize_metrics():
initialized_metrics = []
timestamp = int(get_timestamp())
with open('config.yaml') as fp:
config = yaml.load(fp)
for metric in config['Metrics']:
stats = metric['Statistics'] if isinstance(metric['Statistics'], list) else [metric['Statistics']]
for stat in stats:
metric_name = (metric['Options']['Formatter']
.replace("%(statistic)s", stat.lower()))
initialized_metrics.append("{} 0.0 {}".format(metric_name, timestamp))
send_to_hostedgraphite("\n".join(initialized_metrics))
def format_config_metric_entry_for_hostedgraphite_base_metric(config_metric_entry, timestamp):
hostedgraphite_metric_name = config_metric_entry['Options']['Formatter'].replace(
"%(statistic)s",
config_metric_entry['Statistics'].lower()
)
hostedgraphite_base_metric = "{} 0.0 {}".format(hostedgraphite_metric_name, timestamp)
return hostedgraphite_base_metric


@retry(wait_fixed=60000, retry_on_result=lambda res: res is None)
def call_leadbutt():
def create_hostedgraphite_base_metrics(config):
"""Take a metric entry from the config format it into a base metric used to initialise the metric on hostedgraphite.
"""
See https://github.com/crccheck/cloudwatch-to-graphite for more info on leadbutt
base_metrics = []
timestamp = int(((datetime.now() - timedelta(days=5)) - datetime(1970, 1, 1)).total_seconds())
for config_metric_entry in config['Metrics']:
hostedgraphite_base_metric = format_config_metric_entry_for_hostedgraphite_base_metric(
config_metric_entry,
timestamp
)
hostedgraphite_base_metrics.append(hostedgraphite_base_metric)
return hostedgraphite_base_metrics


def format_cloudwatch_metric_datapoint_for_hostedgraphite(metric, result):
"""Given a cloudwatch metric datapoint convert it into the format hostedgraphite expects."""
hostedgraphite_metric_name = (
cloudwatch_metric_datapoint['Options']['Formatter'] % {'statistic': cloudwatch_metric_datapoint['Statistics']}
).lower()
return '{0} {1} {2}'.format(
hostedgraphite_metric_name,
result[cloudwatch_metric_datapoint['Statistics']],
result['Timestamp'].strftime('%s')
)


def get_metric_from_cloudwatch(client, metric):
"""Call the client once for th supplied metric."""
end_time = datetime.utcnow()
start_time = end_time - timedelta(seconds=600)
return client.get_metric_statistics(
Period=60,
StartTime=start_time,
EndTime=end_time,
MetricName=metric['MetricName'],
Namespace=metric['Namespace'],
Statistics=[metric['Statistics']],
Dimensions=[{'Name': k, 'Value': v} for k, v in metric['Dimensions'].items()],
Unit=metric.get('Unit', 'None'),
)


def get_metrics_from_cloudwatch_and_format_for_hostedgraphite(config):
"""Get metrics from config, call cloudwatch for metric entry and format metric datapoints to expected hostedgraphite
format.
For each metric in the config call the cloudwatch api to return the datapoints for that metric.
For each cloudwatch datapoint create one hostedgraphite metric entry for supply to hostedgraphite.
"""
result = subprocess.Popen("leadbutt", stdout=subprocess.PIPE)
metrics = result.communicate()[0].decode("utf-8")
send_to_hostedgraphite(metrics)
hostedgraphite_metrics = []

client = boto3.client('cloudwatch', region_name="eu-west-1")
for metric_entry in config['Metrics']:
cloudwatch_metric = get_metric_from_cloudwatch(client, metric_entry)
for cloudwatch_metric_datapoint in cloudwatch_metric['Datapoints']:
hostedgraphite_metric = format_cloudwatch_metric_datapoint_for_hostedgraphite(
metric_entry,
cloudwatch_metric_datapoint
)
hostedgraphite_metrics.append(hostedgraphite_metric)

return hostedgraphite_metrics


if __name__ == '__main__':
logging.basicConfig(
level=logging.DEBUG,
format="%(asctime)s:%(name)s:%(levelname)s:%(message)s",
)
logging.basicConfig(level=logging.INFO, format="%(asctime)s:%(name)s:%(levelname)s:%(message)s")
config = get_config()

base_metrics = create_hostedgraphite_base_metrics(config)
send_to_hostedgraphite("\n".join(base_metrics))

# python by default installs a signal handler that largely ignores SIGPIPE, but we want to use it for our watchdog
# mechanism, so reinstate the *unix* default, which is a fatal handler
signal.signal(signal.SIGPIPE, signal.SIG_DFL)
@retry(wait_fixed=60000, retry_on_result=lambda res: res is None)
def sleep_and_send_retry():
"""Wrapper to apply retry to get and send methods."""
hostedgraphite_metrics = get_metrics_from_cloudwatch_and_format_for_hostedgraphite(config)
send_to_hostedgraphite('\n'.join(hostedgraphite_metrics))

initialize_metrics()
call_leadbutt()
sleep_and_send_retry()
12 changes: 4 additions & 8 deletions config.yaml.j2
Original file line number Diff line number Diff line change
@@ -1,16 +1,14 @@
{% set apps = ["antivirus-api", "api", "admin-frontend", "briefs-frontend", "brief-responses-frontend", "buyer-frontend", "router", "search-api", "supplier-frontend", "user-frontend"] -%}
{% set environments = ["preview", "staging", "production"] -%}
{% set log_types = ["application", "nginx"] -%}
Auth:
region: "eu-west-1"
Metrics:
{% for environment in environments -%}
{% for app in apps -%}
{% for bucket in range(10) -%}
- Namespace: "DM-RequestTimeBuckets"
MetricName: "{{ environment }}-{{ app }}-request-times-{{ bucket }}"
Statistics: "Sum"
Dimensions: []
Dimensions: {}
Options:
Formatter: 'cloudwatch.request_time_buckets.{{ environment }}.{{ app }}.request_time_bucket_{{ bucket }}.%(statistic)s'
{% endfor -%}
Expand All @@ -21,13 +19,13 @@ Metrics:
- Namespace: "DM-500s"
MetricName: "{{ environment }}{% if app != "nginx" %}-{{ app }}{% endif %}-nginx-500s"
Statistics: "Sum"
Dimensions: []
Dimensions: {}
Options:
Formatter: 'cloudwatch.application_500s.{{ environment }}.{{ app }}.500s.%(statistic)s'
- Namespace: "DM-APIClient-Retries"
MetricName: "{{ environment }}-{{ app }}-apiclient-retries"
Statistics: "Sum"
Dimensions: []
Dimensions: {}
Options:
Formatter: "cloudwatch.apiclient_retries.{{ environment }}.{{ app }}.retries.%(statistic)s"
{% endfor -%}
Expand All @@ -36,7 +34,7 @@ Metrics:
- Namespace: "DM-429s"
MetricName: "{{ environment }}-router-nginx-429s"
Statistics: "Sum"
Dimensions: []
Dimensions: {}
Options:
Formatter: "cloudwatch.router_429s.{{ environment }}.router.429s.%(statistic)s"
{% endfor -%}
Expand All @@ -53,5 +51,3 @@ Metrics:
{% endfor -%}
{% endfor -%}
{% endfor -%}
Options:
Count: 10
2 changes: 1 addition & 1 deletion manifest.yml.example
Original file line number Diff line number Diff line change
Expand Up @@ -4,7 +4,7 @@ applications:
buildpack: python_buildpack
no-route: true
health-check-type: none
command: python app.py 2>&1 | python pipe-watchdog.py
command: python app.py
instances: 1
memory: 128M
env:
Expand Down
41 changes: 0 additions & 41 deletions pipe-watchdog.py

This file was deleted.

4 changes: 4 additions & 0 deletions requirements-app.txt
Original file line number Diff line number Diff line change
@@ -0,0 +1,4 @@
retrying==1.3.3
requests==2.17.3
PyYAML==3.13
boto3==1.9.6
1 change: 1 addition & 0 deletions requirements-dev.txt
Original file line number Diff line number Diff line change
@@ -1,3 +1,4 @@
-r requirements.txt

Jinja2==2.10
flake8==3.5.0
18 changes: 16 additions & 2 deletions requirements.txt
Original file line number Diff line number Diff line change
@@ -1,3 +1,17 @@
# This file is autogenerated. Do not edit it manually.
retrying==1.3.3
cloudwatch-to-graphite==0.9.2
requests==2.17.3
requests==2.17.3
PyYAML==3.13
boto3==1.9.6

## The following requirements were added by pip freeze:
botocore==1.12.7
certifi==2018.8.24
chardet==3.0.4
docutils==0.14
idna==2.5
jmespath==0.9.3
python-dateutil==2.7.3
s3transfer==0.1.13
six==1.11.0
urllib3==1.21.1

0 comments on commit 57f38df

Please sign in to comment.