Building a Docker container for easy SSH into Opsworks Stacks

Part of the concept behind Opsworks is the ability to create and destroy instances dynamically. If your instances are configured by Chef recipes all the way from AMI to processing production workload, this is probably something you do pretty regularly.

But this probably means that the IP addresses behind your instances change regularly. At some point you might get tired of constantly going back to the Opsworks console to get an IP address, I know I did.

It turns out it’s not too difficult to generate an ssh config file using boto3 to pull down the instances IP addresses. I chose to do this in python, and an example script is below. In my case, our instances all have private IP addresses, so that’s the property I’m using.

import os
import boto3

ssh_config_filename = '/home/meuser/.ssh/config'
if os.path.exists(ssh_config_filename):
os.remove(ssh_config_filename)

if not os.path.exists('/home/meuser/.ssh/'):
os.mkdir('/home/meuser/.ssh/')

profiles = {'NoPHI':[{'StackName':'My-Dev-Stack','IdentityFile':'my-dev-private-key.pem', 'ShortName':'dev'}
],
'PHI':[{'StackName':'My-prod-stack','IdentityFile':'my-prod-private-key.pem', 'ShortName':'prod'}]
}

for profile in profiles.keys():
session = boto3.Session(profile_name=profile)

opsworks_client = session.client('opsworks')
opsworks_stacks = opsworks_client.describe_stacks()['Stacks']
for opsworks_stack in opsworks_stacks:
for stack in profiles[profile]:
if opsworks_stack['Name'] == stack['StackName']:
instances = opsworks_client.describe_instances(StackId=opsworks_stack['StackId'])
for instance in instances['Instances']:
with open(ssh_config_filename, "a") as ssh_config_file:
ssh_config_file.write("Host " + (stack['ShortName'] + '-' + instance['Hostname']).lower() + '\n')
ssh_config_file.write(" Hostname " + instance['PrivateIp'] + '\n')
ssh_config_file.write(" User ubuntu\n")
ssh_config_file.write(" IdentityFile " + '/home/meuser/keys/' + stack['IdentityFile'] + '\n')
ssh_config_file.write("\n")
ssh_config_file.write("\n")

This script will run through the different AWS account profiles you specify, find the instances in the stacks you specify, and let you ssh into them using

ssh dev-myinstance1

If you have an instance in your Opsworks stack named “myinstance1”. If you run linux as your working machine, you’re really done at this point. But if you’re on Windows like me, there’s another step that can make this even easier: running this script in a Docker linux container to make ssh’ing around easier.

First, you’ll need to install Docker for windows. It might be helpful to go through some of their walk throughs as well if you aren’t familiar with Docker.

Once you have the Docker daemon installed and running, you’ll need to create a Docker image from a Docker file that can run the python script we have above. I’ve got an example below of using the ubuntu:latest image, installing python, moving over your AWS secret keys and private keys for sshing to the image, and running the python script.

You will need to put the files being moved over (ssh_config_updater.py, my-prod-private-key.pem, my-dev-private-key.pem, and credentials) in the same directory as the docker file.

FROM ubuntu:latest
RUN useradd -d /home/meuser -m meuser
RUN apt-get update
RUN apt-get install -y python-pip
RUN pip install --upgrade pip
RUN apt-get install -y vim
RUN apt-get install -y ssh
RUN pip install --upgrade awscli
ADD my-dev-private-key.pem /home/meuser/keys/my-dev-private-key-dev.pem
ADD my-prod-private-key.pem /home/meuser/keys/my-prod-private-key.pem
RUN chmod 600 /home/meuser/keys/*
RUN chown bolson /home/meuser/keys/*
ADD ssh_config_updater.py /home/meuser/ssh_config_updater.py
ADD credentials /home/meuser/.aws/credentials
RUN pip install boto3
USER meuser
WORKDIR /home/meuser
RUN python /home/meuser/ssh_config_updater.py
CMD /bin/bash

Once you have your Dockerfile and build directory setup, you can run the command below with the docker daemon running.

docker build -t opsworks-manage .

Once that command finishes, you can ssh into your instances with


docker run -it --name opsworks-manage opsworks-manage ssh dev-myinstance1

This creates a running container with the name opsworks-manage. You can re-use this container to ssh into instances using

docker exec -it opsworks-manage ssh dev-myinstance1

A couple notes, I’m using the default “ubuntu” account AWS builds into Ubuntu instances for simplicity. This is a root account, and in practice you should create another account to use for normal management, either through an opsworks recipe or by using Opsworks to create the user account.

Another note, because this example copies over ssh keys and credentials files to the Docker container, you should never push this image to a container registry. If you plan on version controlling the Dockerfile, you should make sure to use a .gitignore file to keep that sensitive information out of source control.

AWS Codepipeline: Alert on Stage Failure

We’ve been using AWS Codepipeline for some time now and for the most part it’s a great managed service. Easy to get started with and pretty simple to use.

That being said, it does lack some features out of the box that most CICD systems have ready for you. The one I’ll be tackling today is alerting on a stage failure.

Out of the box Codepipeline won’t alert you when there’s a failure at a stage. Unless you go in and literally look at it in the console, you won’t know that anything is broken. For example when I started working on this blog entry, I checked one of the pipelines that delivers to our test environment, and found it in a failed state.

In this case the failure is because our Opsworks stacks are set to turn off test instances when not during business hours, but for almost any other failure I would want to alert the team responsible for making the change that failed.

For a solution, we’ll use these resources

  • AWS Lambda
  • Boto3
  • AWS SNS Topics
  • Cloudformation
First we’ll need a Lambda function that can get a list of pipelines that are in our account, scan their stages, detect failures, and produce alerts. Below is a basic example of what we’re using. I’m far from a python expert, so I understand that there are improvements that could be made for error handling.
import boto3
import logging
import os

def lambda_handler(event, context):
# Get a cloudwatch logger
logger = logging.getLogger('mvp-alert-on-cp-failure')
logger.setLevel(logging.DEBUG)

sns_topic_arn = os.environ['TOPIC_ARN']

# Obtain boto3 resources
logger.info('Getting boto 3 resources')
code_pipeline_client = boto3.client('codepipeline')
sns_client = boto3.client('sns')

logger.debug('Getting pipelines')
for pipeline in code_pipeline_client.list_pipelines()['pipelines']:
logger.debug('Checking pipeline ' + pipeline['name'] + ' for failures')
for stage in code_pipeline_client.get_pipeline_state(name=pipeline['name'])['stageStates']:
logger.debug('Checking stage ' + stage['stageName'] + ' for failures')
if 'latestExecution' in stage and stage['latestExecution']['status'] == 'Failed':
logger.debug('Stage failed! Sending SNS notification to ' + sns_topic_arn)
failed_actions = ''
for action in stage['actionStates']:
logger.debug(action)
logger.debug('Checking action ' + action['actionName'] + ' for failures')
if 'latestExecution' in action and action['latestExecution']['status'] == 'Failed':
logger.debug('Action failed!')
failed_actions += action['actionName']
logger.debug('Publishing failure alert: ' + pipeline['name'] + '|' + stage['stageName'] + '|' + action['actionName'])
logger.debug('Publishing failure alert: ' + pipeline['name'] + '|' + stage['stageName'] + '|' + failed_actions)
alert_subject = 'Codepipeline failure in ' + pipeline['name'] + ' at stage ' + stage['stageName']
alert_message = 'Codepipeline failure in ' + pipeline['name'] + ' at stage ' + stage['stageName'] + '. Failed actions: ' + failed_actions
logger.debug('Sending SNS notification')
sns_client.publish(TopicArn=sns_topic_arn,Subject=alert_subject,Message=alert_message)

return "And we're done!"

If you’re looking closely, you’re probably wondering what the environment variable named “TOPIC_ARN” is, which leads us to the next piece: A cloudformation template to create this lambda function.

The Cloudformation template needs to do a few things.

  1. Create the Lambda function. I’ve chosen to do this using AWS Serverless Application Model.
  2. Create an IAM Role for the Lambda function to execute under
  3. Create IAM policies that will give the IAM role read access to your pipelines, and publish access to your SNS topic
  4. Create an SNS topic with a list of individuals you want to get the email
The only really new fangled Cloudformation feature I’m using here is AWS SAM, the rest of these have existed for quite a while. In my opinion one of the main ideas behind AWS SAM is to package your entire Serverless Function in a single Cloudformation template, so the example below does all four of these steps.
#############################################
### Lambda function to alert on pipeline failures
#############################################

LambdaAlertCPTestFail:
Type: AWS::Serverless::Function
Properties:
Handler: mvp-alert-on-cp-failure.lambda_handler
Role: !GetAtt IAMRoleAlertOnCPTestFailure.Arn
Runtime: python2.7
Timeout: 300
Events:
CheckEvery30Minutes:
Type: Schedule
Properties:
Schedule: cron(0/30 12-23 ? * MON-FRI *)
Environment:
Variables:
STAGE_NAME: Test
TOPIC_ARN: !Ref CodePipelineTestStageFailureTopic
CodePipelineTestStageFailureTopic:
Type: "AWS::SNS::Topic"
Properties:
DisplayName: MvpPipelineFailure
Subscription:
-
Endpoint: 'pipelineCurator@example.com'
Protocol: 'email'
TopicName: MvpPipelineFailure
IAMPolicyPublishToTestFailureTopic:
Type: "AWS::IAM::Policy"
DependsOn: MoveToPHIIAMRole
Properties:
PolicyName: !Sub "Role=AlertOnCPTestFailure,Env=${AccountParameter},Service=SNS,Rights=Publish"
PolicyDocument:
Version: "2012-10-17"
Statement:
-
Effect: "Allow"
Action:
- "sns:Publish"
Resource:
- !Ref CodePipelineTestStageFailureTopic
Roles:
- !Ref IAMRoleAlertOnCPTestFailure
IAMPolicyGetPipelineStatus:
Type: "AWS::IAM::Policy"
DependsOn: MoveToPHIIAMRole
Properties:
PolicyName: !Sub "Role=AlertOnCPTestFailure,Env=${AccountParameter},Service=CodePipeline,Rights=R"
PolicyDocument:
Version: "2012-10-17"
Statement:
-
Effect: "Allow"
Action:
- "codepipeline:GetPipeline"
- "codepipeline:GetPipelineState"
- "codepipeline:ListPipelines"
Resource:
- "*"
Roles:
- !Ref IAMRoleAlertOnCPTestFailure
IAMRoleAlertOnCPTestFailure:
Type: "AWS::IAM::Role"
Properties:
RoleName: !Sub "Role=AlertOnCPTestFailure,Env=${AccountParameter},Service=Lambda"
AssumeRolePolicyDocument:
Version: "2012-10-17"
Statement:
-
Effect: "Allow"
Principal:
Service:
- "lambda.amazonaws.com"
Action:
- "sts:AssumeRole"
Path: "/"

#############################################
### End of pipeline failure alerting lambda function
#############################################

And that’s about it. A couple notes on the Cloudformation template:

Alert Frequency

I’m using a cron expression as my schedule, currently set to go off every half hour during business hours, because we don’t have over night staff that would be able to look at Pipeline failures. You can easily up the frequency with something like

cron(0/5 12-23 ? * MON-FRI *)

Lambda Environment Variables

One of the announcements from reInvent I was most excited about was AWS Lambda environment variables. This is a pretty magical feature that lets you pass in values to your Lambda functions. In this case, I’m using it to pass an SNS topic ARN that’s being created in the same Cloudformation template into the Lambda function.

Long story short, that means we can create resources in AWS and pass references to them into code without having to have a way to search for them or put their values into source code.

      Environment:
Variables:
STAGE_NAME: Test
TOPIC_ARN: !Ref CodePipelineTestStageFailureTopic

Flowerboxes

The CFT this example comes from contains multiple pipeline management functions, so the flower boxes (“###############”) at the beginning and end of the Lambda Function definition are our way of keeping resources for each lambda function separated.

SNS Notifications

When you create an SNS topic with an email, the user will have to register with the topic. They’ll get an email and have to click the link to allow the notifications.

Snippets

These are snippets I pulled out of our pipeline management Cloudformation stack. Obviously you’ll have to put them into a Cloudformation template that references the SAM Cloudformation Transform and has a valid header like the one below:
AWSTemplateFormatVersion: '2010-09-09'
Transform: AWS::Serverless-2016-10-31
Resources:......

Happy alerting!

AWS Serverless Application Model: Here we go!

AWS Serverless Application Model (SAM) was released a couple months ago. The punch line of this new release in my mind is the ability to version your lambda function code and your cloudformation template next to each other. The idea being to have completely packaged serverless application that deploy from a single repository.

I spent an afternoon playing around with AWS SAM, and I’m already a pretty big fan. It makes deploying lambda functions a lot easier, especially when you have different accounts you want to use them in.

The example below is to create a lambda function that tags EBS volumes as they become available

AWSTemplateFormatVersion: '2010-09-09'
Transform: AWS::Serverless-2016-10-31
Resources:
EbsVolumeAvailableTagger:
Type: AWS::Serverless::Function
Properties:
Handler: ebs_available_date_tagger.lambda_handler
Role: !GetAtt EbsCleanerIAMRole.Arn
Runtime: python2.7
IAMEbsVolumeListTagPolicy:
Type: "AWS::IAM::Policy"
DependsOn: EbsCleanerIAMRole
Properties:
PolicyName: !Sub "Role=EBSCleaner,Env=${AccountParameter},Service=Lambda,Rights=RW"
PolicyDocument:
Version: "2012-10-17"
Statement:
-
Effect: "Allow"
Action:
- "ec2:CreateTags"
- "ec2:DeleteTags"
- "ec2:DescribeTags"
- "ec2:DescribeVolumeAttribute"
- "ec2:DescribeVolumeStatus"
- "ec2:DescribeVolumes"
Resource:
- "*"
Roles:
- !Ref EbsCleanerIAMRole
EbsCleanerIAMRole:
Type: "AWS::IAM::Role"
Properties:
RoleName: !Sub "Role=EbsCleaner,Env=${AccountParameter},Service=Lambda"
AssumeRolePolicyDocument:
Version: "2012-10-17"
Statement:
-
Effect: "Allow"
Principal:
Service:
- "lambda.amazonaws.com"
Action:
- "sts:AssumeRole"
Path: "/"
Parameters:
AccountParameter:
Type: String
Default: NoPHI
AllowedValues:
- Prod
- Staging
- Corporate
Description: Enter the account where this lambda function is being created. Will
be used to properly name the created IAM role

And then the python that it runs

import boto3
import re
import logging
import time

def lambda_handler(event, context):

# Number of days to wait before deleting a volume

volumeDaysOld = 30

# Get a cloudwatch logger
logger = logging.getLogger('EbsVolumeCleanup')
logger.setLevel(logging.DEBUG)

# Obtain boto3 resources
logger.info('Getting boto 3 resources')
opsworksClient = boto3.client('opsworks')
ec2Client = boto3.client('ec2')

availableVolumes = ec2Client.describe_volumes(Filters=[{'Name':'status','Values':['available']}])

availableVolumesToTag = []

for volume in availableVolumes['Volumes']:
logger.info(volume)
if 'Tags' in volume:
tags = volume['Tags']
availableDate = (tag for tag in tags if tag['Key'] == 'volumeAvailableDate').next()
if availableDate:
logger.info('Volume was available' + availableDate['Value'])
else:
logger.info('Volume not yet tagged')
availableVolumesToTag.append(volume['VolumeId'])
else:
availableVolumesToTag.append(volume['VolumeId'])

logger.info('Volumes to be tagged available: ' + "-" + str(len(availableVolumesToTag)) + " " + "|".join(availableVolumesToTag))
if availableVolumesToTag:
ec2Client.create_tags(Resources=availableVolumesToTag,Tags=[{'Key':'volumeAvailableDate','Value':time.strftime("%d/%m/%Y")}])

return 0

If you put the two of these into a directory together, you can use the aws cloudformation package and deploy CLI commands to push them to your account.

aws cloudformation package --template-file ec2-management-cft.yml --output-template-file instance-management-cft-staging.yml --s3-bucket cft-deployment-bucket --s3-prefix "lambda/ec2-management

aws cloudformation deploy --template-file instance-management-cft-sandbox.yml --stack-name Sandbox-InstanceManagement --capabilities CAPABILITY_NAMED_IAM --parameter-overrides AccountParameter=Staging

Those commands will package your lambda function, inserting the correct CodeUri property.

Lambda Logging to Cloudwatch

If you’re an AWS user, either professionally or personally, I can’t encourage you enough to try it out. Lambda is the ability to run code in a few different languages (currently Python, Node, and Java) without worrying about the server environment it runs on.
Unfortunately (or fortunately, depending on your perspective) as with any new technology or paradigm, there are caveats to Lambda.
For example, one problem we’ve solved with Lambda is monitoring web services endpoints. Lambda allows us to make an HTTP call to a web service using the python httplib. But because the python script is being run on a server we don’t control or configure, it isn’t configured to point to our DNS servers by default. You can imagine our initial confusion when the lambda function said the web service was unavailable, but we never saw any traffic to the service.
The best way we found to gain insight into what lambda is actually doing is by logging from Lambda to a cloudwatch log stream. This allows you to output logs and put retention policies on them. Amazon has been helpful enough to tie the built in python logger into Cloudwatch. All you really have to do is to create a logging object similar to the example below

Below is an example

https://gist.github.com/LenOtuye/a7c14d8753d8268ab6b53c6a15535a70.js

Your logs will then be dumped into a log stream that is named “/aws/lambda/”

One thing to note is that from my experience you can’t control the name of the log stream. Even if you create the logger with “logger = logging.getLogger(‘LogName’)” the logstream will be named after the Lambda function.

To give your Lambda function permissions to log to Cloudwatch it will need to run under a role that has permissions. The IAM role should allow Lambda resources to run under it, for example

https://gist.github.com/LenOtuye/6e3216e129592b59327d1af9751ee1ee.jsAnd then you will need to give it permissions similar to the following (plus whatever rights your lambda function needs for it’s actual work) https://gist.github.com/LenOtuye/7bbf8922a547f937862055bb74791188.js