example – Rolling Websphere

In my last post I talked about distributing your committed RI spend over time. The goal being to avoid buying too many 1 year RIs (front loading your spend), and missing out on the savings of committing to 3 years, but not buying too many 3 year RIs (back loading your spend) and risking having a bill you have to foot if your organization has major changes.

Our solution for balancing this is a powershell snippet that graphs our RI commitment over time.

# Get RI entries from AWS console
$ri_entries = Get-EC2ReservedInstance -filter @(@{Name="state";Value="active"});

# Array to hold the relevant RI data
$ri_data = @();

# Calculate monthly cost for RIs
foreach ($ri_entry in $ri_entries) {
    $ri = @{};
    $hourly = $ri_entry.RecurringCharges.Amount;
    $monthly = $hourly * 24 * 30 * $ri_entry.InstanceCount;
    $ri.monthly = $monthly;
    $ri.End = $ri_entry.End;
    $ri_data += $ri;
}

# Three years into the future (maximum duration of RIs as of 1.22.2019)
$three_years_out = (get-date).addyears(3);

# Our current date iterator
$current = (get-date);

# Array to hold the commit by month
$monthly_commit = @();

# CSV file name to save output
$csv_name = "ri_commitment-$((get-date).tostring('ddMMyyyy')).csv";

# Remove the CSV if it already exists
if(test-path $csv_name) {
    remove-item -force $csv_name;
}

# Insert CSV headers
"date,commitment" | out-file $csv_name -append -encoding ascii;

# Iterate from today to three years in the future
while($current -lt $three_years_out) {

    # Find the sum of the RIs that are active on this date
    # all RI data -> RIs that have expirations after current -> select the monthly measure -> get the sum -> select the sum
    $commit = ($ri_data | ? {$_.End -gt $current} | % {$_.monthly} | measure -sum).sum;

    # Build a row of the CSV
    $output = "$($current),$($commit)";

    # Print the output to standard out for quick review
    write-host $output;

    # Write out to the CSV for deeper analysis
    $output | out-file $csv_name -append -encoding ascii;

    # Increment to the next month and repeat
    $current = $current.addmonths(1);
}

Ok, short’s not the right word. It’s a little lengthy, but at the end it kicks out a CSV in your working directory with months and your RI commit for them.

From there it’s easy to create a graph that shows your RI spend commit over time.

That gives you an idea of how much spend you’ve committed to, and for how long.

We’ve been using AWS Codepipeline for some time now and for the most part it’s a great managed service. Easy to get started with and pretty simple to use.

That being said, it does lack some features out of the box that most CICD systems have ready for you. The one I’ll be tackling today is alerting on a stage failure.

Out of the box Codepipeline won’t alert you when there’s a failure at a stage. Unless you go in and literally look at it in the console, you won’t know that anything is broken. For example when I started working on this blog entry, I checked one of the pipelines that delivers to our test environment, and found it in a failed state.

In this case the failure is because our Opsworks stacks are set to turn off test instances when not during business hours, but for almost any other failure I would want to alert the team responsible for making the change that failed.

For a solution, we’ll use these resources

AWS Lambda
Boto3
AWS SNS Topics
Cloudformation

First we’ll need a Lambda function that can get a list of pipelines that are in our account, scan their stages, detect failures, and produce alerts. Below is a basic example of what we’re using. I’m far from a python expert, so I understand that there are improvements that could be made for error handling.

import boto3
import logging
import os

def lambda_handler(event, context):
    # Get a cloudwatch logger
    logger = logging.getLogger('mvp-alert-on-cp-failure')
    logger.setLevel(logging.DEBUG)

    sns_topic_arn = os.environ['TOPIC_ARN'] 

    # Obtain boto3 resources
    logger.info('Getting boto 3 resources')
    code_pipeline_client = boto3.client('codepipeline')
    sns_client = boto3.client('sns')

    logger.debug('Getting pipelines')
    for pipeline in code_pipeline_client.list_pipelines()['pipelines']:
        logger.debug('Checking pipeline ' + pipeline['name'] + ' for failures')
        for stage in code_pipeline_client.get_pipeline_state(name=pipeline['name'])['stageStates']:
            logger.debug('Checking stage ' + stage['stageName'] + ' for failures')
            if 'latestExecution' in stage and stage['latestExecution']['status'] == 'Failed':
                logger.debug('Stage failed! Sending SNS notification to ' + sns_topic_arn)
                failed_actions = ''
                for action in stage['actionStates']:
                    logger.debug(action)
                    logger.debug('Checking action ' + action['actionName'] + ' for failures')
                    if 'latestExecution' in action and action['latestExecution']['status'] == 'Failed':
                        logger.debug('Action failed!')
                        failed_actions += action['actionName']
                        logger.debug('Publishing failure alert: ' + pipeline['name'] + '|' + stage['stageName'] + '|' + action['actionName'])
                logger.debug('Publishing failure alert: ' + pipeline['name'] + '|' + stage['stageName'] + '|' + failed_actions)
                alert_subject = 'Codepipeline failure in ' + pipeline['name'] + ' at stage ' + stage['stageName']
                alert_message = 'Codepipeline failure in ' + pipeline['name'] + ' at stage ' + stage['stageName'] + '. Failed actions: ' + failed_actions
                logger.debug('Sending SNS notification')
                sns_client.publish(TopicArn=sns_topic_arn,Subject=alert_subject,Message=alert_message)

    return "And we're done!"

If you’re looking closely, you’re probably wondering what the environment variable named “TOPIC_ARN” is, which leads us to the next piece: A cloudformation template to create this lambda function.

The Cloudformation template needs to do a few things.

Create the Lambda function. I’ve chosen to do this using AWS Serverless Application Model.
Create an IAM Role for the Lambda function to execute under
Create IAM policies that will give the IAM role read access to your pipelines, and publish access to your SNS topic
Create an SNS topic with a list of individuals you want to get the email

The only really new fangled Cloudformation feature I’m using here is AWS SAM, the rest of these have existed for quite a while. In my opinion one of the main ideas behind AWS SAM is to package your entire Serverless Function in a single Cloudformation template, so the example below does all four of these steps.

#############################################
### Lambda function to alert on pipeline failures
#############################################

  LambdaAlertCPTestFail:
    Type: AWS::Serverless::Function
    Properties:
      Handler: mvp-alert-on-cp-failure.lambda_handler
      Role: !GetAtt IAMRoleAlertOnCPTestFailure.Arn
      Runtime: python2.7
      Timeout: 300
      Events:
        CheckEvery30Minutes:
          Type: Schedule
          Properties:
            Schedule: cron(0/30 12-23 ? * MON-FRI *)
      Environment:
        Variables:
          STAGE_NAME: Test
          TOPIC_ARN: !Ref CodePipelineTestStageFailureTopic
  CodePipelineTestStageFailureTopic:
    Type: "AWS::SNS::Topic"
    Properties: 
      DisplayName: MvpPipelineFailure
      Subscription:
        -
          Endpoint: 'pipelineCurator@example.com'
          Protocol: 'email'
      TopicName: MvpPipelineFailure
  IAMPolicyPublishToTestFailureTopic: 
    Type: "AWS::IAM::Policy"
    DependsOn: MoveToPHIIAMRole
    Properties:
        PolicyName: !Sub "Role=AlertOnCPTestFailure,Env=${AccountParameter},Service=SNS,Rights=Publish"
        PolicyDocument: 
            Version: "2012-10-17"
            Statement: 
                - 
                    Effect: "Allow"
                    Action:
                        - "sns:Publish"
                    Resource: 
                        - !Ref CodePipelineTestStageFailureTopic
        Roles:
          - !Ref IAMRoleAlertOnCPTestFailure
  IAMPolicyGetPipelineStatus: 
    Type: "AWS::IAM::Policy"
    DependsOn: MoveToPHIIAMRole
    Properties:
        PolicyName: !Sub "Role=AlertOnCPTestFailure,Env=${AccountParameter},Service=CodePipeline,Rights=R"
        PolicyDocument: 
            Version: "2012-10-17"
            Statement: 
                - 
                    Effect: "Allow"
                    Action:
                        - "codepipeline:GetPipeline"
                        - "codepipeline:GetPipelineState"
                        - "codepipeline:ListPipelines"
                    Resource: 
                        - "*"
        Roles:
          - !Ref IAMRoleAlertOnCPTestFailure
  IAMRoleAlertOnCPTestFailure: 
    Type: "AWS::IAM::Role"
    Properties:
      RoleName: !Sub "Role=AlertOnCPTestFailure,Env=${AccountParameter},Service=Lambda"
      AssumeRolePolicyDocument: 
        Version: "2012-10-17"
        Statement: 
          - 
            Effect: "Allow"
            Principal: 
              Service: 
                - "lambda.amazonaws.com"
            Action: 
              - "sts:AssumeRole"
      Path: "/"

#############################################
### End of pipeline failure alerting lambda function
#############################################

And that’s about it. A couple notes on the Cloudformation template:

Alert Frequency

I’m using a cron expression as my schedule, currently set to go off every half hour during business hours, because we don’t have over night staff that would be able to look at Pipeline failures. You can easily up the frequency with something like

cron(0/5 12-23 ? * MON-FRI *)

Lambda Environment Variables

One of the announcements from reInvent I was most excited about was AWS Lambda environment variables. This is a pretty magical feature that lets you pass in values to your Lambda functions. In this case, I’m using it to pass an SNS topic ARN that’s being created in the same Cloudformation template into the Lambda function.

Long story short, that means we can create resources in AWS and pass references to them into code without having to have a way to search for them or put their values into source code.

      Environment:
        Variables:
          STAGE_NAME: Test
          TOPIC_ARN: !Ref CodePipelineTestStageFailureTopic

Flowerboxes

The CFT this example comes from contains multiple pipeline management functions, so the flower boxes (“###############”) at the beginning and end of the Lambda Function definition are our way of keeping resources for each lambda function separated.

SNS Notifications

When you create an SNS topic with an email, the user will have to register with the topic. They’ll get an email and have to click the link to allow the notifications.

Snippets

These are snippets I pulled out of our pipeline management Cloudformation stack. Obviously you’ll have to put them into a Cloudformation template that references the SAM Cloudformation Transform and has a valid header like the one below:

AWSTemplateFormatVersion: '2010-09-09'
Transform: AWS::Serverless-2016-10-31
Resources:......

Happy alerting!

Rolling Websphere

Tag: example

Graph Your RI Commitment Over Time (subtitle: HOW LONG AM I PAYING FOR THIS?!?!?)

AWS Codepipeline: Alert on Stage Failure