Loving Docker as a Windows guy: August 2019

Over the last few years I’ve slowly made the transition from “docker detractor” through “docker accepter” all the way to “docker enthusiast” where I am now. Docker isn’t the only container tool out there, but it’s certainly one of the most popular. It gives a spin on application management and deployment that’s honestly pretty refreshing.

At the same time, I’ve been a windows enthusiast my whole life. I’m not opposed to Ubuntu, RHEL, or OSX, and I’m not a stranger to grep, find, and ls, but I’ll defend the use of Windows Server, “where-object” and “get-childitem” any day.
The Docker experience on windows has evolved a ton in the last few years. If you haven’t downloaded Docker Desktop, I would highly recommend grabbing it and playing around (even just to win a game of “have you tried?” at work).
In this post I’ll walk through some of my surprises (pleasant and unpleasant) with using Docker on windows for the last few years.

Volume Mapping is useful, but little tricky

If you didn’t know docker lets you mount directories from your host OS into your container with the docker run –volume option
This works between windows and linux containers, and windows and windows containers, and can be really useful for running linux utilities to modify files on your windows machine, like using a current version of openssl to update certificates

What’s weird about it?

Anytime you’re moving files between windows and linux, you’re going to encounter some weirdness. Things at the top of the list are
  • Files created on your windows machine show up in linux with “-rwxr-xr-x”. This can be a little confusing if you author a bash script on your laptop, then submit the Dockerfile to a build tool like CircleCI, and realize it doesn’t have execute permissions
  • Default line endings will drive you crazy
  • Support for specify the path on your host is different in Docker and docker-compose
    • Docker supports relative pathing, because you can execute “(pwd).path” inside of the string you pass to -v
    • Docker-compose supports relative pathing inside of your workspace with the normal “./”

Alpine will make you hate Windows Server Core

Alpine containers are delightfully tiny. Think of them like cheese burger sliders, or mini bagels, or minimuffins (you get the picture). The base alpine container is single digit MB (that’s not an exaggeration). Obviously as you add packages and tools to it, it starts to grow, but that’s a great starting point.
Compare that to Windows Server Core that starts off at just over 5 GB. And that only grows when you start adding packages and tools. By the time we have a working windows container it’s often in the tens of GB.
Partly because of the size windows server core containers take a long time to run some basic operations, like extracting, starting, and stopping. It can be frustrating compared to Alpine that can be completely downloaded and up and going in a matter of seconds.

Windows Server Nano is better

Microsoft Nano server is a smaller version of Windows Server and starts off at just under 400 MB which makes it smaller, but that shrinking comes with some caveats like it only runs .NET core and not full .NET.
That makes the case for Nano Server a little more obscure. If you’re committed to some specific windows features in .NET core that aren’t available in linux, it makes sense. But most of .NET core runs on linux now, so think about migrating to linux.

Microsoft’s Image Tagging and Versioning Confuses Me

While I was writing this post, I tried 3 or 4 different image tags from docker hub before I found the right combination of architecture and OS version that would work on my laptop.
Understanding that error comes down to understanding what docker does at it’s core. It is not a virtualization tool, your containers share your host kernel, so you obviously need to have a kernel that’s compatible with the image you’re trying to run. It can still be frustrating to get the first container running.
I’ve also spent a lot of time trying to find the right image repo on docker hub. Searching some terms that seem logical to me like “windows” or “windows 2019” don’t always return the images you’d expect. I haven’t found a good central listing of “all the windows containers from microsoft”.

Docker Desktop’s Moby Linux VM is smooth

When you run a linux container on windows Docker Desktop is using Microsoft HyperV behind the scenes to create a small VM you can use to run linux containers.
The first time I found that out, it gave me some heart palpitations. “A container inside of a VM running on my windows laptop? The resource drain will be astronomical! The performance will be subterranean! The compatibility will be atrocious!”
But after a few weeks of using it, I calmed down quite a bit. We’ve been running docker in production at work for a close to 2 years, and I have yet to see any compatibility issues pulling and running containers onto my laptop. Most of the development happens on OSX or actual linux machines, but the Moby VM works quite well.
This is going to change with the Windows Subsystem for linux 2, and I think that’s the right move. Microsoft is making Linux tooling and development first class citizens on Windows. But the linux VM running inside of HyperV is stable enough I won’t be an early adopter.

Networking is cool and a little confusing

It’s pretty straightforward to expose a port on the container to your host
And that works really smoothly, especially on Windows containers.
On linux containers the VM running your container is technically behind a HyperV switch, so some docker networking modes like “host” and “bridged” don’t work like you’d expect them to. This is fine if you’re running a single container, but when you want to create a group of containers with docker-compose you have to take it into consideration.
My recommendation is to keep things simple and use docker-compose as it was meant to. That is, let docker-compose manage the DNS and networking behind the scenes, and expose as few ports into your container services as you can.

Editing text files in windows containers is really hard, apparently?

If you’ve spent any time in linux you’re probably competent with at least one command line text editor (nano, vi, etc). But apparently none of those are readily available in most versions of windows containers. Searching the internet for “editing text files in powershell” leads me to a bunch of articles about using “set-content” or using powershell to open notepad.
Those approaches have their uses, but what if I just want to edit a config file in a windows container? Apparently my only option is to write a regex, or map the volume to my host. Weird.

Using Docker as an Admin tool on Windows

Have you ever tried to download openssl on Windows? You need to convert a cert, or just do some crypto work, so you google “openssl windows” and find the source forge entry. After a few minutes of scrolling around confused you finally accept that the page doesn’t have a release more recent than several years ago.

So you go back to google and click on the link for openssl.org, and realize that they don’t distribute any binaries at all (windows or otherwise).
You scroll a few entries further down, still looking for an executable or guide to get openssl on windows, and you click on a promising article heading. Perusing it tells you that it’s actually just a guide for Cygwin (and it would work, but then you have Cygwin sitting on your machine, and you’ll probably never use it again). You think to yourself, “There has to be an executable somewhere.”
Next you jump to page 2 of the google results (personally it’s the first time I’ve jumped to that page two in years) and scrolling you find more of the same. Linux fanatics using Cygwin, source code you could compile yourself, and obscure religious wars like schannel vs every other cryptography provider.
All you really want is to go from a .pfx to a .pem, and you’re running in circles looking for the most popular tool in the world to do it.
Enter Docker.
At work a number of our services are deployed on Docker, so I already have Docker desktop installed, and it’s often in Linux container mode on my workstation, so it was only a couple commands to get into openssl in an alpine container
Here are my commands for reference:
PS C:\Users\bolson\Documents> docker run -v "$((pwd).path)/keys:/keys" -it alpine
/ # cd keys/
/keys # ls -l | grep corp.pem
-rwxr-xr-x 1 root root 1692 Jul 6 17:09 corp.pem
/keys # apk add openssl
fetch http://dl-cdn.alpinelinux.org/alpine/v3.10/main/x86_64/APKINDEX.tar.gz
fetch http://dl-cdn.alpinelinux.org/alpine/v3.10/community/x86_64/APKINDEX.tar.gz
(1/1) Installing openssl (1.1.1c-r0)
Executing busybox-1.30.1-r2.trigger
OK: 6 MiB in 15 packages
/keys # openssl
OpenSSL>

I realize there are plenty of other options for getting a Linux interpreter up and running on Windows. You could grab virtual box and a ubuntu ISO, you could open a Cloud9 environment on AWS and get to an Amazon Linux instance there, you could use the Windows Subsystem for Linux, you could dual boot your windows laptop with some linux distro, and the list could go on.

Those approaches are fine and would all work but they either take time, cost money, or are focused on one specific scenario, and wouldn’t have much utility outside of getting your into openssl to convert your cert. If I realize that one of the certs I need to convert is a jks store instead of a .pfx, I can flip over to a docker image with the java keytool installed pretty easily

Cleanup is easy with a few powershell commands

$containers = docker ps -a
foreach ($container in $containers) {$id = ($container -split "[ ]+")[0];docker rm $id}

$images = docker images;
foreach ($image in $images) {$id = ($image -split "[ ]+")[2];docker rmi $id}

And that’s why, as a windows user, I love Docker. You get simple, easy access to Linux environments for utilities and it’s straightforward to map directories on your windows machines into the Linux containers.

Nowadays you can use the Windows Subsystem for Linux for easy command line SSH access from Windows, but before that went GA on Windows 10 I used Docker for an easy SSH client (I know that plink exists, so this time you can accuse me of forcing Docker to be a solution).

You can create a simple Dockerfile that adds open ssh to an alpine container like so

FROM alpine
RUN apk add openssh-client

And then run it with

docker build -t ssh .
docker run -v "$($env:userprofile)/documents/keys:/keys" -it ssh sh

And you’re up and runing with an SSH client. Simple!

Again, there are other ways of accomplishing all of these tasks. But if you’re organization is investing in Docker using it for a few simple management tasks can give you some familiarity with the mechanics, and make it easier for you to support on all kinds of development platforms.

Using Unit Tests for Communication

Expressing software requirements is hard. Software is abstract and intangible, you can’t touch it see it or feel it, so talking about it can become very difficult.

There are plenty of tools and techniques that people use to make communicating about software easier like UML diagrams or software state diagrams. Today I’d like to talk about a software development technique my team has been using for communication: Test Driven Development.
Test Driven Development (TDD) is a very involved software development approach, and I won’t go into it in depth in this post, but here’s a quick summary
  1. Write tests before code, expecting them to fail
  2. Build out code that adds the behavior the test looks for
  3. Rerun the test
On my team we’ve started trying to use unit tests as a communication tool. Here’s an example.
At my company we use the concept of a unique customer ID that consists of the first two digits of the state the customer’s headquarters is in, and a sequential 3 digit number. At least, most of the time. The Site ID concept grew organically, and like any standard that starts organically it has exceptions. Here are a few that I’m aware
  1. One very large customer uses an abbreviation of their company name, instead of state code
  2. Most systems pad with zeros, some do not (e.g. TX001 in some systems is TX1 in others)
  3. Most systems use a 3 digit number, some use four (e.g. TX001 vs TX0001)
After we added “site id converters” to a few modules in our config management code we decided it was time to centralize that functionality and build out a single “site id converter” function. When I was writing the card it was clear there was enough variation that it was going to take a fair amount of verbiage to spell out what I wanted. Let’s give it a try for fun.

Please build a function that takes a site id (the siteid can be two digit state code and 1, 3, or 4 digits, or 3 digit customer code and 1, 3, or 4 digits). The function should also take a “target domain” which is where the site id will be used (for example “citrix” or “bi”). The function should convert the site id into the right format for the target domain. For example if TX0001 and Citrix is passed in the function should convert to “TX001”. If TX001 and “BI” are passed in the function should convert to “TX0001”.

It’s not terrible, but it gets more complex when you start unpacking it and notice the details I may have forgotten to add. What if the function gets passed an invalid domain? What should the signature look like? What module should the function go into?

And it gets clarified when we add a simple unit tests with lots of details that feel a little awkward to put in the card.

Describe "Get SiteID by domain tests" {
InModuleScope CommonFunctions {
It "converts long id to short for citrix" {
Get-SiteIdByDomain -siteid "TX0001" -domain "citrix" | Should be "TX001"
}
}

I’m not suggesting you stop writing cards or user stories, rather the right answer usually seems to be a little of both. Some verbiage to tell that background, give some motivation for the card, and open a dialog with the engineer doing the work. Some unit tests to give more explicit requirements on the interface to the function.

This approach also leaves the engineer free to pick their own implementation details. They could use a few nested “if” statements, they could use powershell script blocks, or anything else they can think of (that will pass a peer review). As long as it meets the requirements of the unit test the specifics are wide open.

A note about TDD

Please note, I’m not a TDD fanatic. TDD is a really useful tool that drives you to write testable code that keeps units small, makes your functions reasonable, and lets you refactor with confidence. As long as your code still passes your unit test sweet you can make any changes you want.
But it’s not a magic hammer. In my experience it’s not a good fit to start with TDD when
  • You’re using an unfamiliar SDK or framework, TDD will slow down your exploration
  • You have a small script that is truly one time use, TDD isn’t worth the overhead
  • You have a team that is unfamiliar with TDD, introducing it can be good, using it as a hard and fast rule will demoralize and delay

Docker Windows container for Pester Tests

I recently wrote an intro to unit testing your powershell modules with Pester, and I wanted to give a walk through for our method of running these unit tests inside of a Docker for Windows container.

Before we get started, I’d like to acknowledge this post is obviously filled with trendy buzzwords (CICD, Docker, Config Management, *game of thrones, docker-compose, you get the picture). All of the components we’re going to talk through today add concrete value to our business, and we didn’t do any resume driven development.

Why?

Here’s a quick run through of our motivation for each of the pieces I’ll cover in this post.
  1. Docker image for running unit tests 
    1. gives engineers a consistent way to run the unit tests. One your workstation you might need different versions of SDKs and tools, but a docker container lets you pin versions of things like the AWS Powershell tools
    2. Makes all pathing consistent – you can setup your laptop anyway you lock, but the paths inside of the container are consistent
  2. Docker-compose
    1. Provides a way to customize unit test runs to a project
    2. Provides a consistent way for engineers to map drives into the container
  3. Code coverage metrics
    1. At my company we don’t put too much stock in code coverage metrics, but they offer some context for how thorough an engineer has been with unit tests
    2. We keep a loose goal of 60%
  4. Unit test passing count
    1. A failed unit test does not go to production. A failed unit test has a high chance of causing production outage

How!

The first step is to setup Docker Desktop for Windows. The biggest struggle I’ve seen people having getting docker running on Windows is getting virtualization enabled, so pay extra attention to that step.
Once you have Docker installed you’ll need to create an image you can use to run your unit tests, a script to execute them, and a docker-compose file. The whole structure will look like

  • /
    • docker-compose.yml
    • /pestertester
      • Dockerfile
      • Run-AllUnitTests.ps1

We call our image “pestertester” (I’m more proud of that name than I should be).

There are two files inside of the pestertester folder: a Dockerfile that defines the image, and a script called Run-AllUnitTests.ps1.
Here’s a simple example of the dockerfile. For more detail on how to write a dockerfile you should explore the dockerfile reference

FROM mcr.microsoft.com/windows/servercore
RUN "powershell Install-PackageProvider -Name NuGet -MinimumVersion 2.8.5.201 -Force"
RUN "powershell Install-Module -Scope CurrentUser -Name AWSPowerShell -Force;"
COPY ./Run-AllUnitTests.ps1 c:/scripts/Run-AllUnitTests.ps1

All we need for these unit tests is the AWS Powershell Tools, and we install NuGet so we can use powershell’s Install-Module.

We played around with several different docker images before we picked mcr.microsoft.com/windows/servercore.

  1. We moved away from any of the .NET containers because we didn’t need the dependencies they added, and they were very large
  2. We moved away from nano server images because some of our powershell modules call functions outside of .NET core
Next we have the script Run-AllUnitTests.ps1. The main requirement for this script to work is that your tests be stored with this file structure
  • /ConfigModule.psm1
    • /tests
      • /ConfigModule.tests.ps1
  • ConfigModule2.psm1
    • /tests
      • /ConfigModule2.tests.ps1
The script isn’t too complicated
$results = @();
gci -recurse -include tests -directory | ? {$_.FullName -notlike "*dsc*"} | % {
set-location $_.FullName;
$tests = gci;
foreach ($test in $tests) {
$module = $test.Name.Replace("tests.ps1","psm1")
$result = invoke-pester ".\$test" -CodeCoverage "..\$module" -passthru -quiet;
$results += @{Module = $module;
Total = $result.TotalCount;
passed = $result.PassedCount;
failed = $result.FailedCount
codecoverage = [math]::round(($result.CodeCoverage.NumberOfCommandsExecuted / $result.CodeCoverage.NumberOfCommandsAnalyzed) * 100,2)
}
}
}

foreach ($result in $results) {
write-host -foregroundcolor Magenta "module: $($result['Module'])";
write-host "Total tests: $($result['total'])";
write-host -ForegroundColor Green "Passed tests: $($result['passed'])";
if($result['failed'] -gt 0) {
$color = "Red";
} else {
$color = "Green";
}
write-host -foregroundcolor $color "Failed tests: $($result['failed'])";
if($result['codecoverage'] -gt 60) {
$color = "Green";
} elseif($result['codecoverage'] -gt 30) {
$color = "Yellow";
} else {
$color = "Red";
}
write-host -ForegroundColor $color "CodeCoverage: $($result['codecoverage'])";
}

The script iterates through any subdirectories named “tests”, and executes the unit tests it finds there, running code coverage metrics for each module.

The last piece to tie all of this together is a docker-compose file. The docker compose file handles

  1. Mapping the windows drives into the container
  2. Executing the script that runs the unit tests
The docker-compose file is pretty straightforward too
version: '3.7'

services:
pestertester:
build: ./pestertester
volumes:
- c:\users\bolson\documents\github\dt-infra-citrix-management\ssm:c:\ssm
stdin_open: true
tty: true
command: powershell "cd ssm;C:\scripts\Run-AllUnitTests.ps1"

Once you’ve got all of this setup, you can run your unit tests with

docker-compose run pestertester

One the container starts up you’ll see your test results

Experience

We’ve been running linux containers in production for a couple of years now, but we’re just starting to pilot windows containers. According to the documentation they’re not production ready yet

Docker is a full development platform for creating containerized apps, and Docker Desktop for Windows is the best way to get started with Docker on Windows.

Running our unit tests inside of windows containers has been a good way to get some experience with them without risking production impact.

A couple final thoughts

Windows containers are large, even server core and nano server are gigabytes.

The container we landed on is 11GB

If you need to run windows containers, and you can’t stick to .NET core and get onto nano server, you’re going to be stuck with pretty large images.

Start up times for windows containers will be a few minutes

Especially the first time on a machine while resources are getting loaded.

Versatile Pattern

This pattern of unit testing inside of a container is pretty versatile. You can use it with any unit testing framework, and any operating system you can run inside a container.

*no actual game of thrones references will be in this blog post

AWS S3 Lifecycle Policies – Prep for Deep Archive

AWS recently released a new S3 storage class called Deep Archive. It’s an archival data service with pretty low cost for data you need to hold onto, but don’t access very often.

Deep Archive is about half the cost of Glacier at $0.00099 per GB per month, but you sacrifice the option to get your data back in minutes, your only retrieval option is hours.

I work for a health care company so we hold onto patient data for years. There are plenty of reasons we might need to retrieve data from years ago, but few of them would have a time limit of less than several weeks. That makes Deep Archive a great fit for our long term data retention.

Setting it up is as simple as changing an existing life cycle transition to Deep Archive, or creating a new S3 Lifecycle transition to glacier

We put together a quick script to find the lifecycle transition rules in our S3 buckets that move data to Glacier already

$buckets = get-s3bucket;

# Iterate through buckets in the current account
foreach ($bucket in $buckets) {
write-host -foregroundcolor Green "Bucket: $($bucket.BucketName)";

# Get the lifecycle configuration for each bucket
$lifecycle = Get-S3LifecycleConfiguration -BucketName $bucket.BucketName;

# Print a warning if ther eare no lifecycles for this bucket
if(!$lifecycle) {
write-host -foregroundcolor Yellow "$($bucket.BucketName) has no life cycle policies";
} else {
# Iterate the transition rules in this lifecycle
foreach ($rule in $lifecycle.Rules) {
write-host -foregroundcolor Magenta "$($rule.Id) with prefix: $($rule.Filter.Lifecyclefilterpredicate.Prefix)";
# Print a warning if there are no transitions
if(!($rule.Transitions)) {
write-host -foregroundcolor Yellow "No lifecycle transitions";
}

# Iterate the transitions and print the rules
foreach ($transition in $rule.Transitions) {
if($transition.StorageClass -eq "GLACIER") {
$color = "Yellow";
} else {
$color = "White";
}
write-host -foregroundcolor $color "After $($transition.Days) transition to $($transition.StorageClass)";
}
}
}
}

To run this script you’ll need the AWS powershell tools, an IAM account setup, and a default region initialized.

When you run the script it will print out your current S3 buckets, the lifecycle rules, and the transitions in each of them, highlighting the transitions to Glacier in yellow.

Unit Testing PowerShell Modules with Pester

Pester is a unit testing framework for Powershell. There are some good tutorials for it on their github page, and a few other places, but I’d like to pull together some of the key motivating use cases I’ve found and a couple of the gotchas.

Let’s start with a very simple example.
This is the contents of a simple utility module named Util.psm1
function Get-Sum([int]$number1, [int]$number2) {
$result = $number1 + $number2;
write-host "Result is: $($result)";
return $result;
}

And this is the content of a simple unit test file named UtilTest.ps1

Import-Module .\Util.psm1
Describe "Util Function Tests" {
It "Get-Sum Adds two numbers" {
Get-Sum 2 2 | Should be 4;
}
}

We can run these tests using “Invoke-Pester .\UtilTest.ps1”.

And already there’s a gotcha here that wasn’t obvious to me from the examples online. Let’s say I change my function to say “Sum is:” instead of “Result is” and save the file. When I re-run my pester tests I still see “Result is:” printed out.

What’s also interesting is that the second run rook 122 ms, while the first took 407 ms.

It turns out both of these changes are results of the same fact – once the module you are testing is loaded into memory it will stay there until you Remove it. That means any changes you make trying to fix your unit tests won’t take effect until you’ve refreshed the module. The fix is simple

Import-Module .\Util.psm1
Describe "Util Function Tests" {
It "Get-Sum Adds two numbers" {
Get-Sum 2 2 | Should be 4;
}
}
Remove-Module Util;

Removing the module after running your tests makes powershell pull a fresh copy into memory so you can see the changes.

The next gotcha is using the Mock keyword. Let’s say I want to hide the write-host output in my function so it doesn’t clutter up my unit tests. The obvious way is to use the “Mock” keyword to create a new version of write-host that doesn’t actually write anything. My first attempt looked like this

Import-Module .\Util.psm1
Describe "Util Function Tests" {
It "Get-Sum Adds two numbers" {
Mock write-host;
Get-Sum 2 2 | Should be 4;
}
}
Remove-Module Util;

But I still see the write-host output in my unit test results.

It turns out the reason is that the Mock keyword creates mock objects in the current scope, instead of in scope for the module being tested. There are two ways of fixing this. One is the InModuleScope, or the ModuleName parameter on the Mock object. Here’s an example of the first option

Import-Module .\Util.psm1

InModuleScope Util {
Describe "Util Function Tests" {
It "Get-Sum Adds two numbers" {
Mock write-host;
Get-Sum 2 2 | Should be 4;
}
}
}
Remove-Module Util;

And just like that the output goes away!

Graph Your RI Commitment Over Time (subtitle: HOW LONG AM I PAYING FOR THIS?!?!?)

In my last post I talked about distributing your committed RI spend over time. The goal being to avoid buying too many 1 year RIs (front loading your spend), and missing out on the savings of committing to 3 years, but not buying too many 3 year RIs (back loading your spend) and risking having a bill you have to foot if your organization has major changes.

Our solution for balancing this is a powershell snippet that graphs our RI commitment over time.

# Get RI entries from AWS console
$ri_entries = Get-EC2ReservedInstance -filter @(@{Name="state";Value="active"});

# Array to hold the relevant RI data
$ri_data = @();

# Calculate monthly cost for RIs
foreach ($ri_entry in $ri_entries) {
$ri = @{};
$hourly = $ri_entry.RecurringCharges.Amount;
$monthly = $hourly * 24 * 30 * $ri_entry.InstanceCount;
$ri.monthly = $monthly;
$ri.End = $ri_entry.End;
$ri_data += $ri;
}

# Three years into the future (maximum duration of RIs as of 1.22.2019)
$three_years_out = (get-date).addyears(3);

# Our current date iterator
$current = (get-date);

# Array to hold the commit by month
$monthly_commit = @();

# CSV file name to save output
$csv_name = "ri_commitment-$((get-date).tostring('ddMMyyyy')).csv";

# Remove the CSV if it already exists
if(test-path $csv_name) {
remove-item -force $csv_name;
}

# Insert CSV headers
"date,commitment" | out-file $csv_name -append -encoding ascii;

# Iterate from today to three years in the future
while($current -lt $three_years_out) {

# Find the sum of the RIs that are active on this date
# all RI data -> RIs that have expirations after current -> select the monthly measure -> get the sum -> select the sum
$commit = ($ri_data | ? {$_.End -gt $current} | % {$_.monthly} | measure -sum).sum;

# Build a row of the CSV
$output = "$($current),$($commit)";

# Print the output to standard out for quick review
write-host $output;

# Write out to the CSV for deeper analysis
$output | out-file $csv_name -append -encoding ascii;

# Increment to the next month and repeat
$current = $current.addmonths(1);
}

Ok, short’s not the right word. It’s a little lengthy, but at the end it kicks out a CSV in your working directory with months and your RI commit for them.

From there it’s easy to create a graph that shows your RI spend commit over time.

That gives you an idea of how much spend you’ve committed to, and for how long.

AWS Powershell Tools: Get Specific Tags

A quick AWS Powershell tools snippet post here. When you call Get-EC2Instance from the AWS Powershell tools it returns an instance object that has a Tags attribute, which is a Powershell list of EC2 Tag Objects.
I’m usually a fan of how the AWS Powershell Tools object models are setup, but this is one case where I feel there could be some improvement. Instead of using a list and forcing users to iterate the list to find the right tag, the EC2 objects “Tags” property should be a hashtable with the tag Key as the hash key so you can index directly to the object. But, this is was we have to work with for now.
So we came up with a simple function to generate a map of desired EC2 tags from an instance.
function Get-Ec2InstanceTag($instance, [array]$desiredTagKeys) {
$instanceTags = $instance.Tags;
$tagMap = @{};
foreach ($desiredTagKey in $desiredTagKeys) {
foreach ($instanceTag in $instanceTags) {
if($desiredTagKey -eq $instanceTag.Key) {
$tagMap[$desiredTagKey] = $instanceTag.Value;
}
}
}
return $tagMap;
}

Usage for this function looks like this:

AWS Powershell Tools Snippets: S3 Multipart Upload Cleanup

My company does quite a bit with AWS S3. We use it to store static files and images, we push backups to it, we use it to deliver application artifacts, and the list goes on.

When you push a significant amount of data to and from S3, you’re bound to experience some network interruptions that could stop an upload. Most of the time S3 clients will recover on their own, but there are some cases where it might struggle.

One such case is when you are pushing a large file and using S3 Multi Part Uploads. This can leave you with pieces of files sitting in S3 that are not useful for anything, but still taking up space and costing you money. We recently worked with AWS support to get a report of how how many incomplete uploads we had sitting around, and it was in the double digit terabytes!

We started looking for a way to clean them up and found that AWS recently created a way to manage these with a bucket lifecycle policy. Some details are in a doc here and there’s an example of how to create this policy on the AWS CLI towards the bottom.

We decided to recreate this functionality in Powershell using “Write-S3LifecycleConfiguration Cmdlet” to make it a little easier to apply the policy to all of the buckets in our account at once.

It took a little reverse engineering. The Write-S3LifecycleConfiguration commandlet doesn’t have many useful examples. In the end I wound up creating the policy I wanted in the AWS console, and then using Get-S3LifecycleConfiguration to see how AWS is representing the policies in their .NET class structure.

It seems to me that there are a lot of classes between you and creating this policy, but that could mean that AWS has future plans to make these policies even more dynamic and useful.

The code I came up with at the end is below. Hope it’s helpful!


$rule = new-object -typename Amazon.S3.Model.LifecycleRule;
$incompleteUploadCleanupDays = new-object -typename Amazon.S3.Model.LifecycleRuleAbortIncompleteMultipartUpload
$incompleteUploadCleanupDays.DaysAfterInitiation = 7
$rule.AbortIncompleteMultipartUpload = $incompleteUploadCleanupDays
$rule.ID = "WholeBucketPolicy"
$rule.status = "Enabled"

$prefixPredicate = new-object -type Amazon.S3.Model.LifecyclePrefixPredicate

$lifecycleFilter = new-object -type Amazon.S3.Model.LifecycleFilter

$lifecycleFilter.LifecycleFilterPredicate = $prefixPredicate

$rule.Filter = $lifecycleFilter

foreach ($bucket in get-s3bucket) {
write-host "Bucket name: $($bucket.bucketname)"
$existingRules = get-s3lifecycleconfiguration -bucketname $bucket.bucketname
$newPolicyNeeded = $true;
foreach ($existingRule in $existingRules.rules) {
if($existingRule.ID -eq $rule.ID) {
write-host "Policy $($rule.ID) already exists, skipping bucket"
$newPolicyNeeded = $false;
}
}
if($newPolicyNeeded) {
write-host "Rule not found, adding"
$existingRules.rules += $rule

Write-S3LifecycleConfiguration -bucketname $bucket.bucketname -configuration_rule $existingRules.rules
}
}