AWS Lambda Functions with Modular Powershell

It’s been a while since AWS released support for running Powershell in Lambda Functions. Up until now all of the Lambda functions I’ve worked with have been either python or NodeJS, but we recently had a project that needed to update a database from inside of a deployment pipeline.

We’re trying to retire an old admin box where users would run Powershell management scripts to do things like provision servers, configure customer accounts, etc. Rather than trust a user to remember which script to run, pick the right parameters, check the output, etc, etc, we wanted all of that to happen as part of a CICD pipeline.

I know what you’re thinking: “Why would you drag along the powershell scripts? It’s probably a zillion line single script that’s impossible to debug or manage.” But the codebase is pretty well designed — it’s modular, implements defensive programming, has tons of business logic in it, and most of it has >50% unit test coverage with Pester. So there’s a lot of good stuff there that would take time to rebuild in another language.

So what’s our plan?

Our goals are to have a Lambda function that runs some .NET Core powershell code. The code needs have unit tests that can be run in a CI environment, and be deployed from a pipeline.

The .NET core requirement is interesting. As a windows user I’m still on Powershell 5.1 on my laptop and I use AWSPowershellTools (the old monolithic Powershell module). That’s not to say I couldn’t make the jump, but I’d rather not risk breaking my other tooling for this one project. So we’ll setup a docker environment to do our local development. Let’s get started!

It’s probably obvious, but I’ll be assuming you’re working on a windows machine.

Setting up a Docker image for the build

The first thing we need is a docker image we can use to run our local builds and deploys in.

My first instinct was to use an image from the Powershell repo, but I had a terrible time getting the right version of .NET Core installed. I was getting errors running the Powershell commandlets for building the lambda functions, .NET core install errors, and the list went on.

After struggling for a while I flipped to the mcr.microsoft.com/dotnet/core/sdk run time, and it worked much better. After reading several of the tutorials I decided to use the 2.1-bionic tag because

  • Lambda functions are going to run in a linux environment
  • It’s smaller than running a windows container
  • it worked! 🙂

The first step is to get into the container and see what tools we’ll need. So I ran

mcr.microsoft.com/dotnet/core/sdk:3.1-bionic

And of course because this is a linux container our interpreter is bash off the bad. Installing powershell is pretty easy

dotnet tool install --global PowerShell

That gives us the pwsh program we can use to get into Powershell on the linux container. To see this running try

export PATH="$PATH:/root/.dotnet/tools"
pwsh
gci # prints your current directory
exit # returns you to bash

Next we need to install the zip package.

apt-get update \
&& apt-get install zip -y

This will take several minutes while your container downloads You can run these as separate commands too if you’d like. When you see the AWS CLI print it’s version you know you’re setup!

Let’s we need to install the AWSLambdaPSCore into the Powershell environment. To do that you’ll run

pwsh "Install-Module AWSLambdaPSCore -Confirm:\$false -force; Import-Module AWSLambdaPSCore;"

You can break that apart too if you want to. As long as you don’t see an error importing the Powershell Module you should be good to go! Let’s put all of that work into a docker file

FROM mcr.microsoft.com/dotnet/core/sdk:3.1-bionic
RUN dotnet tool install --global PowerShell \
&& apt-get update \
&& apt-get install zip -y
# python3 python3-pip -y \
# && pip3 install awscli --upgrade
RUN export PATH="$PATH:/root/.dotnet/tools" \
&& export PATH=~/.local/bin:$PATH \
&& pwsh -command "Install-Module AWSLambdaPSCore -Confirm:\$false -force; Import-Module AWSLambdaPSCore;"
ENTRYPOINT [ "pwsh" ]
ENV PATH="~/.local/bin:/root/.dotnet/tools:${PATH}"

You can build this by running this command inside of the directory with your Dockerfile

docker build -t lambda-dotnet .

Project Structure

Now that we have a working Dockerfile we need to create our project structure it will look like

  • Update-MetaData will hold our powershell handler function
  • CommonModules will hold our testable powershell code and unit tests
  • scripts will hold deployment tools
  • docker-compose.yml we’ll dig into later

Creating the Module and Unit test

Let’s create our common module first. You’ll need to create a new Powershell module manifest with New-ModuleManifest.

New-ModuleManifest CommonModules\Common.psd1 -RootModule Common.psm1 -FunctionsToExport "*" -moduleversion "0.0.1" -Description "Testable module"

This tells powershell to create a new module definition, points to the Common.psm1 powershell module, and exports all functions. You can get a lot fancier with your psd1 files, but those are the basics to get started.

Next we’ll create a simple function in our common module

# Function to insert some data
function Set-AppData {
write-host "working on inserting data";
return $true;
}
Export-ModuleMEmber -function 'Set-AppData';

And add a single unit test for it in a tests\Common.tests.ps1 file

$module = "Common.psm1";
if(test-path ".\$module") {
Import-Module ".\$module";
} elseif(test-path "..\$module") {
import-module "..\$module";
}
# Test all of the citrix common functions
Describe "Common Functions tests" {
InModuleScope Common {
It "returns true" {
Set-AppData | Should be $true;
}
}
}
# Cleanup the module so we can test new changes
remove-module Common;

The boiler plate code the top lets you execute your tests from either the CommonModules directory or the CommonModules\tests directory. We should be able to invoke this and see our tests pass

Powershell Handler

Our Powershell handler is going to be hard to be a little hard to unit test. It seems like it has to be a ps1 file, and AWS Lambda is going to run the script from top to bottom. That makes it a little hard to inject unit tests if we have a lot of logic in that handler script.

That’s why we’re keeping our handler thin (and using the common modules we created above. Let’s look at a slim implementation of the Lambda Handler

# PowerShell script file to be executed as a AWS Lambda function.
# When executing in Lambda the following variables will be predefined.
# $LambdaInput - A PSObject that contains the Lambda function input data.
# $LambdaContext - An Amazon.Lambda.Core.ILambdaContext object that contains information about the currently running Lambda environment.
# The last item in the PowerShell pipeline will be returned as the result of the Lambda function.
# To include PowerShell modules with your Lambda function, like the AWSPowerShell.NetCore module, add a "#Requires" statemen
# indicating the module and version.
#Requires -Modules @{ModuleName='AWSPowerShell.NetCore';ModuleVersion='3.3.335.0'},@{ModuleName='Common';ModuleVersion='0.0.1'}
write-host "Inserting new app data"
Import-Module Common;
# Use function from CommonModules to insert new CMT app workers
Set-AppData;

The end of this script looks like pretty standard powershell, we import a module and call a function.

The top of the script is comments which get generated when you call New-AWSPowerShelLambda .

The middle of this script is interesting, though.

#Requires -Modules @{ModuleName='AWSPowerShell.NetCore';ModuleVersion='3.3.335.0'},@{ModuleName='Common';ModuleVersion='0.0.1'}
write-host "Inserting new app data"
Import-Module Common;

This line tells the AWS tools how to pull together the modules that your function requires. And here’s where we hit an interesting part of our build: we need a place for the lambda tools to find both your modules, and the AWSPowerShell.NetCore module.

Walking through the build

This script will run inside of the container. We’ll start by creating a new temp directory, and registering it as a local powershell repo.

new-item -itemtype Directory -Path /tmp/ -Name "localpsrepo";
Register-PSRepository -Name LocalPSRepo `
-SourceLocation '/tmp/localpsrepo' `
-ScriptSourceLocation '/tmp/localpsrepo' `
-InstallationPolicy Trusted;

Now we’ll publish our common modules to it, and save the AWSPowerShell.NetCore module to it as well.

publish-module -Name ‘/lambda/CommonModules/Common.psd1’ `
-Repository LocalPSRepo `
-NuGetApiKey ‘AnyStringWillDo’;
save-package -name “AWSPowerShell.NetCore” `
-Provider NuGet `
-source https://www.powershellgallery.com/api/v2 `
-RequiredVersion 3.3.335.0 `
-Path /tmp/localpsrepo;

That gives our call to Publish-AWSPowerShellLambda a local repo to pull from.

Publish-AWSPowerShellLambda -scriptpath /lambda/Update-MetaData/Set-AppData.ps1 `
-StagingDirectory /tmp/lambda `
-ProfileName sandbox `
-region us-east-1 `
-ModuleRepository LocalPSRepo `
-Name Set-AppData`
-IamRoleArn arn:aws:iam::*********:role/lambda_basic_execution

That’s the build, lastly let’s pull it all together with a docker-compose file.

Docker Compose File

Docker compose is a tool for building and running multiple services at a time in a docker development environment. Among other things it lets you call out volumes to map into your container, which is our main use.

Our goal is to map the powershell script, modules, and our AWS credentials into the container so we can run the build using the .NET Core container.

Your docker-compose file will look like this

version: '3.7'
services:
lambda-dotnet:
build: "."
volumes:
- .:/lambda
- "${USER_HOME}/.aws:/root/.aws"
command: /lambda/scripts/Deploy-Lambda.ps1
stdin_open: true
tty: true

NOTE: There are some tricks to sharing files between a windows host and a linux container. There are some tips in my post here.

Lastly you’ll need to create a .env file (reference here) that looks like this

USER_HOME=c:\users\bolson\

And that’s it! You can build and deploy your lambda function with

docker-compose run lambda-dotnet

My troubleshooting routine

This post is going to have fewer technical examples and be more about my troubleshooting methodology. The concepts I’m going to describe may seem rudimentary to some, intuitive to others, and eye opening to a few. I’ve watched enough junior engineers wrestle with solving vague problems I felt it was worth documenting my approach.

When you’re a junior developer or a computer science it’s easy to get used curated problems — that is bugs or issues that are scoped for you and may come with some guidance toward a solution. We’ve all gotten assignments like, “Use Dijkstra’s algorithm too….”, “User Jim Bob is having trouble printing, please call him and have him update his print driver”, “When you get this page in the middle of the night follow this troubleshooting document to restart x job.”

Don’t get me wrong, this problems can still be frustrating and can take a lot of work to resolve. Dijkstra’s algorithm is not simple, and walking a user through updating a print driver over the phone is something I still have nightmares about. But those problems give you a starting point like which algorithm to use, looking at a print driver, or following a document. I classify those problems differently than what I think of as “vague issues”.

Vague issues are problems that don’t come with any guidance, and don’t scope themselves for you. Thinks like, “This application is acting slow”, or “we’re getting consistent reports of an error we can’t seem to reproduce”, or “a lot of users are different locations are complaining they can’t print”. Problems like these don’t have any scope for you, they don’t have a documented solution, and they aren’t solved easily. As a DevOps practitioner I’ve seen these problems both with my development hat, and my Ops hat on. The specific tools you use with each are different, but my general approach is the same. I’ll give an example of each.

The steps to my routine are

  1. Grasp the severity of the problem
  2. Pick wide, useful outer bounds for the broken system
  3. Divide the system into components based on your insight points
  4. Assess the insight point closest to the middle of the system and decide if the problem is closer to the symptom, or closer to the back end
  5. Shrink the system based on your decision, picking new components if it makes sense
  6. Repeat until you are confident you’ve found the problem

Let’s dive in!

Grasp the severity of the problem

All troubleshooting and debugging is done on a mix of qualitative and quantitative data. By that I mean every time you solve a problem you are working with a mix of numbers (“The API is generating 50% 500 errors”, “90% of print jobs are never printing”) and feelings (“Users are furious, support is overwhelmed!”, “The app is so much slower than it was earlier today I can hardly use it!”). In most cases qualitative data is much easier to get on the fly.

Because you’ll often be working with qualitative data you’ll want to take a temperature on the people giving it to you before you dive in. A small problem can be presented as a world ending event if an important customer calls in. Ask questions that push users to pull quantitative data out of the anecdotes they’re telling you like

  • How many users have called in?
  • How long ago did this start?
  • Are the users all from one location or several?

And some questions that help you understand the emotional state of the person answering the questions

  • How serious do you feel this problem is?
  • How frustrated are the users?

This a soft skill interaction rather than a technical one, but it will help you understand how much time the person reporting the problem has spent working on it, and how calmly they are relaying what they’ve learned.

This is also a good time to decide if the problem is a “drop everything now” type of issue, or a “I’ll put this on my todo list for next week”

Pick wide, useful outer bounds for the broken system

This is your change to brain storm everything that could be related to the problem. Here are a few examples I’ve done recently

A page of our application is reported as being slow

The outer bounds for the system are the users web browser (they might be using an old or poor performing machine) all the way back to the disk drives on the database the page connects to.

Users at our corporate office couldn’t get to the internet

The outer bounds are the users themselves (I can’t rule out that a user is typing in a wrong web address), all the way to our ISP hand off (I trust that if the ISP had a problem I would’ve seen an email).

A few general rules for picking your outer bounds when you’re having a problem

  1. If you can’t confidently rule out that a component is contributing to the problem then include it in the system
  2. Check your confidence level on every component. Classify your knowledge into “I thinks” and “I knows”

Divide the system into components based on your insight points

Any technology system is made up of dozens, possibly hundreds of components. You could think about a bug in terms of bits and bytes, but it’s not helpful. Instead divide the system on your insight points, or places you can gather logs or metrics, or inject an active test.

For our slow web page example the components are

  • The load balancer that routes traffic to the web server — I can see load balancer metrics
  • The SSL off loader that decrypts and re-encrypts the traffic for inspection — this is in nginx and we can see logs for timeouts
  • The IPS/IDS device that does layer 7 inspection on the app — I can see if a rule suddenly started going off
  • A downstream API that handles a synchronous call — I can see request completion times in the logs
  • The disk drives the webserver writes logs to — I can see read/write latencies
  • The database the application queries — I can have a DBA pull query plans and check for row locks
  • The disk drives the database uses to store data — I can see read/write latencies

For users are our corporate office not being able to access the internet the insight points are

  • The DHCP server that hands out IP addresses to clients — I can inject an active test by renewing my laptop’s DHCP address
  • The DNS server that resolves hostnames to IPs for our office — I can do an nslookup for an active test, or look at DNS caches
  • The core switches that route traffic to our ISP — I can look at switch logs or traffic statistics on a port
  • The firewall we use to allow traffic outbound — I can look at firewall CPU or logs
  • Our ISP who handles the traffic outside of our building — I can use a vendor portal to check traffic stats, or call them to find out if there is a regional problem

Assess the insight point closest to the middle of the system and decide if the problem is closer to the symptom, or closer to the back end

The idea here is that you are doing a binary search for the source of the problem. Looking in the middle of the system can tell which direction to move in.

Keep in mind the hardest problems to diagnose are the ones without a single root cause. Problems with two or three small contributors. While you’re troubleshooting keep an open mind, and don’t dismiss a problem because it isn’t the only source.

For our slow web page example

In this instance I would start on the web server, and look at request counts and latencies. If I see a smaller number of requests, it’s likely the problem is before the traffic hits my app server, and it could be an inbound IPS/IDS, SSL off load, load balancer, or even the internet connection into my data center.

If I see normal numbers of requests with high latency, I’ll move backwards towards the disks where the logs are stored, or the database itself.

For users are our corporate office not being able to access the internet

I start with a user’s machine and see if they are getting a DHCP address. If they are, can they resolve DNS entries? Can they print their next hop on the way to the internet? If they can, the problem is closer to the internet. If they can’t, the problem is closer to the user’s laptop, like a DHCP server or a DNS server being down.

Shrink the system based on your decision, picking new components if it makes sense

At this point in either scenario you’ve shrunk your problem domain pretty well, which is great! You have fewer things to look at, and you’re probably moving in the right direction.

Important things to remember at this stage are

  • Don’t shrink your problem domain quickly. Use what you’ve observed to pull in a little closer without skipping components you haven’t examined yet
  • Reference what you’ve seen, but watch for your own bias. When you’re troubleshooting, it’s easy to blame something you don’t understand. Always question assumptions about pieces you feel you own
  • Be expedient, but deliberate. When you’re working on a problem and people are hounding you for an ETA on a fix, it’s easy to get agitated. Don’t let urgency rush quality troubleshooting. It can make a problem drag out if you get spooked and move too quickly.

Repeat until you are confident you’ve found the problem

Rinse and repeat these steps! You’re recursively looking for a solution. There is no substitute for steady, consistent, calm troubleshooting. Wild guesses based on not enough information can muddy the waters.

Conclusion

Troubleshooting and debugging are hard. Most engineers, whether software engineers or IT engineers, would rather create something from scratch. But there is a lot of value in being able to take a system you aren’t familiar with, break apart it’s components, and start digging into it’s core functions and problems.)