AWS S3 Lifecycle Policies – Prep for Deep Archive

AWS recently released a new S3 storage class called Deep Archive. It’s an archival data service with pretty low cost for data you need to hold onto, but don’t access very often.

Deep Archive is about half the cost of Glacier at $0.00099 per GB per month, but you sacrifice the option to get your data back in minutes, your only retrieval option is hours.

I work for a health care company so we hold onto patient data for years. There are plenty of reasons we might need to retrieve data from years ago, but few of them would have a time limit of less than several weeks. That makes Deep Archive a great fit for our long term data retention.

Setting it up is as simple as changing an existing life cycle transition to Deep Archive, or creating a new S3 Lifecycle transition to glacier

We put together a quick script to find the lifecycle transition rules in our S3 buckets that move data to Glacier already

$buckets = get-s3bucket;

# Iterate through buckets in the current account
foreach ($bucket in $buckets) {
write-host -foregroundcolor Green "Bucket: $($bucket.BucketName)";

# Get the lifecycle configuration for each bucket
$lifecycle = Get-S3LifecycleConfiguration -BucketName $bucket.BucketName;

# Print a warning if ther eare no lifecycles for this bucket
if(!$lifecycle) {
write-host -foregroundcolor Yellow "$($bucket.BucketName) has no life cycle policies";
} else {
# Iterate the transition rules in this lifecycle
foreach ($rule in $lifecycle.Rules) {
write-host -foregroundcolor Magenta "$($rule.Id) with prefix: $($rule.Filter.Lifecyclefilterpredicate.Prefix)";
# Print a warning if there are no transitions
if(!($rule.Transitions)) {
write-host -foregroundcolor Yellow "No lifecycle transitions";
}

# Iterate the transitions and print the rules
foreach ($transition in $rule.Transitions) {
if($transition.StorageClass -eq "GLACIER") {
$color = "Yellow";
} else {
$color = "White";
}
write-host -foregroundcolor $color "After $($transition.Days) transition to $($transition.StorageClass)";
}
}
}
}

To run this script you’ll need the AWS powershell tools, an IAM account setup, and a default region initialized.

When you run the script it will print out your current S3 buckets, the lifecycle rules, and the transitions in each of them, highlighting the transitions to Glacier in yellow.

AWS Powershell Tools Snippets: S3 Multipart Upload Cleanup

My company does quite a bit with AWS S3. We use it to store static files and images, we push backups to it, we use it to deliver application artifacts, and the list goes on.

When you push a significant amount of data to and from S3, you’re bound to experience some network interruptions that could stop an upload. Most of the time S3 clients will recover on their own, but there are some cases where it might struggle.

One such case is when you are pushing a large file and using S3 Multi Part Uploads. This can leave you with pieces of files sitting in S3 that are not useful for anything, but still taking up space and costing you money. We recently worked with AWS support to get a report of how how many incomplete uploads we had sitting around, and it was in the double digit terabytes!

We started looking for a way to clean them up and found that AWS recently created a way to manage these with a bucket lifecycle policy. Some details are in a doc here and there’s an example of how to create this policy on the AWS CLI towards the bottom.

We decided to recreate this functionality in Powershell using “Write-S3LifecycleConfiguration Cmdlet” to make it a little easier to apply the policy to all of the buckets in our account at once.

It took a little reverse engineering. The Write-S3LifecycleConfiguration commandlet doesn’t have many useful examples. In the end I wound up creating the policy I wanted in the AWS console, and then using Get-S3LifecycleConfiguration to see how AWS is representing the policies in their .NET class structure.

It seems to me that there are a lot of classes between you and creating this policy, but that could mean that AWS has future plans to make these policies even more dynamic and useful.

The code I came up with at the end is below. Hope it’s helpful!


$rule = new-object -typename Amazon.S3.Model.LifecycleRule;
$incompleteUploadCleanupDays = new-object -typename Amazon.S3.Model.LifecycleRuleAbortIncompleteMultipartUpload
$incompleteUploadCleanupDays.DaysAfterInitiation = 7
$rule.AbortIncompleteMultipartUpload = $incompleteUploadCleanupDays
$rule.ID = "WholeBucketPolicy"
$rule.status = "Enabled"

$prefixPredicate = new-object -type Amazon.S3.Model.LifecyclePrefixPredicate

$lifecycleFilter = new-object -type Amazon.S3.Model.LifecycleFilter

$lifecycleFilter.LifecycleFilterPredicate = $prefixPredicate

$rule.Filter = $lifecycleFilter

foreach ($bucket in get-s3bucket) {
write-host "Bucket name: $($bucket.bucketname)"
$existingRules = get-s3lifecycleconfiguration -bucketname $bucket.bucketname
$newPolicyNeeded = $true;
foreach ($existingRule in $existingRules.rules) {
if($existingRule.ID -eq $rule.ID) {
write-host "Policy $($rule.ID) already exists, skipping bucket"
$newPolicyNeeded = $false;
}
}
if($newPolicyNeeded) {
write-host "Rule not found, adding"
$existingRules.rules += $rule

Write-S3LifecycleConfiguration -bucketname $bucket.bucketname -configuration_rule $existingRules.rules
}
}