Diving Into (and reducing) Your AWS Costs!

AWS uses a “pay as you go” model for most of it’s services. You can start using them at any time, you often get a runway of free usage to get up to speed on the service, then they charge you for what you use. No contract negotiations, no figuring out bulk discounts, and you don’t have to provision for max capacity.

This model is a double edge sword. It’s great when you’re

  • First getting started
  • Working with a predictable workload
  • Working with a modern technology stack (i.e. most of your resources are stateless and can be ephemeral
But it has some challenges when
  • Your workload is unpredictable
  • Your stack is not stateless (i.e. you have to provision for max capacity)
  • Your environment is complex with a lot of services being used by different teams

It’s easy to have your AWS costs run away from you and you can suddenly find yourself paying much more than you need or want to. We recently found ourselves in that scenario. Obviously I can’t show you our actual account costs, but I’ll walk you through the process we used to start digging into (and reducing our costs) with one of my personal accounts.

Step 1: AWS Cost Explorer

Cost Explorer is your first stop for understanding your AWS bill. You’ll navigate to your AWS Billing Dashboard, and then launch cost explorer. If you haven’t been in cost explorer it doesn’t hurt to look at some of the alerts on the home page, but the real interesting data is in Costs and Usage
My preference is to switch to “stack view”
I find this helps to view your costs in context. If you’re looking to cut costs the obvious place to start is the server that takes up the largest section of the bar. For this account it’s ElastiCache
ElastiCache is pretty straight forward to cut costs for – you either cut your nodes or node size – so let’s pick a more interesting service like S3.
Once you’ve picked a service to try to cut costs for add a service filter on the right hand side, and group by service type
Right away we can see that most of our costs are TimedStorage-ByteHrs which translates to S3 Standard Storage, so we’ll focus our cost savings on that storage class.

Next we’ll go to Cloudwatch to see where our storage in this class is. Open up Cloudwatch, open up metrics, and select S3.


Inside of S3 click on Storage Metrics and search for “StandardStorage” and select all buckets.


Then change your time window to something pretty long (say, 6 months) and your view type to Number

This will give you a view of specific buckets and how much they’re storage. You can quickly skim through to find the buckets storing the most data.

Once you have your largest storage points you can clean them up or apply s3 lifecycle policies to transition them to cheaper storage classes.

After you’re done with your largest cost areas, you rinse and repeat on other services.
This is a good exercise to do regularly. Even if you have good discipline around cleaning up old AWS resources costs can still crop up.
Happy cost savings!

VPC Flowlogs through Cloudwatch Logs Insights

You know all those times you think to yourself, “I wish there were a faster way to search all these logs I keep putting in Cloudwatch?”

Well apparently Alexa was reading your mind at one of those times because at AWS re:Invent 2018 released CloudWatch Logs Insights. It’s advertised as a more interactive, helpful log analytics solution than the log search option we already have.
This last week I got some questions about blocked traffic in our AWS account, which seemed like the perfect opportunity to give it a shot (NOTE: You will need to be sending your VPC flowlogs to Cloudwatch for this to work for you).
Here are some of the queries I tried out, most of them based loosely off of the examples page.

Count of Blocked Traffic by Source IP and Destination Port

fields @timestamp, dstPort, srcAddr
| stats count(*) as countSrcAddr by srcAddr, dstPort
| sort countSrcAddr desc

This query gives use the the top blocked senders by destination service.

Using this I pretty quickly found an ELB with the wrong instance port.

This worked for me because we separate our accept and reject flowlogs. If you keep them together you can add a filter as the first line

filter action = 'REJECT'
| fields @timestamp, dstPort, srcAddr
| stats count(*) as countSrcAddr by srcAddr, dstPort
| sort countSrcAddr desc

Blocked Destination Addresses and Ports for a Specific Source

During our troubleshooting we noticed a specific address sending us a lot of traffic on port 8088, which is vendor specific or an alternate HTTP port. This was a little odd because we don’t use 8088 externally or internally.
We dug in using this query
filter srcAddr = 'x.x.x.x'
| fields @timestamp, dstPort, srcAddr, dstAddr

Where x.x.x.x is the source IP address.

I’m not going to post the screen shot because there are a lot of our destination addresses and it would take time to block them out, but you get the idea.

We did a lookup on the address and it was owned by Digital Ocean, which is a cloud hosting company. It’s likely someone was doing a scan from a server in their environment, hard to say if it was good or bad intentions.

To satisfy my curiosity I wanted to ask the question, “When did the scan begin and how aggressive is it?”
So I added a “stats” filter to group the sum of the packets by 5 minute totals.

filter srcAddr = 'x.x.x.x' and dstPort = '8088'
| fields @timestamp, dstPort, srcAddr, dstAddr
| stats sum(packets) by bin( 30 min)

When you use the stats command with a time series you can get a “visualization” like the one below:

It looks like the scan lasted about 6 hours, from 4 am or so my time to around 9:45 my time.

Conclusion

CloudWatch Log Insights is a much faster way to analyze your logs than the current Cloudwatch search. The query language is pretty flexible, and reasonably intuitive (though I did spend several minutes scratching my head over the syntax before I found a helpful example).
While it’s an improvement over what was there, it’s not on par with a log analytics tool like Splunk, or a data visualization tool like Kibana. The visualizations page for Insights only works with time series data (as far as I can tell) and isn’t very intuitive for combining multiple query results. For that you still have to import the query into a cloudwatch dashboard.
Amazon’s edge over those more mature tools is that it’s integrated into your AWS account already, with almost no setup required (except getting your logs into Cloudwatch), and the pricing model. As usual with AWS it’s extremely friendly for getting up and started, but it’s easy to see how the cost could grow out of control if you’re not paying attention (picture someone creating a query for a months worth of data on a dashboard that refreshes every 30 seconds).
Happy querying!

Lambda Logging to Cloudwatch

If you’re an AWS user, either professionally or personally, I can’t encourage you enough to try it out. Lambda is the ability to run code in a few different languages (currently Python, Node, and Java) without worrying about the server environment it runs on.
Unfortunately (or fortunately, depending on your perspective) as with any new technology or paradigm, there are caveats to Lambda.
For example, one problem we’ve solved with Lambda is monitoring web services endpoints. Lambda allows us to make an HTTP call to a web service using the python httplib. But because the python script is being run on a server we don’t control or configure, it isn’t configured to point to our DNS servers by default. You can imagine our initial confusion when the lambda function said the web service was unavailable, but we never saw any traffic to the service.
The best way we found to gain insight into what lambda is actually doing is by logging from Lambda to a cloudwatch log stream. This allows you to output logs and put retention policies on them. Amazon has been helpful enough to tie the built in python logger into Cloudwatch. All you really have to do is to create a logging object similar to the example below

Below is an example

https://gist.github.com/LenOtuye/a7c14d8753d8268ab6b53c6a15535a70.js

Your logs will then be dumped into a log stream that is named “/aws/lambda/”

One thing to note is that from my experience you can’t control the name of the log stream. Even if you create the logger with “logger = logging.getLogger(‘LogName’)” the logstream will be named after the Lambda function.

To give your Lambda function permissions to log to Cloudwatch it will need to run under a role that has permissions. The IAM role should allow Lambda resources to run under it, for example

https://gist.github.com/LenOtuye/6e3216e129592b59327d1af9751ee1ee.jsAnd then you will need to give it permissions similar to the following (plus whatever rights your lambda function needs for it’s actual work) https://gist.github.com/LenOtuye/7bbf8922a547f937862055bb74791188.js