If you’re in a hurry and want a quick takeaway, here it is
- Use reservations
- Operational Optimisation
- Man-hours matter too
- Infrastructure Optimisation
- Re-visit and resize instances (check for their Average CPU utilization or use trusted advisor)
- Collection of spot and on-demand
- Separate ASG
- Stage environment (on/off -> can use cloudformation too or lambda)
- Architecture Optimisation
- Eliminate the web-tier – use s3
- Use SQS, lambda -> Go serverless
- TIPS and precautions:
- Turn off bastion hosts when not in need.
- Use lifecycle rules for S3
- Don’t compromise on performance, HA or security to save costs.
Now that we know you have some time to read, let’s go ahead!
Cloud platforms definitely changed the way we deploy applications on the web, create corporate infrastructures and basically removed the necessity of on-premise data centres.
It not only made provisioning resources simple, but also made it possible to prevent any under-utilised resources because resources are now just a click away.
This in itself is a breakthrough in the IT industry and is helping many organizations lower their cost. But what many people don’t know is that a few simple changes can help them reduce their costs further.
A note before we begin
Before we dive into it, I seriously urge you to look for reservations. Check if you have some stable workloads and if they are reserved or not.
Also, what I have seen is that if some application uses EC2 but is not using a lot of EC2 instances to make a significant impact on their bill, then they may be having serious RDS bills.
Make sure that you have reserved instances for all the stable workloads for both EC2 and RDS.
Also, if the workload changes frequently, within every few months, then choose the convertible RIs.
Types of Cost Optimisation
Okay so I was going through an AWS re:invent video, it actually inspires this blog.
The speaker gave an exceptionally great model to divide our application and that made looking for cost saving extremely simple and efficient.
We’ll follow the same model and divide any given application into three optimization processes:
- Operational Optimisation
- Infrastructural Optimisation
- Architectural Optimisation
This is a small yet powerful part of the overall cost optimisation.
It focuses on the fact that man-hours is worth more than a few dollars. Say for example, deploying your own DB cluster on a fleet of EC2 instances will be cheaper than using RDS, for example Aurora.
But the main fact is that the time spent on saving those few dollars could effectively be spent on other aspects of the business, like optimising DB queries to make the most use of RDS in a cost effective manner.
Also, this is a one time change and would last longer while manually maintaining a DB cluster is a recursive and time consuming task.
Your infrastructure is what composes of almost all of your bill, removing just a small bit of other parts of the bill, like AWS lambda, etc.
But they could also be considered your infrastructure though. However let’s forget this and get back to task at hand.
To make sure that your infrastructure is tuned to save you a lot of money, make sure of the following points.
Re-visit your instances again and again
The demand of customers changes and so does the load on various elements of infrastructure.
Many times such changes could be causing unnecessary costs to your organisation.
Say for instance that you have a cluster of t2.2xlarge instances serving a subdomain responsible for a service xyz.
This was used a lot earlier, but since the last few months, it does not have a lot of requests to process.
This is simply the wastage of a large part of your instances. So you go to the instances and see their CPU utilisation. Well, the average CPU utilisation is around 20% since the last few months.
Guess what, you resize the instance to t2.xlarge. Assuming that you are in us-east-1 and have 4 instances in an ASG, then you have reduced your bill from $1069.1 to $534.5 a month.
That’s a 50% saving just because you revisited the instances.
Make it a practice to do this once a month. Also, if the CPU utilisation of any instance is above 60-70% most of the time, you need to increase the instance type.
Because keep in mind that we don’t want to save our cost by degrading the performance of the application.
It’s just like garbage collection, but here, we collect unused costs back instead of unused memory. I don’t know if this last line made things simpler for you or not. LOL.
Use a mix of spot and on-demand instances
This is assuming that you already have allocated Reserved instances for your stable workloads.
Now for the workloads that aren’t predictable or for spikes in your workloads, you must prefer using a combination of spot and on-demand instances.
Now for simple stateless use cases, the instance serving does not matter and even if it is terminated, we can still go on with our work. Thus we can use spot instances!!
Also, if the process needs the instance to be available till its completion and it will take less than 6 hours, we can again use spot instances with spot block.
That’s an interesting thing to note because now you can see that you can almost use spot instances for all of your workloads.
Now let’s see how we can use them. First of all, whatever you bid, you always pay for the current price, and that’s usually too low as compared to on-demand pricing.
So we need to choose a price sufficiently high, so that we have very few scenarios where we cannot use spot instances or they will be terminated, but this should be low enough to make an impact on our bill.
Then we can use a load balancer that points to two ASGs. One of them has our spot instances and the other one has on-demand instances.
Say for example that we scale spot instances ASG as per our needs. And for cases where spot instances are unavailable, we can scale out the other ASG with on-demand instances.
See! Now we have an ASG to serve most of our workloads and in case spot instances are not available, we can scale out the on-demand ASG.
It sounds a bit tricky to implement, but once done, it reaps a good fruit.
Spendthrift dev/stage environment
Stage and development environments use a lot of the resources, are generally identical in infrastructure as compared to the production environment and so may cost almost identical too.
And in some cases, they may cost more.
But the important thing to note here is that unless you have an international team of developers working on the same stage / development environment, we can see a lot of wasted resources.
What you can do is that simply shut down your resources anytime they are not used, for instance, 8PM to 8AM.
And you know better how much you can save by shutting down your resources for 12 hours a day or 360 hours a month. That’s almost as if your resources are working only half the month.
Also, you can deploy your infrastructure using CloudFormation and then use Lambda functions to destroy them in the evening and re-create them in the morning using the same templates. See?
Again, this seems a lot tricky and cumbersome, but can without any doubt save you a lot of money.
And please make sure that anything you do to save cost does not affect your application’s performance, availability or security.
If it does so, then you may very well spend more but maintain these three things.
Go serverless. Period.
That’s all this section is about. If you understood the above three words, skip this section. Otherwise read along.
Serverless is when you do not provision any resource but rather leverage some AWS services to make your application work.
And since you did not provision any resource, you do not pay for any resource.
But wait, what do you pay for then?
Well you pay for what you use.
For instance, you have a lambda function, then you’ll pay only for those instances when the function is triggered. (By the way lambda has a huge free-tier. It’s simply amazing.)
An example of this is for example a chatbot mobile app.
The traditional approach would involve setting up servers to listen for user’s requests, save any data required, query any records and then send the response. Nice and neat.
This is a lot of work and unnecessary costs too.
Consider this implementation.
You create the mobile app and distribute it to you users. Now whenever your user types anything into the app, it calls an API which is provided by API gateway (serverless). This API gateway then triggers the Lambda function, again serverless.
The function executes itself and then finally if it needs to store or retrieve something, uses S3 bucket.
So now if you have only two users you don’t pay for the entire infrastructure but rather for the requests served to those users. If you have millions of users, you serve all of them without deploying any additional resource.
Magic, isn’t it?
I truly motivate you to go and try to implement serverless for your workloads and believe me, you won’t regret it.
Some tips and precautions
- Use S3 lifecycle rules. Extensively!!
- Shut down your bastion host and turn it on only when you need it to access your web servers, which is obviously very few times.
- Do not save costs while compromising performance, HA and security, because if anyone of them is not there, your app is useless and all the cost that you are spending after saving is going into a pothole anyways.