I read a lot about FinOps and cloud cost optimization those days and I came across AWS short ebook about data cost optimization.
Cost optimization is part of AWS’s well-architected framework. When we think about cost optimization, we usually only consider computing resources, while there are significant optimizations that can go beyond that – storage optimization, network, etc.
Below is a combination of the six sections that appear in the e-books with some comments –
Optimize the cost of information infrastructure – the main point in this section is to use Graviton instances where applicable.
Decouple storage data from compute data – 5 suggestions here which are pretty standard –
- Compress data when applicable, and use optimal data structures for your task.
- Consider data temperature when choosing data store and storage class – use the suitable s3 storage class and manage it using a life-cycle policy.
- Use low-cost compute resources, such as Spot Instances, when applicable – I have some dissonance here since I’m not sure that spot instances are attractive those days (see here), specifically with the overhead of taking care of preempted instances.
- Deploy compute close to data to reduce data transfer costs – trivial.
- Use Amazon S3 Select and Amazon S3 Glacier Select to reduce data retrieval – Amazon S3 Select has several limitations (see here), so I’m not sure it is worth the effort and better query via Athena.
Plan and provision capacity for predictable workload usage
- Choosing the right instance type based on workload pattern and growth – is common sense. You’ll save a little less if you purchase convertible reserve instances. However, in a fast-changing startup environment, there is a higher chance the commitment won’t be underutilized.
- Deploying rightsizing based on average or medium workload usage – this contradicts best practices described in Cloud FinOps book, so I’m a bit hesitant here.
- Using automatic scaling capabilities to meet peak demand – is the most relevant advice in this section. Use auto-scaling groups or similar to accommodate for both performance and cost.
Access capacity on demand for unpredictable workloads
- Use Amazon Athena for ad hoc SQL workloads – as mentioned above, I prefer Athena over AWS S3 Select.
- Use AWS Glue instead of Amazon EMR for infrequent ETL jobs – I don’t have a strong opinion here, but if you have a data strategy in mind, I will try to adjust to it. Additionally, I feel that other AWS can be even easier and cost-effective to work with—for example, Apache Spark in Amazon Athena, step functions, etc.
- Use on-demand resources for transient workloads or short-term development and testing needs – having said that, you should still keep an eye on your production services, ensure they are utilized correctly and rightsize them if needed.
Avoid data duplication with a centralized storage layer
Implement a central storage layer to share data among tenants – I would shorten it to saying, “have a data strategy” – where you are, where you want to go, etc., which is not trivial in early startup days.
Leverage up to $100,000 in AWS Activate credits
This might be a bit contracting to the rest of the document since it feels like free money and delays your concern about cloud costs.