5 thoughts on Jobs to Be Done

After it was mentioned in “Second Brain” and was waiting in my reading list for a long time, I finally listen to “Jobs to Be Done: A Roadmap for Customer-Centered Innovation” by Stephen Wunker, Jessica Wattman, and David Farber. Here are my thoughts about it –

1. Examples relevancy – The book was published in 2016 are brings several examples – slack, snapchat, etc. Are those examples still relevant? Do they still emphasize the relevant points? 

This problem is not unique to this book. One often finds an anecdote and uses it to emphasize or justify a theory. As time passes, the perspective also changes, and it sometimes differs from the theory.

2. Multiple stakeholders – B2B purchasing procedures frequently involve a varied group of stakeholders, each with their unique jobs to be done. This point of view is often not thought of and has great implications for how sales should done.

3. Emotional and Social Components – jobs to be done are not only functional tasks such as getting from here to there, wearing something, etc. Jobs to be done also have emotional and social components that should be addressed.

4. Obstacles to use and obstacles to adoption – Obstacles to adoption are challenges that restrict a consumer’s inclination to purchase a product or service. Facilitating the ease with which people can learn about and experiment with your new offering can diminish obstacles to adoption. Obstacles to use refer to impediments that hinder success, ultimately reducing a customer’s probability of ongoing product usage, acquiring supplementary features, or upgrading to more recent editions. I love this insight, and it is an important distinction, especially when one needs to prioritize.

5. Effective Brainstorming – brainstorming is discussed in long in one of the chapters. Personally, I have many doubts about group brainstorming as a way to encourage creativity. If there is one thing to take from this discussion, such a session should be well-mederated. See more here

5 thoughts on Building a Second Brain

“a wealth of information creates a poverty of attention and a need to allocate that attention efficiently among the overabundance of information sources that might consume it.”

Herbert A. Simon

The quote above appears in “Building a Second Brain” by Tiago Forte, which I finished reading last week. I strongly relate to it, and that book helps me reflect on my personal knowledge management.  

TL;DR – find a personal knowledge management method that works for you (e.g., the PARA method). It does not have to be perfect. You don’t have to shift everything. Just get it started and adjust as you go. 

Here are a few thoughts I had while reading the book –

1. Progressive Summarization – the progressive summarization technique reminded me of a joke my uncle told me a while back – a student in his first semester asked the lecturer how to prepare best for the exam. She tells him – after every class, summarize your class notes. At the end of each week, summarize all the daily summaries. At the end of each month, summarize the weekly summaries and so on. On the week before the exam, summarize the summary from the day before. They meet just before the exam, and she asks him how it went. He answers – “great, I was able to summarize everything into one word – bullshit.”

2. Divergence and Convergence – In my first or second semester in the university, I took a class on academic writing. They told us that a good academic essay is built like an hourglass. It starts with a very wide question or statement, then narrows down to a specific claim or private case, and finishes with the broader picture, zoom out, etc. See more here. Divergence and convergence are the same. You start very scattered, then connect the dots, focus, reach some advancement, and repeat.

3. Hemingway’s bridges – “The “Hemingway Bridge” is a technique used by author Ernest Hemingway in which he would stop his writing for the day only AFTER he knew what was coming next.” (here). Each of us has its own hooks that help him or her restart the next time. A few years ago, I read Hila Noga’s post about getting your programming flow going, and it is Hemingway’s bridge for developers.

4. Blog as an interface – I initially created the blog so it would be easier for me to search for links I once saw and to share with other people. One can view it as some interface to my second brain. I am still in the process of thinking about which methods are right for me to adapt from the book.

5. Listening to an e-book – I’m a big fan of highlighting and writing comments in books, papers, etc. The audiobook format is challenging for me in this aspect, and moreover, I usually listen to an e-book while doing other things like walking or driving, which misses some of the second brain practice. I still need to figure out how to tackle this. On the other hand, I use writing and notes of all kinds to unload my brain and as an easier way to access them in the future. I was very happy that the topic of offloading was widely discussed in the book.

Team Health Check

Today, I heard Dafna Rosenblum’s (see Dafna’s blog here) talk on the EMIL (Engineering Manager IL) meetup about “Team Health Check”. It was the first time I heard about the concept, so I read more about it.

Spotify developed the team health check concept and introduced it in 2014 (here).  About 6 months ago, Spotify published a new post about “Getting More from Your Team Health Checks”. The post focuses on improving the team experience in this workshop and suggests the following main ideas –

  • Customize wisely – tailor the right questions and health checks that fit the team and the organization.
  • Dig Deep – Good facilitation is essential to enable more profound conversations. Good facilitation should help to step outside the team’s day-to-day communication patterns and create a psychologically safe place (see here) to raise issues. The post suggests a few strategies on how to do it.
  • Follow through – you should follow up and reiterate the topics that were raised in the team health check workshop. That can be in 1-1 meetings, scheduling required meetings, checking your priority or attention, etc.

I found a few online tools to help facilitate team health checks – 

  • teamhealthcheck.io – “A free, anonymous, and super simple tool to run a version of the Spotify Health Check Survey. “
  • Miro boards – There are multiple miro templates for team health checks – e.g., here, here, and here.

Another tip from Dafna to start meeting (and every meeting) is to start with a quick intake to break the ice and increase engagement. For example, ask the member to describe their mood using an emoji.

5 interesting things (04/09/2023)

12 Debugging tools I wish I knew earlier 🔨 –  it describes more debugging strategies than debugging tools (i.e. minimal reproduction is not a tool). One strategy I missed in this post is adding breakpoints. If I were to write this post, I would order it in an escalation order. For example, reading the error message would be in a higher place. However, it is an important post, especially for junior developers. 

https://careercutler.substack.com/p/12-debugging-tools-i-wish-i-knew

Consistency Patterns – This post explains the different common consistency patterns – strong, eventual consistency, and weak consistency and the trade-offs. It also mentions the idea of causal consistency, which I find very interesting.

https://systemdesign.one/consistency-patterns/

Remote work requires communicating more, less frequently – he had me at “Think of it like gzip compression, but for human-to-human communication. Yes, there’s slightly more processing overhead at the start, but it allows greater communications throughput using fewer “packets” (communicate more using less)”. Seriously, once your organization grows above ten people and you start having clients, you will have people remote (colleagues or clients), and you will have to optimize your communication to pass your message.

https://ben.balter.com/2023/08/04/remote-work-communicate-more-with-less/

Git log customization – I’m setting a new computer now for development and looking for a format that would be easy for me to use so this post came exactly on time

https://www.justinjoyce.dev/customizing-git-log-format/

Structuring your Infrastructure as Code – I like the layers approach of this post and the examples from all 3 public cloud providers. I would like to give more thought to the exact layers and order. Note that this post is written by Pulumi, a solution engineer, so it might not work well with other IaC tools.

https://leebriggs.co.uk/blog/2023/08/17/structuring-iac

5 interesting things (27/07/2023)

Designing Age-Inclusive Products: Guidelines And Best Practices – I have a 91-year-old grandmother who, in the last 10 years, cannot book a doctor’s appointment herself as she does not use a smartphone and cannot follow voice navigation. Even without a personal perspective, I am very interested in accessibility, and I try to pay attention to inclusivity and accessibility topics wherever relevant. However, I always wonder if those are general best practices or are limited to specific cohorts. Specifically, in this case, younger people usually have more technology literacy than older people and therefore can achieve their goals with less optimized flows and UI.

https://www.smashingmagazine.com/2023/07/designing-age-inclusive-products-guidelines-best-practices/

On Becoming VP of Engineering – A two-part blog post series by Emily Nakashima, Honeycomb’s first VP of Engineering. The first part focuses on her path – coming originally from design, frontend, and product engineering and becoming VP of Engineering that also manages the backend and infrastructure. 

The second part talks about the day-to-day work and the shift in focus when moving from a director position to a VP position. I strongly agree with her saying, “Alignment is your most important deliverable,” and also think it is one of the hardest things to achieve.

https://www.honeycomb.io/blog/becoming-vp-of-engineering-pt1

https://www.honeycomb.io/blog/becoming-vp-of-engineering-pt2

Project Management for Software Engineers – “This article is a collection of techniques I’ve learned for managing projects over time, that attempts to combine agile best practices with project management best practices.”. While a degree in computer science teaches lots of algorithms, software development, and so on, it does not teach project management and time management. Those skills are usually not required in junior positions but can help you have a more significant impact. Having said that, one should find the exact practices that fit him or her and that can evolve over time.

https://sookocheff.com/post/engineering-management/project-management-for-software-engineers/

Designing Pythonic library APIs – A while ago (2 years+-), I looked for a post/tutorial / etc. regarding designing SDK best practices and could not find something I was happy with. I like the examples (both good and bad examples) in this post. If you are in a hurry, all the take aways are summarized in the end (but sometimes hard to understand without context).

https://benhoyt.com/writings/python-api-design/

Fern – “Fern is an open source toolkit for designing, building, and consuming REST APIs. With Fern, you can generate client libraries, API documentation, and boilerplate for your backend server.”. I haven’t tried it myself yet, but if it works, it seems like cookie-cutter on steroids. In the era of LLMs, the next step is to generate all of those from free text.

https://github.com/fern-api/fern

5 interesting things (06/07/2023)

Potential impacts of Large Language Models on Engineering Management – this post is an essential starter for a discussion, and I can think of other impacts. For example – how interviewing \ assessing skills of new team members affected by LLMs? What skills should be evaluated those days (focusing on engineering positions)?

One general caveat for using LLMs is completely trusting them without any doubts. This is crucial for a performance review.  Compared to code, if the code does not work, it is easy to trace and fix. If the performance review needs to be corrected, it might be hard to pinpoint what and where it got wrong, and the person getting it might need more confidence to say something.

https://www.engstuff.dev/p/potential-impacts-of-large-language

FastAPI best practices – one of the most reasoned and detailed guides I read. Also, the issues serve as comments to this guide and are worth reading. Ideally, I would like to take most of the ideas and turn them into a cookie-cutter project that is easy to create. 

https://github.com/zhanymkanov/fastapi-best-practices

How Product Strategy Fails in the Real World — What to Avoid When Building Highly-Technical Products – I saw all in action and hope to do better in the future.

https://review.firstround.com/how-product-strategy-fails-in-the-real-world-what-to-avoid-when-building-highly-technical-products

1 dataset 100 visualizations – I imagine this project as an assignment in a data visualization/data journalism course.  Yes, there are many ways to display data. Are they all good? Do they convey the desired message?

There is a risk in being too creative, and there is some visualization there I cannot imagine using for anything reasonable.

https://100.datavizproject.com/

Automating Python code quality – one additional advantage of using tools like Black, isort, etc., is that it reduces the cognitive load when doing a code review. The code reviewer should no longer check for style issues and can focus on deeper issues.

https://blog.fidelramos.net/software/python-code-quality

Bonus – more extensive pre-commit template – 

https://github.com/br3ndonland/template-python/blob/main/.pre-commit-config.yaml

Did you Miss me? PyCon IL 2023

Today I talked about working with missing data at PyCon IL. We started with a bit of theory about mechanisms of missing data –

  • MCAR – The fact that the data are missing is independent of the observed and unobserved data.
  • MAR – The fact that the data are missing is systematically related to the observed but not the unobserved data.
  • MNAR – The fact that the data are missing is systematically related to the unobserved data.

And deep-dived into an almost real-world example that utilizes the Python ecosystem – pandas, scikit-learn, and missingno.

My slides are available here and my code is here.

3 related posts I wrote about working with missing data in Python –

Pandas fillna vs scikit-learn SimpleImputer

Missing data is prevalent in real-world data and can be missing for various reasons. Gladly, both pandas and scikit-learn several imputation tools to deal with it. Pandas offers a basic yet powerful interface for univariate imputations using fillna and more advanced functionality using interpolate. scikit-learn offers both SimpleImputer for univariate imputations and KNNImputer and IterativeImputer for multivariate imputations. In this post, we will focus on fillna and SimpleImputer functionality and compare them.

Basic Functionality

SimpleImputer offers four strategies to fill in the nan values – mean, median, most_frequet, and constant.

import numpy as np
import pandas as pd
from sklearn.impute import SimpleImputer

df = pd.DataFrame(
    [[7, 2, np.nan], [4, np.nan, 6], [10, 5, 9]])
imp_mean = SimpleImputer(strategy='mean')
pd.DataFrame(imp_mean.fit_transform(df))

output –

      0    1    2
0   7.0  2.0  7.5
1   4.0  3.5  6.0
2  10.0  5.0  9.0

Can we achieve the same with pandas? Yes!

df.fillna(df.mean())

Want to impute with the most frequent value?

Asuume – df = pd.DataFrame(['a', 'a', 'b', np.nan])

With SimpleImputer

imp_mode = SimpleImputer(
    strategy='most_frequent')
pd.DataFrame(
  
  imp_mode.fit_transform(df))

With fillna

df.fillna(df.mode()[0])

And the output of both –

   0
0  a
1  a
2  b
3  a

Different Strategies

Want to apply different strategies for different columns? using scikit-learn you will need several imputers, one per each strategy. Using fillna you can pass a dictionary, for example –

df = pd.DataFrame(
    [[7, 2, np.nan], [4, np.nan, 6], [10, 5, 9]])
df.fillna({1: 10000, 2: df[2].mean()})
    0        1    2
0   7      2.0  7.5
1   4  10000.0  6.0
2  10      5.0  9.0

Advanced Usage

Want to impute values drawn from a normal distribution, no brainer –

mean = 5
scale = 2
df = pd.DataFrame(
    [[7, 2, np.nan], [4, np.nan, 6], [10, 5, 9]])
df.fillna(
    pd.DataFrame(
        (np.random.normal(mean, scale, df.shape))
    0         1         2
0   7  2.000000  3.857513
1   4  5.407452  6.000000
2  10  5.000000  9.000000

Missing indicator

Using SimpleImputer, one can add indicator columns that obtain 1 if the original column was missing, and 0 otherwise. This can also be done using MissingIndicator

df = pd.DataFrame(
    [[7, 2, np.nan], [4, np.nan, 6], [10, 5, 9]])
mean_imp = SimpleImputer(strategy='mean', add_indicator=True)
mean_imp.fit_transform(df)
pd.DataFrame(mean_imp.fit_transform(df))
      0    1    2    3    4
0   7.0  2.0  7.5  0.0  1.0
1   4.0  3.5  6.0  1.0  0.0
2  10.0  5.0  9.0  0.0  0.0

Note that a missing column (i.e., columns 3 and 4 in the example above) corresponds only to columns with missing values. Therefore there is no missing indicator column corresponding to the column 0. If you are converting back and forth to pandas dataframes you should note this nuance.

Another nuance to note when working with SimpleImputer is that columns that contain only missing values are dropped by default –

df =  pd.DataFrame(
    [[7, 2, np.nan, np.nan], [4, np.nan, 6, np.nan],
    [10, 5, 9, np.nan]])
mean_imp = SimpleImputer(strategy='mean')
pd.DataFrame(mean_imp.fit_transform(df))
      0    1    2
0   7.0  2.0  7.5
1   4.0  3.5  6.0
2  10.0  5.0  9.0

This behavior is controllable using setting keep_empty_features=True. While it is manageable, tracing columns might be challenging –

mean_imp = SimpleImputer(
    strategy='mean',
    keep_empty_features=True,
    add_indicator=True)
pd.DataFrame(mean_imp.fit_transform(df))
      0    1    2    3    4    5    6
0   7.0  2.0  7.5  0.0  0.0  1.0  1.0
1   4.0  3.5  6.0  0.0  1.0  0.0  1.0
2  10.0  5.0  9.0  0.0  0.0  0.0  1.0

There is an elegant way to achieve similar behavior in pandas –

df = pd.DataFrame(
    [[7, 2, np.nan, np.nan], [4, np.nan, 6, np.nan],
     [10, 5, 9, np.nan]])
pd.concat(
    [df.fillna(df.mean()), 
     df.isnull().astype(int).add_suffix("_ind")], axis=1)
    0    1    2   3  0_ind  1_ind  2_ind  3_ind
0   7  2.0  7.5 NaN      0      0      1      1
1   4  3.5  6.0 NaN      0      1      0      1
2  10  5.0  9.0 NaN      0      0      0      1

Working with dates

Want to work with dates and fill several columns with different types? No problem with pandas –

df = pd.DataFrame(
    {"date": [
        datetime(2023, 6, 20), np.nan,
        datetime(2023, 6, 18), datetime(2023, 6, 16)],
     "values": [np.nan, 1, 3, np.nan]})
df.fillna(df.mean())

Before –

        date  values
0 2023-06-20     NaN
1        NaT     1.0
2 2023-06-18     3.0
3 2023-06-16     NaN

After –

        date  values
0 2023-06-20     2.0
1 2023-06-18     1.0
2 2023-06-18     3.0
3 2023-06-16     2.0

Working with dates is an advantage that fillna has over SimpleImputer.

Backward and forward filling

So far, we treated the records and their order as independent. That is, we could have shuffled the records and that would not affect the expected imputed value. However, there are cases, for example, when representing time series when the order matters and we would like to impute based on later values (backfill) or earlier values (forward fill). This is done by setting the method property.

df = pd.DataFrame(
    [[7, 2, np.nan], [4, np.nan, 6],
     [10, np.nan, 9], [np.nan, 5, 10]])
df.fillna(method='bfill')
      0    1     2
0   7.0  2.0   6.0
1   4.0  5.0   6.0
2  10.0  5.0   9.0
3   NaN  5.0  10.0

One can also limit the number of consecutive values which are imputed –

df.fillna(method='bfill', limit=1)
      0    1     2
0   7.0  2.0   6.0
1   4.0  NaN   6.0
2  10.0  5.0   9.0
3   NaN  5.0  10.0

Note that when using bfill or ffill and moreover, when specifying limit to value other than None it is possible that not all the values would be imputed.

For me, that’s a killer feature of fillna comparing to SimpleImputer

Treat Infinite values as na

Setting pd.options.mode.use_inf_as_na = True will treat infinite values (i.e. np.inf, np.INF, np.NINF) values as missing values, for example –

df = pd.DataFrame([1, 2, np.inf, np.nan])
df.fillna(1000)

pd.options.mode.use_inf_as_na = False

     0
0  1.0
1  2.0
2  inf
3  1000.0

pd.options.mode.use_inf_as_na = True

     0
0  1.0
1  2.0
2  1000.0
3  1000.0

Note that inf and na are not treated the same for other use cases, e.g. – df[0].value_counts(dropna=False)

0
1.0    1
2.0    1
NaN    1
NaN    1

Summary

Both pandas and scikit-learn offer a basic functionality to deal with missing values. Assuming you are working with pandas Dataframe, pandas fillna functionality can achieve everything SimpleImputer can do and more – working with dates, back and forward fill, etc. Additionally, there are some edge cases and specific behaviors to pay attention to when choosing what to use. For example when using bfill or ffill method some values may not be imputed if there are the last ones or first ones respectively.

Few thoughts on Cloud FinOps Book

I just completed “Cloud FinOps” book by J.R. Storment and Mike Fuller, and here are a few thoughts –

  1. At first, I wondered whether I should read the 1st edition, which I had easy access to, or the 2nd, which I had to buy. After reading a sample, I decided to buy the 2nd edition and am glad. This domain and community move quickly; a 2019 version would have been outdated and misleading.
  2. FinOps involves a paradigm shift – developers should consider not only the performance of their architecture (i.e., memory and CPU consumption, speed, etc.) but the cost associated with the resources they will use. Procurement is not done and approved by the finance team anymore. Developers’ decisions can have a significant influence on the cloud bill. FinOps teams bridge the engineering and finance teams (and more) and speak the language of all parties, along with additional skill sets and an overview of the entire organization. 
  3. A general rule of thumb regarding commitments –
    1. Longer commitment period (3 years → 1 year) = lower price (higher discount)
    2. More upfront (full upfront → partial upfront → no upfront )= lower price (higher discount)
    3. More specific (RI → Convertible RI → SP, region, etc.) = lower price (higher discount)
  4. The FinOps team should be up-to-date about the new cloud technologies updates and cost reduction options. I have been familiar with reserve and spot instances for a long time, but there are many other cost reduction options bits and bytes to pay attention to. For example, the following 2 points –
    1. When purchasing saving plans (SP), which are monetary as appose to resource units commitments, the spend amount you commit to is post discount. Moreover, AWS will apply the SP to the resources that yield the highest discount. This implies that the discount rate diminishes when committing to more money.
    2. CloudFront security savings bundle (here) is a saving plan that ties together the usage of CloudFront and WAF. The book predicts that such plans, e.g., combining multiple product usage, will become common soon.
  5. Commitments (e.g., SP, RI) are one of many ways to reduce costs. Removing idle resources (e.g., unattached drives), using correct storage classes (e.g., infrequent access, glacier), or making architecture changes (e.g., rightsizing, moving from servers to serverless, going via VPC endpoints, etc.) can help avoid and reduce cost. Those activities can happen in parallel – centralized FinOps team to manage commitments (aka cost reduction) and decentralized engineering teams optimize the resources they use (aka cost avoidance). Ideally, it is a tango. Each team moves a little step at a time to optimize their part.
  6. The FinOps domain-specific knowledge goes even further. For example, costs that engineers tend to miss or wrongly estimate e.g. network traffic cost, number of events, data storage events.
  7. The inform phase is part of the FinOps lifecycle – making the data available to the relevant participants. The Prius effect, i.e., real-time feedback, instantly influences behavior even without explicit recommendations or guidance. Visualizations (done right) can help understand and react to the data better. A point emphasized multiple times in the book – put the data in the path of the engineers or any other stakeholder. Don’t ask them to log in to a different system to review the data; integrate with existing systems they use regularly.

Few resources I find helpful – 

  1. FinOps foundation website – includes many resources and community knowledge – https://www.finops.org/introduction/what-is-finops/
  1. FinOps podcast – https://www.finops.org/community/finops-podcast/
  2. Infracost lets engineers see a cost breakdown and understand costs before making changes in the terminal, VS Code, or pull requests. https://www.infracost.io/
  3. Cloud Custodian – “Cloud Custodian is a tool that unifies the dozens of tools and scripts most organizations use for managing their public cloud accounts into one open source tool” – https://cloudcustodian.io/
  4. FinOut – A Holistic Cost Management Solution For Your Cloud. I recently participated in a demo and that looks super interesting. https://www.finout.io/
  5. Startup guide to data cost optimization – my post summarizing AWS’s ebook about data cost optimization for startups – https://tomron.net/2023/06/01/startup-guide-to-data-cost-optimization-summary/
  6. Twitter thread I wrote in Hebrew about the book – https://twitter.com/tomron696/status/1657686198327062529

Startup guide to data cost optimization – summary

I read a lot about FinOps and cloud cost optimization those days and I came across AWS short ebook about data cost optimization

Cost optimization is part of AWS’s well-architected framework. When we think about cost optimization, we usually only consider computing resources, while there are significant optimizations that can go beyond that – storage optimization, network, etc.

Below is a combination of the six sections that appear in the e-books with some comments –

Optimize the cost of information infrastructure – the main point in this section is to use Graviton instances where applicable.

Decouple storage data from compute data – 5 suggestions here which are pretty standard –

  1. Compress data when applicable, and use optimal data structures for your task.
  2. Consider data temperature when choosing data store and storage class – use the suitable s3 storage class and manage it using a life-cycle policy.
  3. Use low-cost compute resources, such as Spot Instances, when applicable – I have some dissonance here since I’m not sure that spot instances are attractive those days (see here), specifically with the overhead of taking care of preempted instances. 
  4. Deploy compute close to data to reduce data transfer costs – trivial.
  5. Use Amazon S3 Select and Amazon S3 Glacier Select to reduce data retrieval – Amazon S3 Select has several limitations (see here), so I’m not sure it is worth the effort and better query via Athena.

Plan and provision capacity for predictable workload usage

  1. Choosing the right instance type based on workload pattern and growth  – is common sense. You’ll save a little less if you purchase convertible reserve instances. However, in a fast-changing startup environment, there is a higher chance the commitment won’t be underutilized.
  2. Deploying rightsizing based on average or medium workload usage – this contradicts best practices described in Cloud FinOps book, so I’m a bit hesitant here.
  3. Using automatic scaling capabilities to meet peak demand – is the most relevant advice in this section. Use auto-scaling groups or similar to accommodate for both performance and cost.

Access capacity on demand for unpredictable workloads

  1. Use Amazon Athena for ad hoc SQL workloads – as mentioned above, I prefer Athena over AWS S3 Select.
  2. Use AWS Glue instead of Amazon EMR for infrequent ETL jobs – I don’t have a strong opinion here, but if you have a data strategy in mind, I will try to adjust to it. Additionally, I feel that other AWS can be even easier and cost-effective to work with—for example,  Apache Spark in Amazon Athena, step functions, etc.
  3. Use on-demand resources for transient workloads or short-term development and testing needs – having said that, you should still keep an eye on your production services, ensure they are utilized correctly and rightsize them if needed.

Avoid data duplication with a centralized storage layer

Implement a central storage layer to share data among tenants – I would shorten it to saying, “have a data strategy” – where you are, where you want to go, etc., which is not trivial in early startup days.

Leverage up to $100,000 in AWS Activate credits

This might be a bit contracting to the rest of the document since it feels like free money and delays your concern about cloud costs.