Learn in Public – week 02

I started this week with deeplearning.ai’s course on semantic caching, created in collaboration with Redis. That sent me down a rabbit hole, exploring different LLM caching strategies and the products that support them.

One such product is AWS Bedrock Prompt Caching. If large parts of your prompts are static (specifically, the prefixes), retokenizing the prefix on every request is a waste of time and money. Prompt or context caching lets you process the prefix once and store it, reducing costs and improving performance.

Sounds great, right? Let’s check the pricing mode. If your requests are more than 5 minutes apart, your cache will be cleared. If your requests are short, caching won’t be activated; if the cache hit rate is low, you will pay an extra, non-usage-based premium for cache writes. I highly recommend reading the “How Much Does Bedrock Prompt Caching Cost?” section in the article “Amazon Bedrock Prompt Caching”.

4 AWS re:Invent announcment to check

AWS re:Invent 2025 took place this week, and as always, dozens of announcements were unveiled. At the macro level, announcing Amazon EC2 Trn3 UltraServers for faster, lower-cost generative AI training can make a significant difference in the market, which is primarily biased towards Nvidia GPUs. At the micro-level, I chose four announcements that I find compelling and relevant for my day-to-day.

AWS Transform custom – AWS Transform enables organizations to automate the modernization of codebases at enterprise scale, including legacy frameworks, outdated runtimes, infrastructure-as-code, and even company-specific code patterns. The custom agent applies those transformation rules defined in documentation, natural language descriptions, or code samples consistently across the organization’s repositories.

Technical debt tends to accumulate quietly, damaging developer productivity and satisfaction. Transform custom wishes to “crush tech debt” and free up developers to focus on innovation instead. For organizations managing many microservices, legacy modules, or long-standing systems, this could dramatically reduce the maintenance burden and risk and increase employees’ satisfaction and retention over time.

https://aws.amazon.com/blogs/aws/introducing-aws-transform-custom-crush-tech-debt-with-ai-powered-code-modernization

Partially complementary, AWS introduced 2 frontier agents in addition to the already existing Kiro agent –

DevOps agent – an on-call / incident-response agent that integrates across monitoring, code repos, and service tickets to detect root causes and coordinate responses.
https://aws.amazon.com/blogs/aws/aws-devops-agent-helps-you-accelerate-incident-response-and-improve-system-reliability-preview/
Security agent – an agent that proactively secures applications throughout the entire development lifecycle by conducting automated design and code reviews, and performing on-demand penetration testing
https://aws.amazon.com/blogs/aws/new-aws-security-agent-secures-applications-proactively-from-design-to-deployment-preview/

AWS Lambda Durable Functions – Durable Functions enable building long-running, stateful, multi-step applications and workflows – directly within the serverless paradigm. Durable functions support a checkpoint-and-replay model: your code can pause (e.g., wait for external events or timeouts) and resume within 1 year without incurring idle compute costs during the pause.

Many real-world use cases, such as approval flows, background jobs, human-in-the-loop automation, and cross-service orchestration, require durable state, retries, and waiting. Previously, these often required dedicated infrastructure or complex orchestration logic. Durable Functions enable teams to build more robust and scalable workflows and reduce overhead.

https://aws.amazon.com/blogs/aws/build-multi-step-applications-and-ai-workflows-with-aws-lambda-durable-functions

AWS S3 Vectors (General Availability) – Amazon S3 Vectors was announced about 6 months ago and is now generally available. This adds native vector storage and querying capabilities to S3 buckets. That is, you can store embedding/vector data at scale, build vector indexes, and run similarity search via S3, without needing a separate vector database. The vectors can be enriched with metadata and integrated with other AWS services for retrieval-augmented generation (RAG) workflows. I think of it as “Athena” for embeddings.

This makes it much easier and cost-effective for teams to integrate AI/ML features – even if they don’t want to manage a dedicated vector DB and reduces the barrier to building AI-ready data backends.

https://aws.amazon.com/blogs/aws/amazon-s3-vectors-now-generally-available-with-increased-scale-and-performance

Amazon SageMaker Serverless Customization – Fine-Tuning Models Without Infrastructure – AWS announced a new capability that accelerates model fine-tuning by eliminating the need for infrastructure management. Teams can upload a dataset and select a base model, and SageMaker handles the fine-tuning pipeline, scaling, and optimization automatically – all in a serverless, pay-per-use model. This customized model can also be deployed using Bedrock for Serverless inference. It is a game-changer, as serving a customized model was previously very expensive. This feature makes fine-tuning accessible to far more teams, especially those without dedicated ML engineers.

https://aws.amazon.com/blogs/aws/new-serverless-customization-in-amazon-sagemaker-ai-accelerates-model-fine-tuning

These are just a handful of the (many) announcements from re:Invent 2025, and they represent a small, opinionated slice of what AWS showcased. Collectively, they highlight a clear trend: Amazon is pushing hard into AI-driven infrastructure and developer automation – while challenging multiple categories of startups in the process.

While Trn3 UltraServers aim to chip away at NVIDIA’s dominance in AI training, the more immediate impact may come from the developer- and workflow-focused releases. Tools like Transform Custom, the new frontier agents, and Durable Functions promise to reduce engineering pain – if they can handle the real, messy complexity of enterprise systems. S3 Vectors and SageMaker Serverless Customization make it far easier to adopt vector search and fine-tuning without adding a new operational burden.

From Demo Hell to Scale: Two Takes on Building Things That Last

I recently came across two blog posts that made me think, especially in light of a sobering statistic I’ve seen floating around: a recent MIT study reports that 95% of enterprise generative AI pilots fail to deliver real business impact or move beyond demo mode.

One post is a conversation with Werner Vogels, Amazon’s long-time CTO, who shares lessons from decades of building and operating systems at internet scale. The other, from Docker, outlines nine rules for making AI proof-of-concepts that don’t die in demo land.

Despite their different starting points, I was surprised by how much the posts resonated with one another. Here’s a short review of where they align and where they differ.

Where They Agree

Solve real problems, not hype – Both warn against chasing the “cool demo.” Docker calls it “Solve Pain, Not Impress”, while Vogels is blunt: “Don’t build for hype.” This advice sounds obvious, but it’s easy to fall into the trap of chasing novelty. Whether you’re pitching to executives or building at AWS scale, both warn that if you’re not anchored in a real customer pain, the project is already off track.
Build with the end in mind – Neither believes in disposable prototypes. Docker advises to design for production from day zero—add observability, guardrails, testing, and think about scale early. Vogels echoes with “What you build, you run”, highlighting that engineers must take ownership of operations, security, and long-term maintainability. Both perspectives converge on the same principle: if you don’t build like it’s going to live in production, it probably never will.
Discipline over speed – Both posts emphasize discipline over blind speed. Docker urges teams to embed cost and risk awareness into PoCs, even tracking unit economics from day one. Vogels stresses that “cost isn’t boring—it’s survival” and frames decision-making around reversibility: move fast when you can reverse course, slow down when you can’t. Different wording, same idea: thoughtful choices early save pain later.

Where They Differ

Scope: the lab vs. the long haul – Docker’s post is tightly focused on how to build POCs in the messy realities of AI prototyping and how to avoid “demo theater” and make something that survives first contact with production. Vogels’ advice is broader, aimed at general engineering, technology leadership, infrastructure, decision-making at scale, and organization-level priorities. Vogels speaks from decades of running Amazon-scale systems, where the horizon is years, not weeks.
Tactics vs. culture – Docker’s advice is concrete and technical: use remocal workflows, benchmark early, add prompt testing to CI/CD. Vogels is less about specific tools and more about culture: engineers owning what they build, organizations learning to move fast on reversible decisions, and leaders setting clarity as a cultural value. Docker tells you what to do. Vogels tells you how to think.
Organizational Context and Scale – Docker speaks to teams fighting to get from zero to one—making PoCs credible beyond the demo stage. Vogels speaks from AWS’s point of view, where the challenge is running infrastructure that millions rely on. Docker’s post is about survival; Vogels is about resilience at scale.

What strikes me about these two perspectives is how perfectly they complement each other. Docker’s advice isn’t really about AI – it’s about escaping demo hell by building prototypes with production DNA from day one. Vogels tackles what happens when you actually succeed: keeping systems reliable when thousands depend on them. They’re describing the same journey from different ends. Set up your prototypes with the right foundations, and you dramatically increase the odds that your product will one day face the kinds of scale and resilience questions Vogels addresses.

AWS has entered the building

AWS has released several notable announcements within the LLM ecosystem over the last few days.

Introducing Amazon S3 Vectors (preview) – Amazon S3 Vectors is a durable, cost-efficient vector storage solution that natively supports large-scale AI-ready data with subsecond query performance, reducing storage and query costs by up to 90%.

Why I find it interesting –

Balancing cost and performance – i.e., storing on a database is more expensive but yields better results. If you know what the “hot vectors” are, you can store them in the database and store the rest in S3.
Designated buckets – it started with table buckets and has now evolved to vector buckets. Interesting direction.

Launch of Kiro – the IDE market is on fire with OpenAI’s acquisition falling apart, Claude code and cursor competition, and now Amazon reveals Kiro with the promise – “helps you do your best work by bringing structure to AI coding with spec-driven development”

Why I find it interesting –

At first, I wondered why AWS entered this field, but I assume it is a must-have these days, and might lead to higher adoption of their models or Amazon Q.
The different IDEs and CLI tools are influenced by each other so it will be interesting to see how a new player influences this space.

Strand agents are now at v1.0.0 – Strand Agents are an AWS open-source SDK that enables building and running AI agents across multiple environments and models, with many pre-built tools that are easy to use.

Why I find it interesting –

The bedrock agents interface was limiting for a production-grade agent, specifically in terms of deployment modes, model support, and observability. Strand agents open many more doors.
There are many agent frameworks out there (probably two more were released while you read this post). Many of them experience different issues when working with AWS Bedrock. If you are using AWS as your primary cloud provider, it should be a leading candidate.

5 interesting things (11/05/2025)

Agents app design pattern – this is a back-to-basics adaptation. How would we read this 14 years from now? Would the ideas he mentioned there be a standard?

https://github.com/humanlayer/12-factor-agents

The original document “12 factors app” is also worth reading (note that it was first published 2011+-) –

https://12factor.net

When the Agents Go Marching In: Five Design Paradigms Reshaping Our Digital Future

This post complements the previous one, covering the same topics. If you are in a hurry, jump to the “The Reinvention of UX: Five Emerging Paradigms” section. I feel that I cope with all those aspects, e.g., building trust, transparency, cognitive load distribution, etc., on a daily basis.

https://medium.com/cyberark-engineering/when-the-agents-go-marching-in-five-design-paradigms-reshaping-our-digital-future-a219009db198

Using Alembic to create and seed a database
Seeding a database is essential for testing, development, and ensuring consistent application behavior across different environments. Alembic is a lightweight database migration tool for Python, designed to work seamlessly with SQLAlchemy.

We use Alembic to manage our database migrations, and I recently needed to seed our database for consistency across environments. I looked for several solutions and eventually used the solution in this post to create a migration that seeds the database –

https://medium.com/@fimd/using-alembic-to-create-and-seed-a-database-8f498638c406

A Field Guide to Rapidly Improving AI Products – while this post focuses on AI products, specifically ones LLM-based, multiple lessons can also be adapted to non-LLM-based AI products and general products. Conducting an error analysis, generating synthetic data (preferably with domain express), and using a data viewer are good starting points.

https://hamel.dev/blog/posts/field-guide/

I Tried Running an MCP Server on AWS Lambda… Here’s What Happened – this post involves two topics I think a lot about these days – MCP and serverless computing. I think it is clear why I think a lot about MCPs. But why do I think about serverless computing? I think of it as a low-cost solution for early-stage startups. Early-stage startups usually have low traffic, which does not justify the cost of having servers up 24/7. On the other hand, the serverless development experience still needs some refinement, and there are services that one would like to host that do not support running in a serverless manner.

https://www.ranthebuilder.cloud/post/mcp-server-on-aws-lambda

5 interesting thing (28/03/2025)

PgAI – LLMs have been part of everyday life already for a while. One aspect I think has not been explored well so far is using them as part of ETL. The implementations I have seen so far don’t take advantage of batch APIs and are not standardized to enable the easy replacement of a model. Having said that, I believe those hurdles will be overcome soon.

https://github.com/timescale/pgai

Related links

Snowflake cortex complete – call llm functions from snowflake https://docs.snowflake.com/en/developer-guide/snowpark-ml/reference/1.5.0/api/model/snowflake.cortex.Complete)
OpenAI Postgres extension – allows calling OpenAI or local models using Ollama. https://www.crunchydata.com/blog/accessing-large-language-models-from-postgresql
Do you use LLMs in your ETL pipelines – Reddit discussion regarding using LLMs in ETL pipelines. There are pros and cons of course, specially for cases you need a deterministic answer but there are also other cases –
https://www.reddit.com/r/dataengineering/comments/1h17gjm/do_you_use_llms_in_your_etl_pipelines/

Life Altering PostgreSql Patterns – a back-to-basics post. I agree with most of the points mentioned there, specifically around adding creaetd_at, updated_at, and deleted_at attributes to all tables and saving state data as logs rather than saving only the latest state. I found the section about enum tables interesting. This is the first time I was exposed to this idea, and the ability to add a description or metadata is excellent.

https://mccue.dev/pages/3-11-25-life-altering-postgresql-patterns

Via this post, I learned about the on update cascade option, you can read more about it here – https://medium.com/geoblinktech/postgresql-foreign-keys-with-condition-on-update-cascade-330e1b25b6e5

AI interfaces of the future – I usually don’t share videos, but I think this talk is thought-provoking for several reasons –

Gen UI patterns – an emerging field, the talk reviews several products and highlights good and destructive patterns. Some of the patterns, like suggestions or auto-complete, are transparent to us but are present in many products we know, and that’s something important to notice when you build such a product.
Product review: Knowing what is out there is good for inspiration, ideas, and understanding the competitive landscape. However, new products are coming out every day, and it is hard to track all of them.

Simplify Your Tech Stack: Use PostgreSQL for Everything – Two widespread tensions, especially in startups, are build vs. buy conflicts and using specialized products or technologies (e.g., different databases) that are top of the breed but not many people can use and maintain vs. more common technology that more people can maintain but can have performance drawbacks or other limitations. Mainly working in startups, I usually prefer to use standard technology to run faster, knowing that the product, focus, and priorities often change. With that being said, I acknowledge that early adoption of new technologies can be life-changing for a startup, but figuring out what to bet on is hard.

https://medium.com/timescale/simplify-your-tech-stack-use-postgresql-for-everything-f77c96026595

CDK Monitoring Constructs – if you are using AWS CDK as your IAC tool, CDK monitoring constructs enable you to create cloudwatch alarms and dashboards almost out of the box. I wish they would release and add additional options at a faster pace.

https://pypi.org/project/cdk-monitoring-constructs/

5 interesting things (31/05/2024)

How we built Text-to-SQL at Pinterest – Text-to-SQL and vice versa became one of the canonical examples of LLM, and every product needs one. The post described a very interesting work that can be implemented relatively easily. I relate the most to the closing paragraph, which emphasizes the gap between demos, tutorials, benchmarks, and real-world use cases. – “It would be helpful for applied researchers to produce more realistic benchmarks which include a larger amount of denormalized tables and treat table search as a core part of the problem.”

https://medium.com/pinterest-engineering/how-we-built-text-to-sql-at-pinterest-30bad30dabff

(p.s I mentioned post in a recent LinkedIn post – LLMs in the enterprise – looking beyond the hype on what’s possible today)

How an empty S3 bucket can make your AWS bill explode – this story completely blew my mind (and gladly not my account). I was happy to see that AWS is looking into this issue and wondered if in bigger accounts, such anomalies could get unnoticed.

https://medium.com/@maciej.pocwierz/how-an-empty-s3-bucket-can-make-your-aws-bill-explode-934a383cb8b1

The Design Philosophy of Great Tables – great_tables is a Python package for creating wonderful-looking tables. This post shares its visual design philosophy and is worth reading if you create tables even if you will not use this package.

https://posit-dev.github.io/great-tables/blog/design-philosophy/

1-measure-3-1 – a variation of the 1-3-1 problem-solving method for making proposals. I found it specifically effective for engineers as it is structured and focused.

https://www.annashipman.co.uk/jfdi/1-measure-3-1.html

On Making Mistakes — I love it when people combine experience or knowledge in one field or domain with another. For example, someone brings her experience as a soccer player to managing a team, or someone uses lessons he learned as a supermarket cashier to software architecture. This post discusses making mistakes and working through them and refers to several domains, including improv, chess, and F1 team management.

https://read.perspectiveship.com/p/on-making-mistakes

5 interesting things (09/02/2024)

Closing the women’s health gap: A $1 trillion opportunity to improve lives and economies – a McKinsey report that highlights the gender health gap and points to the opportunity – potential for a $1 trillion economic gain with additional societal impact. One interesting point is that there are gaps and flaws throughout the value chain – drug effectiveness, therapy access, research functions, etc. This hints that there are many opportunities out there that can make a significant impact.

https://www.mckinsey.com/mhi/our-insights/closing-the-womens-health-gap-a-1-trillion-dollar-opportunity-to-improve-lives-and-economies

Slashing Data Transfer Costs in AWS by 99% – one of the costs developers often forget or dismiss when considering architecture is the cost of data transfer. The solution described in this post is elegant and demonstrates the effect of deep knowledge and understanding of the domain. Simple to trivial architectural decisions can cost so much.

https://www.bitsand.cloud/posts/slashing-data-transfer-costs

3 questions that will make you a phenomenal rubber duck – I previously mentioned that debugging skills are essential, and it is important to iterate and refine them. I especially liked the 3rd question – “If your hypothesis were wrong, how could we disprove it?” as it forces one to think the other way around and see a slightly bigger picture.

https://blog.danslimmon.com/2024/01/18/3-questions-that-will-make-you-a-phenomenal-rubber-duck

Product Managing to Prevent Burnout – burnout is more common than we think and can have many causes. Moreover, different people would react differently to different cultures and would burn out or not burn out accordingly. The most important takeaway is that managing and controlling burnout is a team sport; it is not only the concern of the direct manager, but product managers can also participate in this effort. (I strongly recommend the honeycomb blog)

https://www.honeycomb.io/blog/product-managing-prevent-burnout

The “errors” that mean you’re doing it right – I was able to identify or witness almost all the errors mentioned in the post. I also think some of those errors, such as Letting someone go soon after hiring, Pivoting a strategy just after creating it, etc, could be attributed to the sunk cost fallacy. And if we want to make the opening sentence more extreme – “If you don’t make mistakes, you’re not working”.

https://longform.asmartbear.com/good-problems-to-have

Few thoughts on Cloud FinOps Book

I just completed “Cloud FinOps” book by J.R. Storment and Mike Fuller, and here are a few thoughts –

At first, I wondered whether I should read the 1st edition, which I had easy access to, or the 2nd, which I had to buy. After reading a sample, I decided to buy the 2nd edition and am glad. This domain and community move quickly; a 2019 version would have been outdated and misleading.
FinOps involves a paradigm shift – developers should consider not only the performance of their architecture (i.e., memory and CPU consumption, speed, etc.) but the cost associated with the resources they will use. Procurement is not done and approved by the finance team anymore. Developers’ decisions can have a significant influence on the cloud bill. FinOps teams bridge the engineering and finance teams (and more) and speak the language of all parties, along with additional skill sets and an overview of the entire organization.
A general rule of thumb regarding commitments –
1. Longer commitment period (3 years → 1 year) = lower price (higher discount)
2. More upfront (full upfront → partial upfront → no upfront )= lower price (higher discount)
3. More specific (RI → Convertible RI → SP, region, etc.) = lower price (higher discount)
The FinOps team should be up-to-date about the new cloud technologies updates and cost reduction options. I have been familiar with reserve and spot instances for a long time, but there are many other cost reduction options bits and bytes to pay attention to. For example, the following 2 points –
1. When purchasing saving plans (SP), which are monetary as appose to resource units commitments, the spend amount you commit to is post discount. Moreover, AWS will apply the SP to the resources that yield the highest discount. This implies that the discount rate diminishes when committing to more money.
2. CloudFront security savings bundle (here) is a saving plan that ties together the usage of CloudFront and WAF. The book predicts that such plans, e.g., combining multiple product usage, will become common soon.
Commitments (e.g., SP, RI) are one of many ways to reduce costs. Removing idle resources (e.g., unattached drives), using correct storage classes (e.g., infrequent access, glacier), or making architecture changes (e.g., rightsizing, moving from servers to serverless, going via VPC endpoints, etc.) can help avoid and reduce cost. Those activities can happen in parallel – centralized FinOps team to manage commitments (aka cost reduction) and decentralized engineering teams optimize the resources they use (aka cost avoidance). Ideally, it is a tango. Each team moves a little step at a time to optimize their part.
The FinOps domain-specific knowledge goes even further. For example, costs that engineers tend to miss or wrongly estimate e.g. network traffic cost, number of events, data storage events.
The inform phase is part of the FinOps lifecycle – making the data available to the relevant participants. The Prius effect, i.e., real-time feedback, instantly influences behavior even without explicit recommendations or guidance. Visualizations (done right) can help understand and react to the data better. A point emphasized multiple times in the book – put the data in the path of the engineers or any other stakeholder. Don’t ask them to log in to a different system to review the data; integrate with existing systems they use regularly.

Few resources I find helpful –

FinOps foundation website – includes many resources and community knowledge – https://www.finops.org/introduction/what-is-finops/

FinOps podcast – https://www.finops.org/community/finops-podcast/
Infracost lets engineers see a cost breakdown and understand costs before making changes in the terminal, VS Code, or pull requests. https://www.infracost.io/
Cloud Custodian – “Cloud Custodian is a tool that unifies the dozens of tools and scripts most organizations use for managing their public cloud accounts into one open source tool” – https://cloudcustodian.io/
FinOut – A Holistic Cost Management Solution For Your Cloud. I recently participated in a demo and that looks super interesting. https://www.finout.io/
Startup guide to data cost optimization – my post summarizing AWS’s ebook about data cost optimization for startups – https://tomron.net/2023/06/01/startup-guide-to-data-cost-optimization-summary/
Twitter thread I wrote in Hebrew about the book – https://twitter.com/tomron696/status/1657686198327062529

	Nicole S on 5 Python NLP pacakges
	blissful4bdd2399fa on CSV to radar plot
	tom on CSV to radar plot
	Matt on CSV to radar plot
	“ – Tom… on 📚 Book club Q1 2024 – 3…