LLM Debt: The Double-Edged Sword of AI Integration

Have you noticed how half the posts on LinkedIn these days feel like they were written by an LLM – too many words for too little substance? Or how product roadmaps suddenly include “AI features” that nobody asked for, just because it sounds good in a pitch deck? Or those meetings where someone suggests “let’s use GPT for this,” when a simple SQL query, an if-statement, or a much simpler ML model would do the job?

Laurence Tratt recently coined the term LLM inflation to describe how humans use LLMs to expand simple ideas into verbose prose, only for others to shrink them back down. That concept got me thinking about a related phenomenon: LLM debt.

LLM debt is the growing cost of misusing LLMs — by adding them where they don’t belong and neglecting them where they could help.

We’re all familiar with technical debt, product debt, and design debt1 — the shortcuts or missed opportunities that slow us down over time. Similarly, organizations are quietly accumulating LLM debt.

So, what does LLM debt look like in practice? It’s a double-edged liability:

  • Overuse: Integrating LLMs where they’re unnecessary adds latency, complexity, cost, and stochasticity to systems that could be simpler, faster, and more reliable without them. For example, sending every API request through a multimillion-parameter model when a simple regex or deterministic logic would suffice.
  • Underuse: Failing to adopt LLM-based tools where they could genuinely help results in wasted effort and missed opportunities. Think of teams manually triaging support tickets, writing repetitive documentation, or analyzing text data by hand when an LLM could automate much of the work.

Like product or technical debt, a small amount of LLM debt can be strategic: it allows experimentation, faster prototyping, or proof-of-concept development. However, left unmanaged, it compounds, creating systems that are over-engineered in some areas and under-leveraged in others, which slows product evolution and innovation. Same as other types of debt, it should be owned and managed.

LLMs are powerful, but they come with costs. Just as we track and manage technical debt, we need to recognize, measure, and pay down our LLM debt. That means asking tough questions before adding LLMs to the stack, and also being bold enough to leverage them where they could provide real value.

If LLM inflation showed us how words can expand and collapse in unhelpful cycles, LLM debt shows us how our systems can quietly accumulate inefficiencies that slow us down. Recognizing it early is the key to keeping our products lean, intelligent, and future-ready.

  1. I previously wrote about some of those topics here and talked about them in Hebrew here ↩︎

Things I recently built with my daughter

My daughter just finished first grade and is about to turn seven. One day, she noticed me playing a math puzzle on a news site and asked if she could join. I told her that when an easy one came along, I’d be happy to solve it with her.

A few days passed. She asked again.

I didn’t want to say no, so I offered to sketch a similar game on paper. But designing a good puzzle turned out to be more challenging than expected. I scribbled a lightweight version instead—and that sparked an idea. I asked if she’d like to build a game together.

1. Math Grid

🔗 Play Math Grid

This was our first game. It’s a visual puzzle: players are given a grid with some numeric constraints on rows and columns, and the goal is to fill it in correctly. Inspired by logic puzzles like Sudoku and Kakuro, the constraints are simplified for early elementary-level arithmetic, so it stays approachable while still encouraging reasoning.

2. Number Detector

🔗 Play Number Detector

Later, while solving exercises from her school workbook, we came across number riddles. This inspired our second app. In Number Detector, the player is given partial clues (like a number’s sum of digits or a multiple constraint) and has to figure out the correct number from a limited set.

We added randomness and regeneration to keep things fresh, making the practice repeatable and varied.

3. Triangle Tactics

🔗 Play Triangle Tactics

A month later, while playing a pen-and-paper strategy game, she casually said: “Let’s make it into an app—like we did in the old days.” 🙂

Her request blew my mind.

We built Triangle Tactics, a simple turn-based game where players connect dots to form triangles, and the player who forms the most triangles wins. What surprised me most was the technical challenge—not in the game logic, but in rendering the dots and lines correctly. It was solved only after switching to a different LLM model than the default.


Reflections

These experiments turned into more than just games. They became opportunities to:

  • Understand the importance of English: We worked in English—despite it not being our native language—and talked about why fluency matters, especially when working with tools, prompts, or documentation.
  • Explore computers as collaborators: “What do you mean it misunderstood the prompt?”, “How should we ask to get the result we want?”
  • Practice QA thinking: “Why isn’t this button working?”, “Is it behaving the way we expected?”
  • Enjoy quality time: A fun excuse to step outside our usual routines and build something together.

AWS has entered the building

AWS has released several notable announcements within the LLM ecosystem over the last few days.

Introducing Amazon S3 Vectors (preview) – Amazon S3 Vectors is a durable, cost-efficient vector storage solution that natively supports large-scale AI-ready data with subsecond query performance, reducing storage and query costs by up to 90%.

Why I find it interesting –

  1. Balancing cost and performance – i.e., storing on a database is more expensive but yields better results. If you know what the “hot vectors” are, you can store them in the database and store the rest in S3.
  2. Designated buckets – it started with table buckets and has now evolved to vector buckets. Interesting direction.

Launch of Kiro – the IDE market is on fire with OpenAI’s acquisition falling apart, Claude code and cursor competition, and now Amazon reveals Kiro with the promise – “helps you do your best work by bringing structure to AI coding with spec-driven development”

Why I find it interesting –

  1. At first, I wondered why AWS entered this field, but I assume it is a must-have these days, and might lead to higher adoption of their models or Amazon Q.
  2. The different IDEs and CLI tools are influenced by each other so it will be interesting to see how a new player influences this space.

Strand agents are now at v1.0.0 – Strand Agents are an AWS open-source SDK that enables building and running AI agents across multiple environments and models, with many pre-built tools that are easy to use.

Why I find it interesting –

  1. The bedrock agents interface was limiting for a production-grade agent, specifically in terms of deployment modes, model support, and observability. Strand agents open many more doors.
  2. There are many agent frameworks out there (probably two more were released while you read this post). Many of them experience different issues when working with AWS Bedrock. If you are using AWS as your primary cloud provider, it should be a leading candidate.

5 interesting things (11/05/2025)

Agents app design pattern – this is a back-to-basics adaptation. How would we read this 14 years from now? Would the ideas he mentioned there be a standard?

https://github.com/humanlayer/12-factor-agents

The original document “12 factors app” is also worth reading (note that it was first published 2011+-) –

https://12factor.net

When the Agents Go Marching In: Five Design Paradigms Reshaping Our Digital Future

This post complements the previous one, covering the same topics. If you are in a hurry, jump to the “The Reinvention of UX: Five Emerging Paradigms” section. I feel that I cope with all those aspects, e.g., building trust, transparency, cognitive load distribution, etc., on a daily basis.

https://medium.com/cyberark-engineering/when-the-agents-go-marching-in-five-design-paradigms-reshaping-our-digital-future-a219009db198

Using Alembic to create and seed a database

Seeding a database is essential for testing, development, and ensuring consistent application behavior across different environments. Alembic is a lightweight database migration tool for Python, designed to work seamlessly with SQLAlchemy.

We use Alembic to manage our database migrations, and I recently needed to seed our database for consistency across environments. I looked for several solutions and eventually used the solution in this post to create a migration that seeds the database –

https://medium.com/@fimd/using-alembic-to-create-and-seed-a-database-8f498638c406

A Field Guide to Rapidly Improving AI Products – while this post focuses on AI products, specifically ones LLM-based, multiple lessons can also be adapted to non-LLM-based AI products and general products. Conducting an error analysis, generating synthetic data (preferably with domain express), and using a data viewer are good starting points.

https://hamel.dev/blog/posts/field-guide/

I Tried Running an MCP Server on AWS Lambda… Here’s What Happened – this post involves two topics I think a lot about these days – MCP and serverless computing. I think it is clear why I think a lot about MCPs. But why do I think about serverless computing? I think of it as a low-cost solution for early-stage startups. Early-stage startups usually have low traffic, which does not justify the cost of having servers up 24/7. On the other hand, the serverless development experience still needs some refinement, and there are services that one would like to host that do not support running in a serverless manner.

https://www.ranthebuilder.cloud/post/mcp-server-on-aws-lambda

5 interesting thing (28/03/2025)

PgAI – LLMs have been part of everyday life already for a while. One aspect I think has not been explored well so far is using them as part of ETL. The implementations I have seen so far don’t take advantage of batch APIs and are not standardized to enable the easy replacement of a model. Having said that, I believe those hurdles will be overcome soon.

https://github.com/timescale/pgai

Related links

Life Altering PostgreSql Patterns – a back-to-basics post. I agree with most of the points mentioned there, specifically around adding creaetd_at, updated_at, and deleted_at attributes to all tables and saving state data as logs rather than saving only the latest state. I found the section about enum tables interesting. This is the first time I was exposed to this idea, and the ability to add a description or metadata is excellent.

https://mccue.dev/pages/3-11-25-life-altering-postgresql-patterns

Via this post, I learned about the on update cascade option, you can read more about it here – https://medium.com/geoblinktech/postgresql-foreign-keys-with-condition-on-update-cascade-330e1b25b6e5

AI interfaces of the future – I usually don’t share videos, but I think this talk is thought-provoking for several reasons –

  • Gen UI patterns – an emerging field, the talk reviews several products and highlights good and destructive patterns. Some of the patterns, like suggestions or auto-complete, are transparent to us but are present in many products we know, and that’s something important to notice when you build such a product.
  • Product review: Knowing what is out there is good for inspiration, ideas, and understanding the competitive landscape. However, new products are coming out every day, and it is hard to track all of them.

Simplify Your Tech Stack: Use PostgreSQL for Everything – Two widespread tensions, especially in startups, are build vs. buy conflicts and using specialized products or technologies (e.g., different databases) that are top of the breed but not many people can use and maintain vs. more common technology that more people can maintain but can have performance drawbacks or other limitations. Mainly working in startups, I usually prefer to use standard technology to run faster, knowing that the product, focus, and priorities often change. With that being said, I acknowledge that early adoption of new technologies can be life-changing for a startup, but figuring out what to bet on is hard.

https://medium.com/timescale/simplify-your-tech-stack-use-postgresql-for-everything-f77c96026595

CDK Monitoring Constructs – if you are using AWS CDK as your IAC tool, CDK monitoring constructs enable you to create cloudwatch alarms and dashboards almost out of the box. I wish they would release and add additional options at a faster pace.

https://pypi.org/project/cdk-monitoring-constructs/

Better Plotly Bar Chart

I’m reading “storytelling with data” by Cole Nussbaumer Knaflic. I plan to write my thoughts and insights from the book once I finish it. For now, I wanted to play with it a bit and create a better bar chart visualization that –

  1. Highlight the category you find most important and assign a special color to it (i.e prominent_color in the code), while the remaining categories used the same color (i.e. latent_color)
  2. Remove grids and make the background and paper colors the same to remove cognitive load.

The implementation is flexible, so if you feel like changing one of the settings(i.e., show the grid lines or center the title) you can pass it via keyword arguments when calling the function.


from typing import Any
import pandas as pd
import plotly.graph_objects as go
import pandas as pd
def barchart(
df: pd.DataFrame, x_col: str, y_col: str,
title: str | None = None,
latent_color : str = 'gray',
prominent_color: str = 'orange',
prominent_value: Any | None = None,
**kwargs: dict,
) -> go.Figure:
"""_summary_
Args:
df (pd.DataFrame): Dataframe to plot
x_col (str): Name of x coloumn
y_col (str): Name of y coloumn
title (str | None, optional): Chart title. Defaults to None.
latent_color (str, optional): Color to use for the values we don't want to highlight. Defaults to 'gray'.
prominent_color (str, optional): Color to use for the value we want to highlight. Defaults to 'orange'.
prominent_value (Any | None, optional): Value of the category we want to highlight. Defaults to None.
Returns:
go.Figure: Plotly figure object
"""
colors = (df[x_col] == prominent_value).replace(False, latent_color).replace(True, prominent_color).to_list()
fig = go.Figure(data=[
go.Bar(
x=df[x_col],
y=df[y_col],
marker_color=colors
)],
layout=go.Layout(
title=title,
xaxis=dict(title=x_col, showgrid=False),
yaxis=dict(title=y_col, showgrid=False),
plot_bgcolor='white',
paper_bgcolor='white'
)
)
fig.update_layout(**kwargs)
return fig
if __name__ == "__main__":
data = {'categories': ['A', 'B', 'C', 'D', 'E'],
'values': [23, 45, 56, 78, 90]}
df = pd.DataFrame(data)
fig = barchart(df, 'categories', 'values', prominent_value='C', title='My Chart', yaxis_showgrid=True)
fig.show()

📚 Book club Q2 2024 – 3 Reading recommendations

This quarter, I read 3 books that relate to France and the resistance during the 2nd world War –

  • Code Name Hélène by Ariel Lawhon. This thriller is based on Nancy Wake’s story. It also reminded me of “Agent Sonya” by Ben Macintyre, which I read a while ago and enjoyed.
  • The Paris Architect by Charles Belfoure – is a touching story with lots of imagination but I was a bit disappointed it was not based on a real story.
  • The Dressmaker’s Secret by Rosalie Ham – this book tells the story of Coco Chanel during WWII from the prism of her personal assistant or more accurately from the granddaughter of her personal assistant who unpacks the story. I didn’t know about her relationships with the Nazis, and it was interesting to learn about them.

I also listened to “Setting the Table” by Danny Meyer, and it was very insightful, it even made me publish a LinkedIn post.

5 interesting things (31/05/2024)

How we built Text-to-SQL at Pinterest – Text-to-SQL and vice versa became one of the canonical examples of LLM, and every product needs one. The post described a very interesting work that can be implemented relatively easily. I relate the most to the closing paragraph, which emphasizes the gap between demos, tutorials, benchmarks, and real-world use cases. – “It would be helpful for applied researchers to produce more realistic benchmarks which include a larger amount of denormalized tables and treat table search as a core part of the problem.”

https://medium.com/pinterest-engineering/how-we-built-text-to-sql-at-pinterest-30bad30dabff

(p.s I mentioned post in a recent LinkedIn post – LLMs in the enterprise – looking beyond the hype on what’s possible today)

How an empty S3 bucket can make your AWS bill explode – this story completely blew my mind (and gladly not my account). I was happy to see that AWS is looking into this issue and wondered if in bigger accounts, such anomalies could get unnoticed.

https://medium.com/@maciej.pocwierz/how-an-empty-s3-bucket-can-make-your-aws-bill-explode-934a383cb8b1

The Design Philosophy of Great Tables –  great_tables is a Python package for creating wonderful-looking tables. This post shares its visual design philosophy and is worth reading if you create tables even if you will not use this package.

https://posit-dev.github.io/great-tables/blog/design-philosophy/

1-measure-3-1 – a variation of the 1-3-1 problem-solving method for making proposals. I found it specifically effective for engineers as it is structured and focused.

https://www.annashipman.co.uk/jfdi/1-measure-3-1.html

On Making Mistakes — I love it when people combine experience or knowledge in one field or domain with another. For example, someone brings her experience as a soccer player to managing a team, or someone uses lessons he learned as a supermarket cashier to software architecture. This post discusses making mistakes and working through them and refers to several domains, including improv, chess, and F1 team management.

https://read.perspectiveship.com/p/on-making-mistakes

📚 Book club Q1 2024 – 3 Reading recommendations

This year, I track my book reading for the first time. I don’t know if I’ll keep doing it after this year or if I’ll last until the end of the year, but for now, I’m all in. This helps me reflect on my reading and remember the books I enjoyed.

Fundable: Why Some Entrepreneurs Get Funded, And Others Do Not! by Sephi Shapira

This is the best value for my time in a long time. The book is filled with concrete and practical advice, which I immediately found myself using.

Beartown by Fredrik Backman –

That’s the perfect book for me—it has lots of sports and a human story, some gender tension, and two more books in this series (Us Against You and Winners). 

STFU: The Power of Keeping Your Mouth Shut in an Endlessly Noisy World by Dan Lyons

For long parts of this book, it was a paradox to me – why write 200+ pages to say we should all shut up – just do it.

I listened to this book after also listening to Disrupted by Lyons. I like his style and narration. When listening to Disrupted, I thought that it probably damaged his employability, which turned out to be true, as he discussed in STFU.

The book can sometimes be extreme or refer to an extreme crowd (50 out of 50 in the Talkaholics test). However, I liked the book because it discusses many aspects of our lives – friends and family, work, etc.- and a few things I can immediately adapt. For example, I lowered my cell phone usage near my kids and hope to stick with it.

In one of the last chapters, he mentions Never Split the Difference by Chris Voss and Tahl Raz, which I also recently listened to. I listened to Never Split the Difference after a friend told me that she was starting to reread this book, and after listening to it, I completely understood why she wanted to reiterate it.

I read the book to improve my negotiation skills. I’m not sure there is an immediate effect, but as the two books pointed out – I talk less and actively listen more. So I try to pause before I answer, be succinct and hum, and let the other person talk.

5 interesting things (08/03/2024)

(Almost) Every infrastructure decision I endorse or regret after 4 years running infrastructure at a startup – in my current role as a CTO of an early-stage startup, I make many choices about tools, programming languages, architecture, vendors, etc. This retrospective view was fascinating not only for the tools themselves but also for the arguments.

https://cep.dev/posts/every-infrastructure-decision-i-endorse-or-regret-after-4-years-running-infrastructure-at-a-startup/

Everything You Can Do with Python’s textwrap Module – I have used Python for more than 10 years and never heard of textwrap model. Maybe you, too, haven’t heard of it.

https://towardsdatascience.com/everything-you-can-do-with-pythons-textwrap-module-0d82c377a4c8

It was never about LLM performance – I couldn’t agree more. The performance gaps between different LLMs are becoming neglectable. Now, it is about the experience you build using those models and the guardrails you put in to ensure the experience.

https://read.technically.dev/p/it-was-never-about-llm-performance

How to build an enterprise LLM application: Lessons from GitHub Copilot – the post ends with a summary of 3 key takeaways – 

  • Identify a focused problem and thoughtfully discern an AI’s use cases.
  • Integrate experimentation and tight feedback loops into the design process
  • As you scale, continue to leverage user feedback and prioritize user needs

Those takeaways are general and correct for almost every product launch I can think of. The post provides more concrete tips for LLM applications. It is interesting to read about a product on such a scale that I use it on a daily basis.

https://github.blog/2023-09-06-how-to-build-an-enterprise-llm-application-lessons-from-github-copilot/

Speaking for Hackers – public speaking is hard. From choosing a topic, submitting a CFP, preparing your talk and slides, and wrapping it all up. Every step can be tricky, and each of us has other things that are harder for us. This site provides excellent materials for all the parts before, during, and after the talk, making it easier to step out of our shells and share the knowledge.

https://sfhbook.netlify.app/