5 interesting thing (28/03/2025)

PgAI – LLMs have been part of everyday life already for a while. One aspect I think has not been explored well so far is using them as part of ETL. The implementations I have seen so far don’t take advantage of batch APIs and are not standardized to enable the easy replacement of a model. Having said that, I believe those hurdles will be overcome soon.

https://github.com/timescale/pgai

Related links

Life Altering PostgreSql Patterns – a back-to-basics post. I agree with most of the points mentioned there, specifically around adding creaetd_at, updated_at, and deleted_at attributes to all tables and saving state data as logs rather than saving only the latest state. I found the section about enum tables interesting. This is the first time I was exposed to this idea, and the ability to add a description or metadata is excellent.

https://mccue.dev/pages/3-11-25-life-altering-postgresql-patterns

Via this post, I learned about the on update cascade option, you can read more about it here – https://medium.com/geoblinktech/postgresql-foreign-keys-with-condition-on-update-cascade-330e1b25b6e5

AI interfaces of the future – I usually don’t share videos, but I think this talk is thought-provoking for several reasons –

  • Gen UI patterns – an emerging field, the talk reviews several products and highlights good and destructive patterns. Some of the patterns, like suggestions or auto-complete, are transparent to us but are present in many products we know, and that’s something important to notice when you build such a product.
  • Product review: Knowing what is out there is good for inspiration, ideas, and understanding the competitive landscape. However, new products are coming out every day, and it is hard to track all of them.

Simplify Your Tech Stack: Use PostgreSQL for Everything – Two widespread tensions, especially in startups, are build vs. buy conflicts and using specialized products or technologies (e.g., different databases) that are top of the breed but not many people can use and maintain vs. more common technology that more people can maintain but can have performance drawbacks or other limitations. Mainly working in startups, I usually prefer to use standard technology to run faster, knowing that the product, focus, and priorities often change. With that being said, I acknowledge that early adoption of new technologies can be life-changing for a startup, but figuring out what to bet on is hard.

https://medium.com/timescale/simplify-your-tech-stack-use-postgresql-for-everything-f77c96026595

CDK Monitoring Constructs – if you are using AWS CDK as your IAC tool, CDK monitoring constructs enable you to create cloudwatch alarms and dashboards almost out of the box. I wish they would release and add additional options at a faster pace.

https://pypi.org/project/cdk-monitoring-constructs/

Better Plotly Bar Chart

I’m reading “storytelling with data” by Cole Nussbaumer Knaflic. I plan to write my thoughts and insights from the book once I finish it. For now, I wanted to play with it a bit and create a better bar chart visualization that –

  1. Highlight the category you find most important and assign a special color to it (i.e prominent_color in the code), while the remaining categories used the same color (i.e. latent_color)
  2. Remove grids and make the background and paper colors the same to remove cognitive load.

The implementation is flexible, so if you feel like changing one of the settings(i.e., show the grid lines or center the title) you can pass it via keyword arguments when calling the function.


from typing import Any
import pandas as pd
import plotly.graph_objects as go
import pandas as pd
def barchart(
df: pd.DataFrame, x_col: str, y_col: str,
title: str | None = None,
latent_color : str = 'gray',
prominent_color: str = 'orange',
prominent_value: Any | None = None,
**kwargs: dict,
) -> go.Figure:
"""_summary_
Args:
df (pd.DataFrame): Dataframe to plot
x_col (str): Name of x coloumn
y_col (str): Name of y coloumn
title (str | None, optional): Chart title. Defaults to None.
latent_color (str, optional): Color to use for the values we don't want to highlight. Defaults to 'gray'.
prominent_color (str, optional): Color to use for the value we want to highlight. Defaults to 'orange'.
prominent_value (Any | None, optional): Value of the category we want to highlight. Defaults to None.
Returns:
go.Figure: Plotly figure object
"""
colors = (df[x_col] == prominent_value).replace(False, latent_color).replace(True, prominent_color).to_list()
fig = go.Figure(data=[
go.Bar(
x=df[x_col],
y=df[y_col],
marker_color=colors
)],
layout=go.Layout(
title=title,
xaxis=dict(title=x_col, showgrid=False),
yaxis=dict(title=y_col, showgrid=False),
plot_bgcolor='white',
paper_bgcolor='white'
)
)
fig.update_layout(**kwargs)
return fig
if __name__ == "__main__":
data = {'categories': ['A', 'B', 'C', 'D', 'E'],
'values': [23, 45, 56, 78, 90]}
df = pd.DataFrame(data)
fig = barchart(df, 'categories', 'values', prominent_value='C', title='My Chart', yaxis_showgrid=True)
fig.show()

📚 Book club Q2 2024 – 3 Reading recommendations

This quarter, I read 3 books that relate to France and the resistance during the 2nd world War –

  • Code Name Hélène by Ariel Lawhon. This thriller is based on Nancy Wake’s story. It also reminded me of “Agent Sonya” by Ben Macintyre, which I read a while ago and enjoyed.
  • The Paris Architect by Charles Belfoure – is a touching story with lots of imagination but I was a bit disappointed it was not based on a real story.
  • The Dressmaker’s Secret by Rosalie Ham – this book tells the story of Coco Chanel during WWII from the prism of her personal assistant or more accurately from the granddaughter of her personal assistant who unpacks the story. I didn’t know about her relationships with the Nazis, and it was interesting to learn about them.

I also listened to “Setting the Table” by Danny Meyer, and it was very insightful, it even made me publish a LinkedIn post.

5 interesting things (31/05/2024)

How we built Text-to-SQL at Pinterest – Text-to-SQL and vice versa became one of the canonical examples of LLM, and every product needs one. The post described a very interesting work that can be implemented relatively easily. I relate the most to the closing paragraph, which emphasizes the gap between demos, tutorials, benchmarks, and real-world use cases. – “It would be helpful for applied researchers to produce more realistic benchmarks which include a larger amount of denormalized tables and treat table search as a core part of the problem.”

https://medium.com/pinterest-engineering/how-we-built-text-to-sql-at-pinterest-30bad30dabff

(p.s I mentioned post in a recent LinkedIn post – LLMs in the enterprise – looking beyond the hype on what’s possible today)

How an empty S3 bucket can make your AWS bill explode – this story completely blew my mind (and gladly not my account). I was happy to see that AWS is looking into this issue and wondered if in bigger accounts, such anomalies could get unnoticed.

https://medium.com/@maciej.pocwierz/how-an-empty-s3-bucket-can-make-your-aws-bill-explode-934a383cb8b1

The Design Philosophy of Great Tables –  great_tables is a Python package for creating wonderful-looking tables. This post shares its visual design philosophy and is worth reading if you create tables even if you will not use this package.

https://posit-dev.github.io/great-tables/blog/design-philosophy/

1-measure-3-1 – a variation of the 1-3-1 problem-solving method for making proposals. I found it specifically effective for engineers as it is structured and focused.

https://www.annashipman.co.uk/jfdi/1-measure-3-1.html

On Making Mistakes — I love it when people combine experience or knowledge in one field or domain with another. For example, someone brings her experience as a soccer player to managing a team, or someone uses lessons he learned as a supermarket cashier to software architecture. This post discusses making mistakes and working through them and refers to several domains, including improv, chess, and F1 team management.

https://read.perspectiveship.com/p/on-making-mistakes

📚 Book club Q1 2024 – 3 Reading recommendations

This year, I track my book reading for the first time. I don’t know if I’ll keep doing it after this year or if I’ll last until the end of the year, but for now, I’m all in. This helps me reflect on my reading and remember the books I enjoyed.

Fundable: Why Some Entrepreneurs Get Funded, And Others Do Not! by Sephi Shapira

This is the best value for my time in a long time. The book is filled with concrete and practical advice, which I immediately found myself using.

Beartown by Fredrik Backman –

That’s the perfect book for me—it has lots of sports and a human story, some gender tension, and two more books in this series (Us Against You and Winners). 

STFU: The Power of Keeping Your Mouth Shut in an Endlessly Noisy World by Dan Lyons

For long parts of this book, it was a paradox to me – why write 200+ pages to say we should all shut up – just do it.

I listened to this book after also listening to Disrupted by Lyons. I like his style and narration. When listening to Disrupted, I thought that it probably damaged his employability, which turned out to be true, as he discussed in STFU.

The book can sometimes be extreme or refer to an extreme crowd (50 out of 50 in the Talkaholics test). However, I liked the book because it discusses many aspects of our lives – friends and family, work, etc.- and a few things I can immediately adapt. For example, I lowered my cell phone usage near my kids and hope to stick with it.

In one of the last chapters, he mentions Never Split the Difference by Chris Voss and Tahl Raz, which I also recently listened to. I listened to Never Split the Difference after a friend told me that she was starting to reread this book, and after listening to it, I completely understood why she wanted to reiterate it.

I read the book to improve my negotiation skills. I’m not sure there is an immediate effect, but as the two books pointed out – I talk less and actively listen more. So I try to pause before I answer, be succinct and hum, and let the other person talk.

5 interesting things (08/03/2024)

(Almost) Every infrastructure decision I endorse or regret after 4 years running infrastructure at a startup – in my current role as a CTO of an early-stage startup, I make many choices about tools, programming languages, architecture, vendors, etc. This retrospective view was fascinating not only for the tools themselves but also for the arguments.

https://cep.dev/posts/every-infrastructure-decision-i-endorse-or-regret-after-4-years-running-infrastructure-at-a-startup/

Everything You Can Do with Python’s textwrap Module – I have used Python for more than 10 years and never heard of textwrap model. Maybe you, too, haven’t heard of it.

https://towardsdatascience.com/everything-you-can-do-with-pythons-textwrap-module-0d82c377a4c8

It was never about LLM performance – I couldn’t agree more. The performance gaps between different LLMs are becoming neglectable. Now, it is about the experience you build using those models and the guardrails you put in to ensure the experience.

https://read.technically.dev/p/it-was-never-about-llm-performance

How to build an enterprise LLM application: Lessons from GitHub Copilot – the post ends with a summary of 3 key takeaways – 

  • Identify a focused problem and thoughtfully discern an AI’s use cases.
  • Integrate experimentation and tight feedback loops into the design process
  • As you scale, continue to leverage user feedback and prioritize user needs

Those takeaways are general and correct for almost every product launch I can think of. The post provides more concrete tips for LLM applications. It is interesting to read about a product on such a scale that I use it on a daily basis.

https://github.blog/2023-09-06-how-to-build-an-enterprise-llm-application-lessons-from-github-copilot/

Speaking for Hackers – public speaking is hard. From choosing a topic, submitting a CFP, preparing your talk and slides, and wrapping it all up. Every step can be tricky, and each of us has other things that are harder for us. This site provides excellent materials for all the parts before, during, and after the talk, making it easier to step out of our shells and share the knowledge.

https://sfhbook.netlify.app/

5 interesting things (09/02/2024)

Closing the women’s health gap: A $1 trillion opportunity to improve lives and economies – a McKinsey report that highlights the gender health gap and points to the opportunity – potential for a $1 trillion economic gain with additional societal impact. One interesting point is that there are gaps and flaws throughout the value chain – drug effectiveness, therapy access, research functions, etc. This hints that there are many opportunities out there that can make a significant impact.

https://www.mckinsey.com/mhi/our-insights/closing-the-womens-health-gap-a-1-trillion-dollar-opportunity-to-improve-lives-and-economies

Slashing Data Transfer Costs in AWS by 99% – one of the costs developers often forget or dismiss when considering architecture is the cost of data transfer. The solution described in this post is elegant and demonstrates the effect of deep knowledge and understanding of the domain. Simple to trivial architectural decisions can cost so much.

https://www.bitsand.cloud/posts/slashing-data-transfer-costs

3 questions that will make you a phenomenal rubber duck – I previously mentioned that debugging skills are essential, and it is important to iterate and refine them. I especially liked the 3rd question – “If your hypothesis were wrong, how could we disprove it?” as it forces one to think the other way around and see a slightly bigger picture.

https://blog.danslimmon.com/2024/01/18/3-questions-that-will-make-you-a-phenomenal-rubber-duck

Product Managing to Prevent Burnout – burnout is more common than we think and can have many causes. Moreover, different people would react differently to different cultures and would burn out or not burn out accordingly. The most important takeaway is that managing and controlling burnout is a team sport; it is not only the concern of the direct manager, but product managers can also participate in this effort. (I strongly recommend the honeycomb blog)

https://www.honeycomb.io/blog/product-managing-prevent-burnout

The “errors” that mean you’re doing it right – I was able to identify or witness almost all the errors mentioned in the post. I also think some of those errors, such as Letting someone go soon after hiring, Pivoting a strategy just after creating it, etc, could be attributed to the sunk cost fallacy. And if we want to make the opening sentence more extreme – “If you don’t make mistakes, you’re not working”.

https://longform.asmartbear.com/good-problems-to-have

5 interesting things (15/01/2024)

SQL as API – I saw several efforts to expose RDBMS as API over the years. This post suggests another engel – exposing an API that accepts SQL. Consider this a brain teaser.

https://valentin.willscher.de/posts/sql-api/

SomeEstimates –  For me, the loss of trust described in the post is the most harassing implication of a culture where estimates are often missed –

“Another negative outcome is a loss of trust between developers and management since a constant sense of urgency is tantamount to no sense of urgency at all.”

https://www.shaiyallin.com/post/someestimates

How to Make Anthropic’s Claude Models Consistently Generate Valid JSON – Gettign valid and consistent JSON from LLM is an issue. Prompt engineering, as described in this post, can solve some of those issues; the json_repair package mentioned there can solve additional problems. With the GPT store announced this week and the evolving models, I believe this will be solved soon in one way or another.

https://levelup.gitconnected.com/how-to-make-anthropics-claude-models-consistently-generate-valid-json-d74ce037ca46

Bonus – https://github.com/mangiucugna/json_repair

My PostgreSQL wishlist – Another brain teaser. The items I most relate to are having created_at and updated_at columns created and maintained automatically and being immutable. I’m curious to follow the comments on this post.

https://ryanguill.com/postgresql/sql/2024/01/08/postgresql-wishlist.html

Everyday storytelling for engineers. The CAO Method – Although storytelling has become an overused buzzword in the last few years (I thought it was already over the hill). This post is important not due to the specific method but to the recognition that ICs practice storytelling every day, and mastering this skill can affect your promotion, career path, tasks you get, etc. 

https://tonyfreed.substack.com/p/everyday-storytelling-for-engineers

5 interesting things (13/12/2023)

Engineering Team Lessons from Cycling – having a background in team sports (Rugby) and individual sports (running), I enjoy such posts that bring experience from one domain to another.

https://benjiweber.co.uk/blog/2023/10/15/engineering-team-lessons-from-cycling/

How to (and how not to) design REST APIs – although I read several posts about REST API best practices, I found this post very insightful, reasoned, and with great examples.

https://github.com/stickfigure/blog/wiki/how-to-(and-how-not-to)-design-rest-apis

Handling a Regional Outage: Comparing the Response From AWS, Azure and GCP – luckily for the post author, all the major cloud services had regional outages in the last while, so he can compare their responses. This will not tip the scales when choosing a cloud provider but will let you know what to expect. It is also an interesting thought on handling outages as a provider.

https://open.substack.com/pub/pragmaticengineer/p/handling-a-regional-outage-comparing

Python Errors As Values – it is sometimes tough to move from one technology to another – being a newbie all over again, thinking differently, adapting to a new ecosystem, etc. It also makes you ponder concepts that were previously perceived as obvious. For example, the approach for errors in Python. Without spoilers – there is an elegant Pythonic way to implement it.

https://www.inngest.com/blog/python-errors-as-values

croniter – this is a cron utilities package. For example, it helps you find the next time a cronjob should be called given a datetime object. It can also find the previous iteration, validate a cron expression, test if a datetime matches a cron condition, etc.

https://github.com/kiorky/croniter