6 Thoughts on Smart Brevity

Another one bites the dust – I just finished hearing “Smart Brevity: The Power of Saying More with Less” by Jim VandeHei, Mike Allen, and Roy Schwartz. One of my chief complaints about engineering training is the dismissal of communication skills – speaking, reading, and writing.

Effective communication for developers is everywhere –

  • Being succinct and accurate in daily meetings
  • Opening clear tickets so others can understand, reproduce, and prioritize
  • Effectively communicate their work, suggestions, etc.

Here are some thoughts I had while reading the book –

  1. Using LLMs – chatGPT, along with other tools such as Grammarly, Wordtune, and so on, can help you tune the tone of your passage and adjust it to your audience. The book recommends asking a friend for feedback, but in 2023, we can first ask chatGPT.
  2. CV – the book suggests many scenarios for using smart brevity – social media, presentations, etc. I would also like to suggest writing a CV. Many CVs start with a short paragraph about the person. You can use this paragraph wisely.
  3. Spell out your takeaways – I understand the rationale while struggling with this advice. This is a rhetoric trick – tell your audience what is the one thing you want them to take from this talk instead of leaving it open.
  4. Amazon 6 pagers – the book starts with saying like “words overdose” and “words addicts”. Stating that since we moved to the web instead of print, the number of words is unlimited, and nobody reads them besides the headlines. Then, they refer to Amazon 6 pagers as a good example of a compact way of transferring a message. Six pages is a lot, and there is probably a way to reduce the length.
  5. Culture – I find this book very American-centric and complains that people need to be more direct. This is, of course, relative to the Israeli culture I come from. I strongly recommend reading “The Culture Map”  by Erin Meyer on this topic.
  6. Content structure – one of the repeating recommendations is to use bullet points to draw attention and help the audience focus. Additional recommendations include using bold fonts to emphasize important points, combining graphs and charts, etc. As a reader, I get this completely. As a writer who tries to distill her words, I want my words to stand out without using those tricks.

5 thoughts on Working Backwards

I had a long day of walking around and waiting a lot, so it was a good chance to listen “Working Backwards” by Colin Bryar and Bill Carr. It was more insightful than I anticipated. Here are my thoughts about it – 

  1. Corporates vs. startups – whenever I read or hear about best practices and success stories of big organizations (Amazon, Netflix, etc.), I wonder what I can adapt and what I can use for a small startup. Two things sparked my mind –
    • Hiring for diversity – the book mentioned that they cared for diversity, specifically concerning gender, from the early days of Amazon. They say that one of the groups noticed that they had some bias in their hiring process, specifically in CV screening, and wanted to improve it while not changing the bar. To eliminate biases that arise when reading women’s CVs, they modified the hiring process so that every woman who applied to the position would go through a phone interview.
    • Diving into the details – in startups, there are always fires to put away, new requests, and urgent tasks. I can improve on diving into the details, and that can put some of the fires away.
  2. “Be stubborn on the vision but flexible on the details.” – I love this quote, and I think it is one of the enablers for Amazon’s innovative culture.
  3. Blackberry inspiration – the authors mention that when designing Kindle, they were inspired by Blackberry, which was innovative for the time being as it was always synced and available. As in the examples in the Jobs to Be Done book – while the inspiration stands in place, the example is a bit outdated. Maybe I should read this next.
  4. 6 pagers – one of the justifications brought in the book for moving from PowerPoint to six pagers is that it eliminates the differences between a skilled marketing person, a junior developer, and an experienced VP and makes it more equal. I disagree with this argument. It might decrease the dependency on presentation skills, but I still think that there is a gap and experience matters even in writing – choosing the right wording and style and surfacing the relevant doubts, concerns, or benefits of the relevant stakeholders and audience. Writing is also an art, and writing a concise passage is not an easy task. 
  5. WatchNow subscription – when talking about Prime Videos, they mentioned Netflix’s early days when it was called WatchNow, and they offered a subscription to whoever had a DVD subscription. This resonates with jobs to be done advice regarding obstacles to adaptation –  offer the customers a way to try the product (freemium, limited trial, etc) before they buy it.
  6. AWS – as they say, the origin story and other stories about AWS can fill a book independently. I look forward to such a book. Having said that, they talk about the pricing of S3 and whether this should be a subscription or usage-based, what should count for usage – storage size, API calls, etc., and how the pricing was changed once they better understood the usage patterns. This is a great anecdote that even if you walk backward and prepare the PR and Q&A before you develop the product, you will learn new things when users start using it.
  7. 2 pizza teams legend – there are a few concepts associated with Amazon, such as the “2 pizza teams” which, after reading the book, I find it is highly misunderstood or very freely interpreted. On the other hand, I didn’t hear many companies or many people discussing the “Working Backwards” process, which I find far more interesting, and I wonder why one concept is so popular while another one stays in the shade.

5 thoughts on Jobs to Be Done

After it was mentioned in “Second Brain” and was waiting in my reading list for a long time, I finally listen to “Jobs to Be Done: A Roadmap for Customer-Centered Innovation” by Stephen Wunker, Jessica Wattman, and David Farber. Here are my thoughts about it –

1. Examples relevancy – The book was published in 2016 are brings several examples – slack, snapchat, etc. Are those examples still relevant? Do they still emphasize the relevant points? 

This problem is not unique to this book. One often finds an anecdote and uses it to emphasize or justify a theory. As time passes, the perspective also changes, and it sometimes differs from the theory.

2. Multiple stakeholders – B2B purchasing procedures frequently involve a varied group of stakeholders, each with their unique jobs to be done. This point of view is often not thought of and has great implications for how sales should done.

3. Emotional and Social Components – jobs to be done are not only functional tasks such as getting from here to there, wearing something, etc. Jobs to be done also have emotional and social components that should be addressed.

4. Obstacles to use and obstacles to adoption – Obstacles to adoption are challenges that restrict a consumer’s inclination to purchase a product or service. Facilitating the ease with which people can learn about and experiment with your new offering can diminish obstacles to adoption. Obstacles to use refer to impediments that hinder success, ultimately reducing a customer’s probability of ongoing product usage, acquiring supplementary features, or upgrading to more recent editions. I love this insight, and it is an important distinction, especially when one needs to prioritize.

5. Effective Brainstorming – brainstorming is discussed in long in one of the chapters. Personally, I have many doubts about group brainstorming as a way to encourage creativity. If there is one thing to take from this discussion, such a session should be well-mederated. See more here

5 thoughts on Building a Second Brain

“a wealth of information creates a poverty of attention and a need to allocate that attention efficiently among the overabundance of information sources that might consume it.”

Herbert A. Simon

The quote above appears in “Building a Second Brain” by Tiago Forte, which I finished reading last week. I strongly relate to it, and that book helps me reflect on my personal knowledge management.  

TL;DR – find a personal knowledge management method that works for you (e.g., the PARA method). It does not have to be perfect. You don’t have to shift everything. Just get it started and adjust as you go. 

Here are a few thoughts I had while reading the book –

1. Progressive Summarization – the progressive summarization technique reminded me of a joke my uncle told me a while back – a student in his first semester asked the lecturer how to prepare best for the exam. She tells him – after every class, summarize your class notes. At the end of each week, summarize all the daily summaries. At the end of each month, summarize the weekly summaries and so on. On the week before the exam, summarize the summary from the day before. They meet just before the exam, and she asks him how it went. He answers – “great, I was able to summarize everything into one word – bullshit.”

2. Divergence and Convergence – In my first or second semester in the university, I took a class on academic writing. They told us that a good academic essay is built like an hourglass. It starts with a very wide question or statement, then narrows down to a specific claim or private case, and finishes with the broader picture, zoom out, etc. See more here. Divergence and convergence are the same. You start very scattered, then connect the dots, focus, reach some advancement, and repeat.

3. Hemingway’s bridges – “The “Hemingway Bridge” is a technique used by author Ernest Hemingway in which he would stop his writing for the day only AFTER he knew what was coming next.” (here). Each of us has its own hooks that help him or her restart the next time. A few years ago, I read Hila Noga’s post about getting your programming flow going, and it is Hemingway’s bridge for developers.

4. Blog as an interface – I initially created the blog so it would be easier for me to search for links I once saw and to share with other people. One can view it as some interface to my second brain. I am still in the process of thinking about which methods are right for me to adapt from the book.

5. Listening to an e-book – I’m a big fan of highlighting and writing comments in books, papers, etc. The audiobook format is challenging for me in this aspect, and moreover, I usually listen to an e-book while doing other things like walking or driving, which misses some of the second brain practice. I still need to figure out how to tackle this. On the other hand, I use writing and notes of all kinds to unload my brain and as an easier way to access them in the future. I was very happy that the topic of offloading was widely discussed in the book.

Team Health Check

Today, I heard Dafna Rosenblum’s (see Dafna’s blog here) talk on the EMIL (Engineering Manager IL) meetup about “Team Health Check”. It was the first time I heard about the concept, so I read more about it.

Spotify developed the team health check concept and introduced it in 2014 (here).  About 6 months ago, Spotify published a new post about “Getting More from Your Team Health Checks”. The post focuses on improving the team experience in this workshop and suggests the following main ideas –

  • Customize wisely – tailor the right questions and health checks that fit the team and the organization.
  • Dig Deep – Good facilitation is essential to enable more profound conversations. Good facilitation should help to step outside the team’s day-to-day communication patterns and create a psychologically safe place (see here) to raise issues. The post suggests a few strategies on how to do it.
  • Follow through – you should follow up and reiterate the topics that were raised in the team health check workshop. That can be in 1-1 meetings, scheduling required meetings, checking your priority or attention, etc.

I found a few online tools to help facilitate team health checks – 

  • teamhealthcheck.io – “A free, anonymous, and super simple tool to run a version of the Spotify Health Check Survey. “
  • Miro boards – There are multiple miro templates for team health checks – e.g., here, here, and here.

Another tip from Dafna to start meeting (and every meeting) is to start with a quick intake to break the ice and increase engagement. For example, ask the member to describe their mood using an emoji.

5 interesting things (04/09/2023)

12 Debugging tools I wish I knew earlier 🔨 –  it describes more debugging strategies than debugging tools (i.e. minimal reproduction is not a tool). One strategy I missed in this post is adding breakpoints. If I were to write this post, I would order it in an escalation order. For example, reading the error message would be in a higher place. However, it is an important post, especially for junior developers. 

https://careercutler.substack.com/p/12-debugging-tools-i-wish-i-knew

Consistency Patterns – This post explains the different common consistency patterns – strong, eventual consistency, and weak consistency and the trade-offs. It also mentions the idea of causal consistency, which I find very interesting.

https://systemdesign.one/consistency-patterns/

Remote work requires communicating more, less frequently – he had me at “Think of it like gzip compression, but for human-to-human communication. Yes, there’s slightly more processing overhead at the start, but it allows greater communications throughput using fewer “packets” (communicate more using less)”. Seriously, once your organization grows above ten people and you start having clients, you will have people remote (colleagues or clients), and you will have to optimize your communication to pass your message.

https://ben.balter.com/2023/08/04/remote-work-communicate-more-with-less/

Git log customization – I’m setting a new computer now for development and looking for a format that would be easy for me to use so this post came exactly on time

https://www.justinjoyce.dev/customizing-git-log-format/

Structuring your Infrastructure as Code – I like the layers approach of this post and the examples from all 3 public cloud providers. I would like to give more thought to the exact layers and order. Note that this post is written by Pulumi, a solution engineer, so it might not work well with other IaC tools.

https://leebriggs.co.uk/blog/2023/08/17/structuring-iac

5 interesting things (27/07/2023)

Designing Age-Inclusive Products: Guidelines And Best Practices – I have a 91-year-old grandmother who, in the last 10 years, cannot book a doctor’s appointment herself as she does not use a smartphone and cannot follow voice navigation. Even without a personal perspective, I am very interested in accessibility, and I try to pay attention to inclusivity and accessibility topics wherever relevant. However, I always wonder if those are general best practices or are limited to specific cohorts. Specifically, in this case, younger people usually have more technology literacy than older people and therefore can achieve their goals with less optimized flows and UI.

https://www.smashingmagazine.com/2023/07/designing-age-inclusive-products-guidelines-best-practices/

On Becoming VP of Engineering – A two-part blog post series by Emily Nakashima, Honeycomb’s first VP of Engineering. The first part focuses on her path – coming originally from design, frontend, and product engineering and becoming VP of Engineering that also manages the backend and infrastructure. 

The second part talks about the day-to-day work and the shift in focus when moving from a director position to a VP position. I strongly agree with her saying, “Alignment is your most important deliverable,” and also think it is one of the hardest things to achieve.

https://www.honeycomb.io/blog/becoming-vp-of-engineering-pt1

https://www.honeycomb.io/blog/becoming-vp-of-engineering-pt2

Project Management for Software Engineers – “This article is a collection of techniques I’ve learned for managing projects over time, that attempts to combine agile best practices with project management best practices.”. While a degree in computer science teaches lots of algorithms, software development, and so on, it does not teach project management and time management. Those skills are usually not required in junior positions but can help you have a more significant impact. Having said that, one should find the exact practices that fit him or her and that can evolve over time.

https://sookocheff.com/post/engineering-management/project-management-for-software-engineers/

Designing Pythonic library APIs – A while ago (2 years+-), I looked for a post/tutorial / etc. regarding designing SDK best practices and could not find something I was happy with. I like the examples (both good and bad examples) in this post. If you are in a hurry, all the take aways are summarized in the end (but sometimes hard to understand without context).

https://benhoyt.com/writings/python-api-design/

Fern – “Fern is an open source toolkit for designing, building, and consuming REST APIs. With Fern, you can generate client libraries, API documentation, and boilerplate for your backend server.”. I haven’t tried it myself yet, but if it works, it seems like cookie-cutter on steroids. In the era of LLMs, the next step is to generate all of those from free text.

https://github.com/fern-api/fern

5 interesting things (06/07/2023)

Potential impacts of Large Language Models on Engineering Management – this post is an essential starter for a discussion, and I can think of other impacts. For example – how interviewing \ assessing skills of new team members affected by LLMs? What skills should be evaluated those days (focusing on engineering positions)?

One general caveat for using LLMs is completely trusting them without any doubts. This is crucial for a performance review.  Compared to code, if the code does not work, it is easy to trace and fix. If the performance review needs to be corrected, it might be hard to pinpoint what and where it got wrong, and the person getting it might need more confidence to say something.

https://www.engstuff.dev/p/potential-impacts-of-large-language

FastAPI best practices – one of the most reasoned and detailed guides I read. Also, the issues serve as comments to this guide and are worth reading. Ideally, I would like to take most of the ideas and turn them into a cookie-cutter project that is easy to create. 

https://github.com/zhanymkanov/fastapi-best-practices

How Product Strategy Fails in the Real World — What to Avoid When Building Highly-Technical Products – I saw all in action and hope to do better in the future.

https://review.firstround.com/how-product-strategy-fails-in-the-real-world-what-to-avoid-when-building-highly-technical-products

1 dataset 100 visualizations – I imagine this project as an assignment in a data visualization/data journalism course.  Yes, there are many ways to display data. Are they all good? Do they convey the desired message?

There is a risk in being too creative, and there is some visualization there I cannot imagine using for anything reasonable.

https://100.datavizproject.com/

Automating Python code quality – one additional advantage of using tools like Black, isort, etc., is that it reduces the cognitive load when doing a code review. The code reviewer should no longer check for style issues and can focus on deeper issues.

https://blog.fidelramos.net/software/python-code-quality

Bonus – more extensive pre-commit template – 

https://github.com/br3ndonland/template-python/blob/main/.pre-commit-config.yaml

Did you Miss me? PyCon IL 2023

Today I talked about working with missing data at PyCon IL. We started with a bit of theory about mechanisms of missing data –

  • MCAR – The fact that the data are missing is independent of the observed and unobserved data.
  • MAR – The fact that the data are missing is systematically related to the observed but not the unobserved data.
  • MNAR – The fact that the data are missing is systematically related to the unobserved data.

And deep-dived into an almost real-world example that utilizes the Python ecosystem – pandas, scikit-learn, and missingno.

My slides are available here and my code is here.

3 related posts I wrote about working with missing data in Python –

Pandas fillna vs scikit-learn SimpleImputer

Missing data is prevalent in real-world data and can be missing for various reasons. Gladly, both pandas and scikit-learn several imputation tools to deal with it. Pandas offers a basic yet powerful interface for univariate imputations using fillna and more advanced functionality using interpolate. scikit-learn offers both SimpleImputer for univariate imputations and KNNImputer and IterativeImputer for multivariate imputations. In this post, we will focus on fillna and SimpleImputer functionality and compare them.

Basic Functionality

SimpleImputer offers four strategies to fill in the nan values – mean, median, most_frequet, and constant.

import numpy as np
import pandas as pd
from sklearn.impute import SimpleImputer

df = pd.DataFrame(
    [[7, 2, np.nan], [4, np.nan, 6], [10, 5, 9]])
imp_mean = SimpleImputer(strategy='mean')
pd.DataFrame(imp_mean.fit_transform(df))

output –

      0    1    2
0   7.0  2.0  7.5
1   4.0  3.5  6.0
2  10.0  5.0  9.0

Can we achieve the same with pandas? Yes!

df.fillna(df.mean())

Want to impute with the most frequent value?

Asuume – df = pd.DataFrame(['a', 'a', 'b', np.nan])

With SimpleImputer

imp_mode = SimpleImputer(
    strategy='most_frequent')
pd.DataFrame(
  
  imp_mode.fit_transform(df))

With fillna

df.fillna(df.mode()[0])

And the output of both –

   0
0  a
1  a
2  b
3  a

Different Strategies

Want to apply different strategies for different columns? using scikit-learn you will need several imputers, one per each strategy. Using fillna you can pass a dictionary, for example –

df = pd.DataFrame(
    [[7, 2, np.nan], [4, np.nan, 6], [10, 5, 9]])
df.fillna({1: 10000, 2: df[2].mean()})
    0        1    2
0   7      2.0  7.5
1   4  10000.0  6.0
2  10      5.0  9.0

Advanced Usage

Want to impute values drawn from a normal distribution, no brainer –

mean = 5
scale = 2
df = pd.DataFrame(
    [[7, 2, np.nan], [4, np.nan, 6], [10, 5, 9]])
df.fillna(
    pd.DataFrame(
        (np.random.normal(mean, scale, df.shape))
    0         1         2
0   7  2.000000  3.857513
1   4  5.407452  6.000000
2  10  5.000000  9.000000

Missing indicator

Using SimpleImputer, one can add indicator columns that obtain 1 if the original column was missing, and 0 otherwise. This can also be done using MissingIndicator

df = pd.DataFrame(
    [[7, 2, np.nan], [4, np.nan, 6], [10, 5, 9]])
mean_imp = SimpleImputer(strategy='mean', add_indicator=True)
mean_imp.fit_transform(df)
pd.DataFrame(mean_imp.fit_transform(df))
      0    1    2    3    4
0   7.0  2.0  7.5  0.0  1.0
1   4.0  3.5  6.0  1.0  0.0
2  10.0  5.0  9.0  0.0  0.0

Note that a missing column (i.e., columns 3 and 4 in the example above) corresponds only to columns with missing values. Therefore there is no missing indicator column corresponding to the column 0. If you are converting back and forth to pandas dataframes you should note this nuance.

Another nuance to note when working with SimpleImputer is that columns that contain only missing values are dropped by default –

df =  pd.DataFrame(
    [[7, 2, np.nan, np.nan], [4, np.nan, 6, np.nan],
    [10, 5, 9, np.nan]])
mean_imp = SimpleImputer(strategy='mean')
pd.DataFrame(mean_imp.fit_transform(df))
      0    1    2
0   7.0  2.0  7.5
1   4.0  3.5  6.0
2  10.0  5.0  9.0

This behavior is controllable using setting keep_empty_features=True. While it is manageable, tracing columns might be challenging –

mean_imp = SimpleImputer(
    strategy='mean',
    keep_empty_features=True,
    add_indicator=True)
pd.DataFrame(mean_imp.fit_transform(df))
      0    1    2    3    4    5    6
0   7.0  2.0  7.5  0.0  0.0  1.0  1.0
1   4.0  3.5  6.0  0.0  1.0  0.0  1.0
2  10.0  5.0  9.0  0.0  0.0  0.0  1.0

There is an elegant way to achieve similar behavior in pandas –

df = pd.DataFrame(
    [[7, 2, np.nan, np.nan], [4, np.nan, 6, np.nan],
     [10, 5, 9, np.nan]])
pd.concat(
    [df.fillna(df.mean()), 
     df.isnull().astype(int).add_suffix("_ind")], axis=1)
    0    1    2   3  0_ind  1_ind  2_ind  3_ind
0   7  2.0  7.5 NaN      0      0      1      1
1   4  3.5  6.0 NaN      0      1      0      1
2  10  5.0  9.0 NaN      0      0      0      1

Working with dates

Want to work with dates and fill several columns with different types? No problem with pandas –

df = pd.DataFrame(
    {"date": [
        datetime(2023, 6, 20), np.nan,
        datetime(2023, 6, 18), datetime(2023, 6, 16)],
     "values": [np.nan, 1, 3, np.nan]})
df.fillna(df.mean())

Before –

        date  values
0 2023-06-20     NaN
1        NaT     1.0
2 2023-06-18     3.0
3 2023-06-16     NaN

After –

        date  values
0 2023-06-20     2.0
1 2023-06-18     1.0
2 2023-06-18     3.0
3 2023-06-16     2.0

Working with dates is an advantage that fillna has over SimpleImputer.

Backward and forward filling

So far, we treated the records and their order as independent. That is, we could have shuffled the records and that would not affect the expected imputed value. However, there are cases, for example, when representing time series when the order matters and we would like to impute based on later values (backfill) or earlier values (forward fill). This is done by setting the method property.

df = pd.DataFrame(
    [[7, 2, np.nan], [4, np.nan, 6],
     [10, np.nan, 9], [np.nan, 5, 10]])
df.fillna(method='bfill')
      0    1     2
0   7.0  2.0   6.0
1   4.0  5.0   6.0
2  10.0  5.0   9.0
3   NaN  5.0  10.0

One can also limit the number of consecutive values which are imputed –

df.fillna(method='bfill', limit=1)
      0    1     2
0   7.0  2.0   6.0
1   4.0  NaN   6.0
2  10.0  5.0   9.0
3   NaN  5.0  10.0

Note that when using bfill or ffill and moreover, when specifying limit to value other than None it is possible that not all the values would be imputed.

For me, that’s a killer feature of fillna comparing to SimpleImputer

Treat Infinite values as na

Setting pd.options.mode.use_inf_as_na = True will treat infinite values (i.e. np.inf, np.INF, np.NINF) values as missing values, for example –

df = pd.DataFrame([1, 2, np.inf, np.nan])
df.fillna(1000)

pd.options.mode.use_inf_as_na = False

     0
0  1.0
1  2.0
2  inf
3  1000.0

pd.options.mode.use_inf_as_na = True

     0
0  1.0
1  2.0
2  1000.0
3  1000.0

Note that inf and na are not treated the same for other use cases, e.g. – df[0].value_counts(dropna=False)

0
1.0    1
2.0    1
NaN    1
NaN    1

Summary

Both pandas and scikit-learn offer a basic functionality to deal with missing values. Assuming you are working with pandas Dataframe, pandas fillna functionality can achieve everything SimpleImputer can do and more – working with dates, back and forward fill, etc. Additionally, there are some edge cases and specific behaviors to pay attention to when choosing what to use. For example when using bfill or ffill method some values may not be imputed if there are the last ones or first ones respectively.