Things I learned today (20/07/2021)

Delay queues let you postpone the delivery of new messages to a queue for a number of seconds

AWS documentation

This means that all the messages which are pushed to this queue would be visible to the consumer after the delay period. The minimum delay which is also the default delay is 0 and the maximum is 15 minutes.

Note that when changing the delay of a queue the behaviour of FIFO queues and standard queues is different – 


For standard queues, the per-queue delay setting is not retroactive—changing the setting doesn’t affect the delay of messages already in the queue.

For FIFO queues, the per-queue delay setting is retroactive—changing the setting affects the delay of messages already in the queue.

AWS documentation

If you need to delay the visibility of specific messages and not all messages in the queue you can use message timers and add an initial invisibility period for a message. This is only supported by standard queues.Note that setting a message timer for individual messages overrides the delay period of the delay queue.

See the image below to understand message timeline in a queue –

See more here –
https://docs.aws.amazon.com/AWSSimpleQueueService/latest/SQSDeveloperGuide/sqs-delay-queues.html
https://docs.aws.amazon.com/AWSSimpleQueueService/latest/SQSDeveloperGuide/sqs-message-timers.html

Things I learned today (19/07/2021)

[AWS] independently map Availability Zones too names for each account

AWS documentation


This means that eu-west-1a in my account is not necessarily the same as eu-west-1a in your account.

Why does this matter? for example if you want to share subnets across accounts. Or maybe you want to ensure that services in different accounts are not in the same availability zone.

So how can you achieve this? use availability zone ids which are unique and consistent identifiers for availability zones.

See more here – https://docs.aws.amazon.com/ram/latest/userguide/working-with-az-ids.html

5 interesting things (02/07/2021)

Conducting a Successful Onboarding Plan and Onboarding Process – I believe that onboarding is important for the entire employment period. It helps setting expectations, getting to the code and being meaningful faster and assure both sides they made the right choice (and if not know it in an early stage). One thing I miss in this plan is the social part which I think is also important – having lunch \ coffee \ etc with not just the mentor.
I look forward to the next part “Conducting a Successful Offboarding Plan and Offboarding Process”. It might sound like a joke, but it is not. Good offboarding process can help the organization learn and grow and leave the employee with a good taste so she might come back in the future or recommend her friends to join \ use the product.

https://blog.usejournal.com/conducting-a-successful-onboarding-plan-and-onboarding-process-6ec1b01ec2ae

The challenges of AWS Lambda in production – serverless is gaining popularity in the last years and specifically AWS lambda. While many times it sounds like a magic solution for scalability and isolation it also has its issues to know. In this post Lucas De Mitri from Sinch presents problems they run into and possible solutions. For a high level view on Lambda functions just read the conclusion part.

https://medium.com/wearesinch/the-challenges-of-aws-lambda-in-production-fc9f14b182be

My Arsenal Of AWS Security Tools – In a preview post I pointed out on ElectricEye a tool to continuously monitor your AWS services for configurations that can lead to degradation of confidentiality, integrity or availability. This github repo aggregates open source tools for AWS security: defensive, offensive, auditing, DFIR, etc. 

https://github.com/toniblyx/my-arsenal-of-aws-security-tools

3 Problems to Stop Looking For in Code Reviews – I find the post title inaccurate but I like the attitude. As a reviewer you should not be bothered by tiny issues that can be enforced by tooling. Few tools are mentioned in the post and I would also add to that githooks which I find very powerful.I also agree with the insight that code reviews usually happen too late in the development process and constantly looking for the balance between letting developers progress and move forward and on the other hand give feedback on the right time.

https://medium.com/swlh/3-problems-to-stop-looking-for-in-code-reviews-981bb169ba8b

The Power of Product Thinking – In a previous post I mentioned that understanding the cost structure and trade-offs between different architecture (cost wise but also performance and feature wise) is a way to become a more valuable team member. Product thinking is another skill that can make you a more valuable and influential team member. This post explains what product thinking is (and isn’t) and completes it by suggesting several practices on how to develop product thinking. Totally liked it and am going to adopt some of the suggested practices .

https://future.a16z.com/product-thinking/

AWS tagging best practices – 5 things to know

I read AWS tagging best practices whitepaper which was published in December 2018 and distilled 5 takeaways.

1. Use cases – tags have several use-cases including:

  • Cost allocation – using AWS Cost Explorer you can break down AWS costs by tag
  • Access Control – AM policies support tag-based conditions
  • Automation – for example tags can be used to opt into or out of automated task
  • AWS Console Organization and Resource Groups – e.g. create a custom console that organizes and consolidates AWS resources based on one or more tags
  • Security Risk Management – use tags to identify resources that require heightened security risk management practices
  • Operations Support – I find this use case tightly related to the automation use case

2. Standardized tag names and tag values

There are only two hard things in Computer Science: cache invalidation and naming things.

Phil Karlton (check here)

A good practice as suggested in the whitepaper is to gather tagging requirements from all stakeholders and only then start implementing but a minimal step can be to define a convention for tags names and values that everyone can follow, see example from the document below.

tag names example


3. Cost allocation tags delay – this is something I experienced personally – “Cost allocation tags appear in your billing data only after you have (1) specified them in the Billing and Cost Management Console and (2) tagged resources with them”. And even then it can take around 24 hours to appear, take it into account.


4. Tag everything – sounds trivial but sometimes organizations tag only some of the resources, tag everything you can to get a more comprehensive and accurate data of your expenses. A nice feature in the Billing and Cost Management Console is the ability to find resources the don’t have a specific tags so you can easily find out what you missed.


5. Tags limitations – until 2016 AWS allowed up to 10 tags for a given resource. The current limit is 50. It definitely allows much more but it is still a limit to bear in mind when creating a tagging strategy. One way to avoid it is by using compound values, e.g. “anycompany:technical-contact = Susan Jones;sue.jones@anycompany.com; +12015551213” rather than a tag for each attribute (e.g. “anycompany:technical-contact-name = Susan Jones”).

4 interesting things (24/06/2021) – Hebrew

5 years ago I published a blog post about 5 blogs I read in Hebrew. three of those are still live and kicking and I enjoy them (reversim, software archiblog, the bloggerit) the other two are no longer active. Additionally, in those 5 years podcasts became much more popular so I also included 2 podcasts I listen to.

Maya writes algorithms – Maya started this blog while studying for job interviews. In the first posts she presented questions she bumped into and her solutions (and the stream of thought that brought her to those solutions). Later posts also include writing about specific tools (e.g kubectl, git hooks) or interpersonal skills like talking in conferences, being prepared for a code review, etc.


https://algoritmim.co.il/

Internet Israel – Ran Bar Zik is a very experienced Full stack developer and Tech journalist who is also well known for his dad jokes. He writes mostly about Front End and security (but not only) and published several books about software development in Hebrew (those are usually very hard to find).


https://internet-israel.com/

Big Picture (podcast) – a podcast about tech and strategy. Each episode is a deep dive into the strategy of one company such as snapchat, spotify, twilio, etc.

https://bigpicture.buzzsprout.com/

No Tarbut (podcast)- podcast about the daily life of software engineering teams. The topics range from more tech related episodes such as  monitoring, tools, etc. to so-called softer topics such as performance review, leading without authority, salary discussion, creating an inclusive culture. 

http://notarbut.co/

5 interesting things (20/06/2021)

9 Steps to Software Project Handovers – handover is always a challenge and especially when a person leaves the organization and is no longer available for questions or able to access the resources. Similar issues can also arise when you leave the project for a while and then come back – you don’t alway remember all the tricks you used to run the code or the tiny bits of each function. This post suggests practical steps and behaviours that many of them can be TL;DRed with the Zen of Python – “Explicit is better than implicit.”.
https://betterprogramming.pub/9-steps-to-software-project-handovers-9325fbb72cfc

How to make an awesome Python package in 2021 – this post is a walk through building a python package. My main issue with it is the dependency management. I would put it in a different file, separate the development and usage dependency and lock the version – or shortly would rather use pipenv or poetry instead of pip but this could also be achieved with pip.

https://antonz.org/python-packaging/

A Simple Framework for Software Engineering Management – the suggested framework is indeed easy – 3 types of responsibilities (people management, delivery leadership and technical system ownership) vs 3 ranks of priorities (issues, things that are ok, ideas and aspirations). This framework is a good starting point for engineering leaders but also for engineers that can switch the people management with colleagues relations or similar or for personal growth. 

https://medium.com/swlh/a-simple-framework-for-software-engineering-management-f70b216540f2

Full Cycle Data Science (FCDS) – this is a heavy read but worth it both for data practitioners  and managers. Data science projects fail often. Sometimes it is because the problem is not defined well, other times because there is not enough data, data is not relevant, etc. FCDS tries to cast light and solve some of those problems – “In a nutshell, FCDS is a way of life that enables a single data practitioner to close the full product lifecycle and independently deliver end-to-end products, focusing only on where they bring added value”.
https://towardsdatascience.com/fcds-b2d2e6b08d34

Endorse People Publicly, and Other Actions for Allies –  Better Allies is an approach of making everyday actions to create inclusive and engaging workplaces. This post is a weekly newsletter where Karen Catlin (Advocate for Inclusive Workplaces) shares five simple actions to create a more inclusive workplace and be a better ally. This is a weekly reminder to be aware of biases and gaps and ideas to small and consistent changes that can make us and the people around us more comfortable and help everyone be the best version of themselves.
https://code.likeagirl.io/endorse-people-publicly-and-other-actions-for-allies-9352915c0956

5 interesting things – AWS edition (18/06/21)

As I collect items for my posts and wait until I have time to write about them I noticed I have many items related to AWS and decided to have a special edition.


12 Common Misconceptions about DynamoDB – many times our beliefs about certain tools or technology are based on hearing more than doing or doing but not getting into the depth of things and when running into a problem solving it with a solution we already know. This post describes features and qualities of DynamoDB that are sometimes ignored.

https://dynobase.dev/dynamodb-11-common-misconceptions/

Related Bonus – I really liked the link to Alex DeBrie post about single table design with DynamoDB

https://www.alexdebrie.com/posts/dynamodb-single-table/

AWS Chalice – it is not an official offering but rather a python code package for writing serverless applications. The syntax is very similar to Flask while there is a native support for local testing, AWS SAM and Terraform integration, etc. Disclaimer – if you are on multi-cloud I would not move from Flask or FastAPI to Chalice. Also note the used services (AWS lambda, AWS API Gateway, etc.) limits and make sure they don’t limit your app.

https://aws.github.io/chalice/index

Related Bonus – auth0 tutorial on How to Create CRUD REST API with AWS Chalice
https://auth0.com/blog/how-to-create-crud-rest-api-with-aws-chalice/


ElectricEye – “ElectricEye is a set of Python scripts (affectionately called Auditors) that continuously monitor your AWS infrastructure looking for configurations related to confidentiality, integrity and availability that do not align with AWS best practices.”. It is hard to know and follow all AWS best practices and this bundle of scripts is supposed to help uncover those. I have not tried it myself yet but it seems promising.
https://github.com/jonrau1/ElectricEye


My Comprehensive Guide to AWS Cost Control – computing and cloud costs take a big portion of every tech organization those days. Being a more valuable team member also means being aware of the costs and choosing wisely between the different alternatives.

https://corey.tech/aws-cost/


The Best Way To Browse 6K+ Quality AWS GitHub Repositories – most of the time we are not inventing the wheel and someone probably already did something very similar to what we are doing. Let’s browse github to find it and accelerate our process.

https://app.polymersearch.com/discover/aws

Bonus – AWS snowball – I found out that this service exists only this week and it blew my mind – https://aws.amazon.com/snowball/

5 interesting things (06/05/2021)

How Hashicorp works – Hashicorp develops open-source products that are widely used in the industry including Terraform, Vault, Consul, etc. “How HashiCorp Works” provides a glimpse of Hashicorp’s culture and practices. I appreciate this kind of transparency and chance to learn. 

https://works.hashicorp.com/

Make boring plans – a more accurate title would be “make predictable plans”. That is, the next tasks should be predictable based on the team’s knowledge regarding the product pains, bug, customers’ requests, etc.A possible good way to measure how boring the plans are is to ask the team to prioritize the top-k tasks we should work on in the next period (quarter \ sprint, etc.) and check if the tasks overlap. Disclaimer – each team member has its’ own view, pain points, and features they would like to develop and might be biased towards it.

https://skamille.medium.com/make-boring-plans-9438ce5cb053

Explainable AI Cheat Sheet  – cheat sheet, video and resources regarding XAI. This is a very good way to get into this field.

https://ex.pegg.io/

I’ve code reviewed over 750 pull requests at Amazon. Here’s my exact thought process – code review is an art and is a way personal relations manifest themselves. One day I might write a longer post about code reviews but for now I want to focus on the last 2 points in this post – “I approve when the PR is good, not perfect” and “I seek feedback for whether I’m reviewing well”. “Good not perfect” – this depends on the team standards, DoD, the PR scope, etc. Specifically, in startups when the time and money are limited each delay has its’ costs. “I seek feedback” – how is the quality of my CR is measured? what are the goals of CR (familiarity with the code, finding bugs, enforcing standards, something else?)?. I would like to see or find ways to assess the quality of the CR and give feedback to the code reviewer.

https://curtiseinsmann.medium.com/ive-code-reviewed-over-750-pull-requests-at-amazon-here-s-my-exact-thought-process-cec7c942a3a4


My Clean and Tidy Checklist for Clean and Tidy Data – it is commonly believed that “Data scientists spend 80% of their time cleaning data”. This post provides a conceptual framework to clean data so the time data scientist spend on cleaning data might drop to 79% 😉

https://towardsdatascience.com/my-clean-and-tidy-checklist-for-clean-and-tidy-data-fbdeacb3736c

5 interesting things (23/04/21)

You Are Probably Not Making The Most of Pandas “read_csv” Function – this might seems trivial and everything can be found in the documentation but it is well served here with many examples.
https://towardsdatascience.com/you-are-probably-not-making-the-most-of-pandas-read-csv-function-51bcf069e646

Disasters I’ve seen in a microservices world – I experienced most of the disasters described in this post and totally agree with the bottom line – “These edge cases become the new normal at a certain scale, and we should cope with them.”

https://world.hey.com/joaoqalves/disasters-i-ve-seen-in-a-microservices-world-a9137a51

Chess2Vec – while there are many x2Vec works in recent years this work is passion-based. The writer informatics profess that wanted to apply the algorithm to a hobby of his – chess. I think this is a great example of side project and I would love to see more such combinations.
https://towardsdatascience.com/chess2vec-map-of-chess-moves-712906da4de9

Driving Cultural Change Through Software Choices – there are several approaches on who to drive changes this post presents a somehow more immediate approach, straight-forward and role model approach. The author’s idea is that if you choose or provide the tools that reflect your values your team will also adopt them.

https://skamille.medium.com/driving-cultural-change-through-software-choices-bf69d2db6539

Letter to (new) managers – an insightful post for managers and people who strive to become managers. Two quotes I liked – “Trust is consistency over time” and “We start managing others the way we manage ourselves, but to do better, we need to learn new tools and use them adaptively.”. Managing others the way we manage ourselves is one of the most common mistakes I saw managers do and I try to be super aware to it myself.

https://productlessons.substack.com/p/letter-to-new-managers

5 tips for using Pandas

Recently, I worked closely with Pandas and found out a few things that are might common knowledge but were new to me and helped me write more efficient code in less time.


1. Don’t drop the na

Count the number of unique values including Na values.

Consider the following pandas DataFrame –

df = pd.DataFrame({"userId": list(range(5))*2 +[1, 2, 3],
                   "purchaseId": range(13),
                   "discountCode": [1, None]*5 + [2, 2, 2]})

Result

If I want to count the discount codes by type I might use –  df['discountCode'].value_counts() which yields – 

1.0    5
2.0    3

This will miss the purchases without discount codes. If I also care about those, I should do –

df['discountCode'].value_counts(dropna=False)

which yields –

NaN    5
1.0    5
2.0    3

This is also relevant for nuniqiue. For example, if I want to count the number of unique discount codes a user used – df.groupby("userId").agg(count=("discountCode", lambda x: x.nunique(dropna=False)))

See more here – https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.nunique.html

2. Margin on Row \ columns  only

 Following the above example, assume you want to know for each discount code which users used it and for each user which discount code she used. Additionally you want to know has many unique discount codes each user used and how many unique users used each code, you can use pivot table with margins argument –

df.pivot_table(index="userId", columns="discountCode",
               aggfunc="nunique", fill_value=0,
               margins=True)

Result –

It would be nice to have the option to get margins only for rows or only for columns. The dropna option does not act as expected – the na values are taken into account in the aggregation function but not added as a column or an index in the resulted Dataframe.

3. plotly backend


Pandas plotting capabilities is nice but you can go one step further and use plotly very easy by setting plotly as pandas plotting backend.  Just add the following line after importing pandas (no need to import plotly, you do need to install it) –

pd.options.plotting.backend = "plotly"

Note that plotly still don’t support all pandas plotting options (e.g subplots, hexbins) but I believe it will improve in the future. 


See more here – https://plotly.com/python/pandas-backend/


4. Categorical dtype and qcut

Categorical variables are common – e.g., gender, race, part of day, etc. They can be ordered (e.g part of day) or unordered (e.g gender). Using categorical data type one can validate data values better and compare them in case they are ordered (see user guide here). qcut allows us to customize binning for discrete and categorical data.

See documentation here and the post the caught my attention about it here – https://medium.com/datadriveninvestor/5-cool-advanced-pandas-techniques-for-data-scientists-c5a59ae0625d

5. tqdm integration


tqdm is a progress bar that wraps any Python iterable, you can also use to follow the progress of pandas apply functionality using progress_apply instead of apply (you need to initialize tqdm before by doing tqdm.pandas()).

See more here – https://github.com/tqdm/tqdm#pandas-integration