The day-to-day work of software engineers is shifting – less time is spent on implementation, and more on specifying, planning, and reviewing. As AI coding tools improve, the bottleneck moves upstream and downstream: writing clear specs, structuring changes well, and conducting thoughtful reviews become the highest-leverage skills.
One surprisingly weak point in that workflow is GitHub PR file order. Files are shown in alphabetical order that rarely reflects the logical flow of the change. Good reviews are narrative: start with the contract, then the core logic, then the edges and tests. When the order is wrong, reviewers waste cognitive energy reconstructing the story instead of evaluating the change.
That realization pushed me to build a small Chrome extension that lets you reorder files in a PR so the review reads in the right order. Check it out here.
It was also a personal reminder of the joy of doing something for the first time – I never built a Chrome extension before. My first attempt was messy and unsuccessful, so I wiped everything and restarted using Spec-Driven Development (SDD) with OpenSpec. The second iteration was dramatically smoother, more structured, and consumed fewer tokens. In a world where implementation is getting cheaper, clarity in specs, understanding of trade-offs, and the review experience are quickly becoming the real craft of engineering.
Fun fact about me – I played Rugby Sevens for years, including on a national team. It taught me more about engineering leadership than I expected. Small teams, huge space, high speed – Sevens and modern engineering run on the same dynamics.
Here are 7 lessons Sevens taught me about engineering leadership.
1️⃣ It’s Not About Weight – It’s About Teamwork
The first lesson I learned came from scrumming with a weight disadvantage but a huge coordination advantage. In Rugby (in 15s even more than in 7s), you don’t win scrums because your players are heavier. A team wins because: you bind together, you push at the same moment, you trust the rhythm. In lineouts, timing matters more than height, and coordination matters more than individual strength.
In engineering, this shows up everywhere:
A brilliant engineer misaligned with the team slows progress.
A “lighter” team moving in sync outperforms a group of heavy hitters pulling in different directions.
Velocity is not the sum of individual forces. It’s a synchronized force.
Lesson: Coordination multiplies talent.
2️⃣ Communication should be Constant and Loud
In Rugby Sevens, silence is dangerous. Players shout constantly:
“Inside!”
“Switch!”
“Up!”
“Blind!”
It’s not noise – it’s alignment. There’s too much space and speed to assume others know what you’re thinking.
In an office, you get accidental alignment:
Body language
Whiteboard moments
Corridor clarifications
In remote work, silence becomes ambiguity. If communication isn’t explicit and frequent, gaps appear.
Lesson: Constant communication creates alignment.Alignment brings results.
3️⃣ You Can Only Pass Backwards
In rugby, you’re only allowed to pass the ball backwards.
If you want to move forward, you must either run forward yourself or align with teammates who are already in motion
You can’t throw the ball ahead and hope someone figures it out. The constraint forces structure.
To gain ground, teams spread wide.
They create overlap.
They run support lines before they’re needed.
They time short, precise passes.
You go forward by passing backward – and that only works with discipline. In engineering, this principle shows up everywhere. You can’t “pass forward” sloppily:
Product can’t throw half-defined specs over the wall
Engineering can’t push messy code to QA
Leadership can’t announce strategy without alignment
When work is tossed ahead without positioning and support, the play breaks. Real velocity doesn’t come from heroic sprints. It comes from synchronized movement.
Too many handoffs? You lose momentum. Too few? You get isolated and tackled.
In Sevens tournaments, you play multiple games on the same day.
Win big? Reset. Lose badly? Reset.
The next kickoff has no memory.
Engineering teams often carry emotional baggage:
A big launch → complacency
An outage → overreaction
Strong teams don’t ignore outcomes – they process them quickly. They celebrate briefly, learn fast, and show up focused for the next “game.”
Lesson: Don’t let yesterday’s result dictate today’s execution.
5️⃣ Scoring Under the Posts vs Securing the Try
In rugby, when a player breaks through, they often try to run closer to the center before grounding the ball.
Why? Because it makes the conversion kick easier.
But that extra effort increases the risk of being tackled and losing the try entirely.
In engineering:
Do we ship now?
Or optimize a bit more?
Refactor first?
Polish the UI further?
Sometimes we optimize the conversion and lose the try.
Lesson: Secure value first. Optimize second.
6️⃣ You Don’t Always Need More Resources – Just a Change of Angle
In Sevens, attacking the wide-open side is obvious. Great teams exploit the blind side instead.
You don’t need more force. You need better perspective.
In engineering:
Reframing a product problem
Reorganizing teams instead of hiring more
Solving a process issue instead of pushing harder
Sometimes the breakthrough isn’t scaling effort – it’s shifting angle.
Lesson: Strategy beats brute force.
7️⃣ Commit Fast – Adjust Faster
In Rugby Sevens, hesitation kills.
Pause before a tackle, you miss. Delay the pass, the overlap’s gone. Half-commit to a line break, and you get isolated.
The game rewards decisiveness.
Engineering leadership is the same. Over-analysis drains momentum. Once you pick a direction, the team needs full commitment – not tentative buy-in where everyone’s still hedging.
But here’s the nuance: when a defender overcommits, great players side-step. Decisiveness doesn’t mean rigidity. You commit fully, and when new information shows up or the situation shifts, you adjust fast – without ego, without drama.
The Quarterback Paradox – while I’m not sure I agree that it is a paradox – i.e., recruiting a critical position to an organization is hard even if you have a lot of data, I love and strongly agree with the suffix of the post – “As in the NFL, in organizations the hardest part is often not finding talent, but creating the conditions in which real potential does not break before it has a chance to become reality.”
What LEGO Can Teach Us about Autonomy and Engagement – Who doesn’t like LEGO? We all played with it as children, and some of us still build today. In this post, Pawel Brodzinski describes a neat experiment he runs in training sessions – teams first build a LEGO set under a manager’s direction, then self-organize for a second build, and consistently report higher engagement when given more autonomy. While it shows a clear effect, the experiment has some drawbacks – most notably an order effect: the self-organized build always comes second, so the engagement boost could partly stem from participants being warmed up and more comfortable rather than from autonomy alone. Always nice to read about LEGO as an adult.
Skyll– Skills are markdown instruction files that teach AI coding agents how to perform specific tasks. Today, skills must be manually installed before a session, meaning developers need to know upfront which skills they’ll need. Skyll is an open-source search engine and API that lets any AI agent discover and retrieve skills on demand at runtime, ranked by relevance, without pre-installation. You can think of it as a package manager for agent capabilities, enabling agents to be truly self-extending and autonomous.
Skyhook.io radar – Existing K8s dashboards tend to be either heavyweight, cloud-dependent, or require cluster-side components. Radar’s zero-install, single-binary approach with real-time topology and traffic visualization answers the need of developers and platform teams who want quick, frictionless cluster observability that can even run on their laptop, especially useful for DevEx-focused teams looking to reduce the friction of Kubernetes debugging and operations,
Babysitter – If you worked with coding agents, you probably experienced this pain: the lack of a structured process control and non-deterministic workflows. Babysitter lets you define iterative workflows (research → spec → TDD loop → quality gate → deploy) that are deterministic, resumable across sessions, and auditable, which is critical for moving AI-assisted development from ad-hoc experimentation toward reliable, production-grade engineering workflows and complex features.
With the growing adoption of AI and recent moves like Anthropic’s Cowork plugin marketplace, there’s a popular narrative that traditional SaaS is dead. The implication is that the combination of AI agents + marketplaces will commoditize software entirely, and the old subscription paradigms won’t survive.
I tend to believe that’s an overstatement. SaaS isn’t dying – it’s evolving. What is under threat is SaaS as we’ve known it: long-term seat licenses, one-size-fits-all tiers, and feature-driven pricing. AI makes core functionality easier to replicate and access, pushing raw features toward commodity status. The value and pricing increasingly lie in outcomes, workflows, integrations, and customer trust. More than in the past, pricing is a strategic tool. See a similar conversion in David Ondrej’s post on Twitter (link in the first comment)
That’s why understanding the possible pricing models and their tradeoffs matters more than ever. I’ve been listening to Ulrik Lehrskov-Schmidt’s webinar series on Agentic AI pricing –“Making Money in Uncertain Times.” In a world where both cost structures (compute, labor, infrastructure) and capabilities (model performance, automation) are shifting rapidly, we can’t anchor pricing to static assumptions. Instead, pricing needs to reflect real value delivery, signal predictable economics to customers, and align engineering decisions with business outcomes.
This isn’t just a GTM or sales conversation – it’s also a core product and engineering conversation. How we design systems, how we think about metrics and outcomes, and how we package those outcomes for customers all influence pricing levers.
Coding agents are no longer a novelty – they’re everywhere. Over the past year, we’ve seen massive adoption across startups and enterprises, alongside real improvements in autonomy, reasoning depth, and multi-step code execution. Tools like Claude Code, Codex, Copilot, and Kiro are shipping updates at a relentless pace, and teams are increasingly comfortable letting agents refactor modules, write tests, and manage pull requests.
But there’s a catch: these tools are token eaters. Autonomous agents don’t just answer a prompt – they plan, reflect, re-read the codebase, call tools, retry, and iterate. At scale, that translates into serious API bills.
That’s why we’re seeing growing interest in a different deployment pattern: running coding agents against local or self-hosted models. Ollama recently announced ollama launch a command that sets up and runs coding tools such as Claude Code, OpenCode, and Codex with local or cloud models. vLLM, LiteLLM, and OpenRouter also provide similar integrations. That signals that this is no longer fringe experimentation. For many teams, local LLMs are emerging as a viable path to reduce cost, improve stability, and gain tighter control over privacy.
Deployment models for coding agents
When teams talk about “running models locally,” they often mean different things. In practice, there are three distinct deployment patterns – and they differ meaningfully in cost structure, performance profile, and governance posture.
Local (Developer Machine) – the model runs directly on a developer’s laptop or workstation (e.g., via Ollama).
Hosted (Org-Managed Infrastructure / VPC) – the organization runs the model on its own infrastructure, either on-premises GPU servers or in a private cloud/VPC (e.g., via vLLM, Kubernetes, or managed GPU clusters).
Managed LLM API (e.g., Anthropic, OpenAI, etc.) – the model runs fully managed by a provider; the organization interacts via API.
Dimension
Local (Dev Machine)
Hosted (Org VPC / On-Prem)
Managed LLM API
Cost Structure
No per-token fees. Hardware cost borne by the developer. Cheap at a small scale; uneven across the team.
No per-token fees. Significant infra + ops cost. Economical at scale if usage is high.
Usage-based (per token / per request). Predictable but can become very expensive with agent loops.
Cost at Scale (Agents)
Hard to standardize; limited by laptop GPU/CPU.
Strong cost efficiency at high volume
Token costs compound quickly. Expensive in large org rollouts.
Performance (Latency)
Very low latency locally, but limited by hardware. Large models may be slow or impossible.
Good latency if well-provisioned GPU cluster. Can optimize with batching.
Typically excellent latency and throughput; globally distributed infra.
Model Size / Capability
Limited to smaller models (7B–34B typically; maybe 70B with strong GPUs).
Can run large open models (70B+), depending on infra budget.
Access to frontier SOTA models (often strongest reasoning & coding quality).
Quality (Coding Tasks)
Improving. “Good enough” for many workflows, especially with fine-tuned coding models.
Strong – can choose best open models and fine-tune internally.
Often highest raw reasoning quality and reliability on complex multi-file tasks.
Security / Privacy
Code never leaves device. Strong for IP protection. Risk: inconsistent security posture across developers.
Lightweight CLI + API that serves models locally; integrates with multiple agents (Claude Code, Codex, Droid, OpenCode) and supports on-prem inferencing with moderate hardware.
vLLM (Serving)
High-performance LLM server
Optimized for scalable reasoning and long context LLM inference; integrates with agents (e.g., Claude Code) via Anthropic-Messages API compatibility.
OpenRouter
Unified LLM API broker
Central API layer for 400+ LLMs including local and cloud endpoints; can route agents to preferred backends with cost/redundancy optimization.
LiteLLM
Unified LLM API
Enables developers to use many LLM APIs, such as OpenAI, Anthropic, Gemini, and Ollama, in a single, OpenAI-compatible format.
Notable models
Model
Primary Use
Latest Release
Qwen3-Coder
Alibaba’s 480B-parameter MoE coding model. SOTA results among open models on agentic coding tasks
Cost is the most immediate driver. Autonomous coding agents are token-intensive by design. At enterprise scale, those token costs compound quickly.
Local inference eliminates per-token fees, which makes it attractive for high-volume, repetitive tasks. But frontier proprietary models still maintain an edge on complex, cross-repository reasoning and edge cases. The likely outcome is not full replacement, but intelligent routing:
Simpler or repetitive tasks → local or hosted open models
Tools like OpenRouter and LiteLLM are already enabling this pattern, and by the end of 2026, hybrid routing is likely to be the default deployment strategy for medium- to large-sized engineering organizations.
2. Standardization Will Lower the Switching Cost
Hybrid only works if switching models is frictionless.
As coding agents like Claude Code, Codex, Copilot, and others converge around shared inference interfaces (Ollama, vLLM, OpenAI-compatible endpoints), swapping models in and out becomes operationally simple. This reduces lock-in and makes experimentation safer. As interoperability improves, the barrier to trying local models drops dramatically – and adoption follows.
3. Open-Source Coding Models Will Close the Gap
Tool-use fine-tuning is maturing. Code reasoning benchmarks are becoming more rigorous.
By late 2026, open-weight coding models are likely to be “production-grade” for a substantial share of workflows – especially where cost control and data sovereignty matter more than absolute frontier performance.
4. Resilience Will Matter as Much as Cost
There’s also a structural pressure building: agent-driven workloads amplify the impact of API outages. When a coding agent is embedded into CI pipelines or developer workflows, downtime is no longer an inconvenience – it’s a blocker.
As usage scales, reliance on a single managed API becomes a risk vector. This will accelerate investment in redundancy:
Secondary API providers
Local fallback models
On-prem capacity for critical workflows
Summary
In 2026, hybrid won’t just be about cost optimization – it will be about operational resilience.
The future is not “local vs cloud.” It’s a composable, policy-driven model infrastructure.
Organizations that treat model routing, hosting strategy, and redundancy as part of their core engineering architecture – rather than as an afterthought – will have structural advantages in cost control, privacy, and reliability.
2026 won’t be the year enterprises abandon managed APIs. It will be the year they stop depending on them exclusively.
A sense of humor is one of the most underestimated leadership skills
Following the recommendation in “The Great CEO Within” I listened to “The One Minute Manager”. In one of the chapters, they mention the usage of Humor as a leadership tool. It is not about becoming a comedian but rather showing up as humans. Humor is a powerful and often overlooked tool. Why it matters:
Humor helps build trust and rapport – people are more likely to engage and collaborate when they feel comfortable and connected.
It can reduce stress and tension, boosting well-being and performance.
Humor makes leaders more approachable and memorable, signaling confidence and emotional intelligence.
Shared laughter fosters psychological safety, helping teams voice ideas and take risks.
Of course, balance is key – humor should complement clarity and respect, not replace them. Too many jokes or poorly timed humor can actually backfire, so think of it as a strategic leadership tool, not a default setting. It’s about knowing when a light moment can lower defenses, reset the room, or simply remind everyone that work is done by humans, not robots.
I finished listening to “The Great CEO Within” by Matt Mochary and Alex MacCaw. A few thoughts:
1️⃣ A tactical cheat sheet I view the book as a tactical cheat sheet: short, practical chapters you can skim for ideas. It’s great for quick exposure, and most chapters include references for deeper dives. For me, this book has excellent value for time.
2️⃣ Revisit in the LLM era Two well-known ideas in the book are Getting Things Done and Inbox Zero. Inbox-zero and productivity advice hit differently today. With LLMs helping triage emails, summarize threads, and highlight what actually matters, the principles remain the same – but the execution is far more automated.
3️⃣ Optimizing meetings TL;DR: come prepared, use written communication in advance, and don’t deviate from the planned agenda. The authors suggest holding all meetings, including 1:1s, on the same day. From my experience, for 1:1s that go beyond status updates and require real attention (e.g., feedback), stacking too many of them on the same day can be overwhelming for most people.
4️⃣ “The first thing to optimize is yourself” One of my favorite quotes from the book. It emphasizes founders’ and leaders’ health, mental and physical, something that has historically been overlooked. A good reminder that sustainable leadership starts with managing your own energy.
The book also mentions principles of conscious leadership: listening to feedback and acting on it, and not being afraid to make (and admit) mistakes. This week, I also read a blog titled “Reflection is a Crucial Leadership Skill”, which made these ideas more actionable and down-to-earth.
I started this week with deeplearning.ai’s course on semantic caching, created in collaboration with Redis. That sent me down a rabbit hole, exploring different LLM caching strategies and the products that support them.
One such product is AWS Bedrock Prompt Caching. If large parts of your prompts are static (specifically, the prefixes), retokenizing the prefix on every request is a waste of time and money. Prompt or context caching lets you process the prefix once and store it, reducing costs and improving performance.
Sounds great, right? Let’s check the pricing mode. If your requests are more than 5 minutes apart, your cache will be cleared. If your requests are short, caching won’t be activated; if the cache hit rate is low, you will pay an extra, non-usage-based premium for cache writes. I highly recommend reading the “How Much Does Bedrock Prompt Caching Cost?” section in the article “Amazon Bedrock Prompt Caching”.
🚀 AI is moving deeper into digital health – over the past week, both OpenAI and Anthropic have introduced major features aimed at bringing powerful AI capabilities into healthcare and life sciences. Links in the first comment
🔹 OpenAI: ChatGPT Health OpenAI has launched ChatGPT Health, a dedicated health experience that lets users securely connect their medical records and wellness app data (e.g., Apple Health, Function, and MyFitnessPal) to get more informed insights about their health and wellness. The feature is designed to help people better interpret test results, prepare for doctor visits, and navigate everyday health questions — not replace clinicians. Enhanced privacy protections ensure that health chats and data remain isolated and encrypted, and that users retain full control over connections and data.
🔹 Anthropic: Claude for Healthcare & Life Sciences Following an earlier announcement regarding Claude for Life Sciences, Anthropic introduced Claude for Healthcare alongside expanded life science capabilities, bringing its Claude AI into regulated medical and scientific use cases. This includes HIPAA-ready infrastructure and connectors to industry data sources (like CMS coverage rules, ICD-10 codes, and NPI registries) to support tasks such as prior authorizations, claims management, and clinical documentation. Claude can also summarize medical histories and explain test results in plain language. On the life sciences side, new integrations with clinical trial, preprint, and bioinformatics platforms aim to accelerate research workflows and regulatory documentation.
Both announcements show the AI industry racing into digital health with different focus areas. OpenAI’s move toward personalized health guidance for individuals complements Anthropic’s broader, enterprise-oriented tools for providers and researchers. Together, they raise exciting possibilities and important questions about regulatory standards, data privacy, and the role of AI in care delivery.
Bonus – GrantFlow – a grant management platform that automates discovery, planning, and application workflows for researchers and institutions.