Andrey Kislyuk

Refrigeration Adventures

2023-09-13T01:15:00+00:00

I have a refrigerator from GE’s ZISS480 series. This particular one was made in 2003, but this family of side-by-side fridges made by GE in Louisville, Kentucky has been in production since the 1990s and is still made to this day with minor changes (which is great for parts availability), and sold under the Cafe, Profile, and Monogram brands.

I have the maintenance manual for this fridge and since it’s been in use for 20 years (we got it used), I’ve had to fix a few problems with it. Here I’ll describe the most difficult issue I’ve had to deal with - hopefully this will help you!

The problem started with the icemaker/water dispenser module. Appliance technicians recommend fridges without an icemaker or water dispenser, because most issues with the appliances seem to come from those. After this experience, I definitely agree. A separate ice maker machine that doesn’t involve your fridge seems like a more reliable solution.

Anyway, the ice dispenser chute has a flap door that opens to dispense ice, and closes when done. This area started icing over and weeping water. The icing then spread upwards into the icemaker compartment (as I learned later, this is not the kind of problem that can’t be ignored for long). First I changed the flap door, which did not help. After taking apart the freezer door, I realized that the flap door was not closing shut, which essentially meant the fridge always had a hole in it. The problem was in the dispenser solenoid responsible for closing the door - it had rusted through because water or condensate had been dripping on it. I replaced the solenoid, the door started closing again, and the chute stopped icing over. Yay! Problem solved! But it turned out not to be so simple.

The icing in the icemaker compartment persisted. A few months later, it grew to jam the icemaker, and a few weeks after that, the fridge stopped cooling properly. Temperatures in both compartments rose to unsafe levels and I started to stress out.

I followed the troubleshooting flowcharts in the manual, which led me to discover that the evaporator fan was blocked. On these refrigerators, the evaporator is in a compartment above the freezer, while the condenser is above the fresh food section. The evaporator fan is critical - both secitons of the fridge require forced airflow. The evaporator has ducts around it that guide the air through the coils, to the fan, and to the baffle that opens to send cold air to the fresh food section.

The evaporator fan would get blocked by icicles and stop spinning. At this point I went on a bit of a wild goose chase - the flowcharts were not helping. I reached the end of the flowcharts and had a technician check the sealed system to see if was low on refrigerant - but no, it was fine, and the tech had no idea what was wrong. He suspected that the fridge had a hole in it (which was not wrong - it had had a hole! I thought I’d fixed that problem…) Finally I decided to take apart the defrost system and see if it was working properly. I’d tried to get to it before, but the freezer ceiling panel under the evaporator and defroster was glued in place and would not budge.

The defrost system is essential to the operation of any modern fridge. Warm air enters the fridge when opening doors. As it’s cooled, the moisture condenses and must be taken care of - otherwise it will accumulate on the coldest surfaces, and eventually clog the whole fridge. So every few hours, refrigerators turn on a heater next to the evaporator coils. Frost melts and drips down from the evaporator, and is caught by a drain pan. Critically, the drain pan must keep the meltwater liquid and send it outside the fridge. This is typically done with a heated trough pan connected to a hose that runs to the outside of the fridge and then into another pan sitting under the compressor (using the heat of the compressor to speed up evaporation).

My first warning sign about the defrost system was that the pan under the compressor was bone dry. So the fridge was not able to get rid of defrost meltwater, and that’s what was dripping down into the freezer and jamming the evaporator pan. But I’d ran the defrost cycle manually multiple times, felt the heat from the defrost heater, and nothing improved. What was going on?

The process of taking apart the defrost system is shown in this video. I had to apply a lot of force to get the freezer ceiling panel to come down. It was being held by a combination of glue and a huge ice sheet enveloping the entire bottom of the compartment. The defrost cycle did nothing to melt it. There was a lot more water and ice up there than I expected!

The defrost drain pan sits on top of a big styrofoam block. This styrofoam was completely iced over too, and after blowing a hair dryer at it for a while, I tilted the defrost pan and unleashed a deluge of meltwater. It would have taken many days in the sun for this block of styrofoam-clad ice to melt. But when I turned on the defrost cycle again, both heaters (on the evaporator and under the bottom of the drain pan) got nice and hot.

So now I knew why the defroster was not working. The drain pan was completely encased in ice, and any time the heater turned on, most of the heat went to trying to melt the ice under the pan. The ice in the pan never melted, clogged the drain hole, and just kept growing into an iceberg. The defroster fell further and further behind, and meltwater eventually just flowed over the pan and down below. Even though the evaporator coils themselves were not getting iced over too badly, the ice underneath was really restricting the airflow - and blocking it completely, once it got in the way of the fan.

After making sure no ice or water was left anywhere around the evaporator, I removed much of the styrofoam (leaving only enough to support the drain pan in its original position) and put everything back together. After that, the fridge worked like new!

Root cause analysis

This experience made me understand the fundamentals of how refrigerators work better, and later I found other resources that explain what happened in more general terms. As an engineer, I like finding and fixing these kinds of issues - even though this was more stressful since my family’s food was on the line. In incident postmortems at work, engineers are trained to analyze incident timelines, response practices, contributing causes, and ultimately root causes to synthesize learnings. It’s fun to apply this practice to the fridge. What are some contributing causes that got the fridge to this point, and how could they be avoided?

The ice chute door solenoid was positioned where water or condensation was able to collect on it and rust it through. Sealing the solenoid compartment against moisture would have stopped this.
The evaporator fan is positioned in such a way that overflow from the defrost system can drip on it (and jam it when it turns to ice). Since the evaporator fan is essential to cooling, it would be better to reposition the fan or ensure overflow drips elsewhere (like through the suction side opening in the front of the ceiling panel).
The drain pan is engineered in such a way that water can get into the space between the pan and its styrofoam enclosure, or even into pores in the styrofoam enclosure. This means if the drain pan ever gets overwhelmed and overflows, an ice heatsink can form under the pan, disabling its heater and requiring the deep manual defrost that I ended up doing. It would be best to seal the space under the pan so no water can get in, making sure the space under the heater keeps acting as insulation (has low thermal conductivity) and all heat goes into melting the water next to the drain.

What about the root cause? As best I can tell, it’s that last one - an evaporation system drain pan that is too easily overwhelmed by an excess of meltwater. But there’s room for ambiguity here, which is why attributing the root cause is not usually the point of incident post-mortem exercises - the focus is on learning how to operate the overall system safely and reliably.

Wrenching on a budget, 2023 mountain bike edition

2023-09-09T22:15:00+00:00

Recently I assembled a new bike from store-bought components for the first time. Now is a great time to buy a bike, since components have become cheaper after the supply chain caught up with and overshot the pandemic demand surge. Buying parts individually allows you to take advantage of sales and spend on high-end components only where you know they are worth it. For example, the largest component manufacturer, Shimano, segments its product lines and trickles down the best technology into the lower segments every year. So this year, Shimano Deore XT shifters are worth it for the double-upshift capability, but for the rest of the components, Deore or even “un-branded” Shimano can be perfectly fine.

If you’ve ever serviced your own bike, assembling a new one is not hard, and comes with a nice sense of accomplishment. You will also gain a better understanding of the fundamentals of how your bike works. YouTube videos from Park Tool, Berm Peak Express and other great channels provide a wealth of knowledge. Here I’ll list some learning tips for DIY wrenching without spending like you’re about to open your own bike shop.

To install or remove a chain, you will need a multitool that can press out a chain link pin, to shorten the chain to its desired length. This is very easy. Pressing the pin back in to connect the chain is not required if you bought a quick link chain, which snaps into place.
To install or remove a cassette, you will need a cassette lockring tool - which is fairly universal, and can also be used on Shimano centerlock disc brake lockrings. When removing a cassette, cassette pliers are recommended, but not required. To get leverage when turning the lockring counter-clockwise, you can improvise by using another rear wheel and wrapping the chain in an S shape around the biggest sprockets on both cassettes.
To install or remove a bottom bracket, you will need a bottom bracket tool, which is unfortunately a lot less standardized. This is the one place where a custom tool is both required for a full build, and also likely to depreciate over time as bike companies come up with yet another bottom bracket standard. (If you can, avoid press-in bottom brackets! They are almost impossible to service yourself if anything goes wrong.)
To install a headset, you might be told that you need a special press tool, but that’s not really the case. All you need is a threaded rod and a set of large, sturdy washers that can evenly distribute force across the top and bottom of the headset. You can pick these up at a local hardware store for less than $10. It also helps to press in one bearing cup at a time, to minimize the chance of tilting either cup and causing damage. (Removing a headset does unfortunately seem to require specialized tools, unless you get very crafty with your washer diameters.)
To install a crown race (which is the part of your headset that interfaces with your fork), you need a foot long piece of 1.5” PVC pipe to hammer it onto the fork. You don’t even need a mallet, as you can turn the fork upside down and gently tap the fork and pipe on a hard surface to bump the crown race on. A 1.5” O-ring can help protect the surface of the race. Again, these can be picked up at your local hardware store for 5 bucks or so.
To install a fork, you may need to hammer a star nut into the steerer tube. This again may seem daunting or you may see recommendations for a special tool, but all you really need is a long-ish bolt and a mallet, plus some patience to guide the star nut in while it tilts around until it gains purchase against the steerer tube all across its two sets of metal flaps.
Many Youtube videos mention this, but there is a trick to making sure your headset is tight and works correctly. If you’re not careful when installing the headset cap, you will end up using the cap and star nut to compress the top of the steerer tube instead of the entire fork/bearings/cups/head tube assembly. This will cause your bike to feel loose and wobbly. To avoid this, make sure that the topmost rim of the stem/spacer stack is above the top of the steerer tube - in other words, eliminate the possibility that when tightened, the headset cap will touch the rim of the steerer tube.
To cut your brake and cable hoses to size, you need some sort of tool that is sharp enough to make a clean cut. I used cutting pliers, but I can see a Dremel cut-off wheel or even a hacksaw working fine if you can clamp the hose without damaging it.
Connecting hydraulic brake hoses to calipers and levers may seem daunting, but if you’ve ever connected a faucet, dishwasher, or any other plumbing hose in your kitchen or bathroom, technologically this process is no different. All you need to do is jam the barbed fitting into the tip of the hose, then carefully position the hose and soft metal “olive” ring, and gently wrench the fitting together to compress the ring to form a seal.
To bleed your own brakes (or fill new brakes with hydraulic oil), all you really need is a small plastic syringe to pump oil into the caliper. The rest of the “bleed kit” can be replaced by wiping down your lever and caliper with a rag.

And that’s it! Hopefully this gives you a sense of satisfaction in putting together your own ride!

OSS developer experiences: part 2

2022-09-09T21:01:00+00:00

This is a follow-up to my earlier post, OSS developer experiences.

In that post, I wrote about a comment that someone made on GitHub telling me to delete my project because it was “causing problems”. At first I just closed the issue and moved on, but, as I put it earlier:

Only recently while sorting through a backlog of issues in need of attention across my OSS projects did I look back at this and realize the ridiculous insidiousness of this comment. … To tell an OSS developer that they should delete their project is a completely unacceptable way of communicating. It’s a form of harassment, and I will call it out as such. The main reason I bring this up here is to raise awareness. I have enough experience to blow this off and move on without a second thought, but I can easily imagine myself 10 years earlier, when some comment like this from someone I perceive to be competent might discourage me from continuing altogether.

Sadly, I’ve recently encountered another form of misbehavior on part of an OSS community participant who styles himself an “Open Source Contributor”. Apparently, the way this person chose to contribute is by copying a project of mine, stripping the LICENSE file out of the project, replacing my name with his own name throughout the project documentation, and publishing the project on GitHub and PyPI. While this person did delete most of the code in the project (making the result non-functional), he kept much of the documentation in place, making it obvious where it was copied from.

I release my OSS projects under the Apache license, which I think strikes the right balance between attribution and freedom of re-use. Specifically, I treat the software I release as if it was an academic paper: if you use it as inspiration, it would be nice if you cited me, but it’s not required; if you copy my work directly, it is required that you acknowledge my work as the origin. Doing otherwise and distributing the result is a breach of the license and a copyright violation. Many software engineers and OSS contributors work on derivatives of SignXML (GitHub keeps track of 90 forks - and those are just the forks made in the UI). I’m glad that the project is useful to them, and I’m grateful for their contributions, including bug reports and PRs. This person, though, chose to explicitly violate the license.

When I contacted this person to explain the situation and request attribution, he denied that he copied anything (despite the fact that the copies remained in the project’s git history), saying “even if I use signxml in the name of this project both projects are completely different. So, please tell me what I copied from you project line by line.” He then closed, and deleted from GitHub, both the original issue and the follow-on where I explicitly requested that he abide by the terms of the license, but there he implicitly acknowledged copying the documentation by saying, “BTW, apart from docs there nothing in common about these two projects”. Finally, he deleted the contents of the README.rst file in the root of the project (while the infringing copies, again, remained in the project’s git history and on PyPI).

The main reason for writing this post, like the previous one, is, again, awareness. As an OSS developer, you might face situations where people steal your creative output, plagiarize, misattribute it to themselves, and otherwise abuse your work. It’s quite likely that you won’t have the legal or administrative resources to do anything about this violation (I did try). That’s life; not everything in life is perfectly fair or goes our way. But, to quote my previous post again:

Among the many issues in the world, this one should be relatively easy to spot and eliminate. If you see someone [doing] something like this or otherwise giving grief to an OSS maintainer, chime in and call it out for what it is: harassment. Perhaps the harasser will adjust their behavior, perhaps not, but in any case it can give the maintainer confidence to ignore them. You never know when this bit of support and encouragement might foster the next confident individual, and inspire their amazing contributions to society.

And perhaps think of any unsponsored open source software that you use and admire the most, and reach out to its volunteer maintainers to thank them (I know I don’t do this enough).

The only thing I’ll add is: act to give the maintainer confidence to ignore or confront the harasser - giving them the choice.

Cromulent: WDL workflows with miniwdl and AWS Fargate

2020-10-12T04:19:35+00:00

Cromulent: a workflow management toolkit that’s perfectly adequate and not quite as daunting to set up as Cromwell

In this post, I’m going to go over some of the technologies that power my day-to-day work as a bioinformatician, as well as exciting developments that enable these technologies to serve bioinformaticians better going forward.

Background

Bioinformatics workflows (often referred to as pipelines) typically process data using a series of Linux executables, feeding the output of each one to the next. Each of the executables is a bioinformatics software package responsible for (often very sophisticated and idiosyncratic) data processing steps such as filtering input DNA (cleaning data), aligning DNA to a reference genome database, calling variants (identifying unique features in individual genomes), clustering and normalizing DNA reads for quantitative DNA analysis assays, annotating genetic features, de novo genome assembly, etc.

At their basic level, bioinformatics workflows (just like many other scientific and business workflows) can be expressed as shell scripts. A long time ago, scientists would submit these scripts to a batch job scheduler like PBS, SGE, Condor or LSF. The scheduler would dispatch the script to run on a computer that was part of a cluster and connected to a shared NFS filesystem. Today, these same scripts can be run on cloud APIs like AWS Batch, which enable several key improvements:

They use Docker containers, which empower developers to use whatever software they need, and play a key role in enabling reproducibility (a big priority in science).
They provide on-demand compute capacity, which allows you to burst scale your “cluster” to very large numbers of VMs without paying for running them all the time (scientists tend to run workloads in very lumpy ways - the load on your system will often go from zero to many thousands of CPUs as experiments are prepared, run, analyzed and re-analyzed).
They allow easy access to the resulting data (for example, by uploading it to S3, where any authorized web application can access it).
They lower the barrier to entry: anyone can create a cloud PaaS account for free and pay as they go to operate scientific computing infrastructure that previously required hundreds of thousands of dollars of upfront investment and approval by institutional administrators.

Shell scripts are great, but it’s not clear how they can be used as units of reproducible/composable scientific computing. The community has come up with a bewildering array of workflow management frameworks, which by and large attempt to address the key needs for describing workflow inputs and outputs and establishing the runtime contract between the workflows and their execution environment. Given the number of “mostly dead” projects on the list linked above, it is clear that the problem of workflow description and management is very non-trivial and requires careful attention to software interfaces, abstractions, and community engagement for any tool or project that hopes to succeed in this space.

Subjectively, the workflow description and management projects with the most traction in the community are Snakemake, Nextflow, CWL and WDL. Each of these projects boasts a vibrant user community, a rich feature set, and a sustained development history. In my day-to-day work I have ended up using WDL extensively. This is in large part due to the excellent work of my colleague Mike Lin. Thanks to Mike, WDL is the only workflow language with multiple production runtime (interpreter) implementations, and one of very few with a formal, tested language specification. The next ingredient for success is a deep focus on developer tooling provided by the miniwdl project. (My employer, CZI, sponsors miniwdl development; the CZI EOSS grant program also sponsors Nextflow and CWL).

Miniwdl is a WDL interpreter and runtime (workflow runner). Like all foundational software, miniwdl started with a deep, informed focus on developer productivity: it emphasizes speed of execution, usable interfaces and abstractions, and good error messages that support the developer when something goes wrong. In fact, miniwdl started as a linter, which reflects this focus on developer experience.

By itself, miniwdl is great at running workflows on a single computer, but it does not yet integrate directly with cloud-based systems like the AWS Batch API that I mentioned earlier. And this is where I finish setting the stage and dive into the novel part of this blog post: a miniwdl plugin providing an execution backend to run WDL workflows on AWS Fargate containers.

Wiring up correctly

Great software is defined by its interfaces. The interfaces embody the commitment to the user (or developer). Stable, well-specified, well-documented interfaces at well-chosen abstraction boundaries enable software to evolve and grow over time. In the context of workflows, a workflow engine can easily fall into the trap of tying itself too closely to a particular execution environment infrastructure (or conversely, under-specifying its execution environment contract). So wherever reasonable, miniwdl provides a plugin interface to maintain and provide future flexibility as to which cloud API the workflow tasks can run on, or what kinds of URLs can be given to the workflow as inputs.

I used the miniwdl task container backend plugin API to run WDL workflows on AWS Fargate. Why Fargate? Because of its advantage in latency, or dispatch time. When developing workflows locally, miniwdl uses the Docker daemon on the local computer to orchestrate tasks - one Docker container per task. This is excellent for local debugging - tasks start as quickly as Docker can run them, within seconds. But if you want to run your workflow in a production-representative environment, you’re going to be dispatching it to a cloud container management API like Kubernetes or AWS Batch. And in those APIs, it can take a long time - minutes - to provision a VM for your workflow. That will kill your development velocity - and it will also pile up burst scaling latency and job dispatch overhead in production. With Fargate, containers take about 10 seconds to start (not perfect, but still much faster than cold-starting containers with other cloud APIs).

Another aspect of this work is to make these tools accessible to anybody with an AWS account - and to make sure that the baseline compute capacity is zero, so there are no nasty billing surprises at the end of the month. To achieve this, I used the functionality in the aegea package to enable automatic provisioning of a Fargate ECS cluster and its associated resources.

Running the workflow

So what does this look like in practice? To get started, we first need to create an Amazon EFS filesystem which will be used to handle inputs and outputs for our workflow, and an Amazon EC2 instance which will orchestrate the workflow. You can follow the official AWS EFS guide, or install the aegea tool and run aegea efs create wdltest --tags mountpoint=/mnt/efs followed by aegea launch wdltest --security-group aegea.efs and aegea ssh wdltest to launch and connect to the EC2 instance.

After these steps are complete, let’s install the prerequisites on the orchestrator EC2 instance:

pip3 install git+https://github.com/chanzuckerberg/miniwdl-plugins#subdirectory=aws-fargate

To configure miniwdl to use the Fargate scheduler, we need set the container_backend config option by either running

export MINIWDL__SCHEDULER__CONTAINER_BACKEND=aws_fargate

or adding the following lines to the file ~/.config/miniwdl.cfg:

[scheduler]
container_backend = aws_fargate

For our first test workflow run, let’s use the Dockstore md5 checksum test tool. Assuming the EFS filesystem is mounted on /mnt/efs, use the following command to run the workflow and produce a checksum of the /bin/bash executable:

miniwdl run https://raw.githubusercontent.com/briandoconnor/dockstore-tool-md5sum/1.0.4/Dockstore.wdl inputFile=/bin/bash --verbose --dir /mnt/efs --as-me

Behind the scenes, there are several things that the miniwdl and aegea tools are managing for us. To access the EFS filesystem, the orchestrator instance and the Fargate container must both be in a security group that provides network connectivity to it. To run a Fargate task containing the WDL task, an ECS cluster must first be created. And to stream WDL task logs while the container runs, miniwdl and the Fargate plugin must work together to monitor the runtime state. All of these things are taken care of automatically, but if you want to find out more, have a look at the miniwdl and aegea docs.

Next up

Amazon Web Services and Google Cloud Platform are the two main competitors pushing the state of the art in value-added managed cloud orchestration services. A recent post by Tim Bray sheds some light on the “cloud glue” level workflow orchestration services that Amazon and Google provide, their advantages and drawbacks. In production workloads that I am responsible for, we use AWS Step Functions to orchestrate cloud-level workflows. This approach makes the managed API itself responsible for the top level state of your workflow; you no longer have to worry about managing a central server and database (or distributed locking/state management system) to dispatch your workflows and keep track of their state, and all the headaches that go along with that. You also gain a much tighter integration between workflow management and other cloud APIs. We leverage this to great effect to string together AWS Batch jobs and AWS Lambda function executions - a powerful combination that increases the dynamic range of our workflow resource management to span from sub-second dispatch latency, 128MB RAM, fractional-core Lambda containers to 4TB RAM, 128-core instances if needed for heavy duty tasks. On the flip side, by using these APIs you do lock yourself into a more cloud-specific ecosystem. So to keep our workflows portable and available to the community, we combine the two approaches: we use Step Functions for cloud-level orchestration, and miniwdl/WDL for instance-level orchestration. Step Functions ends up being responsible for managed resource provisioning and state tracking, and miniwdl/WDL for type checking and I/O marshalling.

In my next post, I will cover the details of how we use AWS Step Functions, as well as spot (preemptible) instances - an important cost optimization tool that is now integrated into both AWS Fargate and AWS Batch.

OSS developer experiences

2020-09-21T04:19:35+00:00

I have been an open source software developer since 2005. There are many reasons why I contribute OSS, but the bottom line is that writing software that will be free forever and that others find useful brings me great joy and fulfillment. I’m sure I’m not alone in this. I’ve acquired my most important software engineering skills in the context of OSS development. I’ve been lucky enough to be mentored by very talented engineers and to mentor others in venues like GSoC as well.

And I’ve been lucky enough to author and contribute a few packages that have enjoyed modest popularity. As any developer can attest, the software that ends up being popular is not the most algorithmically or scientifically profound, nor what I consider the most valuable. And that’s fine.

A few years ago, I wrote a utility that ended up being among my more popular projects. A few months later, someone else started a similar project with different design goals, and unintentionally used the same name as mine. That’s fine, name collisions happen. We can coexist.

I don’t go out seeking controversy when doing OSS work. I also don’t do growth hacking on my OSS projects; all my projects grew their adoption organically, and I’m proud of that. So imagine my surprise when a few weeks ago someone opened an issue in my project saying that it “should be deprecated in favor of (other project)”, that I should upload the other project’s build artifacts to the package manager to replace mine, and that my project “creates problems.”

At the time I had a lot of other things to take care of urgently, so I just looked at the issue in bewilderment, closed it, and moved on. Only recently while sorting through a backlog of issues in need of attention across my OSS projects did I look back at this and realize the ridiculous insidiousness of this comment.

Open source software developers volunteer their time to contribute software for free as an unambiguous public good. To tell an OSS developer that they should delete their project is a completely unacceptable way of communicating. It’s a form of harassment, and I will call it out as such.

The main reason I bring this up here is to raise awareness. I have enough experience to blow this off and move on without a second thought, but I can easily imagine myself 10 years earlier, when some comment like this from someone I perceive to be competent might discourage me from continuing altogether.

Among the many issues in the world, this one should be relatively easy to spot and eliminate. If you see someone saying something like this or otherwise giving grief to an OSS maintainer, chime in and call it out for what it is: harassment. Perhaps the harasser will adjust their behavior, perhaps not, but in any case it can give the maintainer confidence to ignore them. You never know when this bit of support and encouragement might foster the next confident individual, and inspire their amazing contributions to society.

And perhaps think of any unsponsored open source software that you use and admire the most, and reach out to its volunteer maintainers to thank them (I know I don’t do this enough).

Effective communication patterns in engineering teams

2020-09-21T04:19:35+00:00

Over the course of my career, I have been part of several science and engineering teams. I have seen great leaders and terrible leaders, as well as junior team members who grew quickly and those who got stuck. People who led by example and inspired others, and people who drove others up the wall and caused untold suffering.

The defining factor separating the good from the bad in all of the above experiences was emotional awareness and empathetic communication.

Engineering education and professional training does not prepare new graduates for the intense teamwork required in professional settings. Except in some internship opportunities, most new graduates have very episodic exposure to working in teams. While ambitious engineers usually have a good idea of the technical achievements their next growth phase might entail (or their next promotion might require), they often have no idea or at least no clear incentive to improve their communication skills - critical for organizing their team, motivating and mentoring their colleagues, communicating product requirements to product management and executive stakeholders, and managing customer relationships (yes, engineers should play a role in that).

Effective communication is the limiting factor in most engineers’ professional growth.

Some scientists - and engineers with science training - have an advantage, as science training emphasizes written communication as well as teaching skills through TA work in graduate school. Even that advantage turns out to be relatively small, because most organizations don’t actually verbalize the success factors that enable effective communication in their engineering teams.

Google’s “people ops” (“HR department that hopefully doesn’t suck”) put together a nice website outlining some of these key success factors, but even that only just begins to summarize the key insights that engineers realize over time.

The key breakthrough in my understanding of the importance of practicing emotional awareness and empathetic communication came from my work with an engineering executive who stepped in to manage a bunch of teams as they struggled to find structure and clarity. After this woman joined the team, she did not focus on holding people to specific engineering objectives or product outcomes. Instead she advocated (but did not force) an overall project management process, and spent almost all of her time building relationships with team members without regard to their level in the reporting chain. Her calendar was always open for 1:1s, and after establishing initial rapport her next step was always to give the person a copy of the book, How to Talk So Kids Will Listen & Listen So Kids Will Talk.

The simple and profound truth is that truly empathising with each person on your team is a necessary and often sufficient condition for upleveling your communication skills to the point where they are no longer a limiting factor in your professional growth. The content of the book is just as applicable to adults as it is to children. In workplace teams, the default dynamic is to suppress the negative emotions that come from neglecting our need for empathy - with major negative consequences. If even marginally emotionally abusive behavior is normalized, the team suffers permanent damage and can no longer function properly. Conversely, empathetic, emotionally aware, and comprehensive communication empowers the team to make incremental improvements which translate to greatly increased performance over time.

The last key pattern that I have observed in successful engineering teams is a productive interaction between the customer champion, the internal product stakeholder, and the technical roadmap planner. Different teams have different names for these roles, but they are commonly referred to as a UX researcher, a product manager, and a technical product manager or scrum master. Each competency in this trio is necessary for the team to succeed, and effective, high bandwidth, low ego communication between the three is a key determinant of team productivity and happiness - of all the success factors listed on the Google website above. The customer champion must effectively and passionately represent the customer’s needs - in fact, they need to be backed up by each team member as well (this is the Amazon leadership principle of “customer obsession”). The product manager must determine product strategy, plan large long-term efforts, and steer the team through complex product improvements. The TPM or scrum master must have deep technical knowledge of the product’s infrastructure and the ability to plan tactical efforts to execute the team’s strategy. The TPM must also be an effective advocate for the team’s ability to manage technical debt (which works in a way that’s remarkably similar to sovereign debt). In fact, the third thing that the engineering executive I mentioned above did after impressing the power of empathy and establishing a PM process was to inquire about the team’s technical debt level and what it’s doing to manage it over time.

The interaction among the UX-PM-TPM trio can be evaluated to determine the level of stakeholder value alignment. Do the three parties have a shared understanding of the team’s priorities and how they will deliver customer value over time? If there is ambiguity about the answer, then they can go back to the communication first principles to improve this mutual understanding.