Engineering teams are constantly driving the implementation of new features and solutions for their customers, and who doesn’t get excited for a new adventure? Customers are excited about the journey. But when your customers are in the driver’s seat and they see a Check Engine error, they know there is something wrong. Uh oh—the journey stops there.
The last thing you want to be known for is providing a broken journey, so it is important to ensure your products are in top form to give the optimal experience. Yet, providing the time and focus for reducing errors or broken experiences, known to engineers as technical debt, is another story: there is high demand to create new work, but low capacity for maintenance.
A few years ago, I was approached by a senior leader whose teams were faced with rising tech debt, low morale and high pressure, and high costs and time for repair. As a process improvement person in a technical role, I was tapped to help the engineering teams solve the issues and stand up a culture of engineering excellence and ownership.
Through several remediation team sprints, our team was able to make meaningful impact on their tech debt.
The savings in technical debt capacity was massive:
- more than 50% reduction in technical debt
- approximately 7% reduction in change fail rate
- installation of automated fail-safe limits
- updated database delivery life cycle (DDLC) and software development life cycle (SDLC)
The financial savings were enough to fund other programs and products that are much more meaningful for the organization and its customers.
This article explains what I learned about technical debt while engaging with engineering teams and highlights why and how teams and leaders can reduce tech debt to create a smooth journey for your customers.
Why should you reduce your tech debt?
Whether we like to admit it or not, technical debt exists, and it continues to grow as new features are created. This debt results in many painful experiences, such as:
- Customer or stakeholder complaints
- High repair costs, error rates, and time to repair
- Unclear job ownership and lack of prioritization for failures
- Limited performance metrics and reactive production support
In my experience with engineering teams at Discover, I realized that the biggest impact of technical debt was to our engineers themselves. They felt pressure to constantly deliver new features, but still had to patch the current, non-optimal product. Their efforts to prioritize technical debt and modernization efforts went unheard. The team needed something to change.
How to get started
Acknowledging the need for change is a milestone in and of itself, but taking the right steps to address technical debt is another. Empowered teams and technically savvy product owners will naturally make space for tech debt solutions, and our engineers were raising their hands to do just that.
What they needed was a strategy and support, and when I met with the senior leader to discuss findings, I recommended the following approach:
- Identify leaders to lead and drive right behaviors
- Understand what is broken
- Gather a team to address the issues
Let's break these down into more detailed and actionable steps.
Identify leaders to lead and drive right behaviors
A leader can be the one to set the tone and push to get things done. In our example, we established a manager in charge of reducing tech debt. The person we selected for our engagement was passionate, collaborative, and had the technical knowledge to understand development in detail.
When selecting someone to drive your tech debt paydown, look for someone who will:
- Drive change: use their technical expertise and passion to influence measurable impact
- Think scientifically: ensure root cause identification and solution building
- Focus at all levels on reducing incidents:
- Focus on process: develop new standards and automate fail safes
- Assure quality at the source: evolve processes to prevent errors from being installed initially, through internal and external resources and vendors
- Respect every individual: create a culture of “you build it, you own it”, making everyone responsible for preventing and fixing the errors they make
Understand what is broken
After selecting a leader to tackle tech debt, we asked ourselves “Where do we start?” The Check Engine light in a car indicates that something was wrong. Our large volume of tech debt was certainly an indicator for us that we needed to address the issue, but first we had to understand it.
We researched our problem by completing the following steps that you can also use on your team:
- Gather more data: identify what needs improvement and who owns those jobs
- Improve error research: check data logs and interview people handling tech debt
- Seek perfection: create a dashboard to drive action towards repeat offenders and to allow for ownership of each team’s portfolio
Gather a team to address the issues
After having a better understanding of where the debt existed and which teams were the largest drivers of debt, our leader gathered a team to address the issues that were identified in the tech debt data review. It was important for our team to not just patch issues but to resolve them permanently and to prevent them in the future. We didn’t want to simply patch local breaks, but to improve the underlying system issues.
- Local issues in this case refer to human error, a singular break, or something not related to other breaks.
- Global issues refer to larger scale concerns, such as mis-aligned standards, poor documentation and governance, and lacking quality assurance support.
To ensure that we truly solved the root cause of our technical debt, it was important to gather the experts involved in the work and have our team drive to perfection.
We used the following three-pronged approach to address our issues. You can also use them when addressing your own tech debt.
Create a multi-tiered review flow:
Research local and global problems. Identify the root cause of the issue and review if it impacts more than the one issue.
Solve local and global issues. Identify the local solution and review if a global solution is needed.
Create constancy of purpose
Do this with a two-week remediation team sprint through tiered research layers. Week 1 is for the root cause analysis and week 2 is where a solution is identified.
As the image below shows, the different research tiers and the time devoted to them.
- Tier 1: Tuesday/Thursday review
- Tier 2: Weekly review
- Tier 3: Biweekly review
Continually resolve local and global issues, reprioritize based on impact, and establish practices and procedures to prevent errors from occurring
The image shows how this is a continual process of planning, doing, checking and acting while addressing local and global problems with local and global solutions.
Through several remediation team sprints, our team was able to make meaningful impact on their tech debt. This presented itself in many ways, but most importantly, it allowed our engineers to focus on delivering new work and reducing the impact of technical debt through local and global best practices and prevention.
My best recommendations to reducing technical debt are to set the goal for your organization, establish a leader to focus upon this effort, gather data through research to understand more, and then focus your time and constancy of purpose on perfection. Additionally, consider these other goals as well:
- Create value for customers: build more visibility into the value this provides to business partners and developers
- Assure quality at the source: continue to simplify, automate, and enhance systems to prevent errors
- Seek perfection: as repeat incident offenders decrease, focus is on reducing job run time and other SLA-related activities
- Think scientifically: Use the metadata reporting available from cloud technology to continue reduction efforts
Reducing technical debt will not only ensure a smooth, completed customer journey, but it will also improve your engineers’ morale as they are freed up to focus on forward thinking initiatives.