Engineering managers are almost entirely absent from the AI transformation discourse. There’s a structural reason for that, and understanding it is the first step to doing something about it.
Engineers write on the internet. C-suite decisions make headlines. Engineering managers absorb pressure from above, complexity from below, and produce outcomes that get credited in both directions. The system doesn’t reward the EM voice publicly. But the EM position gives you something that’s genuinely hard to replicate: accountability for what happens to the team, combined with proximity to all three layers of the problem at once.
That’s not a consolation prize. It’s a specific kind of leverage, if you decide to use it deliberately.
You’re accountable for what nobody else fully sees
Writers go where the audience is or where the authority sits. EMs are neither, which is why the playbooks keep missing them. Executives get advice that assumes frictionless implementation. Engineers get advice that assumes organizational stability. At the team level, neither holds.
The EM isn’t the only person with this view. A good Staff or Principal Engineer often has comparable exposure: technical depth, some business context, real influence on architecture decisions. In many organizations, the senior IC has more technical credibility than the EM and less organizational noise to cut through.
The difference isn’t the view. It’s the accountability. When something goes wrong at the team level (delivery slips, quality degrades, an engineer burns out, AI adoption produces incidents instead of velocity), the EM is the one who carries it. That asymmetry is uncomfortable. It’s also what makes the EM’s perspective structurally different from everyone else’s. You don’t just see the intersection where the playbooks break down. You’re responsible for what happens there.
The question isn’t whether that position is valuable. It is. The question is whether you’re using it actively or just absorbing it quietly.
The numbers give you something to work with, if you translate them
49% of tech employers expect to use AI to reduce headcount by 2029 1. 87% of developers don’t view AI as an immediate threat to their employment 1.
Leadership and the team are operating in different realities. That gap lands on you. But before we get to what to do with it, there’s a harder truth worth naming: if the executive mandate really is to cut headcount, no engineering metric will stop it. The EM who walks into that conversation armed with first-pass acceptance rates and post-merge rework data will be read, correctly, as a middle manager defending their headcount.
So the first move isn’t metrics. It’s understanding what conversation you’re actually in.
If the goal is genuinely better AI adoption, not just faster adoption, then the data is useful. What the productivity numbers leave out is that only a third of developers report significant gains from AI investments 2. The actual bottlenecks (knowledge fragmentation, legacy systems, undocumented architecture, the particular flavor of technical debt in every mature codebase) don’t respond to code generation. Research shows a 25% increase in AI adoption is associated with a 1.5% decrease in delivery throughput and a 7.2% decrease in system stability 3, largely because faster generation produces pull requests that are larger, harder to review, and more likely to break things downstream.
That data is useful in the right framing. Not “we need to slow down AI adoption” (that’s a losing argument). But: “here’s what adoption without governance is costing us in incident rate and rework, and here’s what it would cost to fix it.” Incidents have a price. Rework has a price. Attrition (which follows when engineers feel responsible for code they didn’t fully understand) has a price. Those are numbers a C-suite can act on.
Martin Fowler’s work on AI-assisted development points at a useful metric shift: instead of measuring time to first output (what leadership tends to track), measure first-pass acceptance rate, iteration cycles per task, and post-merge rework 4. The value of those numbers isn’t in presenting them as engineering KPIs. It’s in translating them: “every 10% increase in post-merge rework costs us roughly X engineer-weeks per quarter.” That’s a margin conversation, not a methodology conversation.
Make the invisible track visible, in business terms
The Harness State of Software Delivery 2025 documents a pattern that’s become structural 5. Developers who use AI to rapidly ship features get noticed by non-technical leadership. The same prompting loop that generates fragile code can generate fast hotfixes for the incidents that code creates. From far enough away, that looks like responsiveness.
The engineer who spent three weeks preventing five future incidents has nothing to show for it.
This isn’t a new problem. Engineering organizations have always struggled to see the person who quietly keeps things from breaking. AI has made the visible track faster and cheaper, which widens the gap. The EM sees both tracks. The intervention isn’t explaining this to leadership in engineering terms: it’s translating it into risk.
The practical move: attach a cost to the invisible work. Not in sprint points or engineering hours, but in incidents avoided, in on-call load, in customer impact. “This refactor eliminated the class of failures that caused three P1s last year” is a sentence leadership can evaluate. It’s not a guarantee they’ll fund the next one. But it’s the only version of the argument that has a chance. Prevention work that gets described in engineering terms stays invisible. Prevention work described in business terms at least enters the conversation.
Help your engineers through the identity shift, without going backwards
A 2025 PNAS study: developers who use AI receive competence ratings 9% lower for identical output, with the penalty worse for women 6.
The stigma shapes behavior before it shows up in surveys. Engineers use the tools privately, skip mentioning it in reviews, don’t raise it in 1:1s. A longitudinal study of GitHub developers found that AI adoption reduced peer collaboration by nearly 80% 7, not because people stopped caring, but because the stigma quietly restructured how work gets done.
The practical consequence: knowledge stops moving the way it used to. For senior engineers, working alone with AI can feel like focus. For junior developers, it’s the disappearance of the informal learning channel: the questions in the flow of work, the review comments that actually taught something, the proximity where judgment develops without anyone formally teaching it.
The temptation is to reconstruct 2019. Structured sessions with AI turned off, pair programming as it used to be. It’s understandable and it won’t work. You can’t un-generate the tools, and pretending you can produces resentment, not learning.
The real problem isn’t that junior developers are writing less code. It’s that they’re not developing the judgment to evaluate the code that gets generated. That’s a different skill, and it requires a different training environment. Not blackboards, but deliberate exercises that assume generation and stress-test evaluation: why is this design wrong?, what does this code not handle?, where would this break at scale? Threat modeling, architecture review, debugging sessions that start from broken AI output. The goal isn’t less AI. It’s building the judgment that makes AI useful instead of dangerous.
Make AI use discussable in your team. Run a session where people share how they actually use it, what works, what doesn’t. The stigma dissolves when the conversation is normalized. And when it’s normalized, you get the signal early when something isn’t working, before it surfaces as a performance issue or a production incident.
Own the open questions, and admit what you don’t know
The discourse treats EMs as stable context. We’re not.
How do you assess a PR when significant portions weren’t written by a human? How do you calibrate expectations when the productivity baseline is moving? What does it mean to evaluate engineering judgment when the judgment being exercised is mostly about directing and validating an agent? How do you interview for technical competence when the candidate can solve any algorithm in real time with the right prompt?
These questions don’t have industry consensus yet. But here’s the honest version: they don’t have good answers yet, and that includes the EMs who are closest to them. We’re all improvising. The teams that will come out of this well won’t be the ones who found the right framework first. They’ll be the ones whose EMs were honest about the uncertainty, made their working assumptions explicit, and updated them as they learned, instead of projecting confidence they didn’t have.
Add the budget question to the list. Individual developer AI usage can spike to thousands of dollars per month in agentic workflows without governance 8. Tokenmaxxing (high-volume consumption driven by visibility rather than output) became widespread enough that Meta built public leaderboards, awarded “Session Immortal” titles to top spenders, and abolished the system after it contributed to production incidents. A simple policy built now (approved tools, token expectations, review standards for AI-generated code) is infinitely better than no policy. You don’t need a perfect framework. You need a starting point your team can reason against.
On governance more broadly: Fowler describes AI assistants as “junior developers with infinite energy but zero context.” 4 The frustration loop most teams hit (generate code, review it, find it doesn’t fit the codebase, regenerate with corrections, repeat) happens because the AI defaults to generic internet patterns instead of your team’s standards. The fix is encoding team knowledge explicitly: architecture decisions documented, conventions written down, review standards articulated.
That’s real and valuable. It’s also a lot of work that competes directly with performance cycles, hiring, alignment meetings, and the production incidents that AI is simultaneously generating more of. There’s no clean answer here. The realistic version isn’t “map everything.” It’s “pick one standard, document it, use it as AI context, and treat every documented convention as a compounding investment.” It never feels urgent enough. It always matters.
The job nobody prepared you for
This is a harder version of engineering management than the one most people signed up for. Closing the gap between what leadership believes about AI productivity and what’s actually happening at the team level, holding space for engineers going through a real identity shift, building governance structures that didn’t exist before, navigating your own transition at the same time, and doing all of it without a playbook, because the playbook doesn’t exist yet.
That’s not a complaint. It’s a more interesting job than the previous version. And it’s worth being honest about what it actually is, instead of either pretending it’s manageable or waiting for someone to acknowledge how hard it is.
Nobody will. That’s not how this works.
Invisible work doesn’t get resourced unless you name it in terms that move decisions. The view from the middle of this transition is the most complete one in the organization. Use it. And be honest about what you don’t know yet.
Footnotes
-
“Why developers and their bosses disagree over generative AI”, LeadDev. https://leaddev.com/ai/why-developers-and-their-bosses-disagree-over-generative-ai ↩ ↩2
-
Atlassian State of Developer Experience 2025. https://www.atlassian.com/teams/software-development/state-of-developer-experience-2025 ↩
-
Google Cloud / DORA, “Announcing the 2024 DORA Report.” https://cloud.google.com/blog/products/devops-sre/announcing-the-2024-dora-report. 2025 data shows throughput improved with stronger practices; stability remains negatively correlated. https://cloud.google.com/blog/products/ai-machine-learning/announcing-the-2025-dora-report ↩
-
Martin Fowler, “Patterns for Reducing Friction in AI-Assisted Development.” https://martinfowler.com/articles/reduce-friction-ai/ ↩ ↩2
-
Harness State of Software Delivery 2025. https://www.harness.io/state-of-software-delivery ↩
-
Reif, Larrick & Soll, “Evidence of a social evaluation penalty for using AI,” PNAS, May 2025. https://www.pnas.org/doi/10.1073/pnas.2426766122 ↩
-
“The Impact of Generative AI on Collaborative Open-Source Software Development: Evidence from GitHub Copilot,” arXiv:2410.02091. https://arxiv.org/pdf/2410.02091 ↩
-
Enterprise AI spending patterns, industry sources (2025/2026). Meta leaderboard detail cited in secondary research; primary source not independently verified. ↩