Software engineering is not moving toward a future where developers simply "use AI." It is moving toward a future where developers manage agents.
That distinction matters.
The next major shift in software work is not only about code generation. It is about orchestration, delegation, verification, context management, cost control, security, and judgment. Developers will still need hard technical skills, but the shape of the job is changing. The developer is becoming less like a person typing every line by hand and more like the smallest possible engineering organization: someone who can define work, delegate execution, verify output, manage constraints, and decide what deserves to ship.
This is not just a philosophical claim. The tooling market is already moving there. OpenAI describes Codex as a cloud-based software engineering agent that can write features, answer questions about a codebase, fix bugs, and propose pull requests in isolated environments. GitHub's Copilot cloud agent can research a repository, create an implementation plan, make changes on a branch, and let the developer review the diff before opening a pull request. Anthropic describes Claude Code as an agentic coding system where engineers focus more on architecture, product thinking, continuous orchestration, managing agents, giving direction, and making decisions.
At the same time, the evidence does not support the naive claim that AI automatically makes every developer faster in every context. GitHub's Copilot experiment found that developers completed a JavaScript HTTP server task 55.8% faster with Copilot. But a 2025 METR randomized controlled trial found that experienced open-source developers working in mature repositories they knew well took 19% longer when using early-2025 AI tools.
So the right conclusion is not:
AI makes software engineering easy.
The better conclusion is:
AI changes the shape of software engineering work. Developers who learn to manage agents well will compound faster than developers who treat AI as magic.
DORA's 2025 report supports this more nuanced view. It found that AI adoption among software development professionals reached 90%, with a median of two hours per day spent working with AI, and more than 80% of respondents reporting productivity gains. But DORA also frames AI as an amplifier: it magnifies an organization's existing strengths and weaknesses rather than automatically improving the system around it.
That is the center of this thesis.
AI is not replacing engineering judgment.
It is making engineering judgment more important.
What I Mean by "Agent"
When I say agent, I do not mean a chatbot inside a product.
I mean a persistent software coworker: a loop around a model, tools, memory, preferences, context, skills, and permissions.
People often imagine the future as humanoid robots doing work for us. They know our names, remember our preferences, help with tasks, and become part of our daily lives. But the first useful version of that may already exist — just without a physical body.
An agent is not just a model.
An agent is not just a chat window.
An agent is not just autocomplete.
An agent is a system that can reason, plan, remember, use tools, and act on behalf of a user. Google Cloud defines AI agents as software systems that use AI to achieve goals and complete tasks for users, with reasoning, planning, memory, autonomy, and multimodal capabilities. IBM similarly describes AI agents as systems that perform tasks autonomously through workflows using available tools, while adapting to user expectations over time through memory and planning.
The model can change. The interface can change. The tools can change.
But the agent should remain.
That is the key distinction.
Most AI usage today is still transient. You open a session, explain yourself, ask for help, get an output, close the tab, and then repeat the same setup somewhere else. Notion has its assistant. Linear has its assistant. Slack has its assistant. The IDE has its assistant. The terminal has its assistant.
That is useful, but it is not the ideal end state.
The stronger version is one agent above all products.
The same agent should be able to use Slack, GitHub, Notion, Linear, the terminal, the browser, CI, dashboards, documentation, and internal tools — the same way a human coworker would.
The product wants the assistant to belong to the product.
The developer wants the agent to belong to the developer.
That tension will shape the agent era.
Why Chat Matters
The terminal matters. The IDE matters. Dashboards matter.
But companies coordinate in chat.
Slack, Teams, Discord, Telegram, WhatsApp, or whatever a company uses — that is where people already dump context. Screenshots, logs, PDFs, CSVs, customer issues, priorities, decisions, disagreements, and half-formed thoughts already live there.
So the agent should not be trapped in chat, but it should be reachable from chat.
You mention it, delegate work, ask for memory, request a summary, check progress, or tell it to use another tool. Then it goes to the browser, the terminal, the repository, the docs, the ticket system, CI, or wherever the work needs to happen.
The terminal is where execution often happens.
The IDE is where code changes often happen.
Chat is where coordination happens.
That distinction matters.
The end goal is not a terminal-only agent, a Jira clone, or one assistant per SaaS product. The end goal is a persistent agent that can participate where humans coordinate and act where work needs to be done.
Recommendations for Developers
1. Every Developer Should Have an Agent to Call Their Own
A company should not only give a developer a laptop, GitHub access, Slack, Jira, Linear, production credentials, and documentation.
It should give them an agent.
Not a generic assistant.
Their agent.
A developer's agent should know their working style, technical preferences, review standards, recurring tasks, communication habits, and current project context. It should remember what the developer likes, what they dislike, what worked before, what failed before, and what standards they care about.
This does not mean the agent replaces the developer. It means the developer gains a persistent execution and memory layer.
GitHub's Copilot cloud agent already points in this direction: it can research a repository, create implementation plans, fix bugs, implement incremental features, improve test coverage, update documentation, address technical debt, and work in an ephemeral development environment with tests and linters.
The important shift is this:
The agent belongs to the developer.
The tools are places where the agent acts.
That is different from a product-specific assistant.
A Notion assistant knows Notion.
A Linear assistant knows Linear.
A Slack assistant knows Slack.
A developer's agent should use all of them.
2. Shift Up or Shift Down
Agents put pressure on the software engineering org chart.
Either developers shift up and begin acting more like managers, or they shift down by gaining an execution layer underneath them.
In practice, both happen.
A developer with agents defines work, delegates tasks, reviews output, manages context, controls cost, verifies results, and decides what gets shipped.
That is management, even if HR still calls the person "Software Engineer."
Anthropic's own framing around Claude Code is explicit: engineers focus on architecture, product thinking, continuous orchestration, managing multiple agents, giving direction, and making decisions.
So the recommendation is simple:
Learn management skills before your title changes.
Developers will need to specify work clearly, decompose problems, assign tasks, review results, maintain alignment, and decide when output is good enough.
The job does not become less technical.
It becomes technical management at a smaller scale.
3. Think Linearly, Not Magically
A lot of agent orchestration tools imply a fantasy workflow:
- Set a big goal.
- Spawn many tasks.
- Let agents run in parallel for hours.
- Come back to finished work.
That will sometimes work for bounded tasks.
It should not be the default mental model.
Software development has hidden context, unclear constraints, shifting priorities, legacy behavior, product judgment, and integration risk. Human teams already struggle with multitasking. Agents do not magically remove that.
Even in a team of three to six people, the work usually converges around one goal: a feature, a release, an incident, a customer problem, or a product bet.
We are not built for infinite parallelism.
We are built for focus.
The mixed productivity evidence supports this disciplined view. Copilot helped developers complete a bounded JavaScript task 55.8% faster, but METR found that early-2025 AI tools slowed experienced developers by 19% on mature codebases they already knew.
That does not mean agents are useless.
It means agent work needs structure.
A better sequence is:
- Plan.
- Clarify.
- Implement.
- Test.
- Review.
- Refine.
- Ship.
Agents can help at every step.
But the human still preserves direction.
Use agents to increase focus, not fragment it.
4. You Are Responsible for the Agents You Manage
Agents do not remove responsibility.
They redistribute execution.
If an agent opens a pull request under your direction, you are responsible for that pull request. If it introduces a bug, leaks data, breaks a contract, writes insecure code, or creates technical debt, "the agent did it" is not a serious excuse.
Delegation is not abdication.
The security evidence is clear enough to be cautious. A Stanford-led study found that participants with access to an AI coding assistant wrote significantly less secure code than those without one, and were also more likely to believe their code was secure.
That is the dangerous part.
Agents can increase output and confidence at the same time.
But confidence is not correctness.
So developers should treat agent autonomy as a permission ladder:
- Read-only access.
- Local write access.
- Branch write access.
- Pull request creation.
- Limited automation.
- Production access only in exceptional, tightly controlled cases.
IBM's overview of AI agents also emphasizes that high-impact actions should require human approval and monitoring.
A great agent-native developer is not someone who blindly trusts agents.
It is someone who can delegate aggressively and still remain accountable.
5. Hard Skills Matter More, Not Less
There is a dangerous idea that agents make technical depth less important.
The opposite is more likely.
If you manage agents, you need enough hard skill to know when they are wrong.
You need to know what guardrails the application needs:
- linting
- type checks
- tests
- benchmarks
- vulnerability scans
- dependency checks
- monitoring
- permission boundaries
- rollback plans
- data privacy constraints
- observability
- reproducible environments
If the agent writes code that passes the happy path but breaks edge cases, you need to catch it.
If it adds a dependency that creates supply-chain risk, you need to catch it.
If it writes code that is clever but unmaintainable, you need to catch it.
If it changes behavior without understanding the product contract, you need to catch it.
DORA's 2025 report is useful here because it frames AI as an amplifier of existing strengths and weaknesses. Weak testing, weak documentation, weak architecture, weak ownership, and weak review do not disappear when AI enters the workflow. They can become more expensive because AI can generate more work faster.
Agents increase output.
Hard skills protect quality.
6. Build Verification Into Everything
Agents are persuasive.
That is dangerous.
They can produce code that looks correct, tests that look reasonable, explanations that sound confident, and architectural justifications that sound mature.
But plausibility is not correctness.
So the agent-native developer must become obsessed with verification.
Do not only ask:
Can the agent build this?
Ask:
How will I know it built the right thing?
Every serious agent workflow needs checks:
- tests
- type systems
- linters
- security scanners
- dependency audits
- benchmarks
- assertions
- evals
- preview environments
- human review
- production monitoring
OpenAI's Codex documentation says Codex can run commands including test harnesses, linters, and type checkers, and that agents perform best with configured development environments, reliable testing setups, and clear documentation.
That is the correct direction.
The future does not belong only to people who prompt well.
It belongs to people who verify well.
7. Manage Context, Memory, Tokens, and Cost Like Engineering Resources
Part of the developer's job is now managing token usage, model choice, reasoning levels, context length, latency, and cost.
Models are good enough that you do not need the latest and greatest model for everything.
A sensible workflow might look like this:
- Use a frontier model with high reasoning for planning.
- Use a cheaper model for execution.
- Use a mid-tier model for review.
- Use deterministic tools for tests, formatting, linting, vulnerability checks, and benchmarks.
Not every task needs maximum intelligence.
Some tasks need deep reasoning.
Some need fast execution.
Some need narrow context.
Some need broad context.
Some need memory.
Some need retrieval.
Some need a cheap model and a good test suite.
This is becoming a real product and budget concern. GitHub's Copilot documentation explicitly includes usage costs for cloud agents, and GitHub's feature set includes metrics for tracking pull requests created by Copilot cloud agent, merged pull requests, and median time to merge.
So context and cost are no longer abstract concerns.
They are part of software engineering practice.
Context is a budget.
Memory is a budget.
Reasoning is a budget.
Latency is a budget.
Money is a budget.
A junior agent user dumps everything into the context window and asks the strongest model to figure it out.
A senior agent-native developer gets strong results under constraints.
8. Make Your Work Agent-Readable
If agents are going to work with us, our systems need to become easier for agents to understand.
That means:
- clearer READMEs
- stronger tests
- explicit architecture decision records
- typed interfaces
- stable APIs
- useful logs
- searchable documentation
- smaller pull requests
- consistent conventions
- clear repository instructions
- reproducible development environments
This is not just theory. Agent-friendly conventions like instruction files, reproducible dev environments, and explicit documentation are already becoming a practice — not because a vendor mandates them, but because they make the difference between an agent that flounders and one that delivers.
The goal is not bureaucracy.
The goal is legibility.
A codebase that is easier for agents to understand is usually easier for humans to understand too.
9. Do Not Confuse Hype Cycles With Category Importance
Some agent tools are overhyped.
Some demos are exaggerated. Some products will disappear. Some wrappers will become irrelevant. Some platforms will turn out to be thin orchestration layers around existing models.
That does not make the category unimportant.
Tools like OpenClaw, Hermes Agent, Claude Code, Codex, OpenCode, Cursor, and similar systems are early attempts at something larger: supervised, persistent, tool-using software labor.
Jensen Huang's comments about OpenClaw are useful here as a market signal, not as proof. Reuters reported that Huang compared OpenClaw's rapid rise to Linux and described it as a personalized operating system of AI agents that act for users. Reuters also quoted Huang saying that every company needs an OpenClaw strategy and that "this is the new computer." NVIDIA's own announcement described NemoClaw as a stack for the OpenClaw agent platform with privacy and security controls for autonomous AI agents.
That does not prove OpenClaw specifically wins.
It does suggest that major infrastructure companies are treating agent runtimes as a new computing layer, not merely as developer toys.
Maybe terminal-first tools are transitional.
Maybe chat-native agents win.
Maybe local agents matter more than cloud agents.
Maybe the real moat is memory.
Maybe the real moat is tool access.
Maybe the real moat is verification.
Maybe the real moat is trust.
But the category is real.
Messy does not mean irrelevant.