Skip to main content

Go Slow to Go Fast: The New AI Playbook for IT Infrastructure

|

0 min read

Learn how Forcepoint helps safely enable GenAI
  • Jonathan Knepher

Note: This is post #4 of Forcepoint’s 2026 Future Insights series , providing predictions and analysis of developing shifts in the cybersecurity landscape.

###

For decades, IT infrastructure teams have been told to move faster. Deploy faster, patch faster, modernize faster and do everything at a pace that matches rising business pressure. That mindset once helped, but in 2026 it will be one of the biggest drivers of instability, outages and technical debt.

AI changes this reality in a surprising way. It amplifies whatever foundations you already have in place. Plug it into messy architectures and rushed change processes and it will accelerate the wrong things. Slow down long enough to strengthen fundamentals like change control, run thorough technical reviews and empower your teams to act and AI becomes an engine for safe speed at scale.

This last post in our 2026 Future Insights series looks at a quieter revolution. The fastest organizations in 2026 will be the ones that first learn how to move slow in order to ultimately move fast in terms of IT infrastructure.

What “go slow to go fast” really means for infrastructure teams

Go slow to go fast does not mean red tape for its own sake. It means investing early in understanding, testing and governance so you can move with confidence later.

In change control that shows up as:

  • Taking time to understand how systems really behave before automating them
  • Designing change windows, rollout patterns and rollback plans before the first push
  • Treating every incident or outage as a lesson that refines the next change

The payoff is not theoretical. Teams that embrace this approach ship changes more often with fewer emergency rollbacks and shorter downtime. They earn the right to move fast because they already did the slow work when it mattered.

This is the context I live in every day as a VP in Site Reliability Engineering working across a global infrastructure environment. My experience spans on-prem data centers, private cloud platforms and hybrid environments where those worlds meet. Across all of them, I keep coming back to the same framework based on three key pillars.

Pillar 1: Fundamentals first, every time

The first pillar is deceptively simple. When something breaks or when you plan a big change, start with fundamentals.

In practice that means:

  • Start from the fundamentals and move up the stack. Check power, connectivity and basic system health before chasing ghosts or misleading results higher up the stack
  • Review and evaluate recent changes and dependencies before inventing new theories
  • Investigate to clearly understand what the systems and monitors are telling you, don’t make assumptions from limited data
  • Use monitoring, alerting, and runbooks that force a bottom up checklist before major escalation

I have seen this play out in both directions. In one case, the instinct was to trigger a failover for a critical service. Slowing down to check fundamentals revealed an issue with a single host. A rushed failover would have multiplied the blast radius. A methodical checklist kept the incident contained, ensuring it did not have customer impact.

Fundamentals first also applies before changes. When my teams spend time mapping dependencies, understanding baseline performance and reviewing historical incidents, we design better rollouts. We know which metrics matter. We know what failure looks like. That knowledge makes later changes and automation far safer.

Over the next few years, this kind of discipline will separate resilient AI era infrastructure from fragile stacks that crumble under automation. AI can help surface patterns in logs and metrics. It cannot replace the human habit of asking basic questions before taking big actions.

Pillar 2: Empower the people closest to the problem

The second pillar is ownership. The people closest to the problem must feel empowered to act, not just open tickets and wait.

Earlier in my career I worked in environments where operations teams saw themselves as traffic cops. When something broke they paged engineering, watched the clock and hoped. That model collapses under modern complexity. Every escalation chain adds minutes of delay and confusion.

In a go slow to go fast culture, operations and SRE teams:

  • Log into systems to investigate instead of fearing they lack authority or might make it worse
  • Read logs, correlate alerts and form a first pass hypothesis
  • Have clear guidelines for when they can fix, when they must roll back and when to escalate

This does not remove engineering. It reserves engineering for the problems that truly require complex analysis and review. The people on the front line are trusted to handle the rest.

That trust speeds everything up. Incidents are triaged faster. Simple issues are fixed without long war rooms. Engineers spend more time improving systems and less time firefighting. Those improved systems have fewer issues.  Most importantly, the same people who run the system day to day are deeply involved in planning and rehearsing major changes.

As AI copilots arrive in consoles and runbooks, this empowerment becomes even more important. A team that already owns decisions will use AI as a partner, able to verify what is and isn’t appropriate to action. A team that has learned to wait for instructions will treat AI as the new boss. Only one of those paths leads to safe operations and sustainable speed.

Pillar 3: Let observability and automation carry the speed

The third pillar is where the “fast” finally appears. Once fundamentals and ownership are in place, you let observability and automation carry the speed.

My teams have spent months reinstrumenting our systems. We audited old checks, rewired dashboards and rebuilt alerts so that each signal meant something. We consolidated on a modern observability stack instead of a sprawl of disconnected tools. Only then did we implement and enable automatic failover and self-healing for well understood failure modes.

The result is a system where most issues are resolved before customer impact. Alerts trigger runbooks. Known patterns trigger safe automations. Humans focus on the high-risk edge cases and only have to verify minor alerts.

Automation is not the starting point. It is the reward for doing the slow work first.

Why this matters across on-prem, private cloud and hybrid

One misconception about go slow to go fast is that it belongs only in traditional data centers. The opposite is true.

On-prem and self-hosted environments have long hardware lead times and manual work. Slowing down here means taking capacity planning seriously, testing failover paths in the physical environments and rehearsing big changes. The payoff is fewer surprise shortages and upgrades that do not require all hands on deck.

Private cloud environments add another layer. You get the flexibility of software defined infrastructure, but you still own the racks, networks and core services. Slowing down here means hardening your platform services, validating multitenant impact of changes and aligning your internal platform roadmap with what application teams really need. The payoff is a platform teams trust instead of a fragile internal product they route around.

Hybrid environments multiply complexity. Workloads, identities and data move across trust boundaries between on-prem, private cloud and any public cloud services you rely on. Slowing down here means mapping dependencies carefully, tightening approvals for cross boundary changes and limiting blast radius when something goes wrong. The payoff is the ability to migrate, re platform or extend services without weeks of change freeze.

In all of these worlds, the same three pillars apply:

  • Understand the system.  
  • Empower the people closest to it.  
  • Let observability and automation deliver the speed.

How AI changes the speed equation

AI sits on top of these foundations. It can sharpen them or shatter them.

For me, there are two near-term areas where AI can help infrastructure teams move slow in the right places so they can move faster overall.

First is alert noise. In busy periods my teams may see an alert every minute. Today, humans do most of the correlation work. AI can help by batching related alerts, clustering them into single incidents and highlighting the few that truly threaten availability or security. Instead of reacting to a stream of red, teams see a focused set of AI prioritized problems tied to likely root causes.

Second is capacity planning, especially for on-prem, private cloud and other self-hosted infrastructure situations. Today, people focus on graphs in apps like Grafana or spreadsheets to decide when to buy and deploy new hardware or how to scale shared clusters. AI models can forecast demand, spot seasonal patterns and recommend when and where to add capacity. They can also propose smarter placement of workloads between on-prem and private cloud to manage cost and risk.

Looking a bit further ahead, AI will play a bigger role in change simulation. Models can analyze historical incidents and configuration diffs to assign risk scores to proposed changes. They can suggest smaller batches, safer sequences and more realistic test scenarios. Digital twins and high-fidelity staging environments will let teams rehearse major changes before they ever reach production, whether those changes land in a data center, a private cloud cluster or a hosted control plane.

All of this helps only if the slow work came first. Dirty telemetry, weak alerting and lack of production unit tests produces bad AI recommendations. Weak governance turns AI automation into a new way to break things at scale. Go slow to go fast in the AI era means validating AI suggestions with human judgment, documenting guardrails and logging every AI-assisted change for later review.

What leaders should do now for 2028

By 2028, the gap between organizations that mastered this shift and those that did not will be obvious. The winners will not be the ones that moved fastest in 2025. They will be the ones that slowed down long enough to give AI something trustworthy to accelerate.

Three moves stand out.

First, codify these three pillars. Make fundamentals, ownership and observability explicit expectations in your teams. Put them in runbooks, performance plans and architecture reviews.

Second, audit your change pipeline. Identify the riskiest on-prem, private cloud and hybrid changes. Look for spots where you are hoping for the best instead of testing for the worst. Insert slower, more deliberate checks there, then measure how much rework and unplanned downtime you avoid.

Third, invest in clean telemetry before you invest in AI on top of it. The more clearly you can see your systems, the more useful AI becomes in forecasting, correlating and simulating.

The 2026 Future Insights series has explored how AI is reshaping threats, securing new digital actors and getting ahead of AI technical debt. The final insight is simple. In infrastructure, AI does not reward reckless speed. It rewards the leaders who move slow with intention so they can move fast with confidence at the appropriate time. 

  • jonathan-knepher

    Jonathan Knepher

    Jonathan Knepher has a deep background in security, technology, and corporate development, he has been instrumental in advancing Forcepoint's cybersecurity solutions.

    Read more articles by Jonathan Knepher

X-Labs

Get insight, analysis & news straight to your inbox

To the Point

Cybersecurity

A Podcast covering latest trends and topics in the world of cybersecurity

Listen Now