25 DevOps Interview Questions You Must Know in 2026

Q: Do DevOps interviews include coding challenges?

Often, yes. Expect scripting questions in Bash or Python -- parsing logs, automating a task, writing a simple health check. Some companies also include system design rounds where you architect a deployment pipeline or a monitoring system. It's less LeetCode and more practical problem-solving.

DevOps interviews are a different beast. You won’t just whiteboard algorithms or debate REST vs GraphQL. They’ll ask you to explain how you’d deploy something to production at 3 PM on a Friday, what happens when the pipeline breaks mid-release, and whether you’ve actually been paged at 2 AM for a service you built yourself.

I’ve sat through dozens of these — on both sides of the table. The pattern is consistent: interviewers aren’t testing what you’ve memorized. They’re testing what you’ve operated. Broken pipelines, flaky tests, containers that refuse to start, alerts that fire at the worst possible moment. That’s the real curriculum.

These 25 questions keep showing up across DevOps interviews at startups, scale-ups, and big tech. They’re grouped by domain so you can focus on where you’re weakest. For each one, I’ll tell you what the interviewer is actually looking for and how to structure an answer that lands.

CI/CD — The Pipeline Is the Product

If there’s one area that comes up in every single DevOps interview, it’s CI/CD. Not just “do you know what it stands for” — but can you design, debug, and defend a pipeline under pressure.

1. What’s the difference between continuous integration, continuous delivery, and continuous deployment?

What they’re looking for: Whether you understand the spectrum, not just the acronyms. CI means merging and testing code frequently. Delivery means the artifact is always deployable but a human clicks the button. Deployment means it goes to production automatically.

Answer tip: Don’t just define them. Say which one you’ve actually practiced and why. “We did continuous delivery because our compliance team needed a manual gate before production” is ten times better than a textbook definition.

2. How would you design a CI/CD pipeline from scratch for a new microservice?

What they’re looking for: Structured thinking. Can you reason through stages — lint, unit tests, build, integration tests, security scan, artifact push, staging deploy, smoke tests, production deploy?

Answer tip: Walk through it sequentially. Mention what blocks the pipeline at each stage. Interviewers love hearing about fail-fast design: “If linting fails, nothing else runs. No point building a broken artifact.”

3. A deployment just failed in production. Walk me through your rollback strategy.

What they’re looking for: Operational maturity. Do you panic or do you have a plan? Blue-green, canary, feature flags, database migration rollbacks — they want specifics.

Answer tip: Describe a real rollback you’ve done. The messier, the better. “We had a database migration that couldn’t be reversed, so we used a feature flag to disable the new code path while we wrote a forward-fix.” That kind of story sticks.

4. How do you handle secrets in a CI/CD pipeline?

What they’re looking for: Security awareness. Hardcoded secrets in YAML files is still shockingly common. They want to hear about vault integrations, environment-scoped variables, rotation policies.

Answer tip: Name the specific tools you’ve used — GitHub Actions secrets, HashiCorp Vault, AWS Secrets Manager. Then mention what you’d never do: commit secrets to git, log them in pipeline output, pass them as build arguments in Docker.

5. How do you deal with flaky tests in a pipeline?

What they’re looking for: Pragmatism. Everyone has flaky tests. The question is whether you quarantine them, track flakiness metrics, retry with caution, or just re-run the pipeline and hope.

Answer tip: “We tagged known flaky tests and ran them in a separate non-blocking stage while we fixed the root causes. We tracked flake rate weekly. If a test was flaky for more than two sprints, we either fixed it or deleted it.” That’s an answer that shows process maturity.

Containers & Orchestration — Where Theory Meets Triage

Docker and Kubernetes dominate this section. If you’ve only done tutorials, this is where it’ll show.

6. What happens when you run `docker run`?

What they’re looking for: Depth. Do you know about image layers, the container runtime, namespaces, cgroups? Or do you just know the command?

Answer tip: Walk through it layer by layer. “The Docker daemon pulls the image if it’s not cached, creates a writable container layer on top of the image layers, sets up namespaces for process isolation and cgroups for resource limits, then starts the entrypoint process.” That level of detail separates operators from tutorial followers.

7. How is a Docker image different from a container?

What they’re looking for: A clear mental model. An image is a read-only template. A container is a running instance of that template with a writable layer on top.

Answer tip: Use an analogy if it helps — “An image is a class, a container is an instance” — but follow it immediately with something concrete. “I can run five containers from the same image, each with its own state, logs, and network identity.”

8. Explain how Kubernetes scheduling works.

What they’re looking for: Whether you understand the control plane. The API server accepts the pod spec, etcd stores it, the scheduler evaluates node fitness (resources, taints, tolerations, affinity rules), and the kubelet on the chosen node pulls the image and starts the container.

Answer tip: Mention a real scheduling problem you’ve hit. “We had pods stuck in Pending because our node pool didn’t have enough memory headroom. Adding resource requests and limits fixed the scheduling, but it also exposed that we’d been over-provisioning CPU and under-provisioning memory for months.”

9. How would you debug a pod stuck in CrashLoopBackOff?

What they’re looking for: A systematic debugging approach. Not just “check the logs” — but a sequence: kubectl describe pod, check events, kubectl logs (including previous container), exec into a sidecar, check readiness/liveness probes, review resource limits.

Answer tip: Frame it as a checklist you actually follow. Interviewers want to see that you won’t flail when something breaks in production. Bonus points for mentioning that CrashLoopBackOff often means the application is failing, not Kubernetes.

10. When would you NOT use Kubernetes?

What they’re looking for: Judgment. Kubernetes is powerful but expensive in complexity. If you’re running three services on a small team, ECS or even plain Docker Compose might be the right call.

Answer tip: This is a maturity question. “Kubernetes makes sense when you have enough services and enough team members to justify the operational overhead. For a small team shipping two services, I’d pick ECS Fargate or Cloud Run. Less to manage, faster to ship.” Interviewers love candidates who know when not to reach for the big tool.

Infrastructure as Code — Reproducibility or Regret

IaC questions test whether you’ve felt the pain of manual infrastructure changes and emerged with discipline.

11. Why is Infrastructure as Code important?

What they’re looking for: Not a sales pitch for Terraform. They want to hear about reproducibility, audit trails, collaboration, drift detection, and disaster recovery.

Answer tip: Tell a horror story. “We had an environment that was manually configured over two years. When we needed to replicate it for a compliance audit, it took three engineers two weeks. After we moved to Terraform, spinning up a new environment took 20 minutes.” Stories like that are why IaC exists.

12. How do you manage Terraform state, and what can go wrong?

What they’re looking for: Operational depth. State locking, remote backends (S3 + DynamoDB, Terraform Cloud), state corruption, import of manually created resources, state file secrets exposure.

Answer tip: Mention a state-related problem you’ve hit. “We once had two engineers run terraform apply at the same time before we set up state locking. One apply overwrote the other’s changes. We lost a NAT gateway and had to import it back manually.” That’s the kind of scar tissue interviewers respect.

13. What’s the difference between Terraform and Ansible?

What they’re looking for: Understanding of declarative vs imperative, provisioning vs configuration management. Terraform creates infrastructure. Ansible configures what’s running on it. They overlap, but the sweet spot is different.

Answer tip: “Terraform tells the cloud what should exist. Ansible tells machines how to be configured. I’ve used both together — Terraform provisions the EC2 instances, Ansible installs and configures the application. They’re complementary, not competing.”

14. How do you structure Terraform for multiple environments?

What they’re looking for: Real-world organization skills. Workspaces vs directory structure. Modules for reusability. Variable files per environment. Remote state with environment-specific backends.

Answer tip: Describe your actual folder layout. “We used a directory-per-environment structure — environments/dev, environments/staging, environments/prod — each calling the same root modules with different tfvars. Workspaces felt too implicit for our team size.”

15. How do you handle breaking changes in Terraform?

What they’re looking for: Risk awareness. Renaming a resource destroys and recreates it. Changing a provider version can introduce drift. State moves, imports, and targeted applies all have footguns.

Answer tip: “terraform plan is sacred. We never applied without reviewing the plan. And when we needed to rename resources, we used terraform state mv first to avoid destruction. One time someone renamed an RDS instance in code without moving state — Terraform tried to destroy and recreate the production database. The plan review caught it.”

Monitoring & Observability — Because Deployment Is Only Half the Job

Shipping code without monitoring is like driving without headlights. Interviewers know this, and they’ll test whether you do too.

16. What’s the difference between monitoring and observability?

What they’re looking for: Nuance. Monitoring tells you when something is wrong (alerts, dashboards). Observability tells you why — using logs, metrics, and traces together to diagnose problems you didn’t predict.

Answer tip: “Monitoring is the smoke alarm. Observability is having a floor plan, heat sensors, and a camera so you can figure out where the fire started and why.” Then ground it: “We had dashboards for latency and error rates, but we couldn’t debug cross-service failures until we added distributed tracing with Jaeger.”

17. How do you decide what to alert on?

What they’re looking for: Discipline. Alerting on everything creates noise. Alerting on nothing creates blindness. They want to hear about SLOs, error budgets, and symptom-based alerting vs cause-based alerting.

Answer tip: “We alerted on user-facing symptoms — error rate above 1%, p99 latency above 500ms — not on internal metrics like CPU. High CPU isn’t a problem if users aren’t affected. We burned out our on-call team once with noisy alerts before we learned that lesson.”

18. Explain the concept of SLOs, SLIs, and SLAs.

What they’re looking for: Whether you can connect abstract concepts to real operations. SLI is the metric (request latency), SLO is the target (99.9% of requests under 200ms), SLA is the business contract with consequences.

Answer tip: Keep it concrete. “Our SLI was successful API responses divided by total responses. Our SLO was 99.95% over a 30-day window. That gave us an error budget of about 22 minutes of downtime per month. When we consumed half the budget in one incident, we paused feature work to stabilize.”

19. How do you approach logging in a microservices architecture?

What they’re looking for: Structured logging, centralized aggregation, correlation IDs, log levels, retention policies. Not just “we use ELK.”

Answer tip: “Every service emits structured JSON logs with a correlation ID that flows through all downstream calls. We ship everything to a centralized system — we used Loki, but ELK or Datadog work too. The key is that you can search by correlation ID and reconstruct an entire request path across services.”

20. What’s distributed tracing and when would you use it?

What they’re looking for: Understanding of trace context propagation, spans, and when tracing adds value vs when it’s overkill. Tracing shines in microservices where a single request touches five or more services.

Answer tip: “We added OpenTelemetry tracing when debugging a latency spike that logs alone couldn’t explain. Turns out one downstream service was making a synchronous call to a third-party API that occasionally took 3 seconds. Without tracing, we’d have blamed the wrong service for weeks.”

Culture & Process — The Part That Actually Defines DevOps

DevOps is a culture before it’s a toolchain. Interviewers who ask these questions are testing whether you understand that.

21. What does DevOps mean to you?

What they’re looking for: Not a dictionary definition. They want to hear about shared ownership, fast feedback loops, automation as a principle, and breaking down silos between development and operations.

Answer tip: “DevOps, to me, is the practice of making the entire team responsible for what happens after code is written — not just before. If I write it, I should be able to deploy it, monitor it, and get paged when it breaks. That ownership changes how you write code.”

22. How do you handle a production incident?

What they’re looking for: Process. Do you have an incident framework? Roles (incident commander, communicator, debugger)? A severity classification? Post-incident reviews?

Answer tip: Walk through a real incident. “We had a SEV-1 where our payment service went down. I was incident commander. We opened a war room, assigned roles, communicated status every 15 minutes to stakeholders, identified the root cause in 40 minutes — a misconfigured connection pool — and wrote a blameless postmortem the next day.”

23. What’s a blameless postmortem and why does it matter?

What they’re looking for: Whether you understand psychological safety in engineering. Blame drives hiding. Blamelessness drives learning.

Answer tip: “A blameless postmortem focuses on what the system allowed to happen, not who did it. We ask ‘why did the system let this deploy without a canary?’ not ‘why did you push that broken code?’ The goal is to fix the process, not punish the person. Teams that blame stop reporting incidents. Teams that learn from incidents get more resilient.”

24. How do you balance speed of delivery with system reliability?

What they’re looking for: The SRE concept of error budgets. Ship fast when you have budget. Slow down and stabilize when you don’t.

Answer tip: “Error budgets are the mechanism. If we’re within our SLO, we ship features aggressively. If we’ve consumed most of our error budget, we pause and invest in reliability — better tests, improved monitoring, architectural fixes. It’s a data-driven way to settle the eternal ‘ship vs stabilize’ debate.”

25. How do you evaluate whether to build a tool internally or use an existing solution?

What they’re looking for: Judgment and cost awareness. Build vs buy isn’t just about money — it’s about maintenance burden, opportunity cost, and team expertise.

Answer tip: “I default to using existing solutions unless there’s a strong reason not to. The maintenance cost of internal tooling is almost always underestimated. We once built a custom deployment tool because ‘no existing tool fits our workflow.’ Two years later, three engineers were spending 30% of their time maintaining it. We migrated to ArgoCD and got those engineers back.”

How to Structure Your Answers

A pattern I’ve seen work in DevOps interviews, over and over:

State your understanding of the concept. One or two sentences. Don’t lecture.
Give a concrete example from your experience. Real project, real consequence. “At my last company, we…” carries more weight than “In theory, you should…”
Mention the trade-offs. Every DevOps decision has one. If your answer doesn’t include a downside or a “it depends,” it probably sounds rehearsed.
Land on what you’d recommend and why. Show that you can make a decision, not just list options.

DevOps interviewers are operators themselves. They can smell a scripted answer from across the room. Operational experience — including failures — is what separates the candidates who get offers from the ones who get polite rejections.

FAQ

How many DevOps tools should I know for an interview?

Depth beats breadth. Know one CI/CD tool deeply (GitHub Actions or GitLab CI are the most common), one container orchestrator (Kubernetes), one IaC tool (Terraform), and one monitoring stack (Prometheus + Grafana or Datadog). If you can operate those four confidently, you can pick up alternatives in days.

Do DevOps interviews include coding challenges?

Often, yes. Expect scripting questions in Bash or Python — parsing logs, automating a task, writing a simple health check. Some companies also include system design rounds where you architect a deployment pipeline or a monitoring system. It’s less LeetCode and more practical problem-solving.

Should I get certified before applying for DevOps roles?

Certifications like the AWS DevOps Professional or CKA (Certified Kubernetes Administrator) help get past resume filters, especially at large companies. But they won’t carry the interview. I’ve seen certified candidates struggle to explain how terraform plan works in practice. The cert opens doors. Operational stories walk you through them.

What’s the biggest red flag in a DevOps interview answer?

Saying “we” for everything without being able to explain your specific contribution. Interviewers will follow up with “what was your role in that?” If you can’t answer clearly, they’ll assume you watched someone else do the work.

Want the full preparation playbook? DevOps questions are one slice. For coding, system design, behavioral rounds, and a week-by-week study plan, read how to prepare for a technical interview in 2026.

Studying cloud alongside DevOps? These questions pair well with our top cloud interview questions for AWS, Azure, and GCP.

Ready to start? Join the early access —>

25 DevOps Interview Questions You Must Know in 2026

25 DevOps Interview Questions You Must Know in 2026

CI/CD — The Pipeline Is the Product

1. What’s the difference between continuous integration, continuous delivery, and continuous deployment?

2. How would you design a CI/CD pipeline from scratch for a new microservice?

3. A deployment just failed in production. Walk me through your rollback strategy.

4. How do you handle secrets in a CI/CD pipeline?

5. How do you deal with flaky tests in a pipeline?

Containers & Orchestration — Where Theory Meets Triage

6. What happens when you run docker run?

7. How is a Docker image different from a container?

8. Explain how Kubernetes scheduling works.

9. How would you debug a pod stuck in CrashLoopBackOff?

10. When would you NOT use Kubernetes?

Infrastructure as Code — Reproducibility or Regret

11. Why is Infrastructure as Code important?

12. How do you manage Terraform state, and what can go wrong?

13. What’s the difference between Terraform and Ansible?

14. How do you structure Terraform for multiple environments?

15. How do you handle breaking changes in Terraform?

Monitoring & Observability — Because Deployment Is Only Half the Job

16. What’s the difference between monitoring and observability?

17. How do you decide what to alert on?

18. Explain the concept of SLOs, SLIs, and SLAs.

19. How do you approach logging in a microservices architecture?

20. What’s distributed tracing and when would you use it?

Culture & Process — The Part That Actually Defines DevOps

21. What does DevOps mean to you?

22. How do you handle a production incident?

23. What’s a blameless postmortem and why does it matter?

24. How do you balance speed of delivery with system reliability?

25. How do you evaluate whether to build a tool internally or use an existing solution?

How to Structure Your Answers

FAQ

Ready to ace your next interview?

6. What happens when you run `docker run`?