View All Jobs

DevOps Engineer

IND- MH- Pune
  • Apply
Job Description

About This Role:

We are hiring a hands-on DevOps Engineer to manage and support production-grade cloud infrastructure for Kibo’s commerce platform. This role focuses on Kubernetes (EKS), Terraform, and real-time production troubleshooting in a 24/7 on-call environment.

ABOUT KIBO 

KIBO is a composable digital commerce platform for B2C, D2C, and B2B organizations who want to simplify the complexity in their businesses and deliver modern customer experiences.  KIBO is the only modular, modern commerce platform that supports experiences spanning B2B and B2C Commerce, Order Management, and Subscriptions. Companies like Ace Hardware, Zwilling, Jelly Belly, Nivel, and Honey Birdette trust Kibo to bring simplicity and sophistication to commerce operations and deliver experiences that drive value.   

KIBO's cutting-edge solution is MACH Alliance Certified and has been recognized by Forrester, Gartner, IDC, Internet Retailer, and TrustRadius. KIBO has been named a leader in The Forrester Wave™: Order Management Systems, Q1 2025 and in the IDC MarketScape report “Worldwide Enterprise Headless Digital Commerce Applications 2024 Vendor Assessment”.

By joining KIBO, you will be part of a team of Kibonauts all over the world in a remote-friendly environment. Whether your job is to build, sell, or support KIBO’s commerce solutions, we tackle challenges together with the approach of trust, growth mindset, and customer obsession. If you’re seeking a unique challenge with amazing growth potential, then come work with us!

 

WHAT YOU’LL DO 

  • Manage and operate production-grade Kubernetes clusters (EKS preferred), ensuring high availability and scalability
  • Troubleshoot real-time production issues across distributed systems and microservices
  • Diagnose and resolve issues such as:
    • Pod failures (CrashLoopBackOff, Pending, OOMKilled)
    • Node failures, autoscaling, and resource constraints
    • Networking, ingress, and service connectivity issues
  • Build, maintain, and debug infrastructure using Terraform (modules, remote state, locking, drift handling)
  • Implement and enhance monitoring & alerting systems using Prometheus, Grafana, and related tools
  • Perform root cause analysis (RCA) for incidents and drive permanent fixes to improve system reliability
  • Participate in a 24/7 on-call rotation, owning incidents and resolving them independently
  • Collaborate with engineering teams to improve system performance, resilience, and deployment processes
  • Automate deployments, infrastructure provisioning, and operational workflows to reduce manual effort
  • Ensure adherence to security best practices across infrastructure and deployments 
 
Skills & Requirements

WHAT YOU’LL NEED 

  • 8 + Years of experience as a Developer Engineer, owning and operating production Kubernetes clusters (EKS preferred), including cluster health, scaling, and availability
  • Troubleshoot real-time production issues independently across microservices and distributed systems
  • Debug and resolve critical issues such as:
    • Pods stuck in CrashLoopBackOff, Pending, OOMKilled states
    • Node failures, node pressure, autoscaling issues
    • Service connectivity, ingress, and networking issues
  • Investigate and fix cluster-level issues including scheduling, resource constraints, and misconfigurations
  • Build and maintain infrastructure using Terraform, including:
    • Writing and modifying modules
    • Managing remote state and locking
    • Handling drift and failed deployments
    • Design and implement reusable Terraform modules for scalable infrastructure
    • Troubleshoot and resolve Terraform apply failures and infrastructure inconsistencies in production
  • Monitor system health using Prometheus, Grafana, and logging tools, and proactively identify issues
  • Perform root cause analysis (RCA) for production incidents and implement long-term fixes
  • Handle on-call incidents (24/7 rotation) and take full ownership until resolution
  • Work closely with development teams to improve system reliability, performance, and scalability
  • Automate operational tasks and improve deployment and infrastructure processes
  • Ensure security best practices across infrastructure, networking, and access controls
  • .

 

KIBO PERKS 

  • Flexible schedule and hybrid work setting 

  • Paid company holidays and global volunteer holiday 

  • Generous health, wellness, benefits, and time away programs 

  • Commitment to individual growth and development and opportunity for internal mobility 

  • Passionate, high-achieving teammates excited to help you succeed and learn 

  • Company-sponsored events and other activities  

At Kibo we celebrate and support all differences. Kibo is proud to be an equal opportunity workplace. We are committed to equal employment opportunity regardless of race, color, ancestry, religion, sex, national origin, sexual orientation, age, citizenship, marital, disability, and veteran status. 

 

Qualifications