Site Reliability Engineer

Site Reliability Engineer
22
Mumbai
Job Views:
Created Date: 2026-06-18T12:00:00.258Z
Experience: 5 - year
Salary: upto
Industry: 24
Openings: 1
Primary Responsibilities :
Site Reliability Engineering
- Develop, automate, deploy, and maintain cloud infrastructure using Pulumi, Python, and Shell scripting.
- Build and automate CI/CD pipelines for application and infrastructure deployments.
- Deploy, manage, and upgrade Kubernetes environments (GKE) and tools such as Istio, Argo CD, and cert-manager.
- Monitor, troubleshoot, and optimize cloud networking components including VPCs, subnets, DNS, firewall rules, peering, and Private Service Access.
- Deploy, scale, monitor, and support microservices running on Kubernetes.
- Manage cloud resources including virtual machines, cloud functions, and storage services.
- Support and troubleshoot databases, caching, and messaging systems such as PostgreSQL, MongoDB, Redis, Spanner, and Confluent.
- Configure and support authentication systems and API integrations.
- Implement observability, monitoring, logging, and alerting solutions.
- Participate in security audits and implement cloud security best practices.
Operational Excellence
- Improve CI/CD processes, release management, rollback strategies, and production support.
- Enhance monitoring, dashboards, logging, tracing, and alert quality.
- Participate in incident management, troubleshooting, and root cause analysis.
- Create and maintain technical documentation, runbooks, and architecture documents.
- Track tasks, issues, and releases using project management tools.
- Support continuous improvement initiatives and engineering best practices.
Leadership & Collaboration
- Work closely with Software Engineering, QA, Security, and Infrastructure teams.
- Provide technical guidance and support to junior SRE team members.
- Mentor and assist Level 1 SRE engineers.
- Support cross-functional teams in delivering reliable and scalable solutions.
Experience Requirements:
- 5+ years of experience in Site Reliability Engineering, DevOps, Cloud Engineering, or related roles.
- Strong experience with cloud platforms, preferably Google Cloud Platform (GCP).
- Hands-on experience with Kubernetes (GKE) and containerized environments.
- Proficiency in Python, Shell Scripting, YAML, and JSON.
- Experience building and maintaining CI/CD pipelines.
- Strong Linux administration and troubleshooting skills.
- Experience with infrastructure automation tools and APIs.
- Knowledge of networking concepts including VPCs, DNS, firewalling, and cloud connectivity.
- Experience with monitoring and observability platforms such as Prometheus, Datadog, Splunk, or Google Monitoring.
- Strong understanding of Git, DevOps practices, and automation workflows.
- Excellent English communication skills.
Preferred Qualifications
- Experience with Istio Service Mesh, Argo CD, and cert-manager.
- Knowledge of PostgreSQL, MongoDB, Redis, Spanner, and messaging platforms.
- Familiarity with authentication and identity management systems.
- Experience with security audits and cloud security best practices.
- GCP Certifications such as:
- Professional Cloud DevOps Engineer
- Professional Cloud Architect
- Professional Cloud Security Engineer
- Additional certifications in Python, DevOps, or Observability platforms.