SigFig - Site Reliability Engineering Manager - CI/CD Pipeline
Posted 2025-05-30Responsibilities : - Lead a global, distributed SRE/DevOps team operating in a 24/7 production environment.- Develop and implement automation frameworks for self-healing, auto-remediation, and system optimization.- Enhance monitoring and observability through tools like Splunk, Prometheus, and AI-powered alerting platforms.- Improve CI/CD pipelines using Jenkins, GitHub Actions, ArgoCD, and drive continuous delivery at scale.- Manage and scale infrastructure using Terraform, Kubernetes, Puppet, and similar tools.- Act as the first technical escalation point for Level-2/L-3 troubleshooting of production incidents involving Linux servers, cloud networking, and Kubernetes clusters.- Lead post-incident reviews, implement automated solutions for root cause issues, and contribute to a growing incident knowledge base.- Collaborate cross-functionally with Engineering, Security, and Product to align reliability initiatives with business objectives.- Establish and enforce SLOs and error budgets to continually raise system reliability standards.Requirements : - 7+ years of experience in SRE, DevOps, or Technical Operations roles.- 2+ years in a leadership role managing global, distributed teams in a high-uptime environment.- Proven experience with AWS, GCP, or Azure, and implementing infrastructure as code at scale.- Strong scripting skills in Python, Bash, or similar for automation and operational tooling.- Deep understanding of observability and incident management best practices.- Experience with CI/CD and deployment orchestration tools.- Familiarity with containerized and microservices-based architectures.- Passion for automation, reliability engineering, and continuous improvement.- Excellent communication and leadership skills to coordinate across global teams.- Previous experience in fintech or highly regulated environments is a plus. (ref: hirist.tech)
Apply Job!