Engineer III, Software – SREResponsibilities:
Required Knowledge and Skills:
- Implement tools and processes necessary to achieve required SLOs for this company's Platform.
- Implement CI/CD pipelines.
- Automate delivery of platform services using infrastructure-as-a-code. Build self-service playbooks for platform which can be consumed across globally distributed teams at this company.
- Support incident response management processes
- Fix support and escalation issues.
- Conduct post-incident reviews.
- Collaborate with application and business stakeholders to ensure high-quality product is developed and deployed in production. Work diligently with other engineering teams to ratify release processes necessary to meet business goals.
- Drive continuous improvement process
- Strong knowledge of one of the major public cloud platforms (Azure, AWS, GCP)
- Experience supporting CI/CD for cloud native applications.
- Hands-on programming experience in Python or other object-oriented programming languages.
- Strong knowledge of Infrastructure and Application Monitoring tools: Prometheus, Grafana, DataDog, etc
- Experience implementing IaC concepts using Terraform, Chef, Puppet.
- Experience with Elasticsearch, Kibana
- Experience administering Databases
- Extensive knowledge in Linux administration.
- Strong knowledge of Docker, Helm.
- Experience administering Kubernetes clusters.
- Experience with deploying applications that utilize Service Mesh
- Experience working with incident response management processes.
- Bachelor’s degree
- 5+ years’ experience in software engineering
Preferred Knowledge and Skills:
- Understanding of GitOps principals.
- Experience implementing secure and compliant Kubernetes platforms.
- Experience deploying and managing stateful distributed service in Kubernetes.
- Experience with security scanning tools.
- Experience with intrusion detection systems.
- Experience with various messaging systems, such as Kafka or RabbitMQ
- Working knowledge of Databricks, Team Foundation Server, TeamCity, Octopus deploys and DataDog
- Corporate office/lab environment.
- Ability to travel 10% of the time.