Our client, who is in the education space, is looking for a Software Engineer or DevOps who is interested in managing projects and products (not people) in the role of Site Reliability Engineer. This individual will pull from their engineering experience to foster a vision of a DevOps culture, through developing documentation, requirements, processes, and solutions which enhance the reliability of the infrastructure.

 • Hands-on design, analysis, development and troubleshooting of highly-distributed large-scale production systems and event-driven services spanning on-prem and AWS based hosting
• Ownership of reliability, uptime, system security, cost, operations, capacity and performance-analysis
• Share a 24×7 on-call rotation with your team and respond to incidents; lead triage bridges during incidents and provide needed status updates
• Create and maintain monitoring, alerting and dashboarding solutions that improve the visibility into applications' performance and business metrics and keep operational workload in-check.
• Use automation technologies to ensure repeatability, eliminate toil, reduce time to action and repair services
• Participate in technical training events and game day scenarios
• Partner with engineering, security, system admins performance, qa and product management teams to improve the availability and quality of service of products
• Ensure repeatability, traceability, and transparency of our infrastructure automation (infrastructure-as-code, monitoring-as-code)
• Software development practices, including complying with agile software development methodology, building standards for code reviews, work packaging, and continuous delivery

Required Skills:
• Strong Linux administration/build/management skills
• Development experience in at least one of these languages: Java, Go, C# and/or Python; Strong skills in reading, understanding and writing code in the same
• Demonstrated expertise building and managing highly scaled production infrastructure in on-prem and AWS based environments
• Extensive experience troubleshooting n-tier architectures with diverse sets of technologies strongly desired. (e.g. load balancers, web/app/caching/database servers, queues, threading, memory, cpu, heap, storage, network, os)
• Strong experience using application and infrastructure monitoring systems (like Splunk, Cloudwatch, Datadog, New Relic, Sumologic, ELK)
• Excellent presentation and communication skills
• Mastery of infrastructure automation technologies (like Terraform, Puppet, Ansible, Chef)
• Expertise with continuous deployment based software development lifecycles (e.g. CI/CD)
• Experience with common middleware (e.g., Apache, NGINX, IIS, Tomcat, JBoss)
• Experience with SQL databases (e.g., PostgreSQL, Oracle, MySQL)
• Expertise with SDLC branching, SCM, and code deployment systems (git/gitflow, Jenkins, CircleCI, TravisCI, etc.)
• Expertise in container/container-fleet-orchestration technologies (like Docker, Vagrant, Mesosphere)
• BS Degree in Computer Science (or related technical field and/or equivalent industry experience)


Attach a resume file. Accepted file types are DOC, DOCX, PDF, HTML, and TXT.

We are uploading your application. It may take a few moments to read your resume. Please wait!