Technical Architect: AWS Site Reliability
-
- Software Engineering
- Professional
Technical Architect: AWS Site Reliability
-
- Software Engineering
- Professional
In this role, you’ll work in one of our IBM Consulting Client Innovation Centers (Delivery Centers), where we deliver deep technical and industry expertise to a wide range of public and private sector clients around the world. Our delivery centers offer our clients locally based skills and technical expertise to drive innovation and adoption of new technology.
At IBM, work is more than a job – it’s a calling: To build. To design. To code. To consult. To think along with clients and sell. To make markets. To invent. To collaborate. Not just to do something better, but to attempt things you’ve never thought possible. Are you ready to lead in this new era of technology and solve some of the world’s most challenging problems? If so, lets talk.
Your Role and Responsibilities
The Site Reliability Engineer is a critical role in Cloud based projects. An SRE works with the development squads to build platform & infrastructure management/provisioning automation and service monitoring using the same methods used in software development to support application development. SREs create a bridge between development and operations by applying a software engineering mindset to system administration topics. They split their time between operations/on-call duties and developing systems and software that help increase site reliability and performance.
Required Technical and Professional Expertise
- 10+ experience in Senior SRE related role. Deep understanding of the AWS platforms technology and capabilities to support site reliability goals.
- Responsible for identifying the point of failures and performance bottlenecks and provide feedback to the architecture teams. Identifies the tools best suitable for integrating to ci/cd pipeline for performance, code quality, code coverage measurement. Defined the quality gates in ci/cd pipeline by working with the application architects.
- SRE should define & implement the strategy for availability, latency, performance, efficiency, change management, monitoring, emergency response, and capacity planning. Identifies and implements the methods for scaling the applications; as well as tools for logging, monitoring, alerting and run book automation for auto remediation(self healing).
- Works with the application and support teams during critical situations in identifying the root cause of failures and help fix them. Incorporates aspects of software engineering and apply that to it operations problems.
- Applies aspects of software engineering to operations with the goal of creating software systems that are highly scalable and reliable.
Preferred Technical and Professional Expertise
- Smart Monitoring & Alerting for resilience, capacity & performance optimization
- Good understanding of applying infrastructure as code to control and automate build processes, Windows servers and infrastructure, hybrid identity using Azure AD Connect for SSO integration
- Understanding of DevOps and CI/CD tools (such as Jenkins, Ansible, Packer, Docker)
IBMerについて知りたい場合
About IBM
IBM’s greatest invention is the IBMer. We believe that through the application of intelligence, reason and science, we can improve business, society and the human condition, bringing the power of an open hybrid cloud and AI strategy to life for our clients and partners around the world.
Restlessly reinventing since 1911, we are not only one of the largest corporate organizations in the world, we’re also one of the biggest technology and consulting employers, with many of the Fortune 50 companies relying on the IBM Cloud to run their business.
At IBM, we pride ourselves on being an early adopter of artificial intelligence, quantum computing and blockchain. Now it’s time for you to join us on our journey to being a responsible technology innovator and a force for good in the world.
職種の概要
希望に沿うポジションがない場合
IBMの人材ネットワークにご参加ください。新たな職種をご紹介します。