Cloud Platform Operations Engineer
As a preferred supplier to one of my biggest clients, I am seeking a Cloud Platform Operations Engineer on a Hybrid working mode to join my client in Romania.
- As part of the Cloud Team your primary focusing is the delivery of cloud related services and products to internal customers.
- You will be responsible for providing end-to-end operations support for cloud services through monitoring, incident response, and incident resolution for the Public Cloud Platform environment.
- You will also support incidents affecting hosted Workloads (applications and infrastructure) by providing these services to the owning teams who are customers of the Platform, in a timely manner.
- Successful candidate will monitor the cloud platform to ensure availability, reliability, security and performance, responding to incidents and escalating them to the appropriate teams as required to ensure SLA’s are met.
- You will follow incident, change, release, and problem management processes. Participating and driving root cause analysis on incidents to ensure issues around and acted up on in a timely manner, using those processes to continually seeking to improve the stability of the environment for all customers.
- You will be an expert in observability, responsible for centralized platform monitoring and alerting tooling, setting the direction and enabling greater observability for all.
- You will seek to use automation to remove repetitive tasks, developing Runbooks and Playbooks for use across the organization.
- You will be an expert in operations in cloud environments, using that expertise to support development teams in taking on more operations responsibilities and driving DevOps practices to enable the company to efficiently scale cloud workloads.
- Perform infrastructure engineering for activities which include platform upgrades, image patching, monitoring, configuration, and troubleshooting.
- Contribute to incident root cause analysis, with enough knowledge to lead discussions and drive tangible improvements.
- *Work with the design leads and platform engineers to improve and shape the CCP and CCoE outcomes.
- *Work with peers inside and outside the team to promote and expand automation of operational tasks through consultancy and hands-on expertise, identifying the best fit and highlighting benefits.
- Design, build and implement tools to aid observability, identification and resolution of incidents that occur on the cloud platform with a strong emphasis on reducing Mean Time To Recover (MTTR).
- Provide backup and recovery support and guidance for cloud resources. This includes participating in disaster recovery procedures to support disaster recovery testing.
- Be a point of contact for key ITIL processes that impact the Cloud Platform and hosted Workloads, providing and advice and guidance on how to implement the various requirements in a cloud-first way.
- Participate in security incident response and investigation as needed.
- Execute capacity management ensuring the platform is not constrained by increase in adoption.
- Execute Pre Flight checks in order to ensure silent running for Public Cloud.
- Produce technical documents to support day to day operations.
If you want to be involved with tech innovation in a fast-paced environment with an employer that offers a generous salary + benefits (including performance related bonuses) and professional growth, then please don’t delay in sending me your CV 🙂