The position involves overseeing the availability and capacity of essential applications to optimize performance. Responsibilities include offering technical support for these applications, resolving issues, and facilitating the setup and integration of new applications within the company's ecosystem. The successful candidate will play a crucial role in maintaining the stability and efficiency of the organization's application infrastructure.
Key Responsibilities:
Conceptualize, implement, and maintain highly available and scalable infrastructure.
Work alongside development teams to ensure applications are created with reliability and performance in mind.
Observe system performance, identify bottlenecks, and proactively implement optimizations to enhance system efficiency.
Create and maintain automation tools for deployment, configuration, and monitoring of systems and services.
Perform system capacity planning and provide recommendations for scaling resources to meet growing demands.
Recognize and resolve complex technical issues related to infrastructure, networking, and application performance.
Establish and improve monitoring, alerting, and logging systems to ensure timely detection and resolution of incidents.
Cooperate with cross-functional teams to define and implement best practices for infrastructure, deployment, and operational processes.
Take part in on-call rotation and provide prompt response and resolution to production incidents.
Remain current with industry trends and emerging technologies in cloud computing and infrastructure automation.
Extensive experience with on-premises infrastructure and cloud platforms such as AWS and Azure.
Proficiency in infrastructure-as-code (IaC) tools like Terraform or CloudFormation.
Solid understanding of containerization technologies such as Docker and orchestration tools like Kubernetes.
Experience with configuration management tools like Ansible, Puppet, or Chef.
Strong scripting skills in languages such as Python, Bash, or PowerShell.
In-depth knowledge of Linux systems administration and networking concepts.
Familiarity with monitoring and logging tools like Prometheus, Grafana, ELK Stack
Education
Academic Qualification(s):
Bachelor's degree in Computer Science, Engineering, or a related field (or equivalent work experience).
Professional Qualification(s):
Service Management Certifications (e.g. ITIL) are beneficial
Experience (Number of relevant years):
Minimum of 3-4 years of experience in a similar SRE or infrastructure engineering role.
Proceed to application method »