Roles and Responsibilities
- Serve as a technical leader in Operations team to build next gen operations system, supporting and improving key aspects of infrastructure services including security, availability, and performance.
- Create procedures/run books for operational and security aspects of platform.
- Mentor team in terms of best practices for security, cost optimization and infra monitoring system.
Provide advanced business and engineering support services to end users:
- Gather business details from Product owner
- Lead other admins and platform engineers through design and implementation decisions to achieve balance between strategic design and tactical needs
- Research and deploy new tools and frameworks to maintain a big data platform
- Assist with creating programs for training and onboarding for new customers
- Lead Agile/Kanban workflows and team process work
- Troubleshoot customer issues to resolve problems
- Provide status updates to Operations product owner and stakeholders
- Track all details in the issue tracking system (JIRA)
- Provide issue review and triage problems for new service/support requests
- Use DevOps automation tools, including GitLab build jobs
- Fulfill any ad-hoc data or report request queries from different functional groups.
- On-call support for P1 issues on weekends / holidays on rotation basis.
- Total 12+ years of experience with 5+ years Full stack development using JS frameworks for front-end and Java or Python for backend.
- Infrastructure: 5+ years supporting systems infrastructure operations, upgrades, deployments, and monitoring
- DevOps: 5+ years’ Experience with DevOps automation – Terraform and Gitlab Runner/Jenkins
- Working experience and excellent understanding of AWS services; Advanced experience with IAM policy and role management.
- Security: Experience implementing role-based security, including AD integration, security policies, and auditing in a Linux/Hadoop/AWS environment. Familiar with penetration testing and scan tools for remediation of security vulnerabilities.
- Monitoring: Hands on experience with monitoring tools such as AWS CloudWatch, Datadog and Elastic Search
- Demonstrated successful experience learning new technologies quickly
Nice to have skills
- Ability to work with different geo regions
- Data Science tools (nice to have): Jupyter, Tensor flow