3 - 6 years
Client is a post-trade technology and professional services firm. We are a fully hosted technology platform, coupled with a team of experienced hedge fund professionals, offering sophisticated solutions for the most complex post-trade challenges facing asset managers. Built on a platform developed and tested by the D. E. Shaw group for its own post-trade activities, we were launched as an independent company in 2015. Since its launch, we have grown to support more than $65 billion in assets from a number of leading hedge funds, with a staff of over 600 software development, accounting, operations, and treasury professionals. By providing cutting-edge technology, automation, and security, we enable clients’ operations, accounting, treasury, and enterprise data management teams to achieve unparalleled results.
Client seeks a highly skilled Site Reliability Engineer to join our Technology team. You will be working as part of a cross-functional product team to create elegant solutions to highly complex and intricate business challenges. Responsibilities include: ? Working with the rest of the team to deploy, maintain, and run a highly-available, multi-tenant distributed system ? Automating both the infrastructure creation and the application deployment to that environment. ? Contributing to the design/architecture of the system ? Programming in the core application (ex: instrumenting code with monitoring metrics, setting up traces, shipping and organizing logs) ? Ensuring the system performs as intended The ideal candidate will have at least 6 years of experience in a SRE/Operations/DevOps role running distributed systems in production. Requirements: ? Experience with automated provisioning and management of AWS infrastructure and services ? Strong knowledge of Linux systems internals and administration ? Deep experience with Kubernetes and Docker ? Experience automating the software dev/test/deployment lifecycle with continuous integration and continuous deployment ? Experience with scaling, monitoring, and troubleshooting actively running systems ? Ability to program in Java, C++, or C# ? Comfortable with configuration management tools: Ansible, Chef, Puppet, etc. ? Other technologies: Fluentd, Key-Val datastores, API management/service meshes, Git, Key management
Must have Skills:
Experience in Linux programming, Kafka, Terraform, AWS and Kubernetes is a must. Decent programming experience in Java. Proficient in writing codes.
The following is the skillset that we are looking in this role –
� Dev Ops, Debugging skills, experience in logging and monitoring solutions such as Elastic Search, Kibana, fluentd, logstash, OpenCensus, Prometheus, AWS Cloudwatch/Cloud Metrics, Datadog
� Linux – administration & internals, Networking, Scripting, Debugging skills, LDAP, [Docker, Ansible/Puppet, Security]
� AWS Skills
� Experience in managing messaging middleware infra such as – Kafka (AWS MSK), Rabbit MQ, Active MQ
Preferred Universities would be all Tier 1 institutes such as IIT, NIT, BITS-Pilani, Anna University, COE Pune, DTU, NSIT, PEC, PSG, RVCE & TIET Patiala.
If Linux programming, Linux Administration, AWS & Kubernetes experience is missing, then it is not suitable for this role.
Process will involve a Hacker Rank test, then telephonic interview and 3-4 F2F interviews.
4 to 6 years of experience in Operations and/or Development teams.
� Excellent Linux administration and troubleshooting skills.
� Solid experience in AWS administration and automation.
� Good programming knowledge (Python).
� In-depth experience in handling incidents, coordination and RCA.
� Monitoring and alerting
� Should be comfortable working in a 24*7 environment.