Create a scalable architecture to input data into the database from multiple sources.
Write various scripts to merge and input data into the MongoDB database.
Develop and automate large-scale, high-performance data processing systems (batch and/or streaming).
Evangelize high-quality software engineering practices towards building data infrastructure and pipelines at scale.
Lead data engineering projects to ensure pipelines are reliable, efficient, testable, & maintainable.
Design our data models for optimal storage and retrieval and to meet critical product and business requirements.
Understand and influence logging to support our data flow, architecting logging best practices where needed
Experience with languages used for Machine Learning experiment/implementation such as Python and frameworks like PyTorch and TensorFlow.
Experience of languages used in production for integration such as Python and Java is a plus.
Experience or willingness to learn the state of the art NLP models such as the Transformer family
Experience or willingness to learn state-of-the-art Deep Learning paradigms and application of them into areas such as Text Classification, Semantic Modeling, Ranking, and Personalization.
Understanding of analytics & experimentation
Experience in building an end-to-end data pipeline is a plus.
Ability to communicate clearly and to effectively influence others
Creative
Self-driven