Veryfi is looking for our next great data engineer that will build out and scale our analytics platform and corresponding data pipelines. Responsible for building and scaling a robust platform that will deliver our ML/AI driven insights to coordinate with the data visualization team to create engaging and insightful content
Responsibilities:
-
Craft data engineering components, applications and entities to empower self-service of our big data
-
Develop and implement technical best ETL practices for data movement, data quality and data cleansing
-
Optimize and tune ETL processes, utilize reusability, parameterization, workflow design, caching, parallel processing, and other performance tuning techniques.
Qualifications:
-
Knowledgeable about data engineering best practices, comfortable in a fast-paced startup
-
Experience with data warehousing, streaming data and supporting architectures: pub/sub, stream processor/data aggregator, realtime analytics, data lake cluster computing framework
-
Master of components necessary to architect solutions for complex data platforms, and large scale CI/CD data pipelines using a variety of technologies (REST APIs, Advanced SQL, Amazon S3, Apache Kafka, Data-Lakes, etc.), relational SQL DBs (e.g. MySQL, Postgres), newer (e.g. Mongo, Neo4j) to in-memory caches (e.g. Redis, Memcache)
-
Working knowledge of distributed computing and data modeling principles.
-
Experience with object-oriented design and coding and testing patterns, including experience with engineering software platforms and data infrastructures.
-
Experience in Big Data, PySpark, Streaming Data.
-
Knowledge of data management standards, data governance practices and data quality dimensions.
-
Experience in UNIX systems, writing shell scripts and programming in Python
-
Hands on experience in Python using libraries like NumPy, Pandas, PySpark.