** No firms - we cannot work with firms due to regulatory reasons.**
Reality Defender seeks a data engineer to join the Data Engineering team. You would work on product-oriented research and development for in-the-wild deepfake media detection, with an emphasis on dataset construction and preprocessing.
Large Language Model (LLM) Ethics
RD does not allow the use of Large Language Models or online chatbots such as ChatGPT in any part of the interview process - video calls, take-homes, etc.
#LI-Remote
Responsibilities
- Build large-scale multimodal datasets and benchmarks, with an eye to ensuring diversity and minimizing bias.
- Write and maintain code for dataset preprocessing, labeling, and sampling.
- Automate data augmentation, quality control, and content moderation.
- Closely interface with AI team for deep learning model training and evaluation.
Qualifications
Required :
- At least 3 years of work experience in software/data science/equivalent.
- Proficient in Python and at least one deep learning library (ideally PyTorch).
- Experience building diverse and balanced datasets.
- Solid understanding of linear algebra, statistics and deep learning concepts.
Nice to have:
- Familiarity with audio and video file formats and codecs.
- Experience with text preprocessing and tokenizers.
- Experience with topics in AI fairness / Responsible AI, particularly in relation to dataset construction.