Job description
A reputable and well-established Technology company is actively recruiting a Data Engineer for their team in Abu Dhabi.
Please note that you must meet all the criteria set out below for your application to be considered. Suitable candidates will be contacted within 5 working days. If you are not contacted by us within that time, please consider your application unsuccessful on this occasion.
The main responsibilities will include but not limited to:
- Prepare and manage the datasets that power the LLM fine-tuning and AI workflows.
- Build ingestion pipelines for structured/unstructured data using Python.
- Clean, normalize, and prepare data formats suitable for LLM fine-tuning (e.g., JSONL, CSV).
- Create high-quality, task-specific datasets for training and evaluation.
- Apply versioning to datasets using DVC or LakeFS for reproducibility.
- Generate embeddings using HuggingFace or Sentence Transformers.
- Manage vector indexes (FAISS, Weaviate) and optimize retrieval workflows.
- Tokenize and chunk long-form data for context window optimization.
Skills
To be successful you will need to meet the following:
- 10+ years of experience in a Data Engineering role.
- 2+ years of experience in an AI-adjacent data role.
- Experience managing datasets and object storage (MinIO, NFS)
- Proficiency in Python, pandas, and text processing tools.
- Familiarity with tokenization libraries (HuggingFace Tokenizers, SentencePiece)
- Understanding of LLM data constraints (context windows, formatting, prompt injection)
- Applicants should be available for face-to-face interviews in the location mentioned above.