Submitting more applications increases your chances of landing a job.

Here’s how busy the average job seeker was last month:

Opportunities viewed

Applications submitted

Keep exploring and applying to maximize your chances!

Looking for employers with a proven track record of hiring women?

Click here to explore opportunities now!
We Value Your Feedback

You are invited to participate in a survey designed to help researchers understand how best to match workers to the types of jobs they are searching for

Would You Be Likely to Participate?

If selected, we will contact you via email with further instructions and details about your participation.

You will receive a $7 payout for answering the survey.


User unblocked successfully
https://bayt.page.link/iChnigGVTJ2CYk9u7
Back to the job results

Lead Engineer Bigdata - PySpark

4 days ago 2026/08/24
Other Business Support Services
Create a job alert for similar positions
Job alert turned off. You won’t receive updates for this search anymore.

Job description

Discover your future at Citi

Working at Citi is far more than just a job. A career with us means joining a team of more than 230,000 dedicated people from around the globe. At Citi, you’ll have the opportunity to grow your career, give back to your community and make a real impact.


Job Overview

Overview


We are seeking a highly skilled and experienced Senior Bigdata/PySpark Engineer to join our dynamic Big Data Analytics team. The ideal candidate will have a strong background in Python programming and extensive experience with Apache Spark, particularly PySpark, for large-scale data processing and analytics. This role involves designing, developing, and optimizing robust and scalable data pipelines, working with vast datasets, and contributing to the architecture of our Big Data solutions.


Responsibilities:


  • Design, develop, and maintain efficient, scalable, and reliable data pipelines using PySpark.
  • Implement complex data transformations, aggregations, and data quality checks on large datasets.
  • Collaborate with multiple stakeholders (technology and business) to understand data requirements and translate them into technical specifications.
  • Optimize PySpark jobs for performance, efficiency, and cost-effectiveness.
  • Develop and maintain documentation for data pipelines, data models, and data processing logic.
  • Participate in code reviews, ensuring code quality, best practices, and adherence to established standards.
  • Troubleshoot and resolve issues in existing data pipelines and data processing jobs.
  • Stay up-to-date with the latest advancements in PySpark, Apache Spark, and the broader Big Data ecosystem.
  • Mentor junior developers and contribute to the continuous improvement of the team's technical capabilities and processes.

Required Qualifications:


  • 8-12 years of relevant experience
  • Bachelor's or Master's degree in Computer Science, Engineering, Data Science, or a related field.
  • 5+ years of professional experience in software development with a focus on Big Data technologies.
  • 5+ years of hands-on experience specifically with PySpark for large-scale data processing.
  • Strong proficiency in Python programming, including object-oriented design and data manipulation libraries (e.g., Pandas, NumPy).
  • In-depth understanding of Apache Spark architecture, including Spark Core, Spark SQL, Spark Streaming, and DataFrame API.
  • Experience with various data storage technologies such as HDFS, S3, Azure Blob Storage, or similar distributed file systems.
  • Solid understanding of relational databases and SQL.
  • Experience with version control systems (e.g., Git).
  • Excellent problem-solving, analytical, and communication skills.

Preferred Qualifications:


  • Experience with cloud platforms (AWS, Azure, GCP) and their Big Data services (e.g., EMR, Databricks, Glue, Azure Synapse, Google Dataproc).
  • Familiarity with workflow orchestration tools (e.g., Apache Airflow, Luigi).
  • Experience with streaming data processing (e.g., Kafka, Spark Streaming).
  • Knowledge of data warehousing concepts and data modeling techniques.
  • Experience with containerization technologies (e.g., Docker, Kubernetes).
  • Understanding of data governance, data security, and compliance best practices.

------------------------------------------------------


Job Family Group:


Technology

------------------------------------------------------


Job Family:


Applications Development

------------------------------------------------------


Time Type:


Full time

------------------------------------------------------


Most Relevant Skills


Please see the requirements listed above.

------------------------------------------------------


Other Relevant Skills


For complementary skills, please see above and/or contact the recruiter.

------------------------------------------------------


Citi is an equal opportunity employer, and qualified candidates will receive consideration without regard to their race, color, religion, sex, sexual orientation, gender identity, national origin, disability, status as a protected veteran, or any other characteristic protected by law.


If you are a person with a disability and need a reasonable accommodation to use our search tools and/or apply for a career opportunity review Accessibility at Citi.
View Citi’s EEO Policy Statement and the Know Your Rights poster.



This job post has been translated by AI and may contain minor differences or errors.

You’ve reached the maximum limit of 15 job alerts. To create a new alert, please delete an existing one first.
Job alert created for this search. You’ll receive updates when new jobs match.
Are you sure you want to unapply?

You'll no longer be considered for this role and your application will be removed from the employer's inbox.