Design and implement distributed computing solutions
This is a data engineer position responsible for the design, development, implementation, and maintenance of data flow channels and data processing systems
Job Summary
This is a data engineer position responsible for the design, development, implementation, and maintenance of data flow channels and data processing systems.
Responsibilities include developing and optimizing scalable Spark Java-based data pipelines for processing and analyzing large scale financial data, and designing and implementing distributed computing solutions for risk modeling, pricing, and regulatory compliance.
The role requires strong proficiency in Python and Spark Java with knowledge of core Spark concepts, and experience with relational SQL and NoSQL databases.
Matching Summary
This is a data engineer position responsible for the design, development, implementation, and maintenance of data flow channels and data processing systems.
Skills & Requirements
Must-have
Spark Java development expertise
Python and Apache Spark
design and implement distributed computing solutions
Big Data
Spark performance tuning
CI/CD pipelines and version control
batch processing frameworks
Nice-to-have
interpersonal and communication skills
fast-paced financial environment
mathematical and analytical mindset
Key Requirements
5-8 Years of experience in data ecosystems
4-5 years hands-on experience in Hadoop, Scala, Java, Spark, Hive, Kafka, Impala, Unix Scripting
3+ years experience with SQL and NoSQL databases
Experience with Confluent Kafka, Redhat JBPM, CI/CD build pipelines
Experience with cloud platforms (OpenShift, AWS, GCP)
Experience with container technologies (Docker, Pivotal Cloud Foundry)
Bachelor’s/University degree or equivalent experience