Python, Java, Scripting languages (Bash, Shell), Apache Spark, Scala, Airflow, Hadoop, Kafka, AWS, SQL and NoSQL databases |
Design, develop, and maintain ETL (Extract, Transform, Load) processes and data pipelines on AWS for Batch & Realtime data.Implement data storage solutions, including data lakes and data warehouses, leveraging AWS services such as S3, Redshift, Dynamo DB, RDS, Glue, EC2 and EMR or 3rd party managed services like Snowflake, Databricks, Redis, Neo4jImplement the best data format (Parquet, ORC, Avro, Json) for processing and accessing the data.Collaborate with data scientists, analysts, and other stakeholders to understand data requirements and deliver solutions that meet business objectivesOptimize and tune data processing and transformation jobs for performance and scalability.Ensure data security, compliance, and governance practices are followed.Monitor data pipeline health, troubleshoot issues, and implement proactive solutions.Implement Data Model in consumption layer, which is standardized, governed and optimizedWork and implement by selecting a CI/CD tool for cloud environment, consider factors such as preferred cloud provider, integration capabilities, ease of use, scalability, and pricing for e.g. Jenkins, GitLab CI/CD, AWS CodePipeline, Google Cloud Build, Bamboo, Drone, Travis, check, puppet, bash scripts.Maintain documentation and best practices for data engineering processes and standards. Stay up to date with emerging AWS technologies and best practices in data engineering. |
Proven experience as a data engineer with a focus on AWS cloud technologies.Strong knowledge of AWS data services, including S3, Redshift, Glue, EMR, EC2, Dynamo DB, RDS, Lambda, and Athena.Proficiency in programming languages such as Python, Scala, or Java.Experience with ETL tools and frameworks, such as Apache Spark.Experience with data modeling and SQL.Knowledge of data governance, security, and compliance practices.Excellent problem-solving and communication skills.AWS certifications (e.g., AWS Certified Data Engineer) are a plus |