DESCRIPTION

Although the role category specified in the GPP is Remote, the requirement is for Hybrid.

Key Responsibilities:

Implement and automate deployment of distributed systems for ingesting and transforming data from various sources (relational, event-based, unstructured).
Continuously monitor and troubleshoot data quality and integrity issues.
Implement data governance processes and methods for managing metadata, access, and retention for internal and external users.
Develop reliable, efficient, scalable, and quality data pipelines with monitoring and alert mechanisms using ETL/ELT tools or scripting languages.
Develop physical data models and implement data storage architectures as per design guidelines.
Analyze complex data elements and systems, data flow, dependencies, and relationships to contribute to conceptual, physical, and logical data models.
Participate in testing and troubleshooting of data pipelines.
Develop and operate large-scale data storage and processing solutions using distributed and cloud-based platforms (e.g., Data Lakes, Hadoop, Hbase, Cassandra, MongoDB, Accumulo, DynamoDB).
Use agile development technologies, such as DevOps, Scrum, Kanban, and continuous improvement cycles, for data-driven applications.

Want more jobs like this?

Get jobs in Pune, India delivered to your inbox every week.

By signing up, you agree to our Terms of Service & Privacy Policy.

RESPONSIBILITIES

Qualifications:

College, university, or equivalent degree in a relevant technical discipline, or relevant equivalent experience required.
This position may require licensing for compliance with export controls or sanctions regulations.

Competencies:

System Requirements Engineering: Uses appropriate methods and tools to translate stakeholder needs into verifiable requirements.
Collaborates: Building partnerships and working collaboratively with others to meet shared objectives.
Communicates effectively: Developing and delivering multi-mode communications that convey a clear understanding of the unique needs of different audiences.
Customer focus: Building strong customer relationships and delivering customer-centric solutions.
Decision quality: Making good and timely decisions that keep the organization moving forward.
Data Extraction: Performs ETL activities from various sources and transforms them for consumption by downstream applications and users.
Programming: Creates, writes, and tests computer code, test scripts, and build scripts using industry standards and tools.
Quality Assurance Metrics: Applies measurement science to assess whether a solution meets its intended outcomes.
Solution Documentation: Documents information and solutions based on knowledge gained during product development activities.
Solution Validation Testing: Validates configuration item changes or solutions using defined best practices.
Data Quality: Identifies, understands, and corrects flaws in data to support effective information governance.
Problem Solving: Solves problems using systematic analysis processes and industry-standard methodologies.
Values differences: Recognizing the value that different perspectives and cultures bring to an organization.

QUALIFICATIONS

Knowledge/Skills:

Must-Have:

3-5 years of experience in data engineering with a strong background in Azure Databricks and Scala/Python.
Hands-on experience with Spark (Scala/PySpark) and SQL.
Experience with SPARK Streaming, SPARK Internals, and Query Optimization.
Proficiency in Azure Cloud Services.
Agile Development experience.
Unit Testing of ETL.
Experience creating ETL pipelines with ML model integration.
Knowledge of Big Data storage strategies (optimization and performance).
Critical problem-solving skills.
Basic understanding of Data Models (SQL/NoSQL) including Delta Lake or Lakehouse.
Quick learner.

Nice-to-Have:

Understanding of the ML lifecycle.
Exposure to Big Data open-source technologies.
Experience with SPARK, Scala/Java, Map-Reduce, Hive, HBase, and Kafka.
SQL query language proficiency.
Experience with clustered compute cloud-based implementations.
Familiarity with developing applications requiring large file movement for a cloud-based environment.
Exposure to Agile software development.
Experience building analytical solutions.
Exposure to IoT technology.

Experience:

Relevant experience preferred, such as working in temporary student employment, internships, co-ops, or other extracurricular team activities.
Knowledge of the latest technologies in data engineering is highly preferred, including:
Exposure to Big Data open source
SPARK, Scala/Java, Map-Reduce, Hive, Hbase, and Kafka or equivalent college coursework
SQL query language
Clustered compute cloud-based implementation experience
Familiarity with developing applications requiring large file movement for a cloud-based environment
Exposure to Agile software development
Exposure to building analytical solutions
Exposure to IoT technology

Work Schedule:

Most of the work will be with stakeholders in the US, with an overlap of 2-3 hours during EST hours on a need basis.

Job Systems/Information Technology

Organization Cummins Inc.

Role Category Remote

Job Type Exempt - Experienced

ReqID 2410605

Relocation Package No

Data Engineer

Want more jobs like this?

Perks and Benefits

Health and Wellness

Parental Benefits

Work Flexibility

Office Life and Perks

Vacation and Time Off

Financial and Retirement

Professional Development

Search Additional Jobs