DESCRIPTION
Although the role category specified in the GPP is Remote, the requirement is for Hybrid.
Key Responsibilities:
- Implement and automate deployment of distributed systems for ingesting and transforming data from various sources (relational, event-based, unstructured).
- Continuously monitor and troubleshoot data quality and integrity issues.
- Implement data governance processes and methods for managing metadata, access, and retention for internal and external users.
- Develop reliable, efficient, scalable, and quality data pipelines with monitoring and alert mechanisms using ETL/ELT tools or scripting languages.
- Develop physical data models and implement data storage architectures as per design guidelines.
- Analyze complex data elements and systems, data flow, dependencies, and relationships to contribute to conceptual, physical, and logical data models.
- Participate in testing and troubleshooting of data pipelines.
- Develop and operate large-scale data storage and processing solutions using distributed and cloud-based platforms (e.g., Data Lakes, Hadoop, Hbase, Cassandra, MongoDB, Accumulo, DynamoDB).
- Use agile development technologies, such as DevOps, Scrum, Kanban, and continuous improvement cycles, for data-driven applications.
Want more jobs like this?
Get jobs in Pune, India delivered to your inbox every week.
RESPONSIBILITIES
Qualifications:
- College, university, or equivalent degree in a relevant technical discipline, or relevant equivalent experience required.
- This position may require licensing for compliance with export controls or sanctions regulations.
Competencies:
- System Requirements Engineering: Uses appropriate methods and tools to translate stakeholder needs into verifiable requirements.
- Collaborates: Building partnerships and working collaboratively with others to meet shared objectives.
- Communicates effectively: Developing and delivering multi-mode communications that convey a clear understanding of the unique needs of different audiences.
- Customer focus: Building strong customer relationships and delivering customer-centric solutions.
- Decision quality: Making good and timely decisions that keep the organization moving forward.
- Data Extraction: Performs ETL activities from various sources and transforms them for consumption by downstream applications and users.
- Programming: Creates, writes, and tests computer code, test scripts, and build scripts using industry standards and tools.
- Quality Assurance Metrics: Applies measurement science to assess whether a solution meets its intended outcomes.
- Solution Documentation: Documents information and solutions based on knowledge gained during product development activities.
- Solution Validation Testing: Validates configuration item changes or solutions using defined best practices.
- Data Quality: Identifies, understands, and corrects flaws in data to support effective information governance.
- Problem Solving: Solves problems using systematic analysis processes and industry-standard methodologies.
- Values differences: Recognizing the value that different perspectives and cultures bring to an organization.
QUALIFICATIONS
Knowledge/Skills:
Must-Have:
- 3-5 years of experience in data engineering with a strong background in Azure Databricks and Scala/Python.
- Hands-on experience with Spark (Scala/PySpark) and SQL.
- Experience with SPARK Streaming, SPARK Internals, and Query Optimization.
- Proficiency in Azure Cloud Services.
- Agile Development experience.
- Unit Testing of ETL.
- Experience creating ETL pipelines with ML model integration.
- Knowledge of Big Data storage strategies (optimization and performance).
- Critical problem-solving skills.
- Basic understanding of Data Models (SQL/NoSQL) including Delta Lake or Lakehouse.
- Quick learner.
Nice-to-Have:
- Understanding of the ML lifecycle.
- Exposure to Big Data open-source technologies.
- Experience with SPARK, Scala/Java, Map-Reduce, Hive, HBase, and Kafka.
- SQL query language proficiency.
- Experience with clustered compute cloud-based implementations.
- Familiarity with developing applications requiring large file movement for a cloud-based environment.
- Exposure to Agile software development.
- Experience building analytical solutions.
- Exposure to IoT technology.
Experience:
- Relevant experience preferred, such as working in temporary student employment, internships, co-ops, or other extracurricular team activities.
- Knowledge of the latest technologies in data engineering is highly preferred, including:
- Exposure to Big Data open source
- SPARK, Scala/Java, Map-Reduce, Hive, Hbase, and Kafka or equivalent college coursework
- SQL query language
- Clustered compute cloud-based implementation experience
- Familiarity with developing applications requiring large file movement for a cloud-based environment
- Exposure to Agile software development
- Exposure to building analytical solutions
- Exposure to IoT technology
Work Schedule:
Most of the work will be with stakeholders in the US, with an overlap of 2-3 hours during EST hours on a need basis.
Job Systems/Information Technology
Organization Cummins Inc.
Role Category Remote
Job Type Exempt - Experienced
ReqID 2410605
Relocation Package No