DESCRIPTION
Leads projects for the design, development, and maintenance of a data and analytics platform. Ensures efficient processing, storage, and availability of data for analysts and other consumers. Collaborates with key business stakeholders, IT experts, and subject-matter experts to plan, design, and deliver optimal analytics and data science solutions. Works on one or multiple product teams simultaneously.
Note:- Even though the role is categorized as Remote, it will follow a hybrid work model.
Key Responsibilities:
- Design and automate the deployment of distributed systems for ingesting and transforming data from various sources (relational, event-based, unstructured).
- Develop frameworks for continuous monitoring and troubleshooting of data quality and integrity issues.
- Implement data governance processes for metadata management, data access, and retention policies for internal and external users.
- Provide guidance on building reliable, efficient, scalable, and quality data pipelines with monitoring and alert mechanisms using ETL/ELT tools or scripting languages.
- Design and implement physical data models to define database structures and optimize performance through indexing and table relationships.
- Optimize, test, and troubleshoot data pipelines.
- Develop and manage large-scale data storage and processing solutions using distributed and cloud-based platforms such as Data Lakes, Hadoop, Hbase, Cassandra, MongoDB, Accumulo, DynamoDB, and others.
- Utilize modern tools and architectures to automate common, repeatable, and tedious data preparation and integration tasks.
- Drive automation in data integration and management by renovating the data management infrastructure.
- Ensure the success of critical analytics initiatives by employing agile development methodologies such as DevOps, Scrum, and Kanban.
- Coach and mentor less experienced team members.
Want more jobs like this?
Get jobs in Pune, India delivered to your inbox every week.
RESPONSIBILITIES
Technical Skills:
- Expert-level proficiency in Spark, including optimization, debugging, and troubleshooting Spark jobs.
- Solid knowledge of Azure Databricks for scalable, distributed data processing.
- Strong coding skills in Python and Scala for data processing.
- Experience with SQL, especially for large datasets.
- Knowledge of data formats such as Iceberg, Parquet, ORC, and Delta Lake.
- Experience developing CI/CD processes.
- Deep understanding of Azure Data Services (e.g., Azure Blob Storage, Azure Data Lake, Azure SQL Data Warehouse, Synapse Analytics, etc.).
- Familiarity with data lakes, data warehouses, and modern data architectures.
Competencies:
- System Requirements Engineering - Translates stakeholder needs into verifiable requirements, establishing acceptance criteria and assessing the impact of requirement changes.
- Collaborates - Builds partnerships and works collaboratively with others to meet shared objectives.
- Communicates effectively - Develops and delivers clear, audience-specific communications.
- Customer focus - Builds strong customer relationships and delivers customer-centric solutions.
- Decision quality - Makes timely and informed decisions to keep the organization moving forward.
- Data Extraction - Performs ETL activities from various sources using appropriate tools and technologies.
- Programming - Develops, tests, and maintains computer code and scripts to meet business and compliance requirements.
- Quality Assurance Metrics - Uses IT Operating Model (ITOM) and SDLC standards to assess solution quality.
- Solution Documentation - Documents solutions for improved productivity and knowledge transfer.
- Solution Validation Testing - Ensures configuration changes and solutions meet customer requirements.
- Data Quality - Identifies, understands, and corrects data flaws to enhance information governance.
- Problem Solving - Uses systematic analysis to identify root causes and implement robust solutions.
- Values differences - Recognizes and appreciates diverse perspectives and cultures.
QUALIFICATIONS
Education, Licenses, Certifications:
- Bachelor's or master's degree in Computer Science, Information Technology, Engineering, or a related field.
Experience:
- 8+ years of experience in data engineering or a related field, with experience in a leadership role.
- Intermediate experience in relevant disciplines is required.
- Knowledge of the latest data engineering technologies and trends is preferred, including:
- Analyzing complex business systems, industry requirements, and data regulations.
- Processing and managing large datasets.
- Designing and developing Big Data platforms using open-source and third-party tools.
- SPARK, Scala/Java, Map-Reduce, Hive, Hbase, and Kafka or equivalent.
- SQL query language.
- Cloud-based clustered compute implementation.
- Developing applications requiring large file movement in a cloud-based environment.
- Building analytical solutions.
- Intermediate experience in the following is preferred:
- IoT technology.
- Agile software development.
Job Systems/Information Technology
Organization Cummins Inc.
Role Category Remote
Job Type Exempt - Experienced
ReqID 2411156
Relocation Package No