- Design and implement big data ETL (extract, transform and load) pipelines for large scale (TB/PB) applications which involve moving data between data sources that include Hadoop Distributed File System, Hive, HBase, Kafka and various Relation Database Management systems (Microsoft SQL Server, Oracle and MySQL).
- Import and export large scale data using Sqoop from Hadoop Distributed File System to/from Relational Database Management Systems.
- Develop Hive Query Language code using built in Hive functions and extended Hive functionality by implementing custom Hive user defined functions (UDF) to generate aggregation reports.
- Design and implement ETL pipelines with both Map Reduce and Spark Big Data execution/processing engines.
- Design and develop Big Data applications using the CASK open source application development framework.
- Write Big Data ETL applications using programming languages like Java, Scala and Python.
- Orchestrate and manage resource allocation across Hadoop applications with Cron, Bash scripts, Oozie and YARN.
- Write Impala queries to generate ad-hoc reports on hot data.
- Generate deep insights from data and provide valuable business intelligence by joining multiple datasets, implementing Big Data Analytics and machine learning techniques across financial and information security related data.
- Pre-process Big Datasets using Pig Latin and build full text searches on ETL metrics data using SOLR.
- Consume SOAP/RESTful web services, parse and extract XML and JSON data between different systems hosting data.
- Develop software products using the Scrum agile process framework.
- Implement object-oriented programming concepts, design principles and patterns.
- Build and deploy Java/Scala Big Data applications using the Maven build tool and the continuous integration/deployment tool Jenkins.
- Application Development, monitoring/troubleshooting of Cloudera distributed Hadoop (CDH) clusters in different production and non-production environments.
- Develop, re-factor and enhance existing products/software artifacts and resolve product defects to meet customer issues.
- Alert and monitor production deployed Hadoop ETL applications and provide maintenance and support.
- Provide technical expertise and code reviews to team members.
- Write technical specifications and software documentation for ETL applications.
- Work both individually and as a part of a team as required and possess strong analytical, communication and interpersonal skills.
- Cross disciplinary knowledge in Big Data engineering, data security and the financial domain.
18 months of experience in the job offered or as a Big Data Engineer. Experience to include, 12 months of experience in the following:
- Design and implement big data ETL pipelines for large scale applications which involve moving data between data sources that include Hadoop Distributed File System, Hive, HBase, Kafka and various Relation Database Management Systems such as MS SQL Server, Oracle and MySQL.
- Develop advanced Hive Query Language code using built in Hive functions and extended Hive functionality by implementing custom Hive user defined functions (UDF) to generate aggregation reports.
Naztec International Group LLC
263 N Jog Road, West Palm Beach, FL 33413
Jesus M. Velarde – Operations Manager
Naztec International Group LLC
263 N Jog Road,
West Palm Beach, FL 33413
|Job Category||Developer, Information Technology, Programmer|
|Job Title||Hadoop Developer|
|Location||West Palm Beach - FL|