Job Summary
As a Data Engineer at TecCentric, you will collaborate with architects and other engineers to advise, prototype, develop, and debug Google Cloud Platform data infrastructures. You will get the chance to work on real-world data challenges that our clients are currently dealing with. Engagements range from purely consultative to heavily hands-on, and cover a wide range of domain areas, including data migrations, data archival, and disaster recovery, as well as big data analytics solutions that require a combination of batch or streaming data pipelines, data lakes, and data warehouses.
You will be involved in the design and execution of several projects. You will operate autonomously with little supervision. You will also be involved in client-facing conversations in your field of expertise.
Principal Responsibilities
- Solve complex challenges by developing innovative software systems and analyzing petabytes of data.
- Adapt fast to best practices in software engineering. Work together to create high-quality software.
- Develop, construct, and administer petabyte-scale data pipelines.
- Collaborate with the engineering, project management, and solution architecture teams to develop new products and capabilities for our clients.
- Study functional programming.
- Ensure that produced software is testable and tested.
- Actively engage in conversations about architecture, design, and execution.
- Participate actively in discussions on architecture, design, and implementation.
- Actively engage in the planning, execution, and success of complicated technological projects.
Qualifications
- Bachelor’s degree in Computer Science, Data Science, Software Engineering, Mathematics related technical field, or equivalent practical experience.
- Expertise in at least one of the following domains:
- Big Data: managing Hadoop clusters (all included services), troubleshooting cluster operation issues, migrating Hadoop workloads, architecting solutions on Hadoop, experience with NoSQL data stores such as Cassandra and HBase, building batch/streaming ETL pipelines with frameworks such as Spark, Spark Streaming and Apache Beam, and working with messaging systems like Pub/Sub, Kafka and RabbitMQ.
- Data warehouse modernization: building comprehensive data warehouse solutions, including technical architectures, star/snowflake schema designs, infrastructure components, ETL/ELT pipelines and reporting/analytic tools. Must have hands-on experience working with batch or streaming data processing software (such as Beam, Airflow, Hadoop, Spark, Hive).
- Data migration: migrating data stores to reliable and scalable cloud-based stores, including strategies for minimizing downtime. May involve conversion between relational and NoSQL data stores, or vice versa.
- Backup, restore & disaster recovery: building production-grade data backup and restore, and disaster recovery solutions. Up to petabytes in scale.
- Experience writing software in one or more languages such as Python, Java, Scala, or Go.
- Experience building production-grade data solutions (relational and NoSQL)
- Experience with systems monitoring/alerting, capacity planning and performance tuning.
- Experience in technical consulting or other customer facing role.
- Experience working with Google Cloud data products (CloudSQL, Spanner, Cloud Storage, Pub/Sub, Dataflow, Dataproc, Bigtable, BigQuery, Dataprep, Composer, etc),
- Experience with IoT architectures and building real-time data streaming pipelines.
- Applied experience operationalizing machine learning models on large datasets.
- Knowledge and understanding of industry trends and new technologies and ability to apply trends to architectural needs.
Certifications (preferably, any of the following):
Google Certified Professional Data Engineer