The Lead Data Engineer leads the design and development of tools and process enhancements data pipeline. This pipeline includes high-volume, complex ETL projects on a wide variety of data inputs such as test scores, school characteristics, directory, course enrollment, college readiness, postsecondary outcomes, and others.
As the primary architect of the data pipeline, the Lead Data Engineer supports Data Strategy in implementing scalable processes for analytics, and serves as the internal expert on all issues related to GreatSchools’ data infrastructure. They also collaborate closely with Data Science, TechOps, and Web Engineering functions and provide leadership and guide implementation for the processing and loading of data for display on the website and integration into data feeds.
This position reports to the VP, Data Strategy. It is a full time, exempt position with headquarters located in Oakland, CA (remote work ok).
GreatSchools is the leading national nonprofit empowering parents to unlock educational opportunities for their children and support stronger schools in their community. We aim to make school quality information relevant and actionable to support equity for underserved communities who have faced systemic racism and unjust barriers to access a quality education for their children. We are the only national organization that collects education data from all 51 state departments of education and the federal government and then provides analysis, insights and school quality ratings to parents, communities, advocacy organizations, researchers, policymakers and others.
GreatSchools.org reaches 45 million users per year, 40% of whom are from low-income households. We also work to uncover bright spots in American education and catalyze discussions about equity. We have worked for two decades to develop parent support resources, including articles, videos, podcasts, parent tips, and learning materials parents can use at home to strengthen the home-school connection and keep their child learning and on track.
Demonstrate expertise in GreatSchools data schemas:
Designs, owns and manages database warehouse and storage, serving as internal lead on where and how data is stored, and why. This includes understanding the complex interplay between data pipelines and product and software development.
Advise on and implement data architecture solutions, working closely with Web Engineering and Data Science in solving data architecture problems.
Lead improvements to GreatSchools data pipeline:
Identify and assess new tools and processes that increase the efficiency of ETL processes and analytics.
Serve as the lead project owner for defining and implementing tool and process enhancements.
Support the integration of Data Engineering and Data Science through writing scalable code.
Leads and owns the Data Engineering release process.
Advise and support Data Engineering team:
Develop tools and processes to support ETL developers through increased automation and quality control of data loading.
Provide guidance for the work of other data team members.
Advise and assist in size and difficulty assessment for infrastructure projects.
Lead and advise code review process.
Track and share best practices related to data-engineering.
Represent Data Engineering on our Data Governance and Data Strategy Council.
5+ years of experience as a Data Engineer or in a similar role designing and building scalable and robust data pipelines to enable data-driven decisions.
5+ years of experience using Python for processing of large data sets.
Clear and concise communication, able to articulate clearly with a diverse team.
Core competencies, applies to all GS staff members
Database design and management:
In-depth knowledge of relational databases (MySQL, PostgreSQL)
Proficiency in writing advanced SQL queries, and expertise in performance tuning of SQL queries
Experience with database transformation, modeling and normalization
Data pipeline development:
Experience with data processing and workflow management tools such as Spark, Airflow/Luigi, Azkaban, etc.
Experience working with AWS data technologies like Glue, S3, EMR, Lambda, DynamoDB, Redshift etc.
Experience integrating custom, open source, and purchased tools into robust systems
Programming and software development:
Proficiency in programming languages including Python (preferred), Ruby
Knowledge of professional software engineering practices & best practices for the full software development life cycle
Strong algorithm & data structure knowledge
Familiarity with common software development tools and methods e.g. JIRA, git, continuous deployment
Experience supporting and working with cross-functional teams in a dynamic environment
Experience with Agile Methodology preferred
Preferred working knowledge of school data
Remote or Oakland (CA)
When safe, periodic travel (2-3 times per year) to Oakland, CA may be necessary.