We’re hiring a Research Engineer strongly committed to the principles of free knowledge, open source and open data, transparency, privacy, and collaboration to join the Research team. As a Research Engineer on our team, you will support the research scientists in addressing knowledge gaps on the Wikimedia projects, supporting the Wikimedia volunteers in improving knowledge integrity, and building a more global community of Wikimedia researchers. We’re accepting applications until the 31st of August with a start date by, or before, October 30th.
You’ll work remotely with a distributed team, with members spread between Europe and North America. Here are some things we’ve worked on recently that might give you a better sense of what you could be working on:
- We built a hyperlink recommendation algorithm (by building on past research) to support the Growth team in their newcomer task recommendations.
- We used readers’ trajectories on Wikipedia to inform Wikipedia editors about COVID-19 related pages that readers seek to gain information from. (code)
- We worked with the Analytics, Legal, and Security teams to find a privacy-respectful way to store COVID-19 related page-view traces beyond the 90-day limit that is our standard for purging this data. (code)
- We ran surveys in Wikipedia across 14 languages and collected demographics data from the Wikipedia readers and their motivation and needs to study the effect of demographics on reader behavior. (ongoing results)
- We built an NLP model to identify Unsourced Statements in Wikipedia articles. (paper, code)
You can learn more about what we have done in the past six month by reading our biannual report.
You will be responsible for:
- Defining engineering projects to improve the research scientists’ workflows. For example, in collaboration with the Legal, Security, and Analytics teams you will be developing a process for public data releases by the team.
- Collaborating with Analytics Engineering and Machine Learning Platform teams, to improve data collection and data sanitization and processing
- Building experimental APIs for the models developed by the team
- Writing distributed computing code in Spark for the algorithms developed by the research scientists
- Acting as the Research team’s engineering contact for internal and external conversations and decision making
Skills and experience:
- Experience working as a research or data engineer on complex applied research projects
- Comfortable with mathematics and the basics of statistics
- Strong understanding of Computer Science fundamentals such as: algorithms, data structures and complexity
- Familiarity with scientific computing libraries in Python. Experience with open source machine learning libraries such as scikit-learn and deep learning frameworks such as Keras, TensorFlow or Pytorch
- Experience with Hadoop and related technologies: HDFS, YARN, MapReduce, Hive, Spark, etc. (more info about our Hadoop cluster and analytics servers)
- Experience with MySQL/Postgres technologies
- Experience developing RESTful APIs for data retrieval
- Strong written and oral communication skills in English, including the ability to communicate complex technical issues to a cross-team and cross-functional audience
- BS, MS, or PhD in Computer Science, Mathematics, Statistics, or a closely related engineering field; or the equivalent in related work experience
We know that you won’t know how all of our systems work on day one. With solid fundamentals and teamwork, you will get there.
Qualities that are important to us:
- Commitment to the mission of the organization
- Commitment to our guiding principles
- Ability to disagree in a respectful manner and yet work towards a solution even when you disagree
- Willingness to understand math and algorithms
- Good at async communication
- Solution-focused. The Wikimedia ecosystem is complex, resources are limited, and our guiding principles are ambitious. We want you to work to find solutions embracing these factors.
- Self motivated
- Ability to navigate through ambiguity and bring a project to completion with limited directions
- Curiosity and commitment to learn
Additionally, we’d love it if you have:
- A portfolio of open source programming projects
- Experience in label collection using crowdsourcing platforms or large-scale systems
- Production-level experience with Hadoop, Spark, Flink, Hive, Kafka, etc.
- Experience working with volunteers
- Experience editing Wikipedia or other Wikimedia or open data / knowledge projects