Core Data Curators
The Core Data Curators curate the core data.
Curation involves identifying and locating core (public) data, and packaging it up as high-quality data packages.
New team members wanted: We are always seeking volunteers to join the Data Curators team. Get to be part of a crack team and develop and hone your data wrangling skills whilst helping to provide high quality data to the community.
- Anyone can contribute: details on the roles and skills needed below.
- Get involved: read more below or jump straight to the sign-up section.
- Data Curators Guide: You can dive straight in and start packaging data by following the core data curators guide.
What Roles and Skills are Needed
We have a variety of roles from identifying new "core" data, to collecting and packaging the data, to performing quality control.
Core Skills – at least one of these skills is strongly recommended:
- Data Wrangling Experience. Many of our source data are not complex (just an Excel file or similar) and can be "wrangled" in a Spreadsheet program. What we therefore recommend is at least one of:
- Experience with a Spreadsheet application such as Excel or (preferably) Google Docs including use of formulas and (desirably) macros (you should at least know how you could quickly convert a cell containing '2014' to '2014-01-01' across 1000 rows)
- Coding for data processing (especially scraping) in one or more of python, javascript, bash
- Data sleuthing - the ability to dig up data on the web (specific desirable skills: you know how to search by filetype in google, you know where the developer tools are in chrome or firefox, you know how to find the URL a form posts to)
Desirable Skills (the more the better!):
- Data vs Metadata: know difference between data and metadata
- Familiarity with Git (and Github)
- Familiarity with a command line (preferably bash)
- Know what JSON is
- Mac or Unix is your default operating system (will make access to relevant tools that much easier)
- Knowledge of Web APIs and/or HTML
- Use of curl or similar command line tool for accessing Web APIs or web pages
- Scraping using a command line tool or (even better) by coding yourself
- Know what a Data Package and a Tabular Data Package are
- Know what a text editor is (e.g. notepad, textmate, vim, emacs, …) and know how to use it (useful for both working with data and for editing Data Package metadata)
Get Involved - Sign Up Now
Here's what you need to know when you sign up:
- Time commitment: Members of the team commit to at least 8-16h per month (though this will be an average - if you are especially busy with other things one month and do less that is fine)
- Schedule: There is no schedule so you can contribute at any time that is good for you - evenings, weekeneds, lunch-times etc
- Location: all activity will be carried out online so you can be based anywhere in the world
- Skills: see above
To register your interest please get in touch via the issue tracker here https://github.com/datasets/awesome-data/issues