Bad Data

Bad Data

Real-world examples of how not to do data

Bad Data is a site providing real-world examples of how not to prepare or provide data. It showcases the poorly structured, the mis-formatted, or the just plain ugly.

Its primary purpose is to serve as an educational tool for governments and other organizations – though there may also be some aspect of entertainment.

It is also a good source of practice material for budding data wranglers—those tasked with cleaning and transforming data (in fact the repo in fact began as a place to keep practice data for Data Explorer).

New examples are wanted and welcome – submit them here »

History

Examples

Add an Example

New examples are wanted! Here's how to submit one.

What information to provide

What's we'd like to see in a good example is:

  • Original data url (and name of organization providing the data)
  • File format (e.g. CSV, Shapefile etc)
  • A description of what's wrong
  • A nice illustrative image or screenshot

Optional (but desirable): a backup of the data in case it goes away! (If the file is more than ~100kb please provide a chopped down version illustrating the main "badness").

Also don't forget to credit yourself in an appropriate way (if you want to be credited!).

Lastly, note that your contribution will be licensed under this site's general CC Attribution License.

How to Contribute

Option 1: Fork and Pull

This website is stored in a github repo which you can fork and pull to add your example. Here are detailed instructions:

  1. Fork the bad-data github repository

  2. Choose a "slug" for your example e.g. my-bad-data-example

  3. Copy and paste the ex/template/ to a directory ex/{your-slug}

  4. Edit the ex/{your-slug}/index.md file

    • Change the frontmatter attributes (i.e. key/value items at very top of file) as appropriate
    • Add the description (markdown formatted) to the main section of that page
  5. Commit the new file

  6. Then submit the pull request

Option 2: Open an Issue

If the thought of forks and pulls give you the jitters there's an ultra-simple alternative: just open an issue in the issue tracker and add the information requested!

© 2024 All rights reserved

Built with DataHub LogoDataHub Cloud