Purpose: find freelance data journalist assistance to do content writing and documentation for DataHub and Portal stack and offers.
Vignesh + Daniela (11 April 2024)
He is working for Thehundu as of today from 3.30pm to 9pm Indian time
He is doing exactly what we need. He is working on a data story. He is "proving it" by collecting, analysing, and preparing the data. He then creates visualizations and graphs. He is publishing all this in an article, eg. see https://www.thehindu.com/sci-tech/energy-and-environment/un-climate-chief-says-two-years-to-save-the-planet/article68051233.ece
He is available from around 7.30am IST to around 1-2 pm IST
Agenda:
- Check in
- Short intros
- Background and needs
- The latest version of DH Cloud - what it does and how it works
- Where we are in our project and what we need
- Datahub Cloud Onboarding
- Next steps
Context and needs
Write datasets and data stories and publish them with DH Cloud. We will use them to showcase what DH Cloud does which we hope will sell the product itself (indirectly). Roughly 2 types of posts:
- Data stories e.g. "How (or why) are interest rates are rising around the world"
- Target readers: everyone who is interested on the topic… (doesn't need to be a data person)
- Datasets: "here is this interesting dataset on DataHub". Or, even "here's an interesting one online" - though usually we will have a created a page on datahub for the dataset.
- Target readers: Data geeks aka data engineers, data scientists, data journalists people who play around with data
Datahub Cloud Onboarding
- Clicking on the template doesn't open in a new tab (so they basically go out of the app)
We are going to open this up to a wider audience. So that they can create data stories on their own and use them to publish datasets.
- One is to test and get shortcomings from datahub
- Two is to create interesting examples/showcases that we can market
Great I can add a chart but how do I add my data there?
He can start working immediately. Hourly.
Next steps
- He will first get back and give us his rate
- Then he will prep the contract and we sent it to us to review and approve
- Then he will get my Whatsapp and share 4-5 data stories with me
- I will select the first one that I want him to work on
- He will create it, prepare his data and ping me
- We will get on a call where I will walk him through the process of updating eg. the barchart with his data
Vignesh + Rufus (March 2023)
Vignesh was a data journalist we found on upwork. In the end paused but good set of background notes.
Agenda
- Check-in
- Background on the need
- What are we going to do? See below
- Who are the target readers? Data geeks aka data engineers, data scientists, data journalists people who play around with data …
- Examples of Vignesh work
- Summary: Two overall goals
- Publish content that is interesting and gets traffic
- Try out the tooling (be a guinea pig)
Background
Have https://DataHub.io a community oriented data publishing platform and marketplace. Aimed at "power data users": data analysts, data journalists, data engineers, data scientists.
Two distinct complementary aspects:
- Easy data and data-rich content publishing / sharing
- Marketplace/catalog of datasets: think kaggle datasets, or statista like in some aspects
You can checkout the site, blog and collections:
What's the brief? (and why)
Write more posts on datahub blog. Roughly three types of post:
- Data stories e.g. "How (or why) are interest rates are rising around the world"
- Datasets: "here is this interesting dataset on DataHub". Or, even "here's an interesting one online" - though usually we will have a created a page on datahub for the dataset.
- LOW PRIORITY: data tooling and data wrangling
Dataset vs data story posts
Any given post exists on some spectrum between what we term a "dataset" post and a "data story" post.
<---------------->
datasets data story
A dataset post is a simple write-up/README for a dataset e.g. this dataset is about XXX, it has these columns, these rows etc.
A data story post is telling a data story or writing up a dataset in a data story way e.g. this dataset is interesting because we can answer this question, tell this story etc.
In general, a dataset is made more interesting by a question i.e. a data story. E.g. I have a dataset on real estate prices across major cities. Story: where are the most expensive places to live on the planet? Or I have a dataset on suicides per year. Story: is suicide up and down and where. Hence, whilst it is simpler to write a dataset README and more engaging to write the data story.
Why write posts?
Write this content so that …
- We present more datasets, better (via writing up the dataset or creating a data story) so that more people find and use the datasets and find the platform
- For easy data and data-rich content publishing: this acts as evangelism and discovery
- As a Marketplace of datasets: Content marketing of the datasets
- Show off the platform i.e. show what you can do with the site
- Demo the product i.e. the data publishing systems
- To dogfood the site - try out the tools ourselves, try out the datasets ourselves
- Data wrangler/geek (journalists) evangelism
- Content marketing type posts about tools, datasets you can use b/c this attracts our audience for either use (publishing or marketplace)
Tasks brainstorm
Dataset or data story posts
- Find examples of datasets to wrangle or simple data stories to write by looking at the board https://github.com/orgs/datasets/projects/1 for example
- https://github.com/datasets/awesome-data/issues/305 - most expensive cities-
- Write up a dataset topic / collection e.g. "here are some interesting datasets on climate change"
- Here's and example of a "dataset topic" blog: https://datahub.io/blog/machine-learning-datasets (BTW this could definitely be updated …)
- Look at collections: https://datahub.io/collections
- Reproduce examples from openly licensed ourworldindata.org following DataHub tutorial: https://datahub.io/blog/tutorial-publishing-data-rich-doc-on-datahub
- Review some of their good topics and reproduce (explain and approach as an exercise in our tooling)
- Specific example: https://ourworldindata.org/brief-history-of-AI
- Why?
- Test out DataHub docs and approach e.g. can intersperse charts with rich text
- Good practice for writing our own rich topical articles
- Have a potential story about "how i reproduced X with this tool" (cf how people build twitter clones to demo a new tech stack)
- Create data story in form of simple two number comparison visualizations. Here's example and background https://datahub.io/blog/comparotron-a-simple-way-to-visualize-and-share-comparisons
- You could start by just comping up with a few examples of nice comparisons with perhaps a para of background text (don't even need visualizations)
Inspirations:
- https://twitter.com/jburnmurdoch/status/1641799627128143873 - data journalism
- Idea: Fertility rates and population growth over time, how are they related - see https://www.google.com/search?client=opera&q=fertility+rates+in+africa&sourceid=opera&ie=UTF-8&oe=UTF-8
- Also see JEP paper
Tool posts
Here's an example of writing up a tool / technology (sort of content marketing for what we are doing): https://datahub.io/blog/attribute-relation-file-format-arff
This is pretty obscure. What i'd suggest doing would be to come up with some examples for other things we could write up and then i could say "yes, go for that one" (sort of like pitching your editor!)
Some ideas so far:
- Post about csv command line tools
- Post about https://github.com/BurntSushi/xsv