Raw file urls
Raw file urls
Situation, Complications & Hypothesis
We used to have /r/...
urls. and they are not working and we didn't tell people.
S: We used to have these working /r/...
urls
C: they don't work and we didn't tell people
H: Tell people and/or reinstate something similar (including redirects at least for a time!)
We want raw perma-urls
S: We used to have perma-urls for raw files and specifically data. Now we don't and we're linking directly to file location in the R2 bucket. C: Direct links to R2 are prone to breakage because:
- we may change the R2 public domain (e.g. use Cloudflare worker as a proxy)
- we may change R2 bucket schema
- we may want to add in permissions at some point
- we may want analytics of some kind
We used to auto-generate html pages for data files (resources)
S: We used to have pages like /r/{resource-name}.html
which provided a basic html rendering of the file
C: This has disappeared.
H: Bring this back but have these files at their normal location.
We used to auto-generate json versions of CSV data
S: We had /r/data.json
auto-generated from /r/data.csv
C: These don't exist anymore and some people might want them
H: (For now) Don't generate these in our app b/c this is significant complication and we need to think about this (it's part of pipeline work). We can make suggestion about doing this generation in the underlying datasets …
What this really means is the hypothesis is telling people this has gone away … (perhaps better would be redirect/response to /r/xxx.json pages to give a useful error / info message 😉)
Overall hypothesis:
- Reinstate raw links in some way at some special url e.g. /_r/… or /_raw/
- Reinstate html rendering in some way (and "normal" file path) i.e. /@xx/project/data/myfile.csv would render an html version of the csv ie. with a nice table and any info from datapackage info.
- Do NOT reinstate json generation for now
- Redirect or return useful error messages for old /r/… urls (may be a bit of work as involves looking up info in datapackage.json but that's not the end of the world
- (?) write a blog post about this … (can be referenced in the error message point)
Extras:
- Use those raw links in our frontend code when creating charts
NB: we can proceed incrementally doing the first item and then 4th (add some redirects) then come and do 2nd and then more of 4th etc
Solution
New endpoint for fetching raw file contents: /@{username}/{projectName}/_r/{branch}/{path-to-file}
With branch
being set to -
placeholder for now.
- Will redirect to the file's R2 location, e.g.
https://r2.datahub.io/clt98ke170001ia08btn3tzgm/main/raw/data/vix-daily.csv
- Will work for any file, both media, data and markdown files.
Also: Replace direct links to R2 in frontend with the new permalinks.
Old endpoint for fetching/displaying RESOURCE contents at /@{username}/{projectName}/r/{resource-name-or-id-datapackage}.{extension}
Note: no support for nested datasets.
- Will lookup resource path in the datapackage basedd on resource name or id.
- If found and extension same as in file path:
- Will redirect (permanent, 301) to new
/_r/-/...
path.
- Will redirect (permanent, 301) to new
- If found and extension in URL is
.json
or.html
:- Return 404 with useful message and link to original file contents at
/_r/-/...
(for now, until we add support or decide to drop support definitively)
- Return 404 with useful message and link to original file contents at
Rabbit holes
- Worrying about conflicts with
/_r
path - Worrying about permissions (e.g. files from private repos)
No-goes
- An endpoint for CSVs converted to JSONs. (
/r/xyz.json
) - An endpoint for file previews (for now at least, until it's needed). (
/r/xyz.html
)