MarkdownDB

MarkdownDB

A MarkdownDB is a ContentBase where the primary text files are markdown.

A database where the "data" is markdown.

Markdown with frontmatter combines unstructured content with structured data with an open, highly accessible format.

Combined that with blocks i.e. backtick labelled sections and you have extensible unstructured content in which you can embed other structured or unstructured content.

Combine that with web components / JSX and you have a full inline javascript.

More reasons notes/markdown-is-eating-the-world (aka markdown is cool) 😎

Job Stories High Level

i.e. why do i want a markdowndb

When generating rendered pages I want to get a list of all or some of the markdown files we have so that i can render them and create blog index etc

When generating an individual rendered page i want information about all the pages that reference it so that i can list all the backlinks

When creating a tag page I want to list all pages with that tag so that I can generate that page

Job stories

Index a folder of files

I want to create a db index given a folder of markdown and other files

Bonus

  • Index multiple folders (with support for configuring e.g. prefixing in some way e.g. i have all my blog files in this separate folder over here)

Structured data extraction

Frontmatter

Extract frontmatter

  • deal with nested frontmatter
  • deal with casting types e.g. string, number so that we can query in useful ways e.g. find me all blog posts before date X

Tags

Extracts tags in frontmatter with tags attribute and in body like #abc

I want to extract links so i can compute backlinks or deadlinks etc

  • standard markdown links
  • obsidian wiki links
  • embeds of files e.g. ![...]
    • wiki link embeds of files

So all of these

[...](...)
![...](...)
[[...]]   # wiki link style including with title
[[...|my title]]   # wiki link style including with title
![[...]]  # for images and other embeds

Tasks

Extracting tasks like:

- [ ] this is a task

See obsidian data view.

Computed fields (or just any operations on incoming records)

cf https://www.contentlayer.dev/docs/reference/source-files/define-document-type#computedfields

When loading a file i want to create new fields based on some computation so that i can have additional metadata

  • Add a type based on the folder of the file so that i can label blog posts
  • Add layout based on the folder so i can change layouts based on folder

Validation of data on way in

When loading a file I want to validate it against a schema/type so that I know the data in the database is "valid"

  • When validation fails what happens?
  • Error messages should be super helpful
  • Follow the principle of erroring early

When loading a file i want to allow "extra" metadata by default so that i don't get endless warnings about

When accessing a File I want to cast it to a proper typescript type so that i can use it from code with all the benefits of typescript

BYOT (bring your own types)

When working with markdowndb i want to create my own types … so that when i get an object out it is cast to the right typescript type

Misc

CLI tool for indexing

When working with a folder of markdown files I want to create a markdowndb (index) on the command line so that I it to use

Architecture

Tasks

  • What is the pipeline
  • How do we have have "plugins" e.g. for adding

Schema

  • Nodes / Files
  • Blocks within them

Node properties:

  • type
      • text (markdown)
    • blob
  • metadata
  • body/payload ? not sure we have this in DB (this is on disk / storage)
  • blob_type (? or inferred)

© 2024 All rights reserved

Built with DataHub LogoDataHub Cloud