↖ Writing

Matching Obsidian links with Regex

Matching Obsidian links with Regex

To match the basic types1 of links in Obsidian, we need a Regex that captures the below:

  • Simple Markdown links ([[Obsidian]])
  • Markdown links with alias ([[PKM app|Obsidian]])
  • File embeds (![[obsidian-banner.png]])
  • File embeds with aliases (![[Obsidian Banner|obsidian-banner.png]])

This regular expression captures all link types mentioned before:

const mdReg = /(!)?\[\[(?:(.+?)\|)?(.+?)\]\]/g;

Let’s break that down:

  1. (!)? (Optional capture group): Captures the ! character before a link. If there is none, returns null.
  2. \[\[: Two open square brackets. The [ character needs to be escaped with a backslash in regular expressions.
  3. (?:(.+?)\|)? (Optional capture group): Matches characters between the open brackets and a pipe |. This is the alias of a link or embed. If no match, returns null.
  4. (.+?) (Capture group): Captures all characters within double square brackets. This matches the text within [[ ]].
  5. \]\]: Closing square brackets (need to be escaped with \).

You can test this RegEx and see a visual explanation on Regexr.

Using the Regex in JavaScript

Using JavaScript, we can then iterate over those matches:

const content = "…"; // A string with some links in it

const mdRegex = /(!)?\[\[(?:(.+?)\|)?(.+?)\]\]/g;
const matches = content.matchAll(mdRegex);

for (const match of matchesMd) {
  const embedPrefix = match[1]; // -> "!" or `null`
  const alt = match[2]; // -> alt text or `null`
  const link = match[3]; // -> link text or `null`
  console.log([embedPrefix, alt, link]);
}

This is the output we get when replacing the content variable with different types of Markdown:

const content = "[[Get started with Obsidian]]";
// … running the match of matchesMd loop

// -> [null,null,"Get started with Obsidian"]
const content =
  "[[From plain-text note-taking|I have used plain-text based apps]]";
// … running the match of matchesMd loop

// -> [null,"From plain-text note-taking","I have used plain-text based apps"]

Matching an Embed

const content = "![[obsidian-banner.png]]";
// … running the match of matchesMd loop

// -> ["!",null,"obsidian-banner.png"]

Matching an Embed with alias

const content = "![[screenshot of an obsidian graph|obsidian-graph.png]]";
// -> ["!","screenshot of an obsidian graph","obsidian-graph.png"]

That’s pretty good already!

But we can make this even more convinient and powerful by moving the logic into a reusable function. When we call getMdMatches() and pass it a string, the function returns an object for each match.

const getMdMatches = (content) => {
  const mdReg = /(!)?\[\[(?:(.+?)\|)?(.+?)\]\]/g;
  const matchesMd = content.matchAll(mdReg);
  const matchesArr = [...matchesMd];
  const outputArray = matchesArr.map((match) => ({
    link: match[3],
    alt: match[2] ?? "",
    isEmbed: Boolean(match[1]),
    original: match[0],
  }));

  return outputArray;
};

Example output of the function

const content = `[[Get started with Obsidian]]
[[From plain-text note-taking|I have used plain-text based apps]]
![[obsidian-banner.png]]
![[screenshot of an obsidian graph|obsidian-graph.png]]`;

const mdMatches = getMdMatches(content);
console.log(mdMatches);

// [
//   {
//     "link": "Get started with Obsidian",
//     "alt": "",
//     "isEmbed": false,
//     "original": "[[Get started with Obsidian]]"
//   },
//   {
//     "link": "I have used plain-text based apps",
//     "alt": "From plain-text note-taking",
//     "isEmbed": false,
//     "original": "[[From plain-text note-taking|I have used plain-text based apps]]"
//   },
//   {
//     "link": "obsidian-banner.png",
//     "alt": "",
//     "isEmbed": true,
//     "original": "![[obsidian-banner.png]]"
//   },
//   {
//     "link": "obsidian-graph.png",
//     "alt": "screenshot of an obsidian graph",
//     "isEmbed": true,
//     "original": "![[screenshot of an obsidian graph|obsidian-graph.png]]"
//   }
// ]

The function works with any amount of text in between links:

const content = `First of all, tell me a little bit about what's your experience with note-taking apps like?

-> [[No prior experience|I have no prior experience]]
-> [[From standard note-taking|I’ve used note-taking apps like Evernote and OneNote]]
-> [[From plain-text note-taking|I have used plain-text based apps]]

![[obsidian-banner.png]]

→ [[Get started with Obsidian]]`;

const mdMatches = getMdMatches(content);
console.log(mdMatches);

// [
//   {
//     "link": "I have no prior experience",
//     "alt": "No prior experience",
//     "isEmbed": false,
//     "original": "[[No prior experience|I have no prior experience
// ]]"
//   },
//   {
//     "link": "I’ve used note-taking apps like Evernote and OneNote
// ",
//     "alt": "From standard note-taking",
//     "isEmbed": false,
//     "original": "[[From standard note-taking|I’ve used note-takin
// g apps like Evernote and OneNote]]"
//   },
//   {
//     "link": "I have used plain-text based apps",
//     "alt": "From plain-text note-taking",
//     "isEmbed": false,
//     "original": "[[From plain-text note-taking|I have used plain-
// text based apps]]"
//   },
//   {
//     "link": "obsidian-banner.png",
//     "alt": "",
//     "isEmbed": true,
//     "original": "![[obsidian-banner.png]]"
//   },
//   {
//     "link": "Get started with Obsidian",
//     "alt": "",
//     "isEmbed": false,
//     "original": "[[Get started with Obsidian]]"
//   }
// ]

Use case: Transforming text for blog posts

I write my blog posts in Obsidian and then move them into a Nuxt site. There are a few transformations that I need to apply to make Obsidian’s markdown work with Nuxt.

  • Internal links ([[…]]) are replaced with the equivalent public note
  • Embeds (![[…]]) are replaced with the Markdown syntax (![alt](link)) and the files are moved to the blog folder.

Using getMdMatches() makes it easy to match all links, format them in the way I need and replace the original link.

Beyond that, there may be many use cases where it is helpful to extract Obsidian links and embeds from a body of text. The regex and functions mentioned in this post provide a reliable and convenient way to transform your notes to your heart’s desire!

Further reading

Footnotes

  1. This list doesn’t include heading links (#) or block references (^). I don’t use them often enough. 🤷‍♂️