blog/content/better-custom-ids-orgmode.md

300 lines
12 KiB
Markdown
Raw Blame History

This file contains ambiguous Unicode characters

This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.

+++
title = "[EN] Automatic Meaningful Custom IDs for Org Headings"
author = ["Lucien “Phundrak” Cartier-Tilet"]
date = 2020-06-06
tags = ["emacs", "orgmode"]
categories = ["emacs", "linux", "conlanging", "orgmode"]
draft = false
[menu.main]
weight = 2001
identifier = "en-automatic-meaningful-custom-ids-for-org-headings"
+++
Spoiler alert, I will just modify a bit of code that already exists, go
directly to the bottom if you want the solution, or read the whole post if
you are interested in how I got there.
<div class="ox-hugo-toc toc local">
<div></div>
- [The issue](#the-issue)
- [A first solution](#a-first-solution)
- [These headers are not meaningful](#these-headers-are-not-meaningful)
</div>
<!--endtoc-->
## The issue {#the-issue}
About two to three years ago, as I was working on a project that was meant
to be published on the internet, I looked for a solution to get fixed anchor
links to my various headings when I performed HTML exports. As some of you
may know, by default when an Org file is exported to an HTML file, a random
ID will be generated for each header, and this ID will be used as their
anchor. Heres a quick example of a simple org file:
```org
#+title: Sample org file
* First heading
Reference to a subheading
* Second heading
Some stuff written here
** First subheading
Some stuff
** Second subheading
Some other stuff
```
<div class="src-block-caption">
<span class="src-block-number">Code Snippet 1</span>:
Example org file
</div>
And this is the result once exported to HTML (with a lot of noise removed
from `<head>`):
```html
<html xmlns="http://www.w3.org/1999/xhtml" lang="en" xml:lang="en">
<head>
<title>Sample org file</title>
<meta name="generator" content="Org mode" />
<meta name="author" content="Lucien Cartier-Tilet" />
</head>
<body>
<div id="content">
<h1 class="title">Sample org file</h1>
<div id="outline-container-orgd8e6238" class="outline-2">
<h2 id="orgd8e6238"><span class="section-number-2">1</span> First heading</h2>
<div class="outline-text-2" id="text-1">
<p>
Reference to a subheading
</p>
</div>
</div>
<div id="outline-container-org621c39a" class="outline-2">
<h2 id="org621c39a"><span class="section-number-2">2</span> Second heading</h2>
<div class="outline-text-2" id="text-2">
<p>
Some stuff written here
</p>
</div>
<div id="outline-container-orgae45d6b" class="outline-3">
<h3 id="orgae45d6b"><span class="section-number-3">2.1</span> First subheading</h3>
<div class="outline-text-3" id="text-2-1">
<p>
Some stuff
</p>
</div>
</div>
<div id="outline-container-org9301aa9" class="outline-3">
<h3 id="org9301aa9"><span class="section-number-3">2.2</span> Second subheading</h3>
<div class="outline-text-3" id="text-2-2">
<p>
Some other stuff
</p>
</div>
</div>
</div>
</div>
</body>
</html>
```
<div class="src-block-caption">
<span class="src-block-number">Code Snippet 2</span>:
Output HTML file
</div>
As you can see, all the anchors are in the fomat of `org[a-f0-9]{7}`. First,
this is not really meaningful if you want to read the anchor and guess where
it will lead you. But secondly, these anchors will change each time you
export your Org file to HTML. If I want to share a URL to my website and to
a specific heading,… well I cant, it will change the next time I update the
document. And I dont want to have to set a `CUSTOM_ID` property for each
one of my headings manually. So, what to do?
## A first solution {#a-first-solution}
A first solution I found came from [this blog post](https://writequit.org/articles/emacs-org-mode-generate-ids.html), where Lee Hinman
described the very same issue they had and wrote some Elisp code to remedy
that (its a great read, go take a look). And it worked, and for some time I
used their code in my Emacs configuration file in order to generate unique
custom IDs for my Org headers. Basically what the code does is it detects if
`auto-id:t` is set in an `#+OPTIONS` header. If it is, then it will iterate
over all of the Org headers, and for each one of them it will insert a
`CUSTOM_ID`, which is made from a UUID generated by Emacs. And tada! we get
for each header a
`h-[0-9a-f]{8}-[0-9a-f]{4}-[0-9a-f]{4}-[0-9a-f]{4}-[0-9a-f]{12}` custom ID
that wont change next time we export our Org file to HTML when we save our
file, and only for headings which dont already have a `CUSTOM_ID` property.
Wohoo!
Except…
## These headers are not meaningful {#these-headers-are-not-meaningful}
Ok, alright, thats still a huge step forward, we dont have to type any
CUSTOM\_ID property manually anymore, its done automatically for us. But,
when I send someone a link like
`https://langue.phundrak.com/eittland#h-76fc0b91-e41c-42ad-8652-bba029632333`,
the first reaction to this URL is often something along the lines of “What
the fuck?”. And theyre right, this URL is unreadable when it comes to the
anchor. How am I supposed to guess it links to the description of the vowels
of the Eittlandic language? (Thats a constructed language Im working on,
you wont find anything about it outside my website.)
So, I went back to my configuration file for Emacs, and through some trial
and error, I finally found a way to get a consistent custom ID which is
readable and automatically set. With the current state of my code, what you
get is the complete path of the Org heading, all spaces replaced by
underscores and headings separated by dashes, with a final unique identifier
taken from an Emacs-generated UUID. Now, the same link as above will look
like
`https://langue.phundrak.com/eittland#Aperçu_structurel-Inventaire_phonétique_et_orthographe-Voyelles_pures-84f05c2c`.
It wont be more readable to you if you dont speak French, but you can
guess it is way better than what we had before. I even added a safety net by
replacing all forward slashes with dashes. The last ID is here to ensure the
path will be unique in case wed have two identical paths in the org file
for one reason or another.
The modifications I made to the first function `eos/org-id-new` are minimal,
where I just split the UUID and get its first part. This is basically a way
to simplify it.
```emacs-lisp
(defun eos/org-id-new (&optional prefix)
"Create a new globally unique ID.
An ID consists of two parts separated by a colon:
- a prefix
- a unique part that will be created according to
`org-id-method'.
PREFIX can specify the prefix, the default is given by the
variable `org-id-prefix'. However, if PREFIX is the symbol
`none', don't use any prefix even if `org-id-prefix' specifies
one.
So a typical ID could look like \"Org-4nd91V40HI\"."
(let* ((prefix (if (eq prefix 'none)
""
(concat (or prefix org-id-prefix)
"-"))) unique)
(if (equal prefix "-")
(setq prefix ""))
(cond
((memq org-id-method
'(uuidgen uuid))
(setq unique (org-trim (shell-command-to-string org-id-uuid-program)))
(unless (org-uuidgen-p unique)
(setq unique (org-id-uuid))))
((eq org-id-method 'org)
(let* ((etime (org-reverse-string (org-id-time-to-b36)))
(postfix (if org-id-include-domain
(progn
(require 'message)
(concat "@"
(message-make-fqdn))))))
(setq unique (concat etime postfix))))
(t (error "Invalid `org-id-method'")))
(concat prefix (car (split-string unique "-")))))
```
Next, we have here the actual generation of the custom ID. As you can see,
the `let` has been replaced by a `let*` which allowed me to create the ID
with the variables `orgpath` and `heading`. The former concatenates the path
to the heading joined by dashes, and `heading` concatenates `orgpath` to the
name of the current heading joined by a dash if `orgpath` is not empty. It
will then create a slug out of the result, deleting some elements such as
forward slashes or tildes, and all whitespace is replaced by underscores. It
then passes `heading` as an argument to the function described above to
which the unique ID will be concatenated.
```emacs-lisp
(defun eos/org-custom-id-get (&optional pom create prefix)
"Get the CUSTOM_ID property of the entry at point-or-marker POM.
If POM is nil, refer to the entry at point. If the entry does not
have an CUSTOM_ID, the function returns nil. However, when CREATE
is non nil, create a CUSTOM_ID if none is present already. PREFIX
will be passed through to `eos/org-id-new'. In any case, the
CUSTOM_ID of the entry is returned."
(interactive)
(org-with-point-at pom
(let* ((orgpath (mapconcat #'identity (org-get-outline-path) "-"))
(heading (replace-regexp-in-string
"/\\|~\\|\\[\\|\\]" ""
(replace-regexp-in-string
"[[:space:]]+" "_" (if (string= orgpath "")
(org-get-heading t t t t)
(concat orgpath "-" (org-get-heading t t t t))))))
(id (org-entry-get nil "CUSTOM_ID")))
(cond
((and id
(stringp id)
(string-match "\\S-" id)) id)
(create (setq id (eos/org-id-new (concat prefix heading)))
(org-entry-put pom "CUSTOM_ID" id)
(org-id-add-location id
(buffer-file-name (buffer-base-buffer)))
id)))))
```
The rest of the code is unchanged, here it is anyway:
```emacs-lisp
(defun eos/org-add-ids-to-headlines-in-file ()
"Add CUSTOM_ID properties to all headlines in the current file
which do not already have one.
Only adds ids if the `auto-id' option is set to `t' in the file
somewhere. ie, #+OPTIONS: auto-id:t"
(interactive)
(save-excursion
(widen)
(goto-char (point-min))
(when (re-search-forward "^#\\+OPTIONS:.*auto-id:t"
(point-max)
t)
(org-map-entries (lambda ()
(eos/org-custom-id-get (point)
'create))))))
(add-hook 'org-mode-hook
(lambda ()
(add-hook 'before-save-hook
(lambda ()
(when (and (eq major-mode 'org-mode)
(eq buffer-read-only nil))
(eos/org-add-ids-to-headlines-in-file))))))
```
Note that you **will need** the package `org-id` to make this code work. You
simply need to add the following code before the code I shared above:
```emacs-lisp
(require 'org-id)
(setq org-id-link-to-org-use-id 'create-if-interactive-and-no-custom-id)
```
And thats how my links are now way more readable **and** persistent! The only
downside I found to this is when you move headings and their path is
modified, or when you modify the heading itself, the custom ID is not
automatically updated. I could fix that by regenerating the custom ID on
each save, regardless of whether a custom ID already exists or not, but its
at the risk an ID manually set will get overwritten.
<div class="html">
<div></div>
<script defer src="<https://commento.phundrak.com/js/commento.js>"></script>
<div id="commento"></div>
</div>