blog/content/better-custom-ids-orgmode.md

12 KiB
Raw Blame History

+++ title = "[EN] Automatic Meaningful Custom IDs for Org Headings" author = ["Lucien “Phundrak” Cartier-Tilet"] date = 2020-06-06 tags = ["emacs", "orgmode"] categories = ["emacs", "linux", "conlanging", "orgmode"] draft = false [menu.main] weight = 2001 identifier = "en-automatic-meaningful-custom-ids-for-org-headings" +++

Spoiler alert, I will just modify a bit of code that already exists, go directly to the bottom if you want the solution, or read the whole post if you are interested in how I got there.

The issue

About two to three years ago, as I was working on a project that was meant to be published on the internet, I looked for a solution to get fixed anchor links to my various headings when I performed HTML exports. As some of you may know, by default when an Org file is exported to an HTML file, a random ID will be generated for each header, and this ID will be used as their anchor. Heres a quick example of a simple org file:

#+title: Sample org file
* First heading
  Reference to a subheading
* Second heading
  Some stuff written here
** First subheading
   Some stuff
** Second subheading
   Some other stuff
Code Snippet 1: Example org file

And this is the result once exported to HTML (with a lot of noise removed from <head>):

<html xmlns="http://www.w3.org/1999/xhtml" lang="en" xml:lang="en">

<head>
    <title>Sample org file</title>
    <meta name="generator" content="Org mode" />
    <meta name="author" content="Lucien Cartier-Tilet" />
</head>

<body>
    <div id="content">
        <h1 class="title">Sample org file</h1>
        <div id="outline-container-orgd8e6238" class="outline-2">
            <h2 id="orgd8e6238"><span class="section-number-2">1</span> First heading</h2>
            <div class="outline-text-2" id="text-1">
                <p>
                    Reference to a subheading
                </p>
            </div>
        </div>
        <div id="outline-container-org621c39a" class="outline-2">
            <h2 id="org621c39a"><span class="section-number-2">2</span> Second heading</h2>
            <div class="outline-text-2" id="text-2">
                <p>
                    Some stuff written here
                </p>
            </div>
            <div id="outline-container-orgae45d6b" class="outline-3">
                <h3 id="orgae45d6b"><span class="section-number-3">2.1</span> First subheading</h3>
                <div class="outline-text-3" id="text-2-1">
                    <p>
                        Some stuff
                    </p>
                </div>
            </div>
            <div id="outline-container-org9301aa9" class="outline-3">
                <h3 id="org9301aa9"><span class="section-number-3">2.2</span> Second subheading</h3>
                <div class="outline-text-3" id="text-2-2">
                    <p>
                        Some other stuff
                    </p>
                </div>
            </div>
        </div>
    </div>
</body>

</html>
Code Snippet 2: Output HTML file

As you can see, all the anchors are in the fomat of org[a-f0-9]{7}. First, this is not really meaningful if you want to read the anchor and guess where it will lead you. But secondly, these anchors will change each time you export your Org file to HTML. If I want to share a URL to my website and to a specific heading,… well I cant, it will change the next time I update the document. And I dont want to have to set a CUSTOM_ID property for each one of my headings manually. So, what to do?

A first solution

A first solution I found came from this blog post, where Lee Hinman described the very same issue they had and wrote some Elisp code to remedy that (its a great read, go take a look). And it worked, and for some time I used their code in my Emacs configuration file in order to generate unique custom IDs for my Org headers. Basically what the code does is it detects if auto-id:t is set in an #+OPTIONS header. If it is, then it will iterate over all of the Org headers, and for each one of them it will insert a CUSTOM_ID, which is made from a UUID generated by Emacs. And tada! we get for each header a h-[0-9a-f]{8}-[0-9a-f]{4}-[0-9a-f]{4}-[0-9a-f]{4}-[0-9a-f]{12} custom ID that wont change next time we export our Org file to HTML when we save our file, and only for headings which dont already have a CUSTOM_ID property. Wohoo!

Except…

These headers are not meaningful

Ok, alright, thats still a huge step forward, we dont have to type any CUSTOM_ID property manually anymore, its done automatically for us. But, when I send someone a link like https://langue.phundrak.com/eittland#h-76fc0b91-e41c-42ad-8652-bba029632333, the first reaction to this URL is often something along the lines of “What the fuck?”. And theyre right, this URL is unreadable when it comes to the anchor. How am I supposed to guess it links to the description of the vowels of the Eittlandic language? (Thats a constructed language Im working on, you wont find anything about it outside my website.)

So, I went back to my configuration file for Emacs, and through some trial and error, I finally found a way to get a consistent custom ID which is readable and automatically set. With the current state of my code, what you get is the complete path of the Org heading, all spaces replaced by underscores and headings separated by dashes, with a final unique identifier taken from an Emacs-generated UUID. Now, the same link as above will look like https://langue.phundrak.com/eittland#Aperçu_structurel-Inventaire_phonétique_et_orthographe-Voyelles_pures-84f05c2c. It wont be more readable to you if you dont speak French, but you can guess it is way better than what we had before. I even added a safety net by replacing all forward slashes with dashes. The last ID is here to ensure the path will be unique in case wed have two identical paths in the org file for one reason or another.

The modifications I made to the first function eos/org-id-new are minimal, where I just split the UUID and get its first part. This is basically a way to simplify it.

(defun eos/org-id-new (&optional prefix)
  "Create a new globally unique ID.

An ID consists of two parts separated by a colon:
- a prefix
- a   unique   part   that   will   be   created   according   to
  `org-id-method'.

PREFIX  can specify  the  prefix,  the default  is  given by  the
variable  `org-id-prefix'.  However,  if  PREFIX  is  the  symbol
`none', don't  use any  prefix even if  `org-id-prefix' specifies
one.

So a typical ID could look like \"Org-4nd91V40HI\"."
  (let* ((prefix (if (eq prefix 'none)
                     ""
                   (concat (or prefix org-id-prefix)
                           "-"))) unique)
    (if (equal prefix "-")
        (setq prefix ""))
    (cond
     ((memq org-id-method
            '(uuidgen uuid))
      (setq unique (org-trim (shell-command-to-string org-id-uuid-program)))
      (unless (org-uuidgen-p unique)
        (setq unique (org-id-uuid))))
     ((eq org-id-method 'org)
      (let* ((etime (org-reverse-string (org-id-time-to-b36)))
             (postfix (if org-id-include-domain
                          (progn
                            (require 'message)
                            (concat "@"
                                    (message-make-fqdn))))))
        (setq unique (concat etime postfix))))
     (t (error "Invalid `org-id-method'")))
    (concat prefix (car (split-string unique "-")))))

Next, we have here the actual generation of the custom ID. As you can see, the let has been replaced by a let* which allowed me to create the ID with the variables orgpath and heading. The former concatenates the path to the heading joined by dashes, and heading concatenates orgpath to the name of the current heading joined by a dash if orgpath is not empty. It will then create a slug out of the result, deleting some elements such as forward slashes or tildes, and all whitespace is replaced by underscores. It then passes heading as an argument to the function described above to which the unique ID will be concatenated.

(defun eos/org-custom-id-get (&optional pom create prefix)
  "Get the CUSTOM_ID property of the entry at point-or-marker POM.

If POM is nil, refer to the entry at point. If the entry does not
have an CUSTOM_ID, the function returns nil. However, when CREATE
is non nil, create a CUSTOM_ID if none is present already. PREFIX
will  be passed  through to  `eos/org-id-new'. In  any case,  the
CUSTOM_ID of the entry is returned."
  (interactive)
  (org-with-point-at pom
    (let* ((orgpath (mapconcat #'identity (org-get-outline-path) "-"))
           (heading (replace-regexp-in-string
                     "/\\|~\\|\\[\\|\\]" ""
                     (replace-regexp-in-string
                      "[[:space:]]+" "_" (if (string= orgpath "")
                                  (org-get-heading t t t t)
                                (concat orgpath "-" (org-get-heading t t t t))))))
           (id (org-entry-get nil "CUSTOM_ID")))
      (cond
       ((and id
             (stringp id)
             (string-match "\\S-" id)) id)
       (create (setq id (eos/org-id-new (concat prefix heading)))
               (org-entry-put pom "CUSTOM_ID" id)
               (org-id-add-location id
                                    (buffer-file-name (buffer-base-buffer)))
               id)))))

The rest of the code is unchanged, here it is anyway:

(defun eos/org-add-ids-to-headlines-in-file ()
  "Add CUSTOM_ID properties to all headlines in the current file
which do not already have one.

Only adds ids if the `auto-id' option is set to `t' in the file
somewhere. ie, #+OPTIONS: auto-id:t"
  (interactive)
  (save-excursion
    (widen)
    (goto-char (point-min))
    (when (re-search-forward "^#\\+OPTIONS:.*auto-id:t"
                             (point-max)
                             t)
      (org-map-entries (lambda ()
                         (eos/org-custom-id-get (point)
                                                'create))))))

(add-hook 'org-mode-hook
          (lambda ()
            (add-hook 'before-save-hook
                      (lambda ()
                        (when (and (eq major-mode 'org-mode)
                                   (eq buffer-read-only nil))
                          (eos/org-add-ids-to-headlines-in-file))))))

Note that you will need the package org-id to make this code work. You simply need to add the following code before the code I shared above:

(require 'org-id)
(setq org-id-link-to-org-use-id 'create-if-interactive-and-no-custom-id)

And thats how my links are now way more readable and persistent! The only downside I found to this is when you move headings and their path is modified, or when you modify the heading itself, the custom ID is not automatically updated. I could fix that by regenerating the custom ID on each save, regardless of whether a custom ID already exists or not, but its at the risk an ID manually set will get overwritten.