Debugging wrong HTML fragment being served
Related to [notes / todo#wrong-html-fragment-being-served-for-imaginary-numbers-link]
This might involve the template I’m using to render links. There are some details about it here: notes / Link render hooks.
The issue
The htmx powered link to Real and complex numbers in space on notes / Introduction to imaginary numbers is displaying the fragment for notes/ Introduction to complex numbers.
Run a local deploy
I don’t think the issue is stale data, but to be sure, run the local deploy script.
Check to see if the link’s destination is being found in the link template
This is where the context that’s passed to the link template is assembled:
{{/* Try to get dbId if conditions are met */}}
{{ $dbId := "" }}
{{- if not $rel }}
{{ $fragmentsMap := site.Data.fragments.sections }}
{{ $url := $attrs.href }}
{{ $fragmentData := index $fragmentsMap $url }}
{{ $dbId = $fragmentData.db_id }}
{{ if eq .Destination "/notes/real-and-complex-numbers-in-space/" }}
{{ warnf "Destination: %v" .Destination }}
{{ warnf "Text: %v" .Text }}
{{ warnf "Page.Title: %v" .Page.Title }}
{{ end }}
{{- end }}
The link destination is being found:
WARN Destination: /notes/real-and-complex-numbers-in-space/
WARN Text: Real and complex numbers in
space
WARN Page.Title: Debugging wrong HTML fragment being served
WARN Text: notes / Real and complex numbers in space
WARN Page.Title: Introduction to Imaginary Numbers
Interestingly, I added the link that’s causing the issue to this page too. It’s also pulling in the wrong HTML fragment.
Why am I using Destination and the link’s href value interchangeably?
These should all be the same:
{{ if eq .Destination "/notes/real-and-complex-numbers-in-space/" }}
{{ warnf "Destination: %v" .Destination }}
{{ warnf "$attrs.href: %v" $attrs.href }}
{{ warnf "$url: %v" $url }}
{{ end }}
Messy code, but they’re identical values:
WARN Destination: /notes/real-and-complex-numbers-in-space/
WARN $attrs.href: /notes/real-and-complex-numbers-in-space/
WARN $url: /notes/real-and-complex-numbers-in-space/
Get the value for the key from the data JSON file
Use
jq
to check the value of the key
"/notes/real-and-complex-numbers-in-space/"
❯ jq '.["/notes/real-and-complex-numbers-in-space/"]' data/fragments/sections.json
{
"db_id": 699
}
Check what value’s being found for the key in the template
{{ if eq .Destination "/notes/real-and-complex-numbers-in-space/" }}
{{ warnf "Destination: %v" .Destination }}
{{ warnf "$dbId: %v" $dbId }}
{{ end }}
It’s the right key:
WARN $dbId: 699
Check what’s stored for dbId 699 in the database
It’s looking like the issue isn’t on the Hugo end. Checking the value that’s stored for id=699 in
the SQLite database that stores the embeddings, the entry that should be associated with the
Real
and complex numbers in space page
is storing the
fragment for
Introduction to complex numbers.
Hmmm:
embeddings_generator on master [!] via v3.11.13 (.venv)
❯ ipython
In [1]: import sqlite3
In [2]: con = sqlite3.connect("./sqlite/sections.db")
In [3]: cur = con.cursor()
In [4]: cur.execute("SELECT * FROM sections WHERE id = 699")
Out[4]: <sqlite3.Cursor at 0x7ff3216c7cc0>
In [5]: cur.fetchone()
Out[5]:
(699,
'01KCTBZF1BN6WGKKF96BA3KYN0-',
'01KCTBZF1BN6WGKKF96BA3KYN0',
'',
'<h2><a href="/notes/introduction-to-complex-numbers/">Introduction to complex numbers</a></h2>',
'<div class="article-fragment"><p>What is a complex number?</p>\n<div style="position: relative; padding-bottom: 56.25%; height: 0; overflow: hidden;">\n <iframe allow="accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope; picture-in-picture; web-share; fullscreen" loading="eager" referrerpolicy="strict-origin-when-cross-origin" src="https://www.youtube.com/embed/SP-YJe7Vldo?autoplay=0&controls=1&end=0&loop=0&mute=0&start=0" style="position: absolute; top: 0; left: 0; width: 100%; height: 100%; border:0;" title="YouTube video"></iframe>\n </div>\n\n</div>',
1768784646.1574106)
The database table:
CREATE TABLE IF NOT EXISTS sections (
id INTEGER PRIMARY KEY,
section_id TEXT NOT NULL UNIQUE,
post_id TEXT NOT NULL,
section_heading_slug TEXT NOT NULL,
html_heading TEXT NOT NULL,
html_fragment TEXT NOT NULL,
updated_at REAL NOT NULL,
UNIQUE(post_id, section_heading_slug)
);
The section_id and post_id rows in the entry that seems to be causing the issue are a little
strange (identical except for the - symbol), but that’s OK. The - would be followed by the
heading slug if the link was to a section heading rather than to the first section of the post.
The cause of the issue(s)
What actually triggered the issue was my new deploy script and a lack of attention on my part. The script makes a call to:
.venv/bin/python main.py
That script traverses the built Hugo HTML files to extract heading sections that are saved to an SQLite database, and text content that’s used to generate embeddings for Chromadb. If an error happened in the Python code, it would get output to the console, then the script would happily chug along, pushing corrupt data to the Docker container that’s running the API.
This should fix that problem:
#!/usr/bin/env bash
set -euo pipefail # Exit on error
# Trap errors
trap 'echo "ERROR: Deployment failed at line $LINENO. Remote deployment aborted."; exit 1' ERR
# ...
What was triggering the errors in the Python code?
There were two issues:
The first was happening here:
for child in root.iterchildren():
if child.tag in heading_tags:
if current_fragment is not None and has_text(current_fragment):
current_fragment = fix_relative_links(current_fragment, rel_path)
html_fragment = serialize(current_fragment, pretty_print=False)
html_heading = serialize(current_heading_element, pretty_print=False)
embeddings_text = section_texts(current_fragment, headings_path)
sections.append(
{
"html_fragment": html_fragment,
"html_heading": html_heading,
"heading_id": heading_id,
"heading_href": heading_href,
"headings_path": headings_path,
"embeddings_text": embeddings_text,
}
)
current_fragment = etree.Element("div", {"class": "article-fragment"})
heading_level = get_heading_level(child.tag)
headings_path = (
headings_path[:heading_level] + [child.text]
) # if child.text is None, an error will be triggered in heading_link function
current_heading_element, heading_id, heading_href = heading_link(
child, headings_path, rel_path
)
It turns out that markdown like this ## $1$ is a number, with the
markup.goldmark.extension.passthrough.delimiters set to this:
[markup.goldmark.extensions.passthrough.delimiters]
block = [['\[', '\]'], ['$$', '$$']]
inline = [['\(', '\)'], ['$', '$']]
Results in the text for an HTML element being a null value. So headings_path ends up looking like:
['foo', 'bar', 'baz', None]
The heading_link function then calls:
" > ".join(headings_path)
That doesn’t work:
In [7]: arr
Out[7]: ['foo', 'bar', 'baz', None]
In [8]: " > ".join(arr)
--------------------------------------------------------------------------
TypeError Traceback (most recent call last)
Cell In[8], line 1
----> 1 " > ".join(arr)
TypeError: sequence item 3: expected str instance, NoneType found
Error handling could be added to the code, but really I just want the code to fail at this point.
The second issue was related to some overly complex SQL:
def save_to_sqlite(
self,
section_id: str,
post_id: str,
section_heading_slug: str,
html_heading: str,
html_fragment: str,
updated_at: float,
) -> int:
cursor = self.con.execute(
"""
INSERT INTO sections
(section_id, post_id, section_heading_slug, html_heading, html_fragment, updated_at)
VALUES (?, ?, ?, ?, ?, ?)
ON CONFLICT(section_id) DO UPDATE SET
html_heading = excluded.html_heading,
html_fragment = excluded.html_fragment,
updated_at = excluded.updated_at
RETURNING id
""",
(
section_id,
post_id,
section_heading_slug,
html_heading,
html_fragment,
updated_at,
),
)
return cursor.fetchone()[0]
The table has a unique constraint on section_id. (UNIQUE(post_id, section_heading_slug) is
actually redundant due to the way that section_id is created. So’s the call the CREATE UNIQUE INDEX...):
def create_sections_table(self, con: sqlite3.Connection) -> None:
cur = con.cursor()
cur.execute("""
CREATE TABLE IF NOT EXISTS sections (
id INTEGER PRIMARY KEY,
section_id TEXT NOT NULL UNIQUE,
post_id TEXT NOT NULL,
section_heading_slug TEXT NOT NULL,
html_heading TEXT NOT NULL,
html_fragment TEXT NOT NULL,
updated_at REAL NOT NULL,
UNIQUE(post_id, section_heading_slug)
);
""")
cur.execute(
"CREATE UNIQUE INDEX IF NOT EXISTS idx_section_id ON sections(section_id);"
)
cur.execute("CREATE INDEX IF NOT EXISTS idx_post_id ON sections(post_id);")
return None
The ON CONFLICT clause is (was) causing issues. If there’s a unique key violation, something has
gone wrong and the script should exit. What went wrong today is that I copied a Hugo markdown file
and gave it a new name. This resulted in two posts on the site with identical id values in their
front matter. There are a few ways that can happen, as the id is being added via the default.md
file like this:
+++
date = "{{ .Date }}"
id = "{{ .File.UniqueID }}" # usually OK.
draft = true
title = "{{ replace .File.ContentBaseName "-" " " | title }}"
summary = """
"""
tags = []
+++
A (possibly adequate) solution for now is to remove the ON CONFLICT clause from the INSERT
statement and call the INSERT statement in a try block:
try:
cursor = self.con.execute(
"""
INSERT INTO sections
(section_id, post_id, section_heading_slug, html_heading, html_fragment, updated_at)
VALUES (?, ?, ?, ?, ?, ?)
RETURNING id
""",
(
section_id,
post_id,
section_heading_slug,
html_heading,
html_fragment,
updated_at,
),
)
return cursor.fetchone()[0]
except sqlite3.IntegrityError:
# print debugging info
# ...
raise # raise exception to stop execution