the gizmo's role in markup

This post adds tagged markup to my set of gizmos (labels with automatic behavior attached--like functions and CSS classes) and then tries to apply my intuitions about gizmo-naming from the last post.

The math is easy because it just makes something from the last post explicit. Each line below is roughly equivalent:

1<span class="date posted">Sunday, Jun 5 2022</span>
2<date class="posted">Sunday, Jun 5 2022</date>
3<datePosted>Sunday, Jun 5 2022</datePosted>

By equivalent, I mean we can interpret them similarly and attach the same automatic behavior.

Why does this matter? It suggests two (unintuitive?) things about markup:

Since we can attach automatic behavior to marked-up free-form text and data, they aren't inert.
We can think of each distinct tag{class,name} intersection as a ~function. (To simplify, I'll use "tag-label" to refer to this intersection and set advanced CSS/xpath selectors aside.)

Applying these intuitions to markup

It also suggests that the last post's intuitions about gizmos might apply to markup, so let's try them on again:

When you re-use a tag-label, weigh whether you're re-using it for the same reason. If the reasons differ, use a new name for this new reason.
It may be a good idea to habitually name tag-labels with a dash of why. As the fraction of tag-labels that encode why increases, it'll get easier (less mental work/overhead) to spot diverging purposes per #1.
It's probably easier to write lower-churn markup in languages where you can cheaply alias multiple names to a single implementation--and you should use this mechanism to accomplish #1.

I find these aspirations harder to apply to markup. I think it's because markup is more likely to have multiple consumers with different needs--like an API. An API and its consumers are often designed by different people at different times. It's easier to weigh whether multiple sites call an internal function for the same reason than to think through whether a given name's dash-of-why will square with how it's used later by someone else.

This lens is helping me sort out some things that've bugged me about HTML for a long time.

Focusing on HTML

The HTML spec isn't trying to define a language for annotating what our documents mean, it's trying to brute-force the API-design problem I just described by defining a set of names and prescribing the reasons for using them. At least in theory, this gets the writers and consumers of markup on the same page.

In reality, the relationship between these names and the prescribed reasons for using them can be unintuitive to people who write markup. This is because the names and reasons don't reflect the perspective of the writer--they reflect the reasons and perspectives of this API's main consumers: browsers, screen readers, search engines, and so on.

Note: If you feel like I'm avoiding a specific 8-letter S-word, you're right. Stay tuned for a post dedicated to this word... :)

HTML's syntax is simple, but conforming with the reasons prescribed in the spec entails something fairly complex: learning, taking, and reasoning from the abstract perspective of software like browsers and search engines that need to ~understand the markup. (And it's complex without even considering needs internal to the organization producing the markup.)

This sounds like fairly advanced computational thinking that won't be easy to offload without affecting accuracy. (As I suspect often happens both within organizations and systemically.) I imagine software that depends on the precise use of these tags is either very inconsistent or frequently falls back on fuzzy heuristics.

(It'd be interesting to hear from people who work in accessibility, search, or study use of these elements in the wild.)

Zooming out

Because HTML obliges us to take on a set of names and reasons that don't reflect our own perspective, there's a potential for conflict with the naming practice that this series exists to encourage. Some identifiers that make sense won't be available. Sometimes the spec will oblige you to use a name that appears to mean one thing in your broader ~namescape, and it'll be hard to notice without firm grasp of the perspectives the spec embodies. This tension won't cause much trouble for some projects, and in others it'll be a regular source of confusion.

HTML is an essential part of the web--but it could also be adding friction to your workflows.

In the terms of this series, it's akin to inheritance. With HTML, you aren't really in control of the namescape. You're inheriting and extending someone else's class (which may or may not square with the names, concepts, reasons, and perspectives that govern your software).

Note: Markup languages with clean-slate namespaces are a better fit for the naming practices I'm suggesting. They obviously entail extra overhead and friction of their own (like working in another language, not being able to lean on an existing talent pool, having to make more naming decisions, needing a translator if your target is still HTML, and so on). I'm not recommending one for now--just picking at knots.

Up to now, this series has been focused on building up the concept of the gizmo--a superset of entities that are roughly automatic behavior attached to a free-form label. The next (and hopefully last?) post in this series will turn that framing on its side and consider automatic behavior without a label--the anonymous gizmo.

This is post 3 of 4 in a series on naming things