Semantic Formats, Presentation Formats and HTML
This post is inspired by Terry Crowley. You should read all his posts.
There are two kinds of formats: semantic and presentation. Let’s look at these with an example:
A presentation format says where to display what: display the order name with a certain font size and position, and the order date on top with a grey background and with ORDER PLACED in all caps.
A semantic format describes information at a logical level without regard to how it might be displayed. For example:
{
"item_name": "Sancha Tea Biotique English Breakfast Tea",
"order_date": "2022-02-17",
"total_price": 700,
"ship_to": "Kartick Vaddadi",
"item_count": 2
}
This can be rendered in different ways like the item count as a (2) overlay on top of the photo. Or as audio if you ask Alexa about your order.
Presentation formats are consumed by humans, and semantic formats are consumed by programs. Data is typically converted from one semantic format to another multiple times before finally being rendered to a presentation format:
Semantic → Semantic → Semantic → Presentation
In this case, the seller might upload an XML describing the items they’re selling to Amazon, from which the data might be stored in an SQL database, from which it’s served as JSON, and finally rendered for display in HTML:
XML → SQL → JSON → HTML
(When I say HTML, I mean along with CSS).
HTML began life as a format for documents, but has now become a rendering surface for applications, similar to iOS views like labels and buttons. As a result of this messy history, HTML has both semantic and presentational elements. For example, <div> is presentational, with no logical meaning. HTML also offers <section>, which renders exactly as a div, but is semantic. If you’re writing an article about the Ukraine conflict, structured into three parts — history, current events, and a recommendation for how to ease tensions — you might use a <section> for each of these parts of your article.
Some people insist on using semantic HTML.
But now that HTML has become a presentation format, it’s not critical for HTML to be semantic. If Amazon wants to analyse how much tea consumers are purchasing per month, they’re not going to use HTML as the input to this analysis program. They’re going to use a semantic format like JSON. That waters down the benefit of semantic HTML. Since HTML is rendered from JSON or other semantic format, there’s diminishing returns in HTML being semantic, too.
In some cases, semantic HTML produces meaningless contortions. For example, HTML offers a <b> and a <strong> tag that both make text bold. <strong> has been defined as indicating that its contents have strong importance, seriousness, or urgency. This is a recursive definition: if strong indicates strong importance, what does strong importance mean? That’s like saying a great work of art has positive esthetic qualities. It doesn’t actually explain anything. You wouldn’t expect it to have negative esthetic qualities, would you? The next sentence gives away that Browsers typically render the contents in bold type. So it’s just veneer for a presentation tag. There’s some more self-contradictory explanation: The <strong> element is for content that is of greater importance, while the <b> element is used to draw attention to text without indicating that it's more important. But why would you draw attention to it if it’s not more important? Instead of going through all these convolutions, the simpler solution is to just realise that there’s nothing wrong with using presentation tags. If you want to make something bold, make it bold.
I’m not saying that you shouldn’t use semantic tags at all — if you feel the code is more maintainable that way, sure. Just don’t insist on it as a best practice to be followed always. An analogy is using a factory: a factory helps in some cases, and you should use it then, not everywhere and uncritically. Code that doesn’t use a factory is not necessarily bad, and similarly non-semantic HTML is not necessarily bad.
Considering that most HTML is auto-generated from JSON, worrying about its maintainability is like worrying that the internals of exe files are messy. Who cares? Nobody is going to edit it. Compiled HTML is thrown away when the user closes the tab, so it doesn’t need to be maintainable. Maintainability of the code that generated the HTML is more important. If you have Django code that generates HTML, focus on the maintainability of the Django code.
Non-semantic HTML may often be a better ROI. One of my team members used <center> in one project, and he found that it took him two days to do it in CSS. His conclusion was that best practices are not always the best ROI. Code need not be maximally separated, abstracted, modular, extensible, reusable, testable or <insert other buzzword here>. Such code is over-engineered. Use your judgment.