An intro to HTML

HTML stands for HyperText Markup Language

HTML is the standard markup language/format for creating web pages, containing the content and structure of a page as a series of tags/elements.

In our ongoing analogy, HTML is the skeleton of the web. At its most basic it is a text file, in a folder on a computer, with a .html extension.

As we heard in our first class, this format was codified by our pal Tim Berners-Lee in 1991, evolving from his earlier SGML, a similar/proto language. There have been five major revisions to the spec since then, which added (and sometimes deprecated, or removed) tags and syntax:

The basic document

HTML consists of a range of elements, nested inside one another, like a matryoshka doll of text.

As a visual:

As code:

<!DOCTYPE html>
<html>
  <head>
    <title>Page title</title>
  </head>
  <body>
    <h1>This is a heading</h1>
    <p>This is a paragraph.</p>
    <p>This is another paragraph.</p>
  </body>
</html>

The <html> element contains all elements of the page, the <head> element contains the title, and the body contains <h1> and <p>.

We call these semantic elements—which is saying that they give their contents a meaning or a role. (Remember Tim’s diagram.) These roles are then interpreted by your browser (Chrome, Safari, Firefox, etc.) when it loads the file, to ultimately display the page. We call this parsing the document.

The Semantic Web is not a separate Web but an extension of the current one, in which information is given well-defined meaning, better enabling computers and people to work in cooperation.

Tim Berners-Lee

In our example, here is what we’ve told the computer:

We use semantic elements to help structure and describe our content—but also for accessibility (screen readers)—where the tag type helps indicate what things are.

What are elements?

Elements are composed of tags (opening, closing) and their content:

Some elements do not have any content or children, like <br> or <img>. These are called empty elements and do not have a closing tag.

Common elements

There are many, many HTML elements, all with particular uses. (We’ll unpack some more, below.)

Attributes

All HTML elements can have attributes, which provide more information about the element:

Common attributes

Case, whitespace, tabs, line breaks

HTML doesn’t care about capitalization, extra white space, or line breaks. The browser will just read everything from left to right, as if it is one long, running sentence. So the shouty <HTML> and quieter <html> are interpreted the same.

The browser parses both of these in the exact same way:

<body>
  <h1>Dog Breeds</h1>
  <p>There are many kind of dog breeds</p>
  <ul>
    <li>German Shepherd</li>
    <li>Bulldog</li>
    <li>Poodle</li>
  </ul>
</body>
<body><h1>Dog Breeds</h1><p>There are many kind of
dog breeds</p><ul><li>German Shepherd</li>
<li>Bulldog</li><li>Poodle</li></ul></body>

But obviously, the first one is much more readable to us humans. We can use whitespace, tabs/indenting, and line breaks to make it easier for us to read the code. There are a lot of common patterns used—like indenting to indicate hierarchy/nesting. But there are also no wrong ways to do it! In HTML, spaces are code ergonomics for you—just like a good chair or desk, that allow you to work more comfortably.

Code is read more often than it is written. Code should always be written in a way that promotes readability.

Guido van Rossum

Block elements

Block-level elements always start on a new line, and take up the full width available—stretching out to the left and right of their parent/container. They stack on top of each other. Importantly, block elements can have a top and bottom margin, unlike inline elements:

<address> <article> <aside> <blockquote> <canvas> <dd> <div> <dl> <dt> <fieldset> <figcaption> <figure> <footer> <form> <h1><h6> <header> <hr> <li> <main> <nav> <noscript> <ol> <p> <pre> <section> <table> <tfoot> <ul>

Inline elements

Inline elements do not start on a new line, and only take up as much width as necessary. I like to think of these as the little metal slugs from printing. Other text and inline elements will continue to flow around them, and they can wrap to new lines:

<em> <strong> <span> <a> <img>

So many elements!

Comments

You can comment part of the code and the browser won’t show it. Comments are often used to explain your thinking, organize your code, “turn off” a bit of code, or hide whatever you’d like.

Keep in mind these are still readable in the source.

I highly recommend getting into a habit of commenting your code, especially when starting out. If you figure something tricky out, write down why and how you solved it to help you understand and remember. And you’ll often come back to things. Commenting your code is a gift to your future self!

Tables

Tables aren’t used as often anymore, in favor of <div> and other layout elements. You used to have to use them to get any kind of multi-column, grid layouts. But those need even more CSS!

This syntax is pretty verbose, for what you get.

Lists

Any time you have more than two of something, you probably have a list. These are commonly used for semantic navigation elements, as well, think “here’s a list of links in this site.”

Description lists

There are specific lists for defining things.

These aren’t much to look at without CSS, though. Soon!

Details/summary

There is even some basic interactivity (way, way ahead of JavaScript) with details disclosure elements that open and close.

You can do a lot with these.

Again, there are many, many, many, many HTML elements. Try and find the one that best fits your usage, wherever possible using a semantic element that fits your content.

User-agent styles

We haven’t applied any styles/CSS here yet, so everything we see in these examples is based on user-agent stylesheets—that is, each browser’s own default display (and behavior) for an element type. This is what the web was, before CSS! But as a designer, rarely what you want. We’ll get into writing our own styles in the coming weeks.

3.2. Priority of Constituencies

In case of conflict, consider users over authors over implementors over specifiers over theoretical purity. In other words costs or difficulties to the user should be given more weight than costs to authors; which in turn should be given more weight than costs to implementors; which should be given more weight than costs to authors of the spec itself, which should be given more weight than those proposing changes for theoretical reasons alone. Of course, it is preferred to make things better for multiple constituencies at once.

W3C, HTML Design Principles