[M1L1] HTML: Structuring Web Pages

HTML (Hypertext Markup Language) is the code that is used to structure a web page and its content. For example, content could be structured within a set of paragraphs, a list of bulleted points, or using images and tables. When you ask your Web browser to display a page like this one by typing in an address or clicking on a link, it first reads through some HTML that it has been sent ‘behind the scenes’, interprets it and then shows you a human-friendly version.

HTML is a markup language that you use to define the structure of your content. We begin with simple unstructured text, and add code to give it meaning. Let’s start with the following two lines

My cat is very grumpy.
She is always sitting on my keyboard.

This is actually a valid HTML snippet on its own. HTML does not require you to use any markup on text. However, a Web browser would have no idea how we wanted to display these lines, and we have given no clue about their function within the page – is it a title, a list item, an aside? The browser cannot know unless we tell it. Neither would the it know whether we might want these two sentences laid out separately or together.

To define the structure we would need to use elements to enclose, or wrap, different parts of the text to make it appear or act in a certain way. Elements come in the form of tags that are defined by angle brackets <>, for example ; is a tag indicating a paragraph. Tags tell the browser to make a word link to somewhere else, italicize text, make the font bigger or smaller, show a picture, and so on.

So, let’s see what happens when we put these lines into an editor. I have embedded the Glitch editor below so that you can use it without leaving this lesson. The editor consists of three horizontal panes. On the left, is a list of the files in the project (currently there aren’t many). In the middle is the code of the page you are currently editing, and on the right is the results of running the current code. Inn this case it simulates the result of opening saving our HTML code to a file and opening it up in a web browser:

We can immediately see that the ‘browser’ view on the right is ignoring the line breaks in our text. In fact, if we were to add spaces between the words ‘My’ and ‘cat’, we would find that the browser pane on the right did not change.

All ‘white space’ characters (line breaks, tabs, spaces etc) are treated the same by a web browser: as a simple space. Multiple spaces, line breaks and tabs are also condensed down into a single space. Such treatment of white space is typical feature of many coding languages; it allows programmers to use line breaks to help separate blocks of code that deal with similar themes or indentation to make it clear that a line is dependent on the one above, making code easier for humans to read without conveying any meaning to the computer.

First, let’s tell our web browser we could specify that these lines are paragraphs by enclosing them in p tags, so that they look like this:

<p>My cat is very grumpy.</p>
<p>She is always sitting on my keyboard.</p>

Give it a go:

In the Glitch editor, click ‘Remix to edit’. Glitch will create a copy of the project that you can edit. By it will give your project a title consisting of two English words separated by a hyphen. You can see this name at the bottom of the screen. Make note of it because you may need to find it later.
Place the mouse cursor at the start of the ‘My cat is very grumpy.’ line
Type 
Woah! – something weird probably just happened. The Glitch editor (like many code editors) has tried to be helpful by automatically completing your tag, and adding some ‘padding’ so that it is easier to read. Remember that HTML ignores all white space, so it doesn’t affect display. Unfortunately this ‘helpful’ behaviour isn’t quite what we want. We need to start each line with  (the opening tag), and end it with a  (the corresponding closing tag), so…
- Delete the closing  tag and spaces, so that  is immediately followed by My cat[…]. At the end of the line, add 
Repeat these on the line below.

As you are editing, the preview pane should automatically update. When you have finished, it should show two separate paragraphs. Congratulations, you have just written your first HTML code!

Anatomy of an HTML element

Before we move on, let’s explore the paragraph element a bit further.

The main parts of this element are as follows:

The opening tag: This consists of the name of the element (in this case, ‘p’), wrapped in opening and closing angle brackets. This states where the element begins or starts to take effect — in this case where the paragraph begins.
The closing tag: This is the same as the opening tag, except that it includes a forward slash before the element name. This states where the element ends — in this case where the paragraph ends. Failing to add a closing tag is a common error for beginners (and indeed for experts!) and can lead to strange results.
The content: This is the content of the element, which in this case is just text.
The element: The opening tag, the closing tag, and the content together comprise the element.

Elements can also have attributes that look like the following:

Attributes contain extra information about the element that you don’t want to appear in the actual content. Here, class is the attribute name, and editor-note is the attribute value.

The class attribute allows you to give the element an identifier that can be used later to groups of elements and target them with display rules or other functionality.

An attribute should always have the following:

A space between it and the element name (or the previous attribute, if the element already has one or more attributes).
The attribute name, followed by an equals sign.
The attribute value, enclosed by opening and closing quotation marks.

Nesting elements

You can put elements inside other elements — this is called nesting. If we wanted to state that our cat is very grumpy, we could wrap the word “very” in a strong element, which means that the word is to be strongly emphasized:

<p>My cat is <strong>very</strong> grumpy.</p>

Give this a go in the Glitch editor, and have a look at the result. Remember that you might need to correct the editor’s ‘helpful’ auto-completion.

You always need to make sure that your elements are properly nested: in the example above, we opened the p element first, then the strong element; therefore, we have to close the strong element first, then the p element. A fundamental stipulation in the rules of the HTML language is that elements cannot overlap. The following is therefore incorrect:

<p>My cat is <strong>very grumpy.</p></strong>

The elements have to open and close correctly so that they are clearly inside or outside one another. If they overlap as shown above, then your web browser will try to make the best guess at what you were trying to say. Although HTML interpreters are almost universally forgiving, such guesswork can lead to unexpected results. Don’t do it!

Empty elements

Some elements have no content and are called empty elements. The img element is an example of this. It allows us to display an image (hence im[a]g[e] – img). Here is an example:

<img src="" alt="A cat at work">

This contains two attributes, but there is no closing </img> tag and no inner content. This is because an image element doesn’t wrap content to affect it. Its purpose is to display an image in the HTML page.

The alt attribute provides an alt[ernative] description for visually-impaired users that is read out by assistive technology, or if the image cannot be shown for some reason (e.g. it has been deleted from the server). The src attribute tells the browser where the image file to be displayed is located (its s[ou]rc[e]).

Let’s put this example in our example HTML file. Place the cursor at the end of your two paragraphs, after the final closing  tag, and type (or paste) the image tag above. You’ll notice that the preview page is not yet displaying anything. We have told the browser that there is an image to be displayed, but we have not told it which image, so the img tag is ignored. We need to enter the address of an image file in the src attribute. We can use any image on the web as the address. Let’s use an image of a cat from WikiMedia commons. Copy the line below, and paste it into the img tag’s src attribute, between the double quotation marks:

https://upload.wikimedia.org/wikipedia/commons/thumb/f/ff/Cat_on_laptop_-_Just_Browsing.jpg/320px-Cat_on_laptop_-_Just_Browsing.jpg

Your preview should update to include the image of the cat. If you can’t see the image, check that your img tag looks exactly like this, including spaces (or lack thereof). Spaces within the src attribute will be treated as part of the filename, and can disrupt display:

<img src="https://upload.wikimedia.org/wikipedia/commons/thumb/f/ff/Cat_on_laptop_-_Just_Browsing.jpg/320px-Cat_on_laptop_-_Just_Browsing.jpg" alt="A cat at work">

Marking up text

This section will cover some of the essential HTML elements you’ll use for marking up the text.

Headings

Heading elements allow you to specify that certain parts of your content are headings — or subheadings. In the same way that a book has the main title, chapter titles, and subtitles, an HTML document can too. HTML contains 6 heading levels, h1-h6, denoting descending levels of importance:

<h1>Main Title</h1>
<h2>Top-level Heading</h2>
<h3>Subheading</h3>
<h4>Sub-subheading</h4>

Now try adding a suitable title to your HTML page at the top.

You’ll see that your heading level 1 is displayed differently. This is because browsers have a set of built-in typographical rules for displaying common elements. We will discover how to amend these default styles in the next lesson.

It is important that you don’t just use heading elements simply to make text bigger or bold unless that text is really a heading within the structure of your document. Like other HTML elements h1-h6 have a semantic function, not an aesthetic one. That is to say that that they convey meaning about the function of the content they enclose, rather than simply describing how you want it to look. The aesthetics of elements on your page is defined using Cascading Style Sheets. They are also used for accessibility and other reasons such as Search Engine Optimisation (SEO). Try to create a meaningful sequence of headings on your pages, without skipping levels.

Paragraphs

As we saw above, p elements are for containing paragraphs of text; you’ll use these frequently when marking up regular text content:

<p>This is a single paragraph</p>

A few paragraphs to your page below your img element.

Lists

HTML has special elements for adding lists of various kinds. The markup for lists always consist of at least 2 elements to identify the list itself and its individual items. The most common list types are ordered and unordered lists:

Unordered lists are for lists where the order of the items doesn’t matter, such as a shopping list. These are wrapped in a ul element.
Ordered lists are for lists where the order of the items does matter, such as a recipe. These are wrapped in an ol element.

Each item inside the lists is put inside an li (list item) element.

For example, if we wanted to turn the part of the following paragraph fragment into a list

<p>My cat is proud, mischievous, and independent, but she always comes home for food.</p>

We could modify the markup to this

<p>My cat is</p>
    
<ul> 
  <li>proud</li>
  <li>mischievous</li>
  <li>independent</li>
</ul>

<p>but she always comes home for food.</p>

Try adding an ordered or unordered list to your example page.

Links

Links are very important — they are what makes the Web a web! To add a link, we need to use a simple element — a — an a[nchor] that links us to another document. To make text within your paragraph into a link, follow these steps:

Choose some text. We chose the text “my cat”.
Wrap the text in an a element, as shown below:
```
<a>My cat</a>
```
Give the a element an href attribute, as shown below:
```
<a href="">My cat</a>
```
Fill in the value of this attribute with the web address that you want the link to link to, in this case the Wikipedia page for cats:
```
<a href="https://en.wikipedia.org/wiki/Cat">My Cat</a>
```

You might get unexpected results if you omit the https:// or http:// part, called the protocol, at the beginning of the web address.

After making a link, click it to make sure it is sending you where you wanted to go. If you want to avoid navigating away from the page you are on, hold down Ctrl (Cmd on a Mac) and the link will open in a new window or tab.

Another important protocol in links is mailto:. The mailto: protocol tells the browser that the following text is an email address. An example use of this protocol is as follows:

If you want to contact her, just <a href=”mailto:kitty@example.com”>send my cat an email</a>.

href might appear like a rather obscure choice for an attribute name at first. If you are having trouble remembering it, remember that it stands for hypertext reference.

Add a link to your page now, if you haven’t already done so.

Sections

We often need to group together sections of content within an HTML page so that we can display them in a certain way (for example in a box). Authors often use the div element to do this. The div[ision] element is used to mark up a logical block of text, images, lists etc. Remember that elements cannot overlap, so a div cannot – for example – contain only part of a paragraph. Used on its own it has no effect on the content of your page, but we will see in the next lesson how you can use style rules to unleash the power of the div.

Note: HTML provides a few other elements that can be used in a similar way to the div (e.g. section, nav, aside, header and footer). In contrast to the div, which is agnostic about its content, these semantic elements all convey meaning about the function of their content. section marks a logical section within the document; nav contains some kind of navigation aid; aside is a supplementary comment; header belongs at the head of the document or section and footer at the end. They can be used interchangeably with the div but should .

Anatomy of an HTML document

So far we have been dealing with snippets of HTML, but we have not created a complete document. Browsers display the snippets we have written because they are forgiving, but they are not technically valid HTML. To create a page, we need a bit of structure to label the page and provide some metadata.

Here is an example of a valid HTML page, which displays the text ‘Hello world!’:

<!DOCTYPE html>
<html>
  <head>
    <meta charset="utf-8">
    <title>My test page</title>
  </head>
  <body>
    <p>Hello World!</p>
  </body>
</html>

Here, we have the following:

<!DOCTYPE html> — The doctype. In the mists of time, when HTML was young (around 1991/92), doctypes were meant to act as links to a set of rules that the HTML page had to follow to be considered good HTML, which could mean automatic error checking and other useful things. However, they are now for the most part just a historical artifact that needs to be included for everything to work right.
<html></html> — the html element. This element wraps all the content on the entire page and is sometimes known as the root element.
<head></head> — the head element. This element acts as a container for all the stuff you want to include on the HTML page that isn’t the content you are showing to your page’s viewers. This includes things like keywords and a page description that you want to appear in search results, CSS to style our content, character set declarations, and more.
<meta charset="utf-8"> — This element sets the character set your document should use. A character set is in some ways analagous to an alphabet; it defines which characters and symbols you can write in. The one we have selected here, UTF-8 is a kind of super-set of characters which includes characters from the vast majority of written languages. Essentially, it can now handle any textual content you might put on it. There is no reason not to set this, and it can help avoid some problems later on.
<title></title> — the title element. This sets the title of your page, which is the title that appears in the browser tab the page is loaded in. It is also used to describe the page when you bookmark/favourite it.
<body></body> — the body element. This contains all the content that you want to show to web users when they visit your page, whether that’s text, images, videos, games, playable audio tracks, or whatever else.
– we’ve seen this tag before; it is the paragraph element.

Building our Example Page

Now that we have the basics down, let’s begin to build our profile page.

We’re going to overwrite the content of our index.html page, so if you want to keep it, take a copy by using the menu on the right-hand side and clicking Copy. Once you have created your new file, switch back to the original index.html and delete everything there.

Next, open up the sample-profile.html page, and copy it into index.html.

Now you will need to change all the text about me to being about you, as well as updating all the links. Feel free to add as much additional text as you would like. Remember that Glitch is a tool for teaching and experimentation and it isn’t password protected. You should only include text that you would be comfortable appearing in a public forum.

You might also want to add a profile photo or avatar. If there isn’t already an image of you on a public website that you can point an img element at, you can upload one directly to Glitch. This is a simple process:

First, locate the image you want to use on your computer. Make sure it is an appropriate size. If the name of the file contains spaces, you might also want to rename it so that it only uses lower case letters and has no spaces; web browsers find simple filenames like this easier to deal with.
In the left-hand (file) pane of the Glitch editor, switch to the Assets folder.
Click on the ‘Add Asset’ menu, then select ‘Upload’.
Find your image and upload it.
To place your image, you will need to create an img tag, like this <img src="" alt="A headshot of me, looking professional" >

You will need to set the src attribute to point to the file you uploaded. We can do this using a relative file path, and do not need the http:// or https:// protocols. When we use a relative protocol the browser starts looking in the folder that our HTML file is in and moves from there. For example, if our file was called my-photo.jpeg, the src attribute would look like this: src="assets/my-photo.jpeg".

If you have finished editing, you are ready to Go to the Next Lesson >

This lesson is adapted from Mozilla’s Excellent ‘Getting Started with the Web’ tutorials. HTML basics by Mozilla Contributors is licensed under the Creative Commons Attribution-ShareAlike 2.5 Generic license. This page is similarly licensed under CC-BY-SA 2.5.

<Hands:On>

Digital Training Programme for Humanities Postgraduates