B.1 HTML
HTML stands for Hyper Text Markup Language, and it is the language behind most web pages you see. You can use the menu View -> View Source
or the context menu View Page Source
to see the full HTML source of a web page in your browser. All elements on a page are represented by HTML tags. For example, the tag <p>
represents paragraphs, and <img>
represents images.
The good thing about HTML is that the language has only a limited number of tags, and the number is not very big (especially the number of commonly used tags). This means there is hope that you can master this language fully and quickly.
Most HTML tags appear in pairs, with an opening tag and a closing tag, e.g., <p></p>
. You write the content between the opening and closing tags, e.g., <p>This is a paragraph.</p>
. There are a few exceptions, such as the <img>
tag, which can be closed by a slash /
in the opening tag, e.g., <img src="foo.png" />
. You can specify attributes of an element in the opening tag using the syntax name=value
(a few attributes do not require value
).
HTML documents often have the filename extension .html
(or .htm
). Below is an overall structure of an HTML document:
<html>
<head>
</head>
<body>
</body>
</html>
Basically an HTML document consists a head
section and body
section. You can specify the metadata and include assets like CSS files in the head
section. Normally the head
section is not visible on a web page. It is the body
section that holds the content to be displayed to a reader. Below is a slightly richer example document:
<!DOCTYPE html>
<html>
<head>
<meta charset="utf-8" />
<title>Your Page Title</title>
<link rel="stylesheet" href="/css/style.css" />
</head>
<body>
<h1>A First-level Heading</h1>
<p>A paragraph.</p>
<img src="/images/foo.png" alt="A nice image" />
<ul>
<li>An item.</li>
<li>Another item.</li>
<li>Yet another item.</li>
</ul>
<script src="/js/bar.js"></script>
</body>
</html>
In the head, we declare the character encoding of this page to be UTF-8 via a <meta>
tag, specify the title via the <title>
tag, and include a stylesheet via a <link>
tag.
The body contains a first-level section heading <h1>
,43 a paragraph <p>
, an image <img>
, an unordered list <ul>
with three list items <li>
, and includes a JavaScript file in the end via <script>
.
There are much better tutorials on HTML than this section, such as those offered by MDN and w3schools.com, so we are not going to make this section a full tutorial. Instead, we just want to provide a few tips on HTML:
You may validate your HTML code via this service: https://validator.w3.org. This validator will point out potential problems of your HTML code. It actually works for XML and SVG documents, too.
Among all HTML attributes, file paths (the
src
attribute of some tags like<img>
) and links (thehref
attribute of the<a>
tag) may be the most confusing to beginners. Paths and links can be relative or absolute, and may come with or without the protocol and domain. You have to understand what a link or path exactly points to. A full link is of the formhttp://www.example.com/foo/bar.ext
, wherehttp
specifies the protocol (it can be other protocols likehttps
orftp
),www.example.com
is the domain, andfoo/bar.ext
is the file under the root directory of the website.If you refer to resources on the same website (the same protocol and domain), we recommend that you omit the protocol and domain names, so that the links will continue to work even if you change the protocol or domain. For example, a link
<a href="/hi/there.html">
on a pagehttp://example.com/foo/
refers tohttp://example.com/hi/there.html
. It does not matter if you changehttp
tohttps
, orexample.com
toanother-domain.com
.Within the same website, a link or path can be relative or absolute. The meaning of an absolute path does not change no matter where the current HTML file is placed, but the meaning of a relative path depends on the location of the current HTML file. Suppose you are currently viewing the page
example.com/hi/there.html
:A absolute path
/foo/bar.ext
always meansexample.com/foo/bar.ext
. The leading slash means the root directory of the website.A relative path
../images/foo.png
meansexample.com/images/foo.png
(..
means to go one level up). However, if the HTML filethere.html
is moved toexample.com/hey/again/there.html
, this path inthere.html
will refer toexample.com/hey/images/foo.png
.When deciding whether to use relative or absolute paths, here is the rule of thumb: if you will not move the resources referred or linked to from one subpath to another (e.g., from
example.com/foo/
toexample.com/bar/
), but only move the HTML pages that use these resources, use absolute paths; if you want to change the subpath of the URL of your website, but the relative locations of HTML files and the resources they use do not change, you may use relative links (e.g., you can move the whole website fromexample.com/
toexample.com/foo/
).If the above concepts sound too complicated, a better way is to either think ahead carefully about the structure of your website and avoid moving files, or use rules of redirects if supported (such as 301 or 302 redirects).
If you link to a different website or web page, you have to include the domain in the link, but it may not be necessary to include the protocol, e.g.,
//test.example.com/foo.css
is a valid path. The actual protocol of this path matches the protocol of the current page, e.g., if the current page ishttps://example.com/
, this link meanshttps://test.example.com/foo.css
. It may be beneficial to omit the protocol because HTTP resources cannot be embedded on pages served through HTTPS (for security reasons), e.g., an image athttp://example.com/foo.png
cannot be embedded on a pagehttps://example.com/hi.html
via<img src="http://example.com/foo.png" />
, but<img src="//example.com/foo.png" />
will work if the image can be accessed via HTTPS, i.e.,https://example.com/foo.png
. The main drawback of not including the protocol is that such links and paths do not work if you open the HTML file locally without using a web server, e.g., only double-click the HTML file in your file browser and show it in the browser.44A very common mistake that people make is a link without the leading double slashes before the domain. You may think
www.example.com
is a valid link. It is not! At least it does not link to the website that you intend to link to. It works when you type it in the address bar of your browser because your browser will normally autocomplete it tohttp://www.example.com
. However, if you write a link<a href="www.example.com">See this link</a>
, you will be in trouble. The browser will interpret this as a relative link, and it is relative to the URL of the current web page, e.g., if you are currently viewinghttp://yihui.org/cn/
, the linkwww.example.com
actually meanshttp://yihui.org/cn/www.example.com
! Now you should know the Markdown text[Link](www.example.com)
is typically a mistake, unless you really mean to link to a subdirectory of the current page or a file with literally the namewww.example.com
.
There are six possible levels from
h1
,h2
, …, toh6
.↩︎That is because without a web server, an HTML file is viewed via the protocol
file
. For example, you may see a URL of the formfile://path/to/the/file.html
in the address bar of your browser. The path//example.com/foo.png
will be interpreted asfile://example.com/foo.png
, which is unlikely to exist as a local file on your computer.↩︎