So, why do we even need URL Shorteners? The answer is simple: because URLs are too long. This may be an issue made more obvious with twitter, cell phones, or any kind of manual text-entry, but it isn't only related to this. Essentially, most interesting content on the web has a URL that is too long to remember/type in/share. This can be a problem if you are:
- Sending an email to someone who uses a crappy email client that wraps (breaks) lines over some character limit.
- Hanging posters in your dorm with a URL to get more information.
- Giving a talk at a conference and want the audience to write down/remember some URL later.
- Having a verbal conversation with a friend: "I'll send you a link later" is a symptom of this issue.
When "moving pictures" (video) first became possible to a large audience, we largely just recorded plays - what we were used to pre-video. Only with time did we learn that the new medium afforded interesting new possibilities: camera angles, shifting scenes, overlaid audio, special effects, etc.
The web evolved similarly. In the original web, most web servers were designed to be a way to access a collection of files on a server some where. We were familiar with file systems and the pre-web internet was a lot of FTP and BBS servers. Our URLs naturally then mirrored file systems. There was certainly nothing that I know of in the HTTP spec that said they had to be. This got us into some trouble:
With the file system as a metaphor, URLs got extensions (.html, .php, .asp). Even though the HTTP spec defined a way to communicate the content type outside the URL structure, we were familiar with the extension UI element. However the vast majority of the URLs we interacted with were all one content-type: HTML. Sure, HTML embedded .gif and .js, but users didn't directly interact with those URLs often, they were hidden. What type of software generated the page (.php, .asp, .jsp) wasn't remotely interesting. For the vast majority of URLs we were viewing, the information presented in extension was redundantly obvious or plain irrelevant. Even this post will have a URL that ends with .html, 5 characters of needless redundancy!
With the file system as a metaphor, URLs became organized hierarchically into directories. We grouped them by topic, date or whatever with well-defined levels of hierarchy. Each file in one folder. Most early http servers would even automatically generate and serve an "index" page which listed all the files in a particular directory. What was a weak metaphor for a hard drive file system became worse on the internet. Hyperlinks made certain of that. Instead of there being only one path to navigate through a series of directories to a document on the internet, links made sure there were plenty of paths to navigate. Our URLs looked like a tree, but on closer inspection, we had really built a web.
Take this post for example. It's path looks something like:
However, I sincerely doubt that you navigated to this post by first looking for documents that I created in 2009, followed by those I created in May (month 05). You came through either a hyperlink or a feed reader. The directory structure here is showing information that isn't usually that interesting to a user actually interacting with a URL. How often are book titles based on Dewey Decimal categories?
The file system metaphor can't explain all our woes. After all who in their right mind would ever name a file something so long as why-do-we-even-need-url-shorteners.html? And originally, the web wasn't named this way. Had I chosen it, this page might have a name of url-shorteners.html or long-URLs-rant.html. But then search engines came along. And before long it became known that one of their ranking signals was words contained in the URLs. Users didn't type in URLs anyway, right? They just clicked on them, so it quickly became more important to create URLs for Search Engine Marketing than for Usability: more keywords are always better.
But you can't blame Search Engines. People frequently named their pages with descriptive URLs. Using this as a signal made lots of sense. And once webmasters noticed it and reacted to it, this custom was only further reinforced. As a result we have, why-do-we-even-need-url-shorteners.html(39 characters) instead of url-shorteners.html (19 characters).
The HTML spec isn't completely blameless either. Since our metaphor was a file system, we never really expected significant amounts of dynamic content. When HTML forms were designed, we imagined things like a way to leave a comment for a webmaster, or a way to upload a file. After all, what other interactions had we really done in the days of FTP or BBS systems?
Historically each hostname (subdomain) generally referred to a different machine. Most machines exposed to the internet were not running HTTP servers. As a result, most uses of hostnames were for things other than a web browser. Since the default was not HTTP, we needed a way to refer to the machine running the HTTP server. A custom arose - the HTTP server would run on the machine named www. It was short, easy to type, memorable, and unique. These days with hardware load balancers, HTTP hostnames rarely refer to individual machines directly. Instead a single hostname can refer to hundreds of separate machines. However www has stuck around because people have come to expect it. The mere presence of a www prefix calls up the concept of a web page in most minds. As you'll notice, gregable.com doesn't have a www and neither do url shorteners - 4 unneeded characters that will be with most URLs for a long time.
Change you can believe in:
Fortunately, this is not a chicken and egg problem. If you run a website or a CMS system, you could write better URLs today without waiting for your customers to do something first. Not all chickens have that much control, but many do. And many websites are already paying attention. Take a close look at how Twitter carefully crafts their URLs to be user interface elements in themselves.
A few of my suggested rules of thumb, but first an important disclaimer. I do work for a search engine company, but the opinions expressed on my blog are my own and not necessarily those of my employer. These recommendations may not be valid in the context of search engine optimization. They are simply my opinions about how URLS could be effectively used as a User Interface Element. With that out of the way, here we go:
- Drop the www. But if your users type it, make sure you still get them to the right place.
- Drop the extensions (.html, .php) for HTML pages - they are the default. Keep them for non-HTML documents (PDF, images, text) because they are useful hints to a user about what to expect.
- Don't let HTML forms dictate your URL structure. They are a necessary evil for actual user-input, but they create awful URL UI experiences.
- Use directory structures for things users care about, not uninteresting categorization. Each level you add makes the URL longer and potentially harder to remember/reuse.
- Urls should be descriptive. Long numbers are often really bad, a few words are really good.
For example, this URL could easily have been as long as:
http://www.gregable.blogspot.com/2009/05/why-do-we-even-need-url-shorteners.html (80 chars)
Or it could potentially have been as short and descriptive as:
Or it could potentially have been as short and descriptive as:
http://gregable.com/long-urls.html (34 chars)34 chars isn't bad. Even a tinyurl would look like http://tinyurl.com/ddvhhc (25 chars). And consider how much more information is conveyed in the short and descriptive URL for a cost of 9 measly characters.