Aug 22, 2012

rel=canonical as a browser feature

I informally propose that rel=canonical become a tag that not only search engines respect, but also browsers.

For a little while now HTML5 browsers seem to have a feature where javascript can modify the displayed URL of the page using window.history.pushState.  The changes are of course subject to same origin policy rules (ie: the protocol, hostname, and port cannot be modified, only the path and parameters).

Originally javascript folks hacked this in with older browsers by shoving text after the "#" symbol in the URL.  Even though there are a number of problems with this, it was useful enough that it became somewhat widely used.  With modern browsers this is no longer required.

To see what I mean, click on this little demo: and look at the URL bar.  If you aren't running an outdated browser, you should see the URL changing every few hundred msec.  The page is not being re-fetched from the server.

The rel=canonical link tag has been telling search engines basically "I know you are fetching the URL, but I'd suggest you should pretend this URL is in your search index".  Basically the same idea as the window.history.pushState functionality, only for search engines.

I propose that a rel=canonical link tag on any HTML page which satisfies the same origin policy should visibly change the URL in the browser.  All the same motivations exist for this as they do in the browser.  If a user copy/pastes the displayed URL, they'll get a more satisfying experience.  If the user mis-types an URL (ie: .html vs .htm), sending them to the correct one generally requires a 301 redirect which adds latency.  The javascript solution is less reliable as users sometimes surf with javascript off, and the javascript may not execute until the page has finished loading either.

Are there any obvious reasons I'm missing why this is a horrible idea?

Are there any regular Gregable readers who work on browser standards and might want to propose this more formally?