Jan 27, 2011

Referrer Stripping

Inspired by Gabriel Weinberg's strange fixation on stripping search queries from HTTP referrers, I thought it would be interesting to describe how to actually strip referrers from an HTTP Request.  I don't think I've seen any good information on this elsewhere on the web.  It's a neat trick, albeit marginally useful.

First, let me explain what Gabriel is doing so as to compare.  If you do a query on Duck Duck Go and click a result, javascript on that page intercepts the click (in a sense) and sends you to http://duckduckgo.com/post.html instead.  http://duckduckgo.com/post.html is just static HTML with some more javascript that then re-sends you on your way to the actual destination page you had in mind, so the referrer sent to the destination is http://duckduckgo.com/post.html.  Duck Duck Go prefetches this URL before you click on any URL and your browser caches it, so there is only a tiny latency hit from running the javascript, and no network delay.  Reasonably clever.

This doesn't in fact strip referrers at all, it just changes them so that the referrer sent to the destination page is simply http://duckduckgo.com/post.html.  The query is gone, but the fact that you did a query and that you did so on Duck Duck Go isn't.  I'd honestly suspect the fact that one uses Duck Duck Go to be more revealing about demographics than the query itself, but I digress.

You as a user can strip referrers by modifying your browser: [Chrome, Firefox, Opera, IE?, Safari?]  What if you as a webmaster really want to strip the referrer for your users clicking on your links?  That's a browser feature and as a webmaster you can't change your users' settings, you can't, right?  Turns out that you can, but it's a pain in the ass because you need to do different things in different browsers.

With later versions of Webkit and hence Safari and Chrome, it's well known that attaching rel=noreferrer to an anchor will successfully tell the browser to not send the referrer on that request.  If you want to do this in javascript instead of through a plain anchor, you can make it happen by creating an anchor element in the DOM  and then simulating a click event via event.initMouseEvent.  Older versions of Safari and Chrome don't work here, and I don't know a workaround, but these browsers auto-update, so it's not common to see really old versions.

Firefox and IE don't support rel=noreferrer as far as I know.  However for as far back as I've tested them, a web page that performs a meta-refresh to a destination URL, even a 0-second refresh, will not send any referrer to that destination URL.  This doesn't work in Webkit - it passes a referrer, but for some reason it works in Firefox and IE.  It's undefined behavior, nowhere in the spec, so it could change in later versions of these browsers, buyer beware.  However rel=noreferrer is part of HTML5, so later versions of these browsers will probably eventually work with rel=noreferrer.  Want to do this with a plain link or javascript?  Simply stick an intermediate page with the meta refresh as an in-between URL like Gabriel does with javascript, and you'll have the same effect.

Konqueror and Opera don't allow any of these tricks last I checked (it's been awhile), and for your various other browsers (phones mostly), all bets are off.

I don't really know why you would want to strip referrers as a webmaster.  It literally is "breaking" the way browsers and the internet are supposed to work.  I've used it occasionally for internal systems (like a control panel) where you don't want a referrer to expose the existence of an URL not usually accessible, but one can use obfuscation with a different URL in this case, as Gabriel does.  Obfuscation is guaranteed to work in all browsers, and so is simpler to implement and maintain.

Jan 2, 2011

Preserving Digital Remains

Lately, I've started researching my genealogical history a little bit. Intrigued by discovering previously unknown relatives within 23andMe, I've spent some time combing through public records, bought a couple more 23andMe kits for living ancestors, and started talking to my family to see what they remember while they are still alive. It's a fascinating puzzle problem.

One thing that I've started wondering about is the future of the content I'm creating during my lifetime. This goes beyond backups: I 'm reasonably skilled at protecting my data while I'm still alive. I'd like to have some of my data survive me. I want my story to be immortal. I have no clue how I would go about making such guarantees. Lots of organizations are trying very hard to bring data from the past into the present, but how would one go about pushing data from the present into the future?

This isn't just motivated by ego, although that's partially true. For example, it would be useful for my future relatives to be able to know my medical history to understand their own risks. Sshould I become famous after my death for some odd reason, it would be historically interesting to have more details on my life story. As someone living in my ancestor's distant future, I would absolutely love it if I could comb through the digital remains of my great great great grandparents who lived during the civil war.

It's an interesting problem. How can I assure that the content in my blog posts will be immortal and searchable for all time. Can I safely assume that blogger will keep my blog running forever? Yahoo closing Geocities makes it pretty clear that there is no guarantee of persistence in free services. Lots of companies have appeared that allow me to send data to family immediately after my passing, but what if the future historian who is interested in my data hasn't yet been born, and family that outlives me doesn't take much care to preserve my data?

One approach would be to create a company designed for this purpose that would attempt to outlive it's employees. That company would require a large upfront payment for storage, for example: $100/GB. The price would be calculated such that interest on a safe investment of that size could safely expect to cover ongoing archival costs for eternity along with some profit.

Even this though seems risky. What if interest rates decline and storage costs increase (peak oil causing a rise in electricity cost for example). What if the company changes hands and the new owners decide that the best option for shareholders is to delete all old data and cash out the interest bearing bank account? Maybe the executives decide to invest in something a little risky to improve profits and the company goes bankrupt.

Maybe we could back this organization by some large government, such as the US. Costs are still paid as a large upfront chunk, so it would require no taxpayer burden unless there is a gravely bad estimation of the ongoing costs. Presumably it would be politically unpopular to risk losing this data, so there would be more pressure felt from historians or the like than if the organization was profit motivated. Governments have historically done at least a reasonable job of preserving records, such as Census data or birth certificates.

I think lots of people would pay a reasonable cost for this service. Perhaps most wouldn't while they were living, but as part of a funeral service, a mortuary could accept a box of writing or a CD or whatever and scan/upload all of the data found for the deceased. Compared to the rest of the costs of funerals, this line item would be pretty modest. If there were privacy issues, you just put some kind of digital seal on the data: do not open for 100 years. Future Historians would presumably find this to be a very valuable trove of information.