Feb 18, 2014

Network Aware Sheetfed Scanner

A little over 3 years ago, I posted a question asking about a consumer product available for scanning to a network drive.

These days there are plenty of options, so I went out and bought an brother MFC-7860DW. I hadn't researched this heavily, so don't consider this a review. It's a scanner/printer that can handle wifi (handy where I wanted to put it), print duplex (yay), and can scan via ADF (automatic document feed, aka sheetfed) straight to a pre-configured FTP location. I overlooked one feature, duplex ADF scanning, which I wish I had.

Now that I have a scanner that uploads to a local FTP drive, I wanted more. I wanted to then automatically upload to a Google Drive folder. Google Drive has very good automatic OCR of all text in uploaded scans, is accessible anywhere, and it's much better than an FTP directory in terms of usability. I can even load documents on a cell phone app.

I started a small project that would monitor for new files on the FTP drive and then upload them to Google Drive. I thought this would take an afternoon, but 600 lines of code later and I'm only now feeling done with the project. I did throw in learning go as part of this, which didn't help things.

The project is here if it inspires anyone: https://github.com/Gregable/ScanServer

One of the big issues I didn't think about initially was that scanned files wouldn't be instantly "complete". It would take several seconds between a file being created and finished uploading from the scanner. If I started processing the file too quickly, I'd get a partial upload. If I slept too long, I'd add too much latency to the whole process. I wanted to largely be able to look at the uploaded file seconds after scanning it so I could verify that it looked right before disposing of (shredding) the scanned document, so latency wasn't a great idea.

Another issue was scanning duplex. I thought it should be possible to scan two single-sided documents and have my script merge them. As it turns out, this is possible, though tricky. If you flip a multiple-page document over, the page order is reversed. As a result, you need to interleave pages backwards. Something like:

  1. Front doc, Page 1
  2. Back doc, Page 3
  3. Front doc, Page 2
  4. Back doc, Page 2
  5. Front doc, Page 3
  6. Back doc Page 1
My scanner had the ability using buttons to select different filename prefixes for uploaded docs, things like "laser", "receipt", "estimate", I ended up choosing one of the prefixes to indicate duplex docs. I then had to think about error cases. What if a user accidentally scans only the front half as duplex and marks the back half as single-sided. Or vice-versa. In these cases, I wanted to upload the document as two single-sided documents. What if I saw two duplex documents come in, but they had different numbers of pages. In this case, I wanted to treat the first one as a single-sided document, but keep the second as potentially the first side of a duplex document that might come next. What if I saw a single duplex document come in and nothing followed? In this case, I wanted to wait 15 minutes to see if I got the other side of the duplex document, and if not give up and treat it as a single-sided document. These cases were fun to work through.

Ultimately, I had fun implementing this with go channels and goroutines. I had one goroutine in a loop monitoring for new ftp documents and managing waiting until they seem to be "finished" uploading before outputting the document to a channel.  I had another goroutine sorting out the duplex / single-sided issues, creating merged documents from duplex uploads, and handling the user error cases above. This goroutine spit out the documents to be uploaded to Drive into another channel. A third goroutine uploaded to drive and spit out to a "final" channel which cleaned up any temporary files created and echoed success messages to the console.

Currently I have this little script running on a desktop and monitoring the FTP store (on a NAS) via NFS. Assuming I don't find any more bugs, I will next move it over to a raspberry pi so I don't have to leave the desktop running.

Jan 12, 2014

Mt Olympia Hike, Part of Mt Diablo State Park

I'm posting a little trip report mostly because I couldn't find too much good information ahead of time. Hopefully this will be useful to someone else.

I found a brief description of this hike in Weekend Sherpa. It's a ~6 mi hike up and down Mount Olympia on the back side of Mt Diablo Park. The climb has ~2,000 ft of elevation. It's not even gains though, so once you really get climbing it averages about a 20% grade with brief blips at or above 30%. The trail is also very narrow at spots making footing a little tricky. At no point did I feel it was dangerous beyond the slight risk of twisting an ankle though. You can see the route and elevation profile on strava: http://www.strava.com/activities/105653802

It was a fun trail. Mostly exposed, there were some great views. The geology is interesting with several outcroppings that lent character to the route. Short twisty trees and burned out areas also add color and interest. Not terribly popular, as I think it's probably pretty difficult to find.

SummitPost has a brief description of the trailhead / parking:

Start from the Marsh Creek Road trailhead (elevation 900 feet, there is no sign that says Mount Diablo State Park but none that says "private property, no trespassing" either). Pass under the gated entrace, and follow a dirt road around a small hill on the right.

The trailhead parking appears to be on the South Side of Marsh Creek Road, but it is poorly marked. I'm not 100% sure this is the correct spot, though I think so. It's more of a dirt pullout with a gate across it. There are a couple of mailboxes and when I was there some trash cans had been set out. The gate has some security cameras and signs. It looks like private property except that one of the gates has a Mount Diablo State Park sign on it, presumably new since the SummitPost writeup. There are no parking signs, but similarly no "no parking" signs either. When I arrived there were 2 other cars, 3 others when I left. Here's what it looks like on Google Streetview:

The maps available online also don't have most the trails marked. Google Maps doesn't have them marked, even Mt Diablo's park brochure doesn't show the trails. The map you really want is the Mt Diablo Interpretive Association topo. I bought mine at the Fremont REI on the way up. Here's a quick view of that section of the map:

You start out walking about a half mile down the unmarked Three Springs Road from the parking lot in the Northeast before coming to Mt Olympia trail markers. The trails are well marked, fortunately. I continued on the road as far as I could and then took the Olympia Trail to East Trail to Mount Olympia. This is a steep but pretty route. On the way back I took the longer route around via Mount Olympia Rd. It's a fire road, not as attractive, but easier on the knees for the descent.

There is also a trail option south from Olympia about a mile to North Peak. I didn't go that route as the peak was shrouded in clouds and I was getting chilly at the top of Olympia as it was. I may return some time and take this option though.

Oct 6, 2013

Majority Voting Algorithm - Find the majority element in a list of values

I haven't done an algorithms post in awhile, so the usual disclaimer first: If you don't find programming algorithms interesting, stop reading. This post is not for you.

On the other hand, if you do find algorithms interesting, in addition to this post, you might also want to read my other posts with the algorithms tag.

Problem Statement

Imagine that you have a non-sorted list of values. You want to know if there is a value that is present in the list for more than half of the elements in that list. If so what is that value? If not, you need to know that there is no majority element. You want to accomplish this as efficiently as possible.

One common reason for this problem could be fault-tolerant computing. You perform multiple redundant computations and then verify that a majority of the results agree.

Simple Solution

Sort the list, if there is a majority value it must now be the middle value. To confirm it's the majority, run another pass through the list and count it's frequency.

The simple solution is O(n lg n) due to the sort though. We can do better!

Boyer-Moore Algorithm

The Boyer-Moore algorithm is presented in this paper: Boyer-Moore Majority Vote Algorithm. The algorithm uses O(1) extra space and O(N) time. It requires exactly 2 passes over the input list. It's also quite simple to implement, though a little trickier to understand how it works.

In the first pass, we generate a single candidate value which is the majority value if there is a majority. The second pass simply counts the frequency of that value to confirm. The first pass is the interesting part.

In the first pass, we need 2 values: 
  1. A candidate value, initially set to any value
  2. A count, initially set to zero.
For each element in our input list, we first examine the count value. If the count is equal to 0, we set the candidate to the value at the current element. Next, first compare the element's value to the current candidate value. If they are the same, we increment count by 1. If they are different, we decrement count by 1.

In python:

candidate = 0
count = 0
for value in input:
  if count == 0:
    candidate = value
  if candidate == value:
    count +=1
    count -= 1

At the end of all of the inputs, the candidate will be the majority value if a majority value exists. A second O(N) pass can verify that the candidate is the majority element (an exercise left for the reader).


To see how this works, we only need to consider cases that contain a majority value. If the list does not contain a majority value, the second pass will trivially reject the candidate.

First, consider a list where the first element is not the majority value, for example this list with majority value 0:

[5, 5, 0, 0, 0, 5, 0, 0, 5]

When processing the first element, we assign the value of 5 to candidate and 1 to count. Since 5 is not the majority value, at some point in the list our algorithm must find another value to pair with every 5 we've seen so far, thus count will drop to zero at some point before the last element in the list. In the above example, this occurs at the 4th' element:

List Values:
[5, 5, 0, 0, ...

Count value:
[1, 2, 1, 0, ...

At the point that count returns to zero, we have consumed exactly the same number of 5's as other elements. If all of the other elements were the majority element as in this case, we've consumed 2 majority elements and 2 non-majority elements. This is the largest number of majority elements we could have consumed, but even still the majority element must still be a majority of the remainder of the input list (in our example, the remainder is ... 0, 5, 0, 0, 5]). If some of the other elements were not majority elements (for example, if the value was 4 instead), this would be even more true.

We can see similarly that if the first element was a majority element and count at some point drops to zero, then we can also see that the majority element is still the majority of the remainder of the input list since again we have consumed an equal number of majority and non-majority elements.

This in turn demonstrates that the range of elements from the time candidate is first assigned to when count drops to zero can be discarded from the input without affecting the final result of the first pass of the algorithm. We can repeat this over and over again discarding ranges that prefix our input until we find a range that is a suffix of our input where count never drops to zero.

Given an input list suffix where count never drops to zero, we must have more values that equal the first element than values that do not. Hence, the first element (candidate) must be the majority of that list and is the only possible candidate for the majority of the full input list, though it is still possible there is no majority at all.

Fewer Comparisons

The above algorithm makes 2 passes through our list, and so requires 2N comparisons in the worst case. It requires another N more if you consider the comparisons of count to 0. There is another, more complicated, algorithm that operates using only 3N/2 - 2 comparisons, but requires N additional storage. The paper (Finding a majority among N votes) also proves that 3N/2 - 2 is optimal.

Their approach is to rearrange all of the elements so that no two adjacent elements have the same value and keep track of the leftovers in a "bucket". 

In the first pass, you start with an empty rearranged list and an empty "bucket". You take elements from your input and compare with the last element on the rearranged list. If they are equal you place the element in the "bucket". If they are not equal, you add the element to the end of the list and then move one element from the bucket to the end of the list as well. The last value on your list at the end of this phase is your majority candidate.

In the second pass, you repeatedly compare the candidate to the last value on the list. If they are the same, you discard two values from the end of the list. If they are different, you discard the last value from the end of the list and a value from the bucket. In this way you always pass over two values with one comparison. If the bucket ever empties, you are done and have no majority element. If you remove all elements from the rearranged list without emptying the bucket your candidate is the majority element.

Given the extra complexity and storage, I doubt this algorithm would have better real performance than Boyer-Moore in all but some contrived cases where equality comparison is especially expensive.

Distributed Boyer-Moore

Of course, Gregable readers probably know that I like to see if these things can be solved in parallel on multiple processors. It turns out that someone has done all of the fun mathematical proof to show how to solve this in parallel: Finding the Majority Element in Parallel.

Their solution boils down to an observation (with proof) that the first phase of Boyer-Moore can be solved by combining the results for sub-sequences of the original input as long as both the candidate and count values are preserved. So for instance, if you consider the following array:

[1, 1, 1, 2, 1, 2, 1, 2, 2]  (Majority = 1)

If you were to run Boyer-Moore's first pass on this, you'd end up with:
candidate = 1
count = 1

If you were to split the array up into two parts and run Boyer-Moore on each of them, you'd get something like:

split 1:
[1, 1, 1, 2, 1]
candidate = 1
count = 3

split 2:
[2, 1, 2, 2]
candidate = 2
count = 2

You can then basically run Boyer-Moore over the resulting candidate, count pairs the same as you would if it were a list containing only the value candidate repeated count times. So for instance, Part 1's result could be considered the same as [1, 1, 1] and Part 2's as [2, 2]. However knowing that these are the same value repeated means you can generate the result for each part in constant time using something like the following python:

candidate = 0
count = 0
for candidate_i, count_i in parallel_output:
  if candidate_i = candidate
    count += count_i
  else if count_i > count:
    count = count_i - count
    candidate = candidate_i
    count = count - count_i

This algorithm can be run multiple times as well to combine parallel outputs in a tree-like fashion if necessary for additional performance.

As a final step, a distributed count needs to be performed to verify the final candidate.

Birds of a Feather

If you are one of the handful of people interested in voting algorithms and advanced software algorithms like this, you are the type of person I'd like to see working with me at Google.  If you send me your resume (ggrothau@gmail.com), I can make sure it gets in front of the right recruiters and watch to make sure that it doesn't get lost in the pile that we get every day.

Jul 7, 2013

Solar, with some live data

As a follow up to my previous post, Residential Solar Financials, we've now had our solar set up long enough now to get some meaningful data from PG&E:

The panels were installed in late May, but PG&E didn't come out and set up net metering until early July. Until PG&E came out, any extra power being generated was being put back on the grid as a freebie.

Here's the day to day net for June from PG&E. The end of the month was pretty hot and the A/C was getting some real use.

Here's an example daily snapshot. Each bar represents power consumption for 15 minutes. A/C got some use in the evening to cool down the house. You can see a nice curve from the solar output with some chunks missing where we had a load running.
Scan from our electricity bill for June. The different bars represent the 3 different time-of-day rates. We don't generate much power in the off-peak rate as most of this is night-time. We banked $32 of energy credits for the winter! Last year we had to pay about $100 for June, so this is a big difference.
In total so far, we're averaging around 20.6 kWh/day of generation, which is actually a little bit higher than predicted.

May 25, 2013

Residential Solar Financials

What would you say if I told you that from a purely financial perspective, one of the best investments you could make is a rooftop solar install?

You probably wouldn't believe me. You'd probably tell me that I'm not factoring in risk of the panels breaking down. You'd be quick to point out that I'm likely neglecting the time value of money. You'd conclude, "Greg, I'm sure it's great for the environment and the country, but that's different than a good bet for my retirement portfolio.".

But, I'd bet you'd be interested if I could prove you wrong. I'm going to try.

First of all, the devil is very likely in the details. If you have cheap electricity rates already, a shady roof, or live somewhere without much sun, this may not be as great of a deal. I'd been wanting to install solar for a long time, but as an apartment dweller it hadn't been an option.

I recently bought a house in the Bay Area. I waited a year to install solar so that I'd first have a good idea of my power usage. The photo on the right is my array. Below, I'll tell you about all of the financial details which is why I think this'll be a great investment for me. It's probably a bit boring with a bunch of math and numbers. I wouldn't consider you lazy if you felt it wasn't worth reading.

Installation Cost

The array above is 10x SunPower E20 327W panels. In addition to the panels, you also need an inverter, which is a device that converts the variable DC output from solar panels into AC power with the right voltage and frequency characteristics to connect up to the electrical grid. There is also some amount of electrical and construction work that is required to mount the panels on tracks on the roof, run conduit to carry the current, etc. In total, including install and all other costs, my system grossed $19,820.

However, right off the bat were several rebates from this number. There is a very significant 30% federal tax credit which I'll see next year at tax time. There is a California rebate program that just ran dry which rebated back a little bit based on how much is generated. This was worth another ~2% discount. I also had a rebate from SunPower (the equipment manufacturer) for 35c/W installed.

In total, after all of the various rebates, my net cost was $12,346. Definitely an investment.


Most panels will come with some form of manufacturer warranty, often guaranteeing a certain amount of production given a certain amount of light. It does seem like there is some financial risk in collecting on these warranties - the solar manufacturer must not go bankrupt and collecting may be a hassle even if free. SunPower warranties their panels for 25 years. The warranty covers the cost of panels and the work to replace them. They expect some degradation over time, but they guarantee 95% of their rated performance for 5 years and then that drops by 0.4% every year for the next 20. At the end of the 25 years the warranty guarantees output will be at least 87% of the original rated output.

The inverter has a separate warranty for only 10 years. Most inverters tend to have a more limited lifespan than panels and it is expected that I'll need to replace my inverter once during the life of the panels, probably around 15 years out. I asked for an estimate of what it would cost to replace a dead inverter and was quoted $400/kW or about $1,200 in my case. For my estimation, I'll conservatively add in $1,200 cost, however it is certainly possible that inverters will be cheaper by that point - they are just electronics after all and electronics tend to get cheaper over time.


The US Dept of Energy runs a web tool called PVWatts that allows a person to calculate with reasonable accuracy how much power a set of panels will generate, given a number of inputs: location, angle of roof, direction of roof, size of array. It's no guarantee but the model should be fairly accurate over long periods of time. In my case, plugging in my numbers produced a generation estimate of 4,830 kWh per year. A solar installer will likely run the PVWatts numbers for you as part of their design process.

Dollar Savings

As a Californian, I buy my power from the utility PG&E. In Caliornia, PG&E uses tiered rates. Tiering works kinda like income tax brackets. The first X kWh you use in a month cost some low price. After that is used up, the next X kWh you use in a month cost a good deal higher prices. This continues through several tiers. The highest tiers have a very high cost per kWh. So, while your average cost per kWh may be somewhere in the middle of these costs, saving a little power is all savings at the highest, most expensive, tier. The higher tiers can be as much as 5x more expensive than the lower tiers. Tiering works in favor of solar generation as you will get save money in the highest tier first. To see what tiers you are paying from, you can simply look at your bill.

In addition to tiering, I can also take advantage of Time of Use rates. Let me explain. The biggest load from residential electric usage is air conditioning. Heating is frequently powered by natural gas, but A/C can't be. Worse, in a particular area, A/C usage is highly correlated - almost everyone runs their max A/C at the same time - middle of the afternoon, when it's hottest. This is a problem for utilities: peak load on the grid is on hot summer afternoons, which sometimes causes brownouts.  Unlike water or gas, you can't cheaply store electricity. You must produce it at the moment it's needed.  This means that you have to have production capacity on your grid equal to the peak usage on a record hot summer afternoon, but 98% of the year you won't need this capacity you've paid for.  This high demand and low supply means that the power company is generally losing money during this peak time - you are paying them less for that electricity than they are paying to generate it. They make it up the rest of the year. Instead of fixed rates, PG&E also offers Time of Use rates if you choose to use it.  Rates during summer afternoons (1pm-7pm in the summer) are 3x higher than at other times of the day, such as at night. Those high-price rate periods are exactly when solar is generally performing the best.  This means I can generate grid power in the afternoon and get paid peak rates but then when I use power at night, I pay 3x cheaper off-peak prices. PG&E actually has a tool on their website that will tell you, for the last year, how much you would have paid with your current rate plan and a time-of-use rate plan.  Even without solar, many folks will save money with Time of Use rates.  This will especially be true for folks with electric cars that charge overnight. Plug-in cars and solar have a synergy of price savings due to time of use rates.

The exact tier levels and prices vary, but the shape is roughly correct.
Time of use also prefers roofs that face south-southwest.  A due south roof would be optimal for kWh generation, but a slightly SW roof shifts power generation a little more towards the afternoon where the time of use rate is higher.  Unfortunately my roof is south-southeast.

Tiering and Time of Use conspire to significantly improve the economics of solar.  Generally you can design a solar system that covers X% of your power needs and way more than X% of your cost.  In my case, I was already efficiently using power, so I'm getting a significantly lower multiplier out of this equation than most people would.

In my case, as a rough estimate based on previous years, I will be generating about 75% of the power I'm using, but saving about 85% of my bill.  If I increase my power usage down the line, the savings bonus will grow even larger.

Bottom Line Savings

Based on the above, I expect around $1,032/yr savings in electricity starting out.  My production will drop over time however the price of energy will very likely rise faster.

Financial Comparison

The risk for a solar install is much lower than stock market.  Assuming one has a balanced portfolio of stocks and bonds, it is therefore fairer to compare solar returns to the bond section of a portfolio than to stock or other risky investments.  The returns from solar are very predictable and nearly guaranteed.  There are some risks - the panel manufacturer could go out of business just before your panels all fail.  Electricity rates could suddenly drop due to new amazing technology.  However, both of these are low probability events compared to swings in the stock market.  As a result, I'm going to compare a solar investment to a long term bond investment.  Current 30 yr investment grade bond rates have a yield of around 2-2.5%.  I'm going to give bonds the benefit of the doubt and use the higher 2.5%.  Even with much higher returns (stock level returns), the math still comes out in favor of solar, just less so.

Solar "returns" are actually just savings rather than income.  As a result, the returns for solar are effectively tax-free!  Bonds have returns with tax consequences.  I'm going to assume a 20% marginal tax bracket for you.

Solar output may decline, but we can bound it at 0.4% per year as per our warranty.  Electricity prices tend to rise over time.  Solar companies like to make estimates using a 5% electricity price growth which has certainly happened in the past, but this may be a little optimistic.  Still, it seems fairly safe to assume that electricity prices will at least keep up with inflation and will very likely beat inflation.  Solar is actually an inflation-protected investment.  I'm going to be reasonably conservative and assume a 3% / year increase in electricity prices.

To model, I'd like to compare returns from my solar install to taking the same money and investing in the above bond fund.  On the solar side of the equation, every year I save money from my panels, I'll take that savings and put it into a bond investment to grow alongside solar.

Here's my final spreadsheet: Solar Return Comparison

With the assumptions above, the solar system returns it's initial investment around year 10.  It still lags traditional investments, however, until around year 20.  By year 25 though, returns via solar are more than double that of traditional investment.  

If I assume that the panels completely die exactly 1 day out of warranty and provide no additional value, I'd need to be able to achieve 8.9% bond returns to make solar unattractive.  Try it by changing the alternative yield to 8.9%.

You can make a copy and change any of the values on the right to see how it affects things.  Small changes don't have big effects fortunately.  You can increase the alternative investment returns to much higher numbers or significantly reduce electricity price growth, and solar still beats out investments within the 25 year warranty period.

The cost of your install and the expected generation might make a big difference though.  Especially if your panels must face north or will be shaded for parts of the year.

There are lots of ways to be more optimistic and see dramatically better returns in the model:

  • More home electricity usage leads to better tiering savings because I'll be "saving" more power that otherwise would have been at a higher tier.
  • Panels may decay slower than the worst case under warranty.  They may continue to work fine well past 25 years.
  • Inverter may not fail as early as 15 years or may be less expensive to replace than budgeted
  • You may have a higher marginal tax rate
  • Electricity may become more expensive much faster (for instance, lots of inflation or a carbon tax).
  • You may be able to additionally sell carbon credits for reduced emissions.  PG&E is looking into a program to do this.
  • Roof panels may reduce your cooling costs due to additional roof shading.
What if I sell my house?

Solar is documented to be one of those investments that raises the value of a home significantly.  One model to consider is that a new homeowner has a monthly bill in mind.  That bill includes mortgage, utilities, etc.  In theory, a homeowner would be willing to pay $X more in mortgage if they saved $X more in utilities - it's the same net result.  If the new homeowner will save $1,000/yr on utility bills, it would make sense that they'd be willing to take on a mortgage with a $1,000/yr higher payment.  30 year mortgage rates are around 4% which means that they could take on a ~$17,000 larger mortgage.  Even after real estate agent commissions, this is an immediate return on the solar install.

You can even flip this on it's head to show that if you have some spare cash and are choosing between solar and paying down the mortgage, the solar install may be a better deal.

Won't I have better returns if I wait a few years?

I've heard this question from a few people.  Folks have heard that the price of panels has been dropping quickly.  Some estimates put the panel prices (cost per watt) dropping at around 7% per year.  The drop in prices was the cause for the failure of Solyndra in the Bay Area and certainly didn't help out Suntech either.  The assumption is that the longer one waits, the better the returns will be.

I'll admit that I don't really know how to analyze this possibility.  I can throw a little cold water on this idea though.  Solar installations have costs outside the panels.  There are the roof racks, wiring, conduit, inverter, and most importantly the labor.  I didn't get a specific breakdown for my install of how much was panels vs. everything else, but from what I've read, it's about 50/50 at this point.  The non-panel costs aren't dropping very fast.  This means that even if panels become half as cheap next year, you'll only save maybe ~25% on the entire install.

Another concern is that incentives may disappear.  CA's solar incentive is now gone as of a few months ago. The federal incentive of 30% of the total cost expires at the end of 2016.  It is unclear what, if anything, it will be replaced with, but it's unlikely to be higher than what it is today.


Lastly, if you are looking for a company in the Bay Area to do an install, I would recommend getting a quote from ProVoltz, who did my install.  It's not the type of work I see done multiple times, but from what I could tell they did a great job.

Also, if you are considering sunpower panels, fill this form out and we'll both end up with $200 gift certificates for the referral.

Mar 17, 2013

Backups with a ReadyNAS Ultra 4 Plus and CrashPlan

After a long vacation full of photography, you load all of your photos up in Picasa.  You spend hours tweaking colors, cropping, tagging faces, and plopping down geo-tags onto a big map.  The very next day and your computer won't start.  The drive is dead.

Despite the fact that the value of those photos and all of your other digital memories are priceless to you, you have never spent the effort to set up a reasonable backup.  It was something you planned to do, but never got around to.  Three years of photos are now locked away in a lifeless hard drive.

You consult your friends.  You get 5 different recommendations for 5 different pieces of software that will attempt to recover the disk.  The platters won't spin though, so no software will help.  Next you try to find an exact duplicate of that disk and swap the platters.  This isn't remotely easy and has a very low chance of success.  Maybe you care about your photos enough that you even send the drive off to a professional drive recovery service.  Perhaps that works and 30% of your files are recovered, but the cost ends up sky high.

Most of my geek friends at least claim to have a backup system in place for the files that they care the most about.  These systems usually have one or more of the following flaws:

  • Technically complex, ie: cron jobs, command lines, shell scripts.  (guilty)
  • Cost fairly large sums of money (the cheapest online backups I see usually start at $5/mo)
  • Require regular human action (swap out thumb drives, burn a CD).
These flaws generally aren't fatal for geeks, but they are for non-geeks (ie: family).  So, how do geeks approach the family tech support backup problem?  I'll share my solution with you, though I'm sure there are alternatives.

For software, we are going to install CrashPlan.  CrashPlan is a client/server backup system with a number of really handy features that we want:
  1. Unless you want to back up to CrashPlan's servers, it's free to use.
  2. It's a Java app that runs on multiple systems (windows, mac, linux).
  3. It has a decent graphical UI that is non-technical.
  4. You can offer to be a backup destination for a friend.  The process is very simple for both of you, Crashplan gives you a 6 character code.  If the friend enters that code in their client, they can backup to you.  Firewalls, dynamic ip addresses, etc are all negotiated for you keeping things simple.  Backups are encrypted before being sent, so there is no privacy risk.

This last feature is what I use for my family backups.  However, on my side things get a little more geeky/technical.

I don't like leaving my machines on when not in use, due to power consumption.  However, by default this would make backups challenging as there will only be transfer when both me and my family member's machines are up.

Instead, I use a Network Attached Storage device (NAS) to store my backups.  It's low-power-ish and always on, which saves me money over leaving a energy hungry computer on all the time.  For CrashPlan, you'll need a NAS with an x86 processor and which allows you to run software on it.  I use the ReadyNAS Ultra 4 Plus.

On my NAS, I install the Community Plugin that enables Root SSH Access and reboot.  Now, I have root access to my NAS with the admin password used to setup the NAS.  Simply ssh root@nas

Now I need to add to /etc/apt/sources.list a new source: deb http://archive.debian.org/debian-backports etch-backports main non-free by adding that source line to the end of the file.

Next update our package list: apt-get update

Next we need to install Java.  We first reconfigure our dialog so we can accept the terms and conditions:  dpkg-reconfigure debconf
Select [1] for dialog and [3] for medium

Install Java: apt-get install sun-java6-jre
Select 'yes' for everything

You can reconfigure again (optional):  dpkg-reconfigure debconf

Select [6] for noninteractive and [3] for medium.

Finally, we can install Crashplan on the ReadyNAS.
wget http://download.crashplan.com/installs/linux/install/CrashPlan/CrashPlan_3.2.1_Linux.tgz

tar -xvf CrashPlan_3.2.1_Linux.tgztar -xvf CrashPlan_3.2.1_Linux.tgz

cd CrashPlan-install


Defaults work for most questions except backup location.  I used /backup/crashplan.

Once installed, you can log out of your SSH connection.  Crashplan is running as a server.
Of course, you still need to do some configuration which can only be done from the Crashplan client UI.  From your computer, install Crashplan and follow these instructions for connecting to your server's headless client:

Once you have that set up, you'll be able to generate a Crashplan backup code, something like FJSW3X.  Send this to your family, ask them to install Crashplan and use your backup code.  The first backup may take awhile, but after that Crashplan should keep up to date incrementally with no intervention or hassle from your family.

Jan 13, 2013

Software Development Books

I'm often looking to improve the software that I write.  If you are in the same boat, here's a few books that I felt have helped me.  This is not exhaustive, but some of the ones I could think of off the top of my head that I'd recommend.  Please share others that have been good for you too, I'm always looking for more.

Note, there are affiliate codes in these links, though feel free to not use them, I don't really care.  If you are in the Bay Area, I would plug my favorite bookstore which frequently has some of this kind of stuff in stock (BookBuyers)

  • Algorithms on Strings, Trees, and Sequences:  Very likely the best book on string algorithms (and trees/sequences).  It's references computational biology, but you need not know a tree from a frog to get a ton of value out of this book.  Invariably, one of my Google coworkers is always borrowing this book.  If you are interested in more about the wonderful world of strings, this book will get you pretty far.
  • Refactoring: This was a very useful read when I read it a few years ago.  It came at the right time in my programming development.  This almost has less to do with the mechanics how to refactor and more with how to structure code in the first place.  The examples are easy enough, but seeing them and the reasons why they reduce complexity helped a ton.
  • Design Patterns: I don't get as much value out of this as most people.  I don't find myself implementing the "X pattern" so much as perusing patterns has occasionally tipped off a light bulb in my head on how to structure things.  I feel like there is more I can learn from this still and intend to revisit.
  • Coders at Work: A collection of interviews with some of the big software developers in the field. Full of lots of nuts and bolts insights and opinions on software development.  This isn't so much about software engineering, but about everything that goes on around it.  Unlike the above three books whose hardbacks are high-quality productions with diagrams, this one is a cheap paperback book with only text - there is no reason not to just grab the kindle version.  Note I also read Founders at Work, but found it to concentrate more on things like fundraising / making deals - Coders was more relevant. 
  • Javascript the good parts(O Reilly):  More than a few people have mentioned that they never could wrap their brain around Javascript until Crockford's book.  I found myself in the same position.  I've forgotten too much from this book as I don't use Javascript frequently enough, but this is a great place to start if you want to understand it.  There is also an @Google Tech Talk from Crockford on the same subject that might give you a flavor.
  • Wireless Nation: The Frenzied Launch of the Cellular Revolution: A little off-topic, but this is a fascinating book that takes a look into how the cellular industry got started in the US.  It helps you to understand clearly how we got to where we are now, such as why the standards are so fragmented.  It's also a delightfully fun read.

Jan 1, 2013

2012, Looking back

2012 was an abysmal year for the Gregable blog.  Only 4 posts!  They were decent, but not great.  Google Plus has taken some of my steam for short form postings, but really the blame lies on my shoulders.

Anyway, 2013 should be better.  With this post, I'll already be caught up to March of 2012's volume.  Dear Gregable readers, what would you like to know more about?  Help me break out of my writer's block.

And... Happy New Year to you and yours!

Aug 22, 2012

rel=canonical as a browser feature

I informally propose that rel=canonical become a tag that not only search engines respect, but also browsers.

For a little while now HTML5 browsers seem to have a feature where javascript can modify the displayed URL of the page using window.history.pushState.  The changes are of course subject to same origin policy rules (ie: the protocol, hostname, and port cannot be modified, only the path and parameters).

Originally javascript folks hacked this in with older browsers by shoving text after the "#" symbol in the URL.  Even though there are a number of problems with this, it was useful enough that it became somewhat widely used.  With modern browsers this is no longer required.

To see what I mean, click on this little demo: http://kurtly.tumblr.com/sticky-history and look at the URL bar.  If you aren't running an outdated browser, you should see the URL changing every few hundred msec.  The page is not being re-fetched from the server.

The rel=canonical link tag has been telling search engines basically "I know you are fetching the URL http://gregable.com/foo, but I'd suggest you should pretend this URL is http://gregable.com/bar in your search index".  Basically the same idea as the window.history.pushState functionality, only for search engines.

I propose that a rel=canonical link tag on any HTML page which satisfies the same origin policy should visibly change the URL in the browser.  All the same motivations exist for this as they do in the browser.  If a user copy/pastes the displayed URL, they'll get a more satisfying experience.  If the user mis-types an URL (ie: .html vs .htm), sending them to the correct one generally requires a 301 redirect which adds latency.  The javascript solution is less reliable as users sometimes surf with javascript off, and the javascript may not execute until the page has finished loading either.

Are there any obvious reasons I'm missing why this is a horrible idea?

Are there any regular Gregable readers who work on browser standards and might want to propose this more formally?

May 8, 2012

LED Bulbs

I've just been trying out some LED light bulbs and they seem to have progressed a great deal since the last time I played with them.  For recessed fixtures that have a narrow angle of lighting, they seem to be a pretty good deal.

Previous generations of LED light bulbs had problems:

  • Blueish color of light
  • Delay after turning on the wall switch
  • Wouldn't work with dimmer controls
  • Not as many lumens (brightness) as desired.
I've bought a couple different bulbs off of Amazon and tried them out.  I ended up really liking these ecoBrites: http://www.amazon.com/gp/product/B003THZHOU.  No affiliation / kickbacks for me at all, I'm sure there are other great options out there too.

They seem to solve all of the above problems, though they do look a little bit different than regular bulbs if you look at the bulb when it's turned off.  

The key is to look for bulbs of a certain "color temperature".  The blue or "cool" colors are a higher temperature (around 4000-5000k) whereas the yellowish incandescents tend to be a warmer color around 2500-3000k.  CFLs are usually a higher temperature too though not usually blue, so they look very white.

My PG&E rates are tiered - 12.8c/kWh for the baseline, then it goes up to 14.6c/kWh for the next chunk and I'm actually bumping a small amount in to 30c/kWh rate lately.  So, my incremental cost of shaving off power usage is 30c/kWh initially and if I can get it down enough, probably 14.6c/kWh.

The above bulbs are 7W and replace 60W incandescents.  So, I'm saving  53W/h while these run.  They cost $39/bulb though.  Very conservatively, let's go with the 14.6c/kWh rate.  That's .7c/hr savings.  Assume I run each bulb for only 2 hrs per day.  To save $39, it'll take 6.9 yrs to breakeven.  That's the conservative number.

If you assume only that:
  • I'm replacing a bulb, so would have to pay $7 anyway, the breakeven is 5.6 years
  • I need to buy a new incandescent bulb every ~750 hrs, the breakeven is 4.4 years
  • the extra 53W of heat an incandescent bulb generates needs to be matched by at least 53W of air conditioning work (likely far more due to inefficiency), the breakeven is 3.4 years
  • I'm actually reducing my bill by the 30c/kWh rate, the breakeven is 3.4 years
  • If I'm using the bulb for 3 hrs / day, the breakeven is 4.6 years
If you assume all of the above, my breakeven becomes only 11 months.

In practice, the real story is probably somewhere in the middle.  I do need to buy incandescent replacements periodically, I sometimes need to use air conditioning, but certainly not always, and my savings is probably a mix between the 30c and 14.6c rates once all is said and done.  So maybe the breakeven is 2-3 years, so roughly a 26% return.  That still seems like a very good investment these days.