Feb 18, 2014

Network Aware Sheetfed Scanner

A little over 3 years ago, I posted a question asking about a consumer product available for scanning to a network drive.

These days there are plenty of options, so I went out and bought an brother MFC-7860DW. I hadn't researched this heavily, so don't consider this a review. It's a scanner/printer that can handle wifi (handy where I wanted to put it), print duplex (yay), and can scan via ADF (automatic document feed, aka sheetfed) straight to a pre-configured FTP location. I overlooked one feature, duplex ADF scanning, which I wish I had.

Now that I have a scanner that uploads to a local FTP drive, I wanted more. I wanted to then automatically upload to a Google Drive folder. Google Drive has very good automatic OCR of all text in uploaded scans, is accessible anywhere, and it's much better than an FTP directory in terms of usability. I can even load documents on a cell phone app.

I started a small project that would monitor for new files on the FTP drive and then upload them to Google Drive. I thought this would take an afternoon, but 600 lines of code later and I'm only now feeling done with the project. I did throw in learning go as part of this, which didn't help things.

The project is here if it inspires anyone: https://github.com/Gregable/ScanServer

One of the big issues I didn't think about initially was that scanned files wouldn't be instantly "complete". It would take several seconds between a file being created and finished uploading from the scanner. If I started processing the file too quickly, I'd get a partial upload. If I slept too long, I'd add too much latency to the whole process. I wanted to largely be able to look at the uploaded file seconds after scanning it so I could verify that it looked right before disposing of (shredding) the scanned document, so latency wasn't a great idea.

Another issue was scanning duplex. I thought it should be possible to scan two single-sided documents and have my script merge them. As it turns out, this is possible, though tricky. If you flip a multiple-page document over, the page order is reversed. As a result, you need to interleave pages backwards. Something like:

  1. Front doc, Page 1
  2. Back doc, Page 3
  3. Front doc, Page 2
  4. Back doc, Page 2
  5. Front doc, Page 3
  6. Back doc Page 1
My scanner had the ability using buttons to select different filename prefixes for uploaded docs, things like "laser", "receipt", "estimate", I ended up choosing one of the prefixes to indicate duplex docs. I then had to think about error cases. What if a user accidentally scans only the front half as duplex and marks the back half as single-sided. Or vice-versa. In these cases, I wanted to upload the document as two single-sided documents. What if I saw two duplex documents come in, but they had different numbers of pages. In this case, I wanted to treat the first one as a single-sided document, but keep the second as potentially the first side of a duplex document that might come next. What if I saw a single duplex document come in and nothing followed? In this case, I wanted to wait 15 minutes to see if I got the other side of the duplex document, and if not give up and treat it as a single-sided document. These cases were fun to work through.

Ultimately, I had fun implementing this with go channels and goroutines. I had one goroutine in a loop monitoring for new ftp documents and managing waiting until they seem to be "finished" uploading before outputting the document to a channel.  I had another goroutine sorting out the duplex / single-sided issues, creating merged documents from duplex uploads, and handling the user error cases above. This goroutine spit out the documents to be uploaded to Drive into another channel. A third goroutine uploaded to drive and spit out to a "final" channel which cleaned up any temporary files created and echoed success messages to the console.

I also moved this little project over to a raspberry pi. As a result, I don't even need a computer running to have this setup going which considering the ~3W power draw of a raspberry pi makes it a no-brainer to leave running continuously.

1 comment:

Mark said...

I can recall the first scanner I bought it was a UMAX. Unlike today where we have a whole range of suppliers the choice was limited back then, however UMAX scanners were reasonably priced but not cheap none the less if you wanted one you had little choice and they were the cheapest quality brand and worked great, especially when using the OCR software Textbridge, or Omnipage, it did what was needed and at the time people where over the moon with the results.

There was a time in London England when a huge company were employing people just to file their documents and the people doing so were doing it manually, I approached the director of the company with a system of my own which was 100% original at the time which meant they could reduce their workers by only a fraction and this earned me over £10,000+ for less than one month’s work. What I knew at the time was about optical character recognition software, I knew how to program it and how to use it and how to add it to a feed to the UMAX scanner, when I showed the director what I could do for the company I became a living cash registrar for the company.

The scanning network you've come up with would have been useful and still can be in a modern day work environment. In fact many offices all around the world still have there office files in folders and need to have them scanned into a computer, however one of things which puts them off they they imagine it to be very time consuming. If one can illustrate it being applied in a simple way at the right price it suddenly becomes appealing.

Can you add a photo of its construction, or perhaps a short Youtube video demonstrating it in action? If you did this it would be helpful, if this is not possible perhaps you could draw some diagram? A third post on the subject or an update would be nice.
scanning can be a deep subject and believe it or not many companies worldwide have files all over the place so a little lesson in scanning useful for everybody.