PDA

View Full Version : Linux Paperless Office?


RedAlpha3
07-09-2006, 09:35 AM
There are two areas of computing where I find it difficult to ditch Windows. The first is the world of popular gadgetry which comes with proprietry Windows-based software and no mention of Linux on the box! (As I gain more experience, this problem has lessened).

The second is the paperless office. I have used a program called Paperport for years. It helps me maintain an electronic scanned backup of all of the papers necessary in life, from Income Tax Returns to copies of postcards from friends.I have come to rely on it! However, Paperport loves to phone home! The XP security problems have me really nervous, and I have a genuine dislike of all things Micros*ft!!!

Does anyone use a linux computer as a filing cabinet? (I'd love to be able to scan files and save as a pdf). Anyone any good software experiences in this region? Anyone read any articles around the subject? I'd welcome your experiences and hopefully should be able to combine a number of programs which will get rid of XP for good.

Cheers

ryancw
07-09-2006, 10:20 PM
I don't scan everything like you do, but I have a HP laserjet3030, and with the hplips package installed, it scans using xsane. Xsane can scan into a variety of formats. I usually use png. I don't know if it can scan directly into pdf, but ImageMagik makes it very easy to convert formats: just type convert foo.png foo.pdf.

My Canon Lide20 scanner also works with xsane.

RedAlpha3
07-10-2006, 02:47 PM
I have been using quiteinsane with my ancient Epson Perfection 610 scanner (can't get the 1670 to work!) and it works well. Produces a reasonable copy.

The imagemagick idea was new to me and it works very quickly. Is there anyway I can convert a batch of files in order to save time?

Thanks for your help

bhobjj
07-10-2006, 06:03 PM
. Is there anyway I can convert a batch of files in order to save time?

Thanks for your help

cd to the directory you are working on and then:

$ for i in *png; do convert $i `basename $i png`pdf; done


or you could open a text editor and save the script:

#!/bin/sh
# this script converts png files to pdf
for i in *.png
do convert $i `basename $i png`pdf
done


Now this brings up another question.
Is there an easy way to merge multiple pdf files into a single file?

-BoB

RedAlpha3
07-10-2006, 06:45 PM
Thanks for that Bob.

$ for i in *png; do convert $i `basename $i png`pdf; done

Is the reason this works due to the fact that the program imagemagick has been installed or are they part of another program?

I'm gradually getting the idea that I could produce a series of scripts which automate the whole document-saving process, including saving, converting, etc and use the command line to do what I have tried to find a GUI to do for me. Does that make sense?

Now this brings up another question.
Is there an easy way to merge multiple pdf files into a single file?

Heh heh! I feel like I'm being led by the hand, very gradually. This is exactly the next question, in order to save documents with multiple pages.

I really appreciate your help.

bhobjj
07-10-2006, 09:53 PM
Thanks for that Bob.

$ for i in *png; do convert $i `basename $i png`pdf; done

Is the reason this works due to the fact that the program imagemagick has been installed or are they part of another program?

The bash shell is being used to automaticly run the convert command. Convert is part of Imagemagick.

for i in *png
for each png file

do convert $i
run the convert command for ($i is the name of each file we are working on)

`basename $i png`pdf;
is actually using another command called basename to strip off the file extension to rename the extension of each output file.
There is an internal bash method to do this, but it is not as clean looking:
${i%%.png}.pdf

done
the end of the script

The same thing can be done for many tasks where you want to operate on multiple files.
Eg: use lame to convert wav files to mp3s:
$ for i in *wav; do lame -h $i `basename $i wav`mp3; done


I'm gradually getting the idea that I could produce a series of scripts which automate the whole document-saving process, including saving, converting, etc and use the command line to do what I have tried to find a GUI to do for me. Does that make sense?

Yes
The power of the shell is amazing. Actually the power of a human brain deciding how to organize a series of commands to accomplish a task is amazing.
And you can use gui front ends for your script if you want. Kde and Gnome both have advanced dialog boxes that can be used.
The best place to start is the work that has already been done. Look at some shell scripts, play with them and customize them:

http://www.linuxcommand.org/learning_the_shell.php
http://tldp.org/guides.html#abs
http://zazzybob.com/
http://g-scripts.sourceforge.net/

-BoB

tom_servo
07-10-2006, 09:57 PM
Check out the page at:
http://ansuz.sooke.bc.ca/software/pdf-append.php
The gs command worked for me the one and only time I have ever had to combine a whole bunch of pdfs. Since your files are originally graphics anyway, the talk about making huge pdfs by converting the text to graphics with the pdf2ps route doesn't really apply.

As an alternative, it might be easier to leave them as .png files, and create an HTML file to display them in order. HTML is easy to create in a script, and I am lazy. :)



Alain

ryancw
07-10-2006, 10:09 PM
Now this brings up another question.
Is there an easy way to merge multiple pdf files into a single file?

-BoB

Yes, I think so.

convert foo1.pdf foo2.pdf foo3.pdf bigfoo.pdf

will put all three foo files into one bigfoo.pdf file. Each little foo file will still exist, so you may want to delete them after, or not.

bhobjj
07-11-2006, 07:01 AM
Now this brings up another question.
Is there an easy way to merge multiple pdf files into a single file?

Heh heh! I feel like I'm being led by the hand, very gradually. This is exactly the next question, in order to save documents with multiple pages.

I really appreciate your help.

I found a debian package (http://packages.debian.org/stable/text/pdfjam) that looks good called pdfjam (http://www2.warwick.ac.uk/fac/sci/statistics/staff/academic/firth/software/pdfjam/)
It is a set of scripts for manipulating pdf files.

-BoB

tom_servo
07-11-2006, 07:58 AM
convert foo1.pdf foo2.pdf foo3.pdf bigfoo.pdf

Oh! So obvious. If only I had known! I never would have thought of convert for pdf files. I was even surprised it would convert images to pdf. There seems to always be something to learn in these forums, thanks.


Alain

danieldk
07-11-2006, 08:36 AM
convert foo1.pdf foo2.pdf foo3.pdf bigfoo.pdf

One warning, don't do this with PDF files that contain text! convert will rasterize the PDF files, all text will be rendered as bitmaps. You'll loose sharpness on-screen and printed sharpness, and will leave a jumbo PDF. One of the intentions of PDF is to be able to store text and font information, so that the final renderer (whether it is Acrobat Reader, Ghostscript, or a printer after conversion to Postscript) can render it to a device-specific high quality format.

RedAlpha3
07-11-2006, 09:52 AM
I've been trying to analyze what I want to do with the items I scan, and can see that it seems simple, but isn't!

I will scan an item and then save it in one or another of many formats. Any documents can contain text, graphics or both. Some documents would have several pages.

The documents need to be easily retrievable in a logical filing system (straightforward) and then either displayed and perused on a monitor or printed onto A4 paper.

Basically, that is it! You want the documents to be easy to read and clear, to take as little storage space as possible. Not sure what else.

The PDFJam series of scripts looks very promising. I'm experimenting with that now.


ryancw wrote:
Code:
convert foo1.pdf foo2.pdf foo3.pdf bigfoo.pdf


One warning, don't do this with PDF files that contain text! convert will rasterize the PDF files, all text will be rendered as bitmaps. You'll loose sharpness on-screen and printed sharpness, and will leave a jumbo PDF. One of the intentions of PDF is to be able to store text and font information, so that the final renderer (whether it is Acrobat Reader, Ghostscript, or a printer after conversion to Postscript) can render it to a device-specific high quality format.

There is a noticeable display loss of sharpness,as you suggest, Daniel, I can't see a major loss of quality in hard-copy. (Perhaps my 300dpi laser printer is a bit past-it!) It would be wise to keep the best quality I can, though

Thanks for all of your thoughts :)

Magnus
07-16-2006, 11:06 AM
If you just want to merge PDF files I think the command pdfjoin would be worth considering. As far as I know there is no quality loss involved. I'm not able to check which Debian package provides it, but it should at least be available from the Debian Sarge repository unless I'm completely wrong.

EDIT: After further checking I see it's actually a part of the pdfjam package mentioned above.

Good luck
/Magnus