bhobjj
02-12-2007, 10:26 PM
There is problem with trying to grep OpenOffice "ods" spreadsheet files.
The files are actually compressed archives. The actual content is contained in a file called "content.xml". I came up with this :
for i in *ods; do echo $i; unzip -p $i content.xml | o3totxt | grep -i "searchstring"; done
If we don't echo the file name, grep is searching the same file each time "content.xml".
Which uses the o3read (http://siag.nu/o3read/) package from Siag Office (Debian package available).
The same script with a zenity front-end:
#!/bin/bash
######################################
ver=0.1.0
# RKN Newport New Hampshire USA
# 07-02-08
# scalcs
# search for strings in Oo calc files
# requires zenity and o3read (o3totxt)
#####################################
tempfile=`tempfile 2>/dev/null` #|| tempfile=/tmp/test$$
Directory=`zenity --file-selection --directory --title="Search Directory" --text="Choose a directory to search"`
Search=`zenity --entry --title="search Oocalc files" --text="Enter Search String"`
#`kdialog --inputbox "search Oocalc files"` # can this be done with kdialog?
for i in $Directory/*ods
do echo $i >> $tempfile
# uncompress xml file from the archive --> strip html code --> send to grep
unzip -p $i content.xml | html2text | `grep -i $Search >> $tempfile`
done
cat $tempfile | zenity --text-info --title="Search Results" --width=500 --height=250
I played around with sed (I had problems trying to get consistant carriage-returns) and tried a few other scripts to strip xml code, but o3read does a much cleaner job.
The files are actually compressed archives. The actual content is contained in a file called "content.xml". I came up with this :
for i in *ods; do echo $i; unzip -p $i content.xml | o3totxt | grep -i "searchstring"; done
If we don't echo the file name, grep is searching the same file each time "content.xml".
Which uses the o3read (http://siag.nu/o3read/) package from Siag Office (Debian package available).
The same script with a zenity front-end:
#!/bin/bash
######################################
ver=0.1.0
# RKN Newport New Hampshire USA
# 07-02-08
# scalcs
# search for strings in Oo calc files
# requires zenity and o3read (o3totxt)
#####################################
tempfile=`tempfile 2>/dev/null` #|| tempfile=/tmp/test$$
Directory=`zenity --file-selection --directory --title="Search Directory" --text="Choose a directory to search"`
Search=`zenity --entry --title="search Oocalc files" --text="Enter Search String"`
#`kdialog --inputbox "search Oocalc files"` # can this be done with kdialog?
for i in $Directory/*ods
do echo $i >> $tempfile
# uncompress xml file from the archive --> strip html code --> send to grep
unzip -p $i content.xml | html2text | `grep -i $Search >> $tempfile`
done
cat $tempfile | zenity --text-info --title="Search Results" --width=500 --height=250
I played around with sed (I had problems trying to get consistant carriage-returns) and tried a few other scripts to strip xml code, but o3read does a much cleaner job.