Author Topic: recoll seems to only index html and txt files [SOLVED]  (Read 455 times)

Offline spottyrover

  • Full Member
  • ***
  • Posts: 80
recoll seems to only index html and txt files [SOLVED]
« on: January 27, 2013, 09:34:01 PM »
I am trying to use recoll to index my drive so that I can search inside docś and pdf files.
I noticed this warning after the initial index  


External applications/commands needed and not found for indexing your file types: (/home/david/.recoll//missing):

pstotext (application/postscript)
rcldoc (application/msword)
rclpdf (application/pdf)
rclpython (text/x-python)
rclsoff (application/vnd.sun.xml.calc application/vnd.sun.xml.draw application/vnd.sun.xml.writer)
rclsvg (image/svg+xml)
rcltex (text/x-tex)
rclxml (application/xml text/xml)
xls2csv (application/vnd.ms-excel)

So my question is where do I find these files as they are not available in synaptic

Thanks for your help

Dave
« Last Edit: February 19, 2013, 12:25:35 AM by spottyrover »
pclos 32bit lxde (fast), kernel 3.2.18-pclos2.pae, AMD Athlon(tm) II X4 620, nvidia gforce 8400gs driver 310.19-1pclos. 12 gig ram
Favorite software tvheadend / Xbmc

Offline medoc

  • New Friend
  • *
  • Posts: 7
Re: recoll seems to only index html and txt files.
« Reply #1 on: January 29, 2013, 11:11:12 AM »
Recoll should list the installable applications, not the name of the filters. This is a bug ! I'll look into it.

[edit] recollindex listing the names of the filter programs instead of the names of the actual applications would seem to indicate an installation problem: the filter scripts are probably missing or are not executable. The filter scripts (rcldoc, rclpdf, etc.) should be found under /usr/share/recoll/filters, and should be executable. Could you please check that this is the case ?

You'll find information about the applications that you should install to index the different formats on the Recoll Web site: http://www.recoll.org/features.html
« Last Edit: January 29, 2013, 11:47:08 AM by medoc »

Offline spottyrover

  • Full Member
  • ***
  • Posts: 80
Re: recoll seems to only index html and txt files.
« Reply #2 on: February 01, 2013, 05:37:57 AM »
Sorry I took so long in getting back

In reply to
  The filter scripts (rcldoc, rclpdf, etc.) should be found under /usr/share/recoll/filters, and should be executable. Could you please check that this is the case ?

The answer in no it is not executable and is owned by root

I have changed these to executable, still owen by root


rcldoc (application/msword)
rclpdf (application/pdf)
rclpython (text/x-python)
rclsoff (application/vnd.sun.xml.calc application/vnd.sun.xml.draw application/vnd.sun.xml.writer)
rclsvg (image/svg+xml)
rcltex (text/x-tex)
rclxml (application/xml text/xml)
but could not find xls2csv (application/vnd.ms-excel) and pstotext (application/postscript)


I also added as many as I could from http://www.recoll.org/features.html

Thanks for your time
I will test the changes over the next few days and let you know what happened

Thanks again

Dave
« Last Edit: February 05, 2013, 08:55:39 PM by spottyrover »
pclos 32bit lxde (fast), kernel 3.2.18-pclos2.pae, AMD Athlon(tm) II X4 620, nvidia gforce 8400gs driver 310.19-1pclos. 12 gig ram
Favorite software tvheadend / Xbmc

Offline medoc

  • New Friend
  • *
  • Posts: 7
Re: recoll seems to only index html and txt files.
« Reply #3 on: February 02, 2013, 04:55:08 AM »
Hello,

I installed pclinuxos in a virtual machine to check on your problem, and the filters are effectively installed as non-executable.

This is a packaging issue, the packager should be notified, but I don't know where to report the bug.

Cheers,

jf

Offline TerryN

  • Global Moderator
  • Hero Member
  • *****
  • Posts: 740
Re: recoll seems to only index html and txt files.
« Reply #4 on: February 02, 2013, 05:05:11 AM »
This is a packaging issue, the packager should be notified, but I don't know where to report the bug.

Report it here in this thread  ;)

Terry
« Last Edit: February 02, 2013, 05:20:55 AM by TerryN »
Dell E521 - AMD 64 X2 5000+, 4GB RAM, ATI X1300 graphics
PCLinuxOS 2013 (KDE)
|Twitter|

Offline medoc

  • New Friend
  • *
  • Posts: 7
Re: recoll seems to only index html and txt files.
« Reply #5 on: February 02, 2013, 06:21:14 AM »
Ok so here goes the bug report:

The PCLinuxOS package for Recoll (recoll-1.18.1-1pclos2013) installs the filter scripts without an executable bit, so that indexing for all file types except a few internal ones (text/plain, text/html...) will always fail.


Offline spottyrover

  • Full Member
  • ***
  • Posts: 80
Re: recoll seems to only index html and txt files.
« Reply #6 on: February 05, 2013, 09:04:53 PM »
Thanks for all of your help

So far in the Office I have only made the filters listed above executable and everything that I have tested now works.
Should I make all of the files in the folder executable?
pstotext is not in the repository (listed at recoll website)
Installed catdoc for ms-excel  (not sure if it has done anything yet)

Thanks again

Dave
pclos 32bit lxde (fast), kernel 3.2.18-pclos2.pae, AMD Athlon(tm) II X4 620, nvidia gforce 8400gs driver 310.19-1pclos. 12 gig ram
Favorite software tvheadend / Xbmc

Offline medoc

  • New Friend
  • *
  • Posts: 7
Re: recoll seems to only index html and txt files.
« Reply #7 on: February 06, 2013, 12:19:30 AM »
Yes, all the files in the 'filters' directory should be executable.

pstotext is sometimes packaged with ghostscript, but I don't now for sure,  this depends on the distribution.

Cheers,

jf

Offline TerryN

  • Global Moderator
  • Hero Member
  • *****
  • Posts: 740
Re: recoll seems to only index html and txt files.
« Reply #8 on: February 06, 2013, 04:21:58 PM »
Ok so here goes the bug report:
The PCLinuxOS package for Recoll (recoll-1.18.1-1pclos2013) installs the filter scripts without an executable bit, so that indexing for all file types except a few internal ones (text/plain, text/html...) will always fail.

Updated package will appear in the repo soon.

Terry
Dell E521 - AMD 64 X2 5000+, 4GB RAM, ATI X1300 graphics
PCLinuxOS 2013 (KDE)
|Twitter|

Online JohnW_57

  • PCLinuxOS Tester
  • Hero Member
  • *******
  • Posts: 2259
Re: recoll seems to only index html and txt files.
« Reply #9 on: February 06, 2013, 04:30:47 PM »
Ok so here goes the bug report:
The PCLinuxOS package for Recoll (recoll-1.18.1-1pclos2013) installs the filter scripts without an executable bit, so that indexing for all file types except a few internal ones (text/plain, text/html...) will always fail.

Updated package will appear in the repo soon.

Terry

A ooops here:

ib64uuid-devel (versie 2.18-2pclos2011) zal geïnstalleerd worden (gonna be installed)
lib64xapian-devel (versie 1.2.6-1pclos2011) zal geïnstalleerd worden
libstdc++-devel (versie 4.5.2-4pclos2011) zal geïnstalleerd worden
recoll (versie 1.18.1-2pclos2013) zal geïnstalleerd worden
zlib1-devel (versie 1.2.5-2pclos2011) zal geïnstalleerd worden

JohnW

PCLinuxOS 2013 KDE4 (64 bit) on: home build system:  Intel Core 2 Quad (q6700) (2.66ghz), Asus P5K motherboard, 4 gig ddr2 memory, Asus Nvidia Geforce GTS 250 1024 mb gddr3, Crucial M4 128 SSD,  2x Samsung 500 gig HDD (sata), TSSTcorp CDDVDW SH-224BB.

Offline TerryN

  • Global Moderator
  • Hero Member
  • *****
  • Posts: 740
Re: recoll seems to only index html and txt files.
« Reply #10 on: February 07, 2013, 03:40:23 AM »
A ooops here:
ib64uuid-devel (versie 2.18-2pclos2011) zal geïnstalleerd worden (gonna be installed)
lib64xapian-devel (versie 1.2.6-1pclos2011) zal geïnstalleerd worden
libstdc++-devel (versie 4.5.2-4pclos2011) zal geïnstalleerd worden
recoll (versie 1.18.1-2pclos2013) zal geïnstalleerd worden
zlib1-devel (versie 1.2.5-2pclos2011) zal geïnstalleerd worden

That problem was present in 1.16.1-1, fixed in 1.17.1-1 and re-introduced in 1.18.1-1  ???
Fixed (again) in 1.18.1-3 coming soon  ;D

Terry
Dell E521 - AMD 64 X2 5000+, 4GB RAM, ATI X1300 graphics
PCLinuxOS 2013 (KDE)
|Twitter|

Offline spottyrover

  • Full Member
  • ***
  • Posts: 80
Re: recoll seems to only index html and txt files [SOLVED]
« Reply #11 on: February 19, 2013, 12:26:12 AM »
Thank you all for your help

Dave
pclos 32bit lxde (fast), kernel 3.2.18-pclos2.pae, AMD Athlon(tm) II X4 620, nvidia gforce 8400gs driver 310.19-1pclos. 12 gig ram
Favorite software tvheadend / Xbmc