Tag Archives: pdf index

Joomla/Mambo – PDF Indexer Module

PDF Indexer

Allow PDFs to be searched via the Joomla/Mambo search module.

This Joomla/Mambo Component allows you to index PDFs located within your Joomla directory and the corresponding mosbot allows that index to be searched using the Joomla search component. This allows the text of PDFs to be viewed when searching a Joomla site.

Version 2.4

New Features:
* Joomla 1.5 legacy support
* More bug fixes

Also Featuring:
* Indexes new pdfs only so indexing is much faster.
* PDF file version changes.  It will automatically detect if a PDF has changed and index it on the next pass.
* Delete indexes to PDFs that have been removed from your file structure.
* Password Protected PDF indexing!
* Ability to edit past indexes (For those image based pdfs, add keywords, phrases)
* Improved MosBot
* Other Various Bug Fixes

Does not work on servers in SafeMode or when Popen is off.

Great work! Really!
But i run into a big problem…

I have tested PDF Indexer with Joomla 1.5 and works PERFECT with “small” pdf files.
With “Small” pdf files I mean up to 1 MB.
When I tried a “bigger” pdf file like 10 MB or even worst 40MB or 80MB, although it seemed that it was working (that is, no errors found) when I tried to see it in “Modify Indexes” from the Administration Menu… it wasn’t there.
🙁

Solution:

1. Edit the file…
/administrator/components/com_file_index/admin.file_index.php

Lines 366,445:
Change this…
$contents .= fread($handle2, 8192);
to this…
$contents .= fread($handle2, $fileSize);

then add the following line…
ini_set("memory_limit","128M");
in the first lines of the file.

Alternative: If you have access, change the memory_limit = 32M to  memory_limit = 128M in your /etc/php.ini file. Restart apache !!!

2. Edit the file…
/etc/my.cnf
set-variable = max_allowed_packet=xM
where xM the needed MB (for example 5MB)!

Fire up phpMyAdmin or open your favorite MySQL Manager.
Go to the Joomla DATABASE and in the TABLE that stores the data for indexing edit the FIELD Description to LONGTEXT.
Restart mysql !!!

That’s all!

The results were great. In a few seconds a 78MB pdf file was indexed !
😉