extract text from ms office
Interesting program to extract text
- catdoc - extract text from ms word
- xls2cvs - extract text from ms excell
- pdftotext - extract text from pdf
- ppthtml - extract text from ms. power point
Then a simple php function can capture the output eg:
function extractWord($word_file)
{
if (file_exists($word_file)
{
// prevent malicious command execution
exec("/usr/bin/catdoc -w ' . escapeshellarg($word_file), $output);
// $output is an array corresponding to lines of output
return join("\n", $output);
}
}