I am working on a problem for which I need to extract the text from a PDF. I am using pdfbox to do this. Therefore, I run this command in terminal (linux Ubuntu):
java -jar pdfbox-app-1.8.7.jar extract text [path leading to file here]
Everything else works, however, I want to do it recursively for thousands of files in a special directory, so I do not have to manually plug in the PDF path manually every time. I appreciate any solution that is from terminal or script
find command:
find / path / to / directory-type f -exec java -jar pdfbox-app-1.8.7.jar ExtractText {} \;
Comments
Post a Comment