the basic process was to download
xpdf and add it to the windows path, thus making it accessable from
within R. Two methods follow:
Method One (easiest) - using the awesome ?system command:
(1) Download xpdf (whichever is the latest version):
ftp://ftp.foolabs.com/pub/xpdf/xpdf-3.02pl4-win32.zip
(2) Unzip it
# system(paste("[app]", "[pdf file]"), wait = FALSE)
> system(paste('"C:/Program Files/xpdf/pdftotext.exe"', '"C:/Documents and Settings/tony/Desktop/test/r-intro.pdf"'), wait=FALSE)
Method Two - if you want to use the tm package like I did last year,
?readPDF requires the following (not documented anywhere that I know
of, but this is what you do):
(1) Download xpdf (whichever is the latest version):
ftp://ftp.foolabs.com/pub/xpdf/xpdf-3.02pl4-win32.zip
(2) Unzip it
(3) Download the Redmond utility for adding files to your windows path
(free version button is in the top left of the page):
http://redmondlab.googlepages.com/path
(4) Unzip it
(5) Open the 'Redmond Path' application.
(6) Click on the green plus in the top left hand corner '+'.
(7) Naviagate to the folder which contains the files: C:/../
xpdf-3.02pl4-win32
(8) Add it and click Ok.
Then you can can do something like:
> library(tm)
> my.path <- and="" br="" esktop="" here="" in="" ocuments="" pdfs="" put="" settings="" tony="" your="">> Corpus(DirSource(my.path), readerControl = list(reader=readPDF))
There are some limitations to how well the conversions work depending
on the pdf file, but it was so long ago now that I'm afraid I don't
remember the details. ->
xpdf and add it to the windows path, thus making it accessable from
within R. Two methods follow:
Method One (easiest) - using the awesome ?system command:
(1) Download xpdf (whichever is the latest version):
ftp://ftp.foolabs.com/pub/xpdf/xpdf-3.02pl4-win32.zip
(2) Unzip it
# system(paste("[app]", "[pdf file]"), wait = FALSE)
> system(paste('"C:/Program Files/xpdf/pdftotext.exe"', '"C:/Documents and Settings/tony/Desktop/test/r-intro.pdf"'), wait=FALSE)
Method Two - if you want to use the tm package like I did last year,
?readPDF requires the following (not documented anywhere that I know
of, but this is what you do):
(1) Download xpdf (whichever is the latest version):
ftp://ftp.foolabs.com/pub/xpdf/xpdf-3.02pl4-win32.zip
(2) Unzip it
(3) Download the Redmond utility for adding files to your windows path
(free version button is in the top left of the page):
http://redmondlab.googlepages.com/path
(4) Unzip it
(5) Open the 'Redmond Path' application.
(6) Click on the green plus in the top left hand corner '+'.
(7) Naviagate to the folder which contains the files: C:/../
xpdf-3.02pl4-win32
(8) Add it and click Ok.
Then you can can do something like:
> library(tm)
> my.path <- and="" br="" esktop="" here="" in="" ocuments="" pdfs="" put="" settings="" tony="" your="">> Corpus(DirSource(my.path), readerControl = list(reader=readPDF))
There are some limitations to how well the conversions work depending
on the pdf file, but it was so long ago now that I'm afraid I don't
remember the details. ->
No comments:
Post a Comment
Thank you