python - pypdf for lists of pdfs -
i have got pypdf work fine single pdf file, can not seem work lits of files, or in loop multiple pdfs, without failing because of string not being callable. ideas can use work around?
def getpdfcontent(path): content = "" # load pdf pypdf pdf = pypdf.pdffilereader(file(path, "rb")) # iterate pages in range(0, pdf.getnumpages()): # extract text page , add content content += pdf.getpage(i).extracttext() + "\n" # collapse whitespace content = " ".join(content.replace(u"\xa0", " ").strip().split()) return content #print getpdfcontent(r"z:\gis\masterpermits\12300983.pdf").encode("ascii", "ignore") #find pdfs root, dirs, files in os.walk(folder1): file in files: if file.endswith(('.pdf')): d=os.path.join(root, file) print getpdfcontent(d).encode("ascii", "ignore") traceback (most recent call last): file "c:\documents , settings\dknight\desktop\readpdf.py", line 50, in <module> print getpdfcontent(d).encode("ascii", "ignore") file "c:\documents , settings\dknight\desktop\readpdf.py", line 32, in getpdfcontent pdf = pypdf.pdffilereader(file(path, "rb")) typeerror: 'str' object not callable
i using list, got exact same error, didnt think big deal, of right becoming one. know able work around similar issues in arcpy, nothing close
try not use built-in types variable names:
don't this:
for file in files:
do instead:
myfile in files:
Comments
Post a Comment