python - pypdf for lists of pdfs -

- September 15, 2014

i have got pypdf work fine single pdf file, can not seem work lits of files, or in loop multiple pdfs, without failing because of string not being callable. ideas can use work around?

def getpdfcontent(path):     content = ""     # load pdf pypdf     pdf = pypdf.pdffilereader(file(path, "rb"))     # iterate pages     in range(0, pdf.getnumpages()):         # extract text page , add content         content += pdf.getpage(i).extracttext() + "\n"     # collapse whitespace     content = " ".join(content.replace(u"\xa0", " ").strip().split())     return content  #print getpdfcontent(r"z:\gis\masterpermits\12300983.pdf").encode("ascii", "ignore")   #find pdfs             root, dirs, files in os.walk(folder1):     file in files:       if file.endswith(('.pdf')):           d=os.path.join(root, file)           print getpdfcontent(d).encode("ascii", "ignore")  traceback (most recent call last):   file "c:\documents , settings\dknight\desktop\readpdf.py", line 50, in <module>     print getpdfcontent(d).encode("ascii", "ignore")   file "c:\documents , settings\dknight\desktop\readpdf.py", line 32, in getpdfcontent     pdf = pypdf.pdffilereader(file(path, "rb")) typeerror: 'str' object not callable

i using list, got exact same error, didnt think big deal, of right becoming one. know able work around similar issues in arcpy, nothing close

try not use built-in types variable names:

don't this:

for file in files:

do instead:

 myfile in files:

Search This Blog

IO

python - pypdf for lists of pdfs -

Comments

Post a Comment

Popular posts from this blog

javascript - DIV "hiding" when changing dropdown value -

html - Accumulated Depreciation of Assets on php -

c# - WPF DataGrids for hierarchical information -