python - Replace a substring in a string -
i'm having issue program in python. i'm trying read content html file, removing html tags , removing stop words.
actually, remove tags can't remove stop words. program gets text file , stores them in list. format of file following:
a ... yours
if test code step step in python interpreter, works, when 'python main.py' doesn't work
my code is:
from htmlparser import htmlparser class mlstripper(htmlparser): def __init__(self): self.reset() self.fed = [] def handle_data(self, d): self.fed.append(d) def get_data(self): return ''.join(self.fed) def strip_tags(html): s = mlstripper() s.feed(html) return s.get_data() def remove_stop_words(textcontent, stopwords): stopword in stopwords: word = stopword.replace('\n','') + ' ' textcontent.replace(word, '') return textcontent def main(): stopwords = open("stopwords.txt", "r").readlines() emailcontent = open("mail.html", "r").read() textcontent = strip_tags(emailcontent) print remove_stop_words(textcontent.lower(), stopwords) main()
i hope can me
the issue here not saving result of textcontent.replace(word, '')
. replace
function not modify textcontent
variable in place; rather result returned.
thus, need save results textcontent
.
textcontent.replace(word, '')
should be:
textcontent = textcontent.replace(word, '')
Comments
Post a Comment