python - Replace a substring in a string -


i'm having issue program in python. i'm trying read content html file, removing html tags , removing stop words.

actually, remove tags can't remove stop words. program gets text file , stores them in list. format of file following:

a ... yours 

if test code step step in python interpreter, works, when 'python main.py' doesn't work

my code is:

from htmlparser import htmlparser  class mlstripper(htmlparser):     def __init__(self):         self.reset()         self.fed = []     def handle_data(self, d):         self.fed.append(d)     def get_data(self):         return ''.join(self.fed)  def strip_tags(html):     s = mlstripper()     s.feed(html)     return s.get_data()  def remove_stop_words(textcontent, stopwords):     stopword in stopwords:         word = stopword.replace('\n','') + ' '         textcontent.replace(word, '')     return textcontent   def main():     stopwords = open("stopwords.txt", "r").readlines()     emailcontent = open("mail.html", "r").read()     textcontent = strip_tags(emailcontent)     print remove_stop_words(textcontent.lower(), stopwords)  main() 

i hope can me

the issue here not saving result of textcontent.replace(word, ''). replace function not modify textcontent variable in place; rather result returned.

thus, need save results textcontent.

textcontent.replace(word, '') 

should be:

textcontent = textcontent.replace(word, '') 

Comments

Popular posts from this blog

javascript - DIV "hiding" when changing dropdown value -

node.js - Node - Passport Auth - Authed Post Route hangs on form submission -

Does Firefox offer AppleScript support to get URL of windows? -