python - Accessing an html file from cache memory -
i writing code in python following things: 1) gets html file internet. 2) extracts urls. 3) compare these urls search key , opens correct webpage user wants open. using following code:
def open_page(name): try: links = lxml.html.parse('http://www.w3schools.com/html/').xpath("//a/@href") url in links: if re.search(name, url): self.get_webpage.open('http://www.w3schools.com/html/'+url) break except indexerror e: pass`
i have call method many times in module making process of opening webpage slow. tried check execution time of each line of method , came know lxml.html.parse() consuming of time. if try use html file stored in local system, method works speedily. there way can html file of webpage http://www.w3schools.com/html/ cache after first time? p.s. not want save html file permanently in local system, because in case can miss updates/changes on html file.
it sounds want cache page, want check nothing has changed since last downloaded it.
the if-modified-since http header friend in endeavor. when making http request, can provide header field time last downloaded page. if page has not changed on server since time, server return 304 not modified status code , not send page content, saving trouble of downloading again.
here's how might go doing in python 2:
import contextlib import datetime import urllib2 contextlib.closing(urllib2.urlopen(urllib2.request( "http://www.w3schools.com/html/", headers={"if-modified-since": last_access_time}))) u: if u.getcode() != 304: cached_html = lxml.html.parse(u) last_access_time = datetime.datetime.now() html = cached_html
last_access_time
, cached_html
stored on disk.
Comments
Post a Comment