ARTICLE AD BOX
In a Python exercise, I need to develop a small web proxy server that is able to cache web pages. This proxy server only needs to understand simple GET-requests but is able to handle all kinds of objects, not just HTML but also images. Below are my codes:
# Proxy Server from socket import * import sys # Create a server socket, bind it to a port and start listening tcpSerSock = socket(AF_INET, SOCK_STREAM) serverName = '0.0.0.0' serverPort = 8000 tcpSerSock.bind((serverName, serverPort)) tcpSerSock.listen(1) while 1: # Start receiving data from the client print('Ready to server...') tcpCliSock, addr = tcpSerSock.accept() print('Received a connection from:', addr) message = tcpCliSock.recv(1024).decode() print(message) # Extract the filename from the given message filename = message.split()[1].partition("/")[2] print(f"filename: {filename}") fileExists = "False" try: # Check whether the file exist in the cache f = open(filename, "r") outputdata = f.readlines() fileExists = "True" # ProxyServer finds a cache hit and generates a response message tcpCliSock.send(b"HTTP/1.0 200 OK\r\n") tcpCliSock.send(b"Content-Type:text/html\r\n") ############################################# for line in outputdata: tcpCliSock.send(line) print("Read from cache") except IOError: if fileExists == "False": # Create a socket on the proxyserver c = socket(AF_INET, SOCK_STREAM) hostn = filename.replace("www.", "", 1).partition("/")[0] print(f"hostn:{hostn}") try: c.connect((hostn, 80)) print("Successfully connected") request_str = "GET /"+" HTTP/1.0\r\n\r\n" c.sendall(request_str.encode()) buf_data = b"" # Read the response into buffer while True: data = c.recv(1024) if not data: break buf_data += data # Create a new file in the cache for the requested file # Also send the response in the buffer to client socket and the corresponding file in the cache tcpCliSock.send(buf_data) # tcpCliSock.send("\r\n".encode()) # The use of tempFile? # tempFile = open("./"+filename.replace("/", "_"), "wb") # tempFile.write(buf_data) except Exception as E: print("Illegal request") print(E) else: nf_header = "<hl>404 Not Found</hl>" tcpCliSock.send(nf_header.encode()) tcpCliSock.close() tcpSerSock.close() sys.exit()The link I used to test this program is "http://127.0.0.1:8000/httpforever.com". The program above froze when the browser is requesting the "favicon.ico" file. Below is the output in the console and the (part of) output html file. In the complete output html file, all texts and links are loaded.


I believe the output is the desired one, but the program cannot finishes the last request, which is the request for the favicon. I believe this is because I do not have the right way to parse the filename variable so that the socket cannot connect to the correct hostn. My question is how to complete the all the requests so that the program above can exit without errors? Any hints or help will be much appreciated.
