Python源码分析之从SocketServer到SimpleHTTPServer

从底层开始慢慢瞅,看看SimpleHTTPServer到底有多Simple?

本文涉及到的各个类的继承关系:

server.png

SocketServer 网络服务框架

Python把网络服务抽象成两个主要的类,一个是Server类,用于处理连接相关的网络操作,另外一个则是RequestHandler类,用于处理数据相关的操作。并且提供两个MixIn 类,用于扩展 Server,实现多进程或多线程。在构建网络服务的时候同时需要 “Server” 和 “RequestHandler”(RequestHandler的实例对象在Server内配合 Server工作)。

下图是socketserver中的类继承图,看起来很复杂,但我们重点关注其中TCP的部分就好了。 所以重点在于:BaseServer / TCPServer / ThreadingMixIn / ThreadingTCPServer 以及 BaseRequestHandler / StreamRequestHandler 。

server.png

SocketServer –> BaseServer

BaseServer 通过__init__初始化,对外提供serve_foreverhandler_request方法。

SocketServer –> BaseServer –> init()

def __init__(self, server_address, RequestHandlerClass):
"""Constructor. May be extended, do not override."""
self.server_address = server_address
self.RequestHandlerClass = RequestHandlerClass
self.__is_shut_down = threading.Event()
self.__shutdown_request = False

__init__源码很简单。主要作用是创建server对象,并初始化server地址和处理请求的class。server_address是一个包含主机和端口的tuple。

SocketServer –> BaseServer –> serve_forever()

创建了server对象之后,就需要使用server对象开启一个无限循环。

def serve_forever(self, poll_interval=0.5):
"""Handle one request at a time until shutdown.

Polls for shutdown every poll_interval seconds. Ignores
self.timeout. If you need to do periodic tasks, do them in
another thread.
"""
self.__is_shut_down.clear()
try:
# XXX: Consider using another file descriptor or connecting to the
# socket to wake this up instead of polling. Polling reduces our
# responsiveness to a shutdown request and wastes cpu at all other
# times.
with _ServerSelector() as selector:
selector.register(self, selectors.EVENT_READ)

while not self.__shutdown_request:
ready = selector.select(poll_interval)
if ready:
self._handle_request_noblock()

self.service_actions()
finally:
self.__shutdown_request = False
self.__is_shut_down.set()

serve_forever接受一个参数poll_interval,用于表示select轮询的时间(0.5秒)。然后进入一个无限循环,调用select方式进行网络IO的监听。

如果select函数返回,表示有IO连接或数据,那么将会调用_handle_request_noblock方法。

SocketServer –> BaseSver –> _handle_request_noblock()

def _handle_request_noblock(self):
"""Handle one request, without blocking.

I assume that selector.select() has returned that the socket is
readable before this function was called, so there should be no risk of
blocking in get_request().
"""
try:
request, client_address = self.get_request()
except OSError:
return
if self.verify_request(request, client_address):
try:
self.process_request(request, client_address)
except Exception:
self.handle_error(request, client_address)
self.shutdown_request(request)
except:
self.shutdown_request(request)
raise
else:
self.shutdown_request(request)

调用_handle_request_noblock()方法开始处理请求。

  1. 先调用get_request方法获取连接(直接从socket返回请求和客户端地址)。

    def get_request(self):
    """Get the request and client address from the socket.
    May be overridden.
    """
    return self.socket.accept()
  2. 获得连接后得到了连接,调用verify_request方法验证请求(默认直接通过验证)。

    def verify_request(self, request, client_address):
    """Verify the request. May be overridden.
    Return True if we should proceed with this request.
    """
    return True
  3. 验证通过后调用process_request处理请求。该方法是MixIn的入口,MixIn子类通过重写该方法,进行多线程或多进程的配置。

    def process_request(self, request, client_address):
    """Call finish_request.
    Overridden by ForkingMixIn and ThreadingMixIn.
    """
    self.finish_request(request, client_address)
    self.shutdown_request(request)
  4. 调用finish_request完成对请求的处理工作(创建requestHandler对象,并通过requestHandler做具体的处理),同时调用shutdown_request结束请求。

    def finish_request(self, request, client_address):
    """Finish one request by instantiating RequestHandlerClass."""
    self.RequestHandlerClass(request, client_address, self)

    def shutdown_request(self, request):
    """Called to shutdown and close an individual request."""
    self.close_request(request)
  5. 如果中途出现错误,则调用handle_error处理错误,以及shutdown_request结束连接。

SocketServer –> BaseRequestHandler

BaseRequestHandler是所有请求处理程序对象的超类。 它定义了一系列接口,其中的 handle() 方法是子类必须实现的,实现具体的数据处理代码。

构造函数设置request,client_addressserver变量,然后依次调用setup()方法(子类需要重写该方法,用于处理socket连接),handle()方法(需要在子类中重写,用于数据处理)和finsh()方法。

class BaseRequestHandler:
"""Base class for request handler classes.

This class is instantiated for each request to be handled. The
constructor sets the instance variables request, client_address
and server, and then calls the handle() method. To implement a
specific service, all you need to do is to derive a class which
defines a handle() method.

The handle() method can find the request as self.request, the
client address as self.client_address, and the server (in case it
needs access to per-server information) as self.server. Since a
separate instance is created for each request, the handle() method
can define other arbitrary instance variables.
"""
def __init__(self, request, client_address, server):
self.request = request
self.client_address = client_address
self.server = server
self.setup()
try:
self.handle()
finally:
self.finish()
def setup(self):
pass
def handle(self):
pass
def finish(self):
pass

小结一下:构建一个网络服务,需要一个BaseServer用于处理网络IO,同时在内部创建requestHandler对象,对所有具体的请求做处理。

接下来分析TCP服务相关代码 ……

SocketServer –> TCPServer

TCPServer继承BaseServer,并在构造函数中完成socket的创建,随后调用server_bind()server_activate()

class TCPServer(BaseServer):
address_family = socket.AF_INET
socket_type = socket.SOCK_STREAM
request_queue_size = 5 # 最大连接数
allow_reuse_address = False

def __init__(self, server_address, RequestHandlerClass, bind_and_activate=True):
"""Constructor. May be extended, do not override."""
BaseServer.__init__(self, server_address, RequestHandlerClass)
self.socket = socket.socket(self.address_family,
self.socket_type) # 创建IPv4TCP
if bind_and_activate:
try:
self.server_bind()
self.server_activate()
except:
self.server_close()
raise

def server_bind(self):
"""Called by constructor to bind the socket.
May be overridden.
"""
if self.allow_reuse_address:
self.socket.setsockopt(socket.SOL_SOCKET, socket.SO_REUSEADDR, 1) # 端口复用
self.socket.bind(self.server_address) # 绑定端口
self.server_address = self.socket.getsockname()

def server_activate(self):
"""Called by constructor to activate the server.
May be overridden.
"""
self.socket.listen(self.request_queue_size) # 开启端口监听

def server_close(self):
"""Called to clean-up the server.
May be overridden.
"""
self.socket.close()

socket在创建(socket.socket)、绑定(socket.bind)、激活(socket.listen)后需要开始侦听连接(socket.accept)。

该类中get_request()方法主要正是返回socket对象的请求连接。get_request()方法是在BaseServer基类中的_handle_request_noblock中被调用的。在TCPServer类中被重写,但代码与基类中一致。

def get_request(self):
"""Get the request and client address from the socket.
May be overridden.
"""
return self.socket.accept()

shutdown_request()重写,增加了对一种意外情况的处理,可以不理会。

def shutdown_request(self, request):
"""Called to shutdown and close an individual request."""
try:
#explicitly shutdown. socket.close() merely releases
#the socket and waits for GC to perform the actual close.
request.shutdown(socket.SHUT_WR) # SHUT_WR:关闭连接的写端,进程不能在对此套接字发出写操作
except OSError:
pass #some platforms may raise ENOTCONN here
self.close_request(request)

此外,还提供了一个 fileno() 方法,提供selector所需要的操作接口(文件描述符)。

SocketServer –> StreamRequestHandler

TCPServer继承自BaseServer,Handler方面则是StreamRequestHandler继承BaseRequestHandler。基类的setup()方法和finish()方法被重写,用于通过连接实现缓存文件的读写操作。

def setup(self):
self.connection = self.request
if self.timeout is not None:
self.connection.settimeout(self.timeout)
if self.disable_nagle_algorithm:
self.connection.setsockopt(socket.IPPROTO_TCP,
socket.TCP_NODELAY, True)
self.rfile = self.connection.makefile('rb', self.rbufsize)
# 缓冲文件,避免大量数据读取时等待。
if self.wbufsize == 0:
self.wfile = _SocketWriter(self.connection)
# 非缓冲文件,因为不适用也不必要。
else:
self.wfile = self.connection.makefile('wb', self.wbufsize)

def finish(self):
if not self.wfile.closed:
try:
self.wfile.flush()
except socket.error:
# A final socket error may have occurred here, such as
# the local error ECONNABORTED.
pass
self.wfile.close()
self.rfile.close()

setup()判断是否超时、是否使用nagle算法(一个tcp连接上最多只能有一个未被ACK的片段,从而有效减少网络上的小数据包)后创建了一个可读(rfile)和一个可写(wfile)的“文件”对象。

他们实际上并不是创建了文件,而是封装了读取数据和发送数据的操作,抽象成为对文件的操作。可以理解为 self.rfile 就是读取客户端数据的对象,它有一些方法可以读取数据。self.wfile则是用来发送数据给客户端的对象。后面的操作,客户端数据到来会被写入缓冲区可读,需要向客户端发送数据的时候,只需要向可写的文件中write数据即可。

finish()方法,用于关闭rfile 和 wfile 。

SocketServer –> ThreadingMixIn

在BaseServer类中预留了可用于Mixin扩展多线程或多进程的接口。ThreadingMixin通过多继承到子类,对原有类中的process_request()方法覆盖来实现。

process_request()方法开启多线程,调用process_request_thread()方法(与BaseServer中的process_request()方法相同功能)。

class ThreadingMixIn:
"""Mix-in class to handle each request in a new thread."""
# Decides how threads will act upon termination of the
# main process
daemon_threads = False

def process_request_thread(self, request, client_address):
"""Same as in BaseServer but as a thread.
In addition, exception handling is done here.
"""
try:
self.finish_request(request, client_address)
except Exception:
self.handle_error(request, client_address)
finally:
self.shutdown_request(request)

def process_request(self, request, client_address):
"""Start a new thread to process the request."""
t = threading.Thread(target = self.process_request_thread,
args = (request, client_address))
t.daemon = self.daemon_threads
t.start()

具体使用时,通过多继承调用接口:

class ThreadingTCPServer(ThreadingMixIn, TCPServer): pass


http.server HTTP服务

该模块定义了用于实现HTTP服务器(Web服务器)的类,该模块定义了4个类:
HTTPServer

BaseHTTPRequestHandler

SimpleHTTPRequestHandler

CGIHTTPRequestHandler

http.server –> HTTPServer

HTTPServer是一个socketserver.TCPServer子类, 它通过将服务器地址存储为名为 server_nameserver_port 的实例变量来创建并侦听HTTP套接字,将请求分派给handler程序。

class HTTPServer(socketserver.TCPServer):
allow_reuse_address = 1 # Seems to make sense in testing environment
def server_bind(self):
"""Override server_bind to store the server name."""
socketserver.TCPServer.server_bind(self)
host, port = self.server_address[:2]
self.server_name = socket.getfqdn(host)
self.server_port = port

http.server –> BaseHTTPRequestHandler

这个类继承自socketserver.StreamRequestHandler, 用于处理到达服务器的HTTP请求。 但它本身不实际响应HTTP请求,而是通过子类来处理每个请求方法(例如GET或POST)。BaseHTTPRequestHandler提供了一系列在子类中需要用到的类、实例变量和方法。

调用内部的 parse_request()方法来解析 request 和 headers ,然后通过请求类型调用对应方法。方法名称是从请求中构造的。(例如,如果请求类型为 GET ,那么 do_GET() 方法将被调用。)

BaseHTTPRequestHandler 只对 socketserver.StreamRequestHandlerhandle() 方法进行了重写,纯粹的进行数据处理。

  1. 重写基类的 handle() 方法

    def handle(self):
    """Handle multiple requests if necessary."""
    self.close_connection = True

    self.handle_one_request()
    while not self.close_connection:
    self.handle_one_request()
  2. 调用 handle_one_request() 方法

    def handle_one_request(self):
    """Handle a single HTTP request.
    You normally don't need to override this method; see the class
    __doc__ string for information on how to handle specific HTTP
    commands such as GET and POST.
    """
    try:
    self.raw_requestline = self.rfile.readline(65537)
    if len(self.raw_requestline) > 65536:
    self.requestline = ''
    self.request_version = ''
    self.command = ''
    self.send_error(HTTPStatus.REQUEST_URI_TOO_LONG)
    return
    if not self.raw_requestline:
    self.close_connection = True
    return
    # parse_request() 解析request报文,提取command, path, version, headers等数据。
    if not self.parse_request():
    # An error code has been sent, just exit
    return
    mname = 'do_' + self.command # 如果command为GET,那么后续就调用do_GET()
    if not hasattr(self, mname):
    self.send_error(
    HTTPStatus.NOT_IMPLEMENTED,
    "Unsupported method (%r)" % self.command)
    return
    method = getattr(self, mname) # 获取方法地址
    method() # 调用方法
    self.wfile.flush() #actually send the response if not already done.
    except socket.timeout as e:
    #a read or a write timed out. Discard this connection
    self.log_error("Request timed out: %r", e)
    self.close_connection = True
    return
  3. do_command 的具体实现

剩下的对报文的进一步业务处理,数据响应就需要在子类中实现了。 比如在子类中实现:do_GET() / do_POST() / do_OPINTION() / do_TRACE()

http.server –> SimpleHTTPRequestHandler

SimpleHTTPRequestHandler 算是一个示例,实现对GET和POST请求的处理。

class SimpleHTTPRequestHandler(BaseHTTPRequestHandler):
"""Simple HTTP request handler with GET and HEAD commands.
This serves files from the current directory and any of its
subdirectories. The MIME type for files is determined by
calling the .guess_type() method.
The GET and HEAD requests are identical except that the HEAD
request omits the actual contents of the file.
"""
server_version = "SimpleHTTP/" + __version__ # 重写server_version

def do_GET(self):
"""Serve a GET request."""
f = self.send_head()
if f:
try:
self.copyfile(f, self.wfile)
finally:
f.close()

def do_HEAD(self):
"""Serve a HEAD request."""
f = self.send_head()
if f:
f.close()

此处,do_GET()do_POST() 都只是做了简单的处理( send_head() )。列出当前目录的文件,响应文件GET请求。

最后来看看整个SimpleHTTPServer的启动代码。

def test(HandlerClass=BaseHTTPRequestHandler,
ServerClass=HTTPServer, protocol="HTTP/1.0", port=8000, bind=""):
"""Test the HTTP request handler class.
This runs an HTTP server on port 8000 (or the port argument).
"""
server_address = (bind, port)
HandlerClass.protocol_version = protocol
# 启动Server
with ServerClass(server_address, HandlerClass) as httpd:
sa = httpd.socket.getsockname()
serve_message = "Serving HTTP on {host} port {port} (http://{host}:{port}/) ..."
print(serve_message.format(host=sa[0], port=sa[1]))
try:
httpd.serve_forever() # 无限循环,提供HTTP服务。
except KeyboardInterrupt:
print("\nKeyboard interrupt received, exiting.")
sys.exit(0)

if __name__ == '__main__':
# ... (一堆参数处理)
handler_class = SimpleHTTPRequestHandler
test(HandlerClass=handler_class, port=8000, bind='')


end

其实,简单撸完这一遍代码,发现如果先看一下document其实会更便于理解,效率也会更高。

也是很少读源代码,朋友们有啥好的建议、技巧啥的吗? 还请不吝赐教。