WSGI: 一个协议,描述通用服务器与python app之间的接口规范
wsgi app:遵守wsgi规范的python app
mod_wsgi: apache服务器的一个扩展模块, wsgi协议在apache服务器上的一个实现,有了它, 你就可以在apache上运行wsgi app
总的来说,WSGIScriptAlias 模式,python解释器被嵌入到apache进程当中,请求处理代码是在apache的 worker子进程中执行。WSGIDaemonProcess python解释器运行在单独的进程之中,和apache进程是隔离的。
mod_wsgi怎么完成python初始化?和apache关系怎样?一个简单的http请求进来之后, 处理流程大概是什么?下面将针对 WSGIScriptAlias 模式进行简要分析。
apache配置:
WSGIScriptAlias /hello /var/www/hello.wsgi
告诉apache hello.wsgi是一个mod_wsgi app,所有 /hello/ 下面的请求都转发给它。
wsgi代码:
jaime@westeros:~/source/mod-wsgi-3.3$ ls build-2.6 build-3.2 debian Makefile.in mod_wsgi.lo posix-ap2X.mk.in win32-ap22py31.mk build-2.7 configure LICENCE mod_wsgi.c mod_wsgi.slo README build-3.1 configure.ac Makefile mod_wsgi.la posix-ap1X.mk.in win32-ap22py26.mk
mod_wsgi.c有很多代码是关于apache 1.3的,和2.0代码有很多重名的函数,容易误导, 不便于阅读,可使用 unifdef 工具,将1.3相关的代码全部用空行替代,保留行号 的同时又清爽了许多:
jaime@westeros:~/source/mod-wsgi-3.3$ sudo apt-get install unifdef jaime@westeros:~/source/mod-wsgi-3.3$ unifdef -DAP_SERVER_MAJORVERSION_NUMBER=2 -b mod_wsgi.c > mod_wsgi-clean.c
apache模块的入口 mod_wsgi.c +15085
/* Dispatch list for API hooks */ module AP_MODULE_DECLARE_DATA wsgi_module = { STANDARD20_MODULE_STUFF, wsgi_create_dir_config, /* create per-dir config structures */ wsgi_merge_dir_config, /* merge per-dir config structures */ wsgi_create_server_config, /* create per-server config structures */ wsgi_merge_server_config, /* merge per-server config structures */ wsgi_commands, /* table of config file commands */ wsgi_register_hooks /* register hooks */ };
配置选项对应的函数 mod_wsgi.c +14982:
static const command_rec wsgi_commands[] = { AP_INIT_RAW_ARGS("WSGIScriptAlias", wsgi_add_script_alias, NULL, RSRC_CONF, "Map location to target WSGI script file."), ... #if defined(MOD_WSGI_WITH_DAEMONS) AP_INIT_RAW_ARGS("WSGIDaemonProcess", wsgi_add_daemon_process, NULL, RSRC_CONF, "Specify details of daemon processes to start."), ... AP_INIT_TAKE1("WSGILazyInitialization", wsgi_set_lazy_initialization, NULL, RSRC_CONF, "Enable/Disable lazy Python initialization."), #endif ... };
wsgi_add_script_alias大致做了一些初始化的工作,告诉apache dispatcher留意了, 看到类似XXX的url,要调用我们来处理。
有意思的是这个 wsgi_register_hooks mod_wsgi.c +14931+:
static void wsgi_register_hooks(apr_pool_t *p) { ... static const char * const p6[] = { "mod_python.c", NULL }; ap_hook_post_config(wsgi_hook_init, p6, NULL, APR_HOOK_MIDDLE); ap_hook_child_init(wsgi_hook_child_init, p6, NULL, APR_HOOK_MIDDLE); ap_hook_translate_name(wsgi_hook_intercept, p1, n1, APR_HOOK_MIDDLE); ap_hook_handler(wsgi_hook_handler, NULL, NULL, APR_HOOK_MIDDLE); ... }
从名字上看,wsgi_hook_init, wsgi_hook_child_init是做初始化工作的。 我们先看wsgi_hook_handler做了什么 mod_wsgi.c +8690:
static int wsgi_hook_handler(request_rec *r) { ... /* * Only process requests for this module. First check for * where target is the actual WSGI script. Then need to * check for the case where handler name mapped to a handler * script definition. */ // blablabla 一堆参数检查代码 ... /* Build the sub process environment. */ // wsgi协议相关环境变量在这里设置,每次请求都不一样 // 故此处是每次请求的必经之地 wsgi_build_environment(r); ... // WSGIDaemonProcess 模式处理代码 /* * Execute the target WSGI application script or proxy * request to one of the daemon processes as appropriate. */ #if defined(MOD_WSGI_WITH_DAEMONS) status = wsgi_execute_remote(r); if (status != DECLINED) return status; #endif ... return wsgi_execute_script(r); }
wsgi_hook_handler 是每次请求的入口,最后调用wsgi_execute_script mod_wsgi.c +6404:
static int wsgi_execute_script(request_rec *r) { ... /* Grab request configuration. */ config = (WSGIRequestConfig *)ap_get_module_config(r->request_config, &wsgi_module); /* * Acquire the desired python interpreter. Once this is done * it is safe to start manipulating python objects. */ // 获得解释器,一个wsgi app可以运行在单独的python解释器里 // 在一个进程里,可以有多个解释器同时运行 // application_group 在 wsgi_application_group 函数中设置 // 与req请求的servername,port,scriptname有关,每次请求对应于哪个解释器由它来决定 interp = wsgi_acquire_interpreter(config->application_group); if (!interp) { ap_log_rerror(APLOG_MARK, WSGI_LOG_CRIT(0), r, "mod_wsgi (pid=%d): Cannot acquire interpreter '%s'.", getpid(), config->application_group); return HTTP_INTERNAL_SERVER_ERROR; } /* Calculate the Python module name to be used for script. */ if (config->handler_script && *config->handler_script) script = config->handler_script; else script = r->filename; // 找到这个app的python模块名字 name = wsgi_module_name(r->pool, script); ... modules = PyImport_GetModuleDict(); module = PyDict_GetItemString(modules, name); Py_XINCREF(module); if (module) exists = 1; /* * If script reloading is enabled and the module for it has * previously been loaded, see if it has been modified since * the last time it was accessed. For a handler script will * also see if it contains a custom function for determining * if a reload should be performed. */ // Reload相关代码,检测app代码是否被修改 if (module && config->script_reloading) { if (wsgi_reload_required(r->pool, r, script, module, r->filename)) { ... #if defined(MOD_WSGI_WITH_DAEMONS) if (*config->process_group) { /* * Need to restart the daemon process. We bail * out on the request process here, sending back * a special response header indicating that * process is being restarted and that remote * end should abandon connection and attempt to * reconnect again. We also need to signal this * process so it will actually shutdown. The * process supervisor code will ensure that it * is restarted. */ Py_BEGIN_ALLOW_THREADS ap_log_rerror(APLOG_MARK, WSGI_LOG_INFO(0), r, "mod_wsgi (pid=%d): Force restart of " "process '%s'.", getpid(), config->process_group); Py_END_ALLOW_THREADS ... wsgi_release_interpreter(interp); r->status = HTTP_INTERNAL_SERVER_ERROR; r->status_line = "0 Rejected"; wsgi_daemon_shutdown++; // WSGIDaemonProcess 模式,杀掉当前daemon进程,重新加载 kill(getpid(), SIGINT); return OK; } else { ... PyDict_DelItemString(modules, name); } #else /* * Need to reload just the script module. Remove * the module from the modules dictionary before * reloading it again. If code is executing * within the module at the time, the callers * reference count on the module should ensure * it isn't actually destroyed until it is * finished. */ // WSGIScriptAlias 模式,删除旧的模块 PyDict_DelItemString(modules, name); #endif } } ... // 如果是第一次请求,则需要加载该模块 /* Load module if not already loaded. */ if (!module) { module = wsgi_load_source(r->pool, r, name, exists, script, config->process_group, config->application_group); } ... // 激动人心的时刻到了,执行app代码! status = HTTP_INTERNAL_SERVER_ERROR; /* Determine if script exists and execute it. */ if (module) { PyObject *module_dict = NULL; PyObject *object = NULL; module_dict = PyModule_GetDict(module); object = PyDict_GetItemString(module_dict, config->callable_object); if (object) { AdapterObject *adapter = NULL; adapter = newAdapterObject(r); if (adapter) { PyObject *method = NULL; PyObject *args = NULL; Py_INCREF(object); status = Adapter_run(adapter, object); // 这里,这里 Py_DECREF(object); ... } else { Py_BEGIN_ALLOW_THREADS ap_log_rerror(APLOG_MARK, WSGI_LOG_ERR(0), r, "mod_wsgi (pid=%d): Target WSGI script '%s' does " "not contain WSGI application '%s'.", getpid(), script, config->callable_object); Py_END_ALLOW_THREADS status = HTTP_NOT_FOUND; } } // 错误处理 /* Log any details of exceptions if execution failed. */ if (PyErr_Occurred()) wsgi_log_python_error(r, NULL, r->filename); /* Cleanup and release interpreter, */ Py_XDECREF(module); wsgi_release_interpreter(interp); return status; }
Adapter_run +3823:
static int Adapter_run(AdapterObject *self, PyObject *object) { ... vars = Adapter_environ(self); // 获取 start_response 函数 start = PyObject_GetAttrString((PyObject *)self, "start_response"); // 准备参数,还记得 def application(environ, start_response) 吗? args = Py_BuildValue("(OO)", vars, start); // 执行app代码 self->sequence = PyEval_CallObject(object, args); if (self->sequence != NULL) { if (!Adapter_process_file_wrapper(self)) { int aborted = 0; iterator = PyObject_GetIter(self->sequence); if (iterator != NULL) { PyObject *item = NULL; // 遍历返回的iterator,输出每一行 while ((item = PyIter_Next(iterator))) { ... if (length && !Adapter_output(self, msg, length, 0)) { if (!PyErr_Occurred()) aborted = 1; Py_DECREF(item); break; } } } ... } // 如果返回的seq有close方法则调用 if (PyObject_HasAttrString(self->sequence, "close")) { PyObject *args = NULL; PyObject *data = NULL; close = PyObject_GetAttrString(self->sequence, "close"); args = Py_BuildValue("()"); data = PyEval_CallObject(close, args); Py_DECREF(args); Py_XDECREF(data); Py_DECREF(close); } ... } ... }
AdapterObject 是自定义的python类型,用来运行wsgi程序,含有start_response方法:
typedef struct { PyObject_HEAD int result; request_rec \*r; #if defined(MOD_WSGI_WITH_BUCKETS) apr_bucket_brigade \*bb; #endif WSGIRequestConfig \*config; InputObject \*input; PyObject \*log; int status; const char \*status_line; PyObject \*headers; PyObject \*sequence; int content_length_set; apr_off_t content_length; apr_off_t output_length; } AdapterObject; static PyTypeObject Adapter_Type; ... static PyMethodDef Adapter_methods[] = { { "start_response", (PyCFunction)Adapter_start_response, METH_VARARGS, 0 }, { "write", (PyCFunction)Adapter_write, METH_VARARGS, 0 }, { "file_wrapper", (PyCFunction)Adapter_file_wrapper, METH_VARARGS, 0 }, { NULL, NULL} };
Adapter_xxx 系列函数,是wsgi协议的具体实现。我承认,前面说的在wsgi_build_environment中设置wsgi相关变量的说法有不对 的地方,大多数变量是在 Adapter_environ 中设置的:)
Adapter_start_response C实现的start_response
如何获得解释器?:
static InterpreterObject *wsgi_acquire_interpreter(const char *name) { PyThreadState *tstate = NULL; PyInterpreterState *interp = NULL; InterpreterObject *handle = NULL; ... /* * Check if already have interpreter instance and * if not need to create one. */ handle = (InterpreterObject *)PyDict_GetItemString(wsgi_interpreters, name); if (!handle) { // 如果没有查找到解释器,新解释器在这里被创建 handle = newInterpreterObject(name); ... // 存储到 wsgi_interpreters PyDict_SetItemString(wsgi_interpreters, name, (PyObject *)handle); } else Py_INCREF(handle); interp = handle->interp; /* * Create new thread state object. We should only be * getting called where no current active thread * state, so no need to remember the old one. When * working with the main Python interpreter always * use the simplified API for GIL locking so any * extension modules which use that will still work. */ // thread 相关代码 ... return handle; }
加载app代码在wsgi_load_source函数:
static PyObject *wsgi_load_source(apr_pool_t *pool, request_rec *r, const char *name, int exists, const char* filename, const char *process_group, const char *application_group) { ... fp = fopen(filename, "r"); n = PyParser_SimpleParseFile(fp, filename, Py_file_input); ... co = (PyObject *)PyNode_Compile(n, filename); PyNode_Free(n); // 根据文件名字name,编译过的代码co,加载该模块 if (co) m = PyImport_ExecCodeModuleEx((char *)name, co, (char *)filename); Py_XDECREF(co); if (m) { ... // 设置模块修改时间 PyModule_AddObject(m, "__mtime__", object); } else { Py_BEGIN_ALLOW_THREADS if (r) { ap_log_rerror(APLOG_MARK, WSGI_LOG_ERR(0), r, "mod_wsgi (pid=%d): Target WSGI script '%s' cannot " "be loaded as Python module.", getpid(), filename); } ... wsgi_log_python_error(r, NULL, filename); } return m; }
以上即是WSGIScriptAlias模式下,一个请求收到之后,apache调用wsgi_hook_handler, mod_wsgi的大致处理流程。还有一个问题,python环境到底是在什么时候初始化的呢? 让我们回头看。
wsgi_hook_init mod_wsgi.c +13031:
static int wsgi_hook_init(apr_pool_t *pconf, apr_pool_t *ptemp, apr_pool_t *plog, server_rec *s) { ... /* Retain reference to base server. */ wsgi_server = s; /* Retain record of parent process ID. */ wsgi_parent_pid = getpid(); /* Determine whether multiprocess and/or multithread. */ ap_mpm_query(AP_MPMQ_IS_THREADED, &wsgi_multithread); wsgi_multithread = (wsgi_multithread != AP_MPMQ_NOT_SUPPORTED); ap_mpm_query(AP_MPMQ_IS_FORKED, &wsgi_multiprocess); if (wsgi_multiprocess != AP_MPMQ_NOT_SUPPORTED) { ap_mpm_query(AP_MPMQ_MAX_DAEMONS, &wsgi_multiprocess); wsgi_multiprocess = (wsgi_multiprocess != 1); } /* Retain reference to main server config. */ wsgi_server_config = ap_get_module_config(s->module_config, &wsgi_module); /* * Check that the version of Python found at * runtime is what was used at compilation. */ wsgi_python_version(); /* * Initialise Python if required to be done in * the parent process. Note that it will not be * initialised if mod_python loaded and it has * already been done. */ if (wsgi_python_required == -1) wsgi_python_required = 1; // 在哪里初始化python,取决于 wsgi_python_after_fork 即 WSGILazyInitialization 选项 // 是在apache进程fork之前,还是之后? if (!wsgi_python_after_fork) wsgi_python_init(pconf); /* Startup separate named daemon processes. */ // WSGIDaemonProcess 模式下启动daemon进程,要探索daemon模式的奥秘,这里即是入口 #if defined(MOD_WSGI_WITH_DAEMONS) status = wsgi_start_daemons(pconf); #endif return status; }
fork 之后的初始化函数:
static void wsgi_hook_child_init(apr_pool_t *p, server_rec *s) { ... // wsgi_python_required 取决于 WSGIRestrictEmbedded 选项 if (wsgi_python_required) { /* * Initialise Python if required to be done in * the child process. Note that it will not be * initialised if mod_python loaded and it has * already been done. */ if (wsgi_python_after_fork) wsgi_python_init(p); /* * Now perform additional initialisation steps * always done in child process. */ wsgi_python_child_init(p); } }
这两个只是和apache相关的,由apache调用的hook初始化,真正的python初始化在 wsgi_python_init, wsgi_python_child_init 两步初始化:
static void wsgi_python_init(apr_pool_t *p) { static int initialized = 1; /* Perform initialisation if required. */ if (!Py_IsInitialized() || !initialized) { ... /* Initialise Python. */ ap_log_error(APLOG_MARK, WSGI_LOG_INFO(0), wsgi_server, "mod_wsgi (pid=%d): Initializing Python.", getpid()); initialized = 1; Py_Initialize(); // 神秘而又强大的 Py_Initialize /* Initialise threading. */ PyEval_InitThreads(); #if PY_MAJOR_VERSION == 3 && PY_MINOR_VERSION >= 2 /* * We now want to release the GIL. Before we do that * though we remember what the current thread state is. * We will use that later to restore the main thread * state when we want to cleanup interpreters on * shutdown. */ wsgi_main_tstate = PyThreadState_Get(); PyEval_ReleaseThread(wsgi_main_tstate); #else PyThreadState_Swap(NULL); PyEval_ReleaseLock(); #endif wsgi_python_initialized = 1; /* * Register cleanups to be performed on parent restart * or shutdown. This will destroy Python itself. */ apr_pool_cleanup_register(p, NULL, wsgi_python_parent_cleanup, apr_pool_cleanup_null); } } static void wsgi_python_child_init(apr_pool_t *p) { // 第二步初始化所做的工作, 此时已经fork了 /* * Trigger any special Python stuff required after a fork. * Only do this though if we were responsible for the * initialisation of the Python interpreter in the first * place to avoid it being done multiple times. Also only * do it if Python was initialised in parent process. */ /* Finalise any Python objects required by child process. */ /* Initialise Python interpreter instance table and lock. */ // 存放所有解释器的字典 wsgi_interpreters = PyDict_New(); /* * Initialise the key for data related to a thread. At * the moment we only record an integer thread ID to be * used in lookup table to thread states associated with * an interprter. */ /* * Cache a reference to the first Python interpreter * instance. This interpreter is special as some third party * Python modules will only work when used from within this * interpreter. This is generally when they use the Python * simplified GIL API or otherwise don't use threading API * properly. An empty string for name is used to identify * the first Python interpreter instance. */ /* Loop through import scripts for this process and load them. */ // 处理wsgi_import_list if (wsgi_import_list) { ... } }
ha, 终于快完了,现在,让我们打印一些有趣的输出,来看一看这些函数在什么时间, 哪个进程被调用。注意,下面的patch针对没有使用过 unifdef 的代码:
diff --git a/mod_wsgi.c b/mod_wsgi.c index f0764b8..1781f7b 100644 --- a/mod_wsgi.c +++ b/mod_wsgi.c @@ -29,6 +29,8 @@ * */ +#define INFO(fmt, args...) ap_log_error(APLOG_MARK, WSGI_LOG_ERR(0), wsgi_server, "[pid %d] %s:%s:%d "fmt, getpid(),__FILE__, __PRETTY_FUNCTION__, __LINE__,args) + #define CORE_PRIVATE 1 #include "httpd.h" @@ -5722,10 +5724,14 @@ static void wsgi_python_init(apr_pool_t *p) static int initialized = 1; #endif + INFO("%s", "enter"); + /* Perform initialisation if required. */ if (!Py_IsInitialized() || !initialized) { + INFO("%s", "init python"); + /* Enable Python 3.0 migration warnings. */ #if PY_MAJOR_VERSION == 2 && PY_MINOR_VERSION >= 6 @@ -5859,6 +5865,8 @@ static PyObject *wsgi_interpreters = NULL; static InterpreterObject *wsgi_acquire_interpreter(const char *name) { + INFO("search interpreter %s", name); + PyThreadState *tstate = NULL; PyInterpreterState *interp = NULL; InterpreterObject *handle = NULL; @@ -5893,6 +5901,9 @@ static InterpreterObject *wsgi_acquire_interpreter(const char *name) name); if (!handle) { + + INFO("create interpreter %s", name); + handle = newInterpreterObject(name); if (!handle) { @@ -5916,6 +5927,8 @@ static InterpreterObject *wsgi_acquire_interpreter(const char *name) else Py_INCREF(handle); + INFO("found interpreter %s", name); + interp = handle->interp; /* @@ -6339,6 +6352,8 @@ static int wsgi_execute_script(request_rec *r) * it is safe to start manipulating python objects. */ + INFO("%s", "enter"); + interp = wsgi_acquire_interpreter(config->application_group); if (!interp) { @@ -6543,6 +6558,7 @@ static int wsgi_execute_script(request_rec *r) PyObject *method = NULL; PyObject *args = NULL; + INFO("%s", "app running"); Py_INCREF(object); status = Adapter_run(adapter, object); Py_DECREF(object); @@ -6693,6 +6709,8 @@ static void wsgi_python_child_init(apr_pool_t *p) int thread_id = 0; int *thread_handle = NULL; + INFO("%s", "init python further"); + /* Working with Python, so must acquire GIL. */ state = PyGILState_Ensure(); @@ -6778,6 +6796,9 @@ static void wsgi_python_child_init(apr_pool_t *p) /* Loop through import scripts for this process and load them. */ if (wsgi_import_list) { + + INFO("%s", "dealing with wsgi_import_list"); + apr_array_header_t *scripts = NULL; WSGIScriptFile *entries; @@ -8115,6 +8136,7 @@ static void wsgi_log_script_error(request_rec *r, const char *e, const char *n) static void wsgi_build_environment(request_rec *r) { + INFO("%s", "enter"); WSGIRequestConfig *config = NULL; const char *value = NULL; @@ -8862,6 +8884,7 @@ static int wsgi_hook_handler(request_rec *r) if (!r->handler) return DECLINED; + INFO("handler %s, file %s", r->handler, r->filename); /* * Construct request configuration and cache it in the * request object against this module so can access it later @@ -9082,6 +9105,7 @@ static int wsgi_hook_handler(request_rec *r) #if AP_SERVER_MAJORVERSION_NUMBER < 2 + /* * Apache 1.3 module initialisation functions. */ @@ -12909,6 +12933,9 @@ static int wsgi_hook_daemon_handler(conn_rec *c) static int wsgi_hook_init(apr_pool_t *pconf, apr_pool_t *ptemp, apr_pool_t *plog, server_rec *s) { + + INFO("%s", "enter"); + void *data = NULL; const char *userdata_key = "wsgi_init"; char package[128]; @@ -13028,6 +13055,8 @@ static void wsgi_hook_child_init(apr_pool_t *p, server_rec *s) } #endif + INFO("%s", "enter"); + if (wsgi_python_required) { /* * Initialise Python if required to be done in @@ -13500,6 +13529,7 @@ static authn_status wsgi_check_password(request_rec *r, const char *user, * the last time it was accessed. */ + /* FIXME: Reloading */ if (module && config->script_reloading) { if (wsgi_reload_required(r->pool, r, script, module, NULL)) { /* @@ -14804,6 +14834,9 @@ static int wsgi_hook_logio(apr_pool_t *pconf, apr_pool_t *ptemp, static void wsgi_register_hooks(apr_pool_t *p) { + + INFO("%s", "enter"); + static const char * const p1[] = { "mod_alias.c", NULL }; static const char * const n1[]= { "mod_userdir.c", "mod_vhost_alias.c", NULL };
日志输出,对应于上面给出的apache配置文件:
[Fri Sep 30 14:22:20 2011] [error] [pid 21372] mod_wsgi.c:wsgi_hook_init:12937 enter [Fri Sep 30 14:22:20 2011] [error] [pid 21372] mod_wsgi.c:wsgi_register_hooks:14838 enter [Fri Sep 30 14:22:20 2011] [error] [pid 21373] mod_wsgi.c:wsgi_hook_init:12937 enter [Fri Sep 30 14:22:20 2011] [notice] Apache/2.2.17 (Ubuntu) mod_wsgi/3.3 Python/2.7.1+ configured -- resuming normal operations [Fri Sep 30 14:22:20 2011] [error] [pid 21377] mod_wsgi.c:wsgi_hook_child_init:13058 enter [Fri Sep 30 14:22:20 2011] [error] [pid 21377] mod_wsgi.c:wsgi_python_init:5727 enter [Fri Sep 30 14:22:20 2011] [error] [pid 21377] mod_wsgi.c:wsgi_python_init:5733 init python [Fri Sep 30 14:22:20 2011] [error] [pid 21378] mod_wsgi.c:wsgi_hook_child_init:13058 enter [Fri Sep 30 14:22:20 2011] [error] [pid 21378] mod_wsgi.c:wsgi_python_init:5727 enter [Fri Sep 30 14:22:20 2011] [error] [pid 21378] mod_wsgi.c:wsgi_python_init:5733 init python [Fri Sep 30 14:22:20 2011] [error] [pid 21377] mod_wsgi.c:wsgi_python_child_init:6712 init python further [Fri Sep 30 14:22:20 2011] [error] [pid 21378] mod_wsgi.c:wsgi_python_child_init:6712 init python further jaime@westeros:/var/www$ ps aux | grep apache2 jaime 20827 0.0 0.0 3928 508 pts/2 S+ 14:17 0:00 tail -f /var/log/apache2/error.log root 21373 0.0 0.1 10224 3036 ? Ss 14:22 0:00 /usr/sbin/apache2 -k start www-data 21377 0.0 0.3 234368 6752 ? Sl 14:22 0:00 /usr/sbin/apache2 -k start www-data 21378 0.0 0.3 234392 6500 ? Sl 14:22 0:00 /usr/sbin/apache2 -k start jaime 23119 0.0 0.0 4156 856 pts/3 S+ 16:37 0:00 grep --color=auto apache2
启动apache之后,在主进程21372中,执行wsgi_hook_init, wsgi_register_hooks, 其中wsgi_hook_init 在另一个进程中21373中也被执行了。 创建了两个子进程21377, 21378。每个进程都按顺序执行wsgi_hook_child_init, wsgi_python_init, wsgi_python_child_init。 此时,apache已经启动完成,python也已经初始化,但是解释器还没有创建。
第一次请求,由进程21377负责处理,创建了解释器,也加载了hello.wsgi:
[Fri Sep 30 14:22:29 2011] [error] [pid 21377] mod_wsgi.c:wsgi_hook_handler:8887 handler wsgi-script, file /var/www/hello.wsgi [Fri Sep 30 14:22:29 2011] [error] [pid 21377] mod_wsgi.c:wsgi_build_environment:8139 enter [Fri Sep 30 14:22:29 2011] [error] [pid 21377] mod_wsgi.c:wsgi_execute_script:6355 enter [Fri Sep 30 14:22:29 2011] [error] [pid 21377] mod_wsgi.c:wsgi_acquire_interpreter:5868 search interpreter 127.0.1.1|/hello [Fri Sep 30 14:22:29 2011] [error] [pid 21377] mod_wsgi.c:wsgi_acquire_interpreter:5905 create interpreter 127.0.1.1|/hello [Fri Sep 30 14:22:29 2011] [error] [pid 21377] mod_wsgi.c:wsgi_acquire_interpreter:5930 found interpreter 127.0.1.1|/hello [Fri Sep 30 14:22:29 2011] [info] [client 127.0.0.1] mod_wsgi (pid=21377, process='', application='127.0.1.1|/hello'): Loading WSGI script '/var/www/hello.wsgi'. [Fri Sep 30 14:22:29 2011] [error] [pid 21377] mod_wsgi.c:wsgi_execute_script:6561 app running [Fri Sep 30 14:22:29 2011] [error] [pid 21377] mod_wsgi.c:wsgi_hook_handler:8887 handler image/x-icon, file /var/www/favicon.ico [Fri Sep 30 14:22:29 2011] [error] [client 127.0.0.1] File does not exist: /var/www/favicon.ico
第二次请求,什么也不需要做,解释器使用原来的,代码也已经加载过了,cool:
[Fri Sep 30 14:22:36 2011] [error] [pid 21377] mod_wsgi.c:wsgi_hook_handler:8887 handler wsgi-script, file /var/www/hello.wsgi [Fri Sep 30 14:22:36 2011] [error] [pid 21377] mod_wsgi.c:wsgi_build_environment:8139 enter [Fri Sep 30 14:22:36 2011] [error] [pid 21377] mod_wsgi.c:wsgi_execute_script:6355 enter [Fri Sep 30 14:22:36 2011] [error] [pid 21377] mod_wsgi.c:wsgi_acquire_interpreter:5868 search interpreter 127.0.1.1|/hello [Fri Sep 30 14:22:36 2011] [error] [pid 21377] mod_wsgi.c:wsgi_acquire_interpreter:5930 found interpreter 127.0.1.1|/hello [Fri Sep 30 14:22:36 2011] [error] [pid 21377] mod_wsgi.c:wsgi_execute_script:6561 app running [Fri Sep 30 14:22:36 2011] [error] [pid 21377] mod_wsgi.c:wsgi_hook_handler:8887 handler image/x-icon, file /var/www/favicon.ico [Fri Sep 30 14:22:36 2011] [error] [client 127.0.0.1] File does not exist: /var/www/favicon.ico
第三次请求,修改了hello.wsgi,所以需要重新加载代码, reloading:
[Fri Sep 30 14:22:47 2011] [error] [pid 21377] mod_wsgi.c:wsgi_hook_handler:8887 handler wsgi-script, file /var/www/hello.wsgi [Fri Sep 30 14:22:47 2011] [error] [pid 21377] mod_wsgi.c:wsgi_build_environment:8139 enter [Fri Sep 30 14:22:47 2011] [error] [pid 21377] mod_wsgi.c:wsgi_execute_script:6355 enter [Fri Sep 30 14:22:47 2011] [error] [pid 21377] mod_wsgi.c:wsgi_acquire_interpreter:5868 search interpreter 127.0.1.1|/hello [Fri Sep 30 14:22:47 2011] [error] [pid 21377] mod_wsgi.c:wsgi_acquire_interpreter:5930 found interpreter 127.0.1.1|/hello [Fri Sep 30 14:22:47 2011] [info] [client 127.0.0.1] mod_wsgi (pid=21377, process='', application='127.0.1.1|/hello'): Reloading WSGI script '/var/www/hello.wsgi'. [Fri Sep 30 14:22:47 2011] [error] [pid 21377] mod_wsgi.c:wsgi_execute_script:6561 app running [Fri Sep 30 14:22:47 2011] [error] [pid 21377] mod_wsgi.c:wsgi_hook_handler:8887 handler image/x-icon, file /var/www/favicon.ico [Fri Sep 30 14:22:47 2011] [error] [client 127.0.0.1] File does not exist: /var/www/favicon.ico
虽然前三次请求都由21372执行,但我们确实观测到了21378:
[Fri Sep 30 14:41:37 2011] [error] [pid 21378] mod_wsgi.c:wsgi_hook_handler:8887 handler wsgi-script, file /var/www/hello.wsgi [Fri Sep 30 14:41:37 2011] [error] [pid 21378] mod_wsgi.c:wsgi_build_environment:8139 enter [Fri Sep 30 14:41:37 2011] [error] [pid 21378] mod_wsgi.c:wsgi_execute_script:6355 enter [Fri Sep 30 14:41:37 2011] [error] [pid 21378] mod_wsgi.c:wsgi_acquire_interpreter:5868 search interpreter 127.0.1.1|/hello [Fri Sep 30 14:41:37 2011] [error] [pid 21378] mod_wsgi.c:wsgi_acquire_interpreter:5905 create interpreter 127.0.1.1|/hello [Fri Sep 30 14:41:37 2011] [error] [pid 21378] mod_wsgi.c:wsgi_acquire_interpreter:5930 found interpreter 127.0.1.1|/hello [Fri Sep 30 14:41:37 2011] [info] [client 127.0.0.1] mod_wsgi (pid=21378, process='', application='127.0.1.1|/hello'): Loading WSGI script '/var/www/hello.wsgi'. [Fri Sep 30 14:41:37 2011] [error] [pid 21378] mod_wsgi.c:wsgi_execute_script:6561 app running [Fri Sep 30 14:41:37 2011] [error] [pid 21378] mod_wsgi.c:wsgi_hook_handler:8887 handler image/x-icon, file /var/www/favicon.ico [Fri Sep 30 14:41:37 2011] [error] [client 127.0.0.1] File does not exist: /var/www/favicon.ico
Notes:
- Python c api代码和apache c代码混在一起,其实只不过是对不同lib的变量进行操作罢了, 实际上都是c代码。当把libpython,libapache链接到本进程时,它们有各自的变量在全局空间里, 保存着自己的状态,其他的代码就是对这些变量的操作。 这部分解释了为什么mod_python, mod_wsgi会冲突,因为他们都链接了同一个库libpython, 如果协调 不善,则极易出问题。 http://code.google.com/p/modwsgi/wiki/InstallationIssues#Incompatible_ModPython_Versions
wsgi_daemon_index 存放process_group到socket的一个映射, 由进程组的名字, 可以找到该组 进程正在监听的socket, 这个socket是与daemon通信的关键, 在fork之前创建, fork之后所有的子进程 都可访问, daemon需要关掉所有不是本进程组的socket fd。
wsgi_daemon_lists 所有已启动的daemon进程列表。
在apache启动的时候, 由wsgi_hook_init 调用start_daemons,创建所有的daemons, 此后daemon的数量就是固定的了。
pid7838 wsgi_hook_init调用返回之后, apache 又fork起了一个子进程 pid 7843, 非root权限, 调用wsgi_hook_child_init,此进程 负责处理分发所有的请求, 对每个请求调用wsgi_hook_handler, 在wsgi_execute_remote中和真正的daemon进程通过 socket进行交互, 该apache子进程可以被成为modwsgi的dispatcher。pid 7842是一个daemon进程。
不管是embedded模式, 还是daemon模式, 最后都会走到wsgi_execute_script函数。
请求headers, 标准的CGI变量, 是通过r->subprocess_env传递到daemon进程中的, 参见wsgi_build_environment, wsgi_send_request。 对象r,从dispatcher到daemon, 跨越了不同的进程, 已经不是原来的r了, 这点需要注意。
daemon进程如果发现需要reload代码, 则会发送一个0 Rejected 消息给dispatcher, 然后杀掉自己。apache捕获到daemon子进程死掉的信号, 重新启动一个daemon process, 仍然监听同一个socket。
daemon如果发现一切正常, 不需要reload(新的daemon总是如此), 会发送0 Continue的消息给dispatcher, 告诉它可以go on了。
dispatcher如果收到0 Rejected信号, 会重新尝试连接,直到收到0 Continue或超出重试次数为止。实际上, 0 Continue可以被看作是一种同步机制。
[Sun Oct 30 13:00:17 2011] [error] [pid 7837] mod_wsgi.c:wsgi_hook_init:13658 enter [Sun Oct 30 13:00:17 2011] [error] [pid 7837] mod_wsgi.c:wsgi_register_hooks:15564 enter [Sun Oct 30 13:00:17 2011] [error] [pid 7838] mod_wsgi.c:wsgi_hook_init:13658 enter [Sun Oct 30 13:00:17 2011] [error] [pid 7838] mod_wsgi.c:wsgi_python_init:5817 enter [Sun Oct 30 13:00:17 2011] [error] [pid 7838] mod_wsgi.c:wsgi_python_init:5823 init python [Sun Oct 30 13:00:17 2011] [info] mod_wsgi (pid=7838): Python home /usr/local/sae/python. [Sun Oct 30 13:00:17 2011] [info] mod_wsgi (pid=7838): Initializing Python. [Sun Oct 30 13:00:17 2011] [error] [pid 7838] mod_wsgi.c:wsgi_start_daemons:11955 enter [Sun Oct 30 13:00:17 2011] [error] [pid 7838] mod_wsgi.c:wsgi_start_process:11540 enter [Sun Oct 30 13:00:17 2011] [error] [pid 7838] mod_wsgi.c:wsgi_start_process:11944 ok, we're father [Sun Oct 30 13:00:17 2011] [error] [pid 7838] mod_wsgi.c:wsgi_hook_init:13754 forking a new process to listen all connections, will call wsgi_hook_child_init [Sun Oct 30 13:00:17 2011] [warn] pid file /var/run/apache2.pid overwritten -- Unclean shutdown of previous Apache run? [Sun Oct 30 13:00:17 2011] [notice] Apache/2.2.17 (Ubuntu) mod_wsgi/3.3 Python/2.6.7 configured -- resuming normal operations [Sun Oct 30 13:00:17 2011] [info] Server built: Sep 1 2011 09:25:26 [Sun Oct 30 13:00:17 2011] [error] [pid 7843] mod_wsgi.c:wsgi_hook_child_init:13784 enter [Sun Oct 30 13:00:17 2011] [error] [pid 7843] mod_wsgi.c:wsgi_python_child_init:6883 init python further [Sun Oct 30 13:00:17 2011] [info] mod_wsgi (pid=7843): Attach interpreter ''. [Sun Oct 30 13:00:17 2011] [error] [pid 7842] mod_wsgi.c:wsgi_start_process:11558 ok in child, we're a new daemon process [Sun Oct 30 13:00:17 2011] [info] mod_wsgi (pid=7842): Starting process 'wic' with uid=1000, gid=1000 and threads=1. [Sun Oct 30 13:00:17 2011] [error] [pid 7842] mod_wsgi.c:wsgi_python_child_init:6883 init python further [Sun Oct 30 13:00:17 2011] [info] mod_wsgi (pid=7842): Attach interpreter ''. [Sun Oct 30 13:00:17 2011] [error] [pid 7842] mod_wsgi.c:wsgi_daemon_main:11276 enter [Sun Oct 30 13:00:17 2011] [error] [pid 7842] mod_wsgi.c:wsgi_daemon_main:11428 creating thread 0 [Sun Oct 30 13:00:17 2011] [error] [pid 7842] mod_wsgi.c:wsgi_daemon_thread:11119 enter [Sun Oct 30 13:00:17 2011] [error] [pid 7842] mod_wsgi.c:wsgi_daemon_worker:10887 enter [Sun Oct 30 13:00:17 2011] [error] [pid 7842] mod_wsgi.c:wsgi_monitor_thread:11181 enter [Sun Oct 30 13:00:17 2011] [error] [pid 7842] mod_wsgi.c:wsgi_monitor_thread:11203 check worker status