Skip to content

Latest commit

 

History

History
1016 lines (765 loc) · 39.8 KB

100.rst

File metadata and controls

1016 lines (765 loc) · 39.8 KB

100 Miles

mod_wsgi 流程简单分析: 一个嵌入python的例子

WSGI: 一个协议,描述通用服务器与python app之间的接口规范

wsgi app:遵守wsgi规范的python app

mod_wsgi: apache服务器的一个扩展模块, wsgi协议在apache服务器上的一个实现,有了它, 你就可以在apache上运行wsgi app

总的来说,WSGIScriptAlias 模式,python解释器被嵌入到apache进程当中,请求处理代码是在apache的 worker子进程中执行。WSGIDaemonProcess python解释器运行在单独的进程之中,和apache进程是隔离的。

mod_wsgi怎么完成python初始化?和apache关系怎样?一个简单的http请求进来之后, 处理流程大概是什么?下面将针对 WSGIScriptAlias 模式进行简要分析。

apache配置:

WSGIScriptAlias  /hello /var/www/hello.wsgi

告诉apache hello.wsgi是一个mod_wsgi app,所有 /hello/ 下面的请求都转发给它。

wsgi代码:

jaime@westeros:~/source/mod-wsgi-3.3$ ls
build-2.6  build-3.2     debian    Makefile.in  mod_wsgi.lo
posix-ap2X.mk.in   win32-ap22py31.mk
build-2.7  configure     LICENCE   mod_wsgi.c   mod_wsgi.slo      README
build-3.1  configure.ac  Makefile  mod_wsgi.la  posix-ap1X.mk.in
win32-ap22py26.mk

mod_wsgi.c有很多代码是关于apache 1.3的,和2.0代码有很多重名的函数,容易误导, 不便于阅读,可使用 unifdef 工具,将1.3相关的代码全部用空行替代,保留行号 的同时又清爽了许多:

jaime@westeros:~/source/mod-wsgi-3.3$ sudo apt-get install unifdef
jaime@westeros:~/source/mod-wsgi-3.3$ unifdef -DAP_SERVER_MAJORVERSION_NUMBER=2 -b mod_wsgi.c  > mod_wsgi-clean.c

apache模块的入口 mod_wsgi.c +15085

/* Dispatch list for API hooks */

module AP_MODULE_DECLARE_DATA wsgi_module = {
    STANDARD20_MODULE_STUFF,
    wsgi_create_dir_config,    /* create per-dir    config structures */
    wsgi_merge_dir_config,     /* merge  per-dir    config structures */
    wsgi_create_server_config, /* create per-server config structures */
    wsgi_merge_server_config,  /* merge  per-server config structures */
    wsgi_commands,             /* table of config file commands       */
    wsgi_register_hooks        /* register hooks                      */
};

配置选项对应的函数 mod_wsgi.c +14982:

static const command_rec wsgi_commands[] =
{
    AP_INIT_RAW_ARGS("WSGIScriptAlias", wsgi_add_script_alias,
        NULL, RSRC_CONF, "Map location to target WSGI script file."),
    ...

#if defined(MOD_WSGI_WITH_DAEMONS)
    AP_INIT_RAW_ARGS("WSGIDaemonProcess", wsgi_add_daemon_process,
        NULL, RSRC_CONF, "Specify details of daemon processes to start."),
    ...

    AP_INIT_TAKE1("WSGILazyInitialization", wsgi_set_lazy_initialization,
        NULL, RSRC_CONF, "Enable/Disable lazy Python initialization."),
#endif
    ...

};

wsgi_add_script_alias大致做了一些初始化的工作,告诉apache dispatcher留意了, 看到类似XXX的url,要调用我们来处理。

有意思的是这个 wsgi_register_hooks mod_wsgi.c +14931+:

static void wsgi_register_hooks(apr_pool_t *p)
{
    ...
    static const char * const p6[] = { "mod_python.c", NULL };

    ap_hook_post_config(wsgi_hook_init, p6, NULL, APR_HOOK_MIDDLE);
    ap_hook_child_init(wsgi_hook_child_init, p6, NULL, APR_HOOK_MIDDLE);

    ap_hook_translate_name(wsgi_hook_intercept, p1, n1, APR_HOOK_MIDDLE);
    ap_hook_handler(wsgi_hook_handler, NULL, NULL, APR_HOOK_MIDDLE);
    ...
}

从名字上看,wsgi_hook_init, wsgi_hook_child_init是做初始化工作的。 我们先看wsgi_hook_handler做了什么 mod_wsgi.c +8690:

static int wsgi_hook_handler(request_rec *r)
{
     ...

    /*
     * Only process requests for this module. First check for
     * where target is the actual WSGI script. Then need to
     * check for the case where handler name mapped to a handler
     * script definition.
     */
    // blablabla 一堆参数检查代码
    ...

    /* Build the sub process environment. */

    // wsgi协议相关环境变量在这里设置,每次请求都不一样
    // 故此处是每次请求的必经之地
    wsgi_build_environment(r);

    ...

    // WSGIDaemonProcess 模式处理代码
    /*
     * Execute the target WSGI application script or proxy
     * request to one of the daemon processes as appropriate.
     */

#if defined(MOD_WSGI_WITH_DAEMONS)
    status = wsgi_execute_remote(r);

    if (status != DECLINED)
        return status;
#endif

    ...

    return wsgi_execute_script(r);
}

wsgi_hook_handler 是每次请求的入口,最后调用wsgi_execute_script mod_wsgi.c +6404:

static int wsgi_execute_script(request_rec *r)
{

    ...

    /* Grab request configuration. */

    config = (WSGIRequestConfig *)ap_get_module_config(r->request_config,
                                                       &wsgi_module);

    /*
     * Acquire the desired python interpreter. Once this is done
     * it is safe to start manipulating python objects.
     */

    // 获得解释器,一个wsgi app可以运行在单独的python解释器里
    // 在一个进程里,可以有多个解释器同时运行

    // application_group 在 wsgi_application_group 函数中设置
    // 与req请求的servername,port,scriptname有关,每次请求对应于哪个解释器由它来决定
    interp = wsgi_acquire_interpreter(config->application_group);

    if (!interp) {
        ap_log_rerror(APLOG_MARK, WSGI_LOG_CRIT(0), r,
                      "mod_wsgi (pid=%d): Cannot acquire interpreter '%s'.",
                      getpid(), config->application_group);

        return HTTP_INTERNAL_SERVER_ERROR;
    }

    /* Calculate the Python module name to be used for script. */

    if (config->handler_script && *config->handler_script)
        script = config->handler_script;
    else
        script = r->filename;

    // 找到这个app的python模块名字
    name = wsgi_module_name(r->pool, script);
    ...

    modules = PyImport_GetModuleDict();
    module = PyDict_GetItemString(modules, name);

    Py_XINCREF(module);

    if (module)
        exists = 1;

    /*
     * If script reloading is enabled and the module for it has
     * previously been loaded, see if it has been modified since
     * the last time it was accessed. For a handler script will
     * also see if it contains a custom function for determining
     * if a reload should be performed.
     */

    // Reload相关代码,检测app代码是否被修改
    if (module && config->script_reloading) {
        if (wsgi_reload_required(r->pool, r, script, module, r->filename)) {

            ...

#if defined(MOD_WSGI_WITH_DAEMONS)
            if (*config->process_group) {
                /*
                 * Need to restart the daemon process. We bail
                 * out on the request process here, sending back
                 * a special response header indicating that
                 * process is being restarted and that remote
                 * end should abandon connection and attempt to
                 * reconnect again. We also need to signal this
                 * process so it will actually shutdown. The
                 * process supervisor code will ensure that it
                 * is restarted.
                 */

                Py_BEGIN_ALLOW_THREADS
                ap_log_rerror(APLOG_MARK, WSGI_LOG_INFO(0), r,
                             "mod_wsgi (pid=%d): Force restart of "
                             "process '%s'.", getpid(),
                             config->process_group);
                Py_END_ALLOW_THREADS
                ...

                wsgi_release_interpreter(interp);

                r->status = HTTP_INTERNAL_SERVER_ERROR;
                r->status_line = "0 Rejected";

                wsgi_daemon_shutdown++;

                // WSGIDaemonProcess 模式,杀掉当前daemon进程,重新加载
                kill(getpid(), SIGINT);

                return OK;
            }
            else {
            ...

                PyDict_DelItemString(modules, name);
            }
#else
            /*
             * Need to reload just the script module. Remove
             * the module from the modules dictionary before
             * reloading it again. If code is executing
             * within the module at the time, the callers
             * reference count on the module should ensure
             * it isn't actually destroyed until it is
             * finished.
             */

           // WSGIScriptAlias 模式,删除旧的模块
            PyDict_DelItemString(modules, name);
#endif
        }
    }
    ...

    // 如果是第一次请求,则需要加载该模块
    /* Load module if not already loaded. */

    if (!module) {
        module = wsgi_load_source(r->pool, r, name, exists, script,
                                  config->process_group,
                                  config->application_group);
    }
    ...


    // 激动人心的时刻到了,执行app代码!
    status = HTTP_INTERNAL_SERVER_ERROR;

    /* Determine if script exists and execute it. */
    if (module) {
        PyObject *module_dict = NULL;
        PyObject *object = NULL;

        module_dict = PyModule_GetDict(module);
        object = PyDict_GetItemString(module_dict, config->callable_object);

        if (object) {
            AdapterObject *adapter = NULL;
            adapter = newAdapterObject(r);

            if (adapter) {
                PyObject *method = NULL;
                PyObject *args = NULL;

                Py_INCREF(object);
                status = Adapter_run(adapter, object); // 这里,这里
                Py_DECREF(object);
                ...
        }
        else {
            Py_BEGIN_ALLOW_THREADS
            ap_log_rerror(APLOG_MARK, WSGI_LOG_ERR(0), r,
                          "mod_wsgi (pid=%d): Target WSGI script '%s' does "
                          "not contain WSGI application '%s'.",
                          getpid(), script, config->callable_object);
            Py_END_ALLOW_THREADS

            status = HTTP_NOT_FOUND;
        }
    }

    // 错误处理
    /* Log any details of exceptions if execution failed. */

    if (PyErr_Occurred())
        wsgi_log_python_error(r, NULL, r->filename);

    /* Cleanup and release interpreter, */

    Py_XDECREF(module);

    wsgi_release_interpreter(interp);

    return status;
}

Adapter_run +3823:

static int Adapter_run(AdapterObject *self, PyObject *object)
{
    ...

    vars = Adapter_environ(self);

    // 获取 start_response 函数
    start = PyObject_GetAttrString((PyObject *)self, "start_response");

    // 准备参数,还记得 def application(environ, start_response) 吗?
    args = Py_BuildValue("(OO)", vars, start);

    // 执行app代码
    self->sequence = PyEval_CallObject(object, args);

    if (self->sequence != NULL) {
        if (!Adapter_process_file_wrapper(self)) {
            int aborted = 0;

            iterator = PyObject_GetIter(self->sequence);

            if (iterator != NULL) {
                PyObject *item = NULL;

                // 遍历返回的iterator,输出每一行
                while ((item = PyIter_Next(iterator))) {
                    ...

                    if (length && !Adapter_output(self, msg, length, 0)) {
                        if (!PyErr_Occurred())
                            aborted = 1;
                        Py_DECREF(item);
                        break;
                    }

                }
            }
            ...

        }


        // 如果返回的seq有close方法则调用
        if (PyObject_HasAttrString(self->sequence, "close")) {
            PyObject *args = NULL;
            PyObject *data = NULL;

            close = PyObject_GetAttrString(self->sequence, "close");

            args = Py_BuildValue("()");
            data = PyEval_CallObject(close, args);

            Py_DECREF(args);
            Py_XDECREF(data);
            Py_DECREF(close);
        }
        ...

    }
    ...

}

AdapterObject 是自定义的python类型,用来运行wsgi程序,含有start_response方法:

typedef struct {
        PyObject_HEAD
        int result;
        request_rec \*r;
#if defined(MOD_WSGI_WITH_BUCKETS)
        apr_bucket_brigade \*bb;
#endif
        WSGIRequestConfig \*config;
        InputObject \*input;
        PyObject \*log;
        int status;
        const char \*status_line;
        PyObject \*headers;
        PyObject \*sequence;
        int content_length_set;
        apr_off_t content_length;
        apr_off_t output_length;
} AdapterObject;

static PyTypeObject Adapter_Type;
...
static PyMethodDef Adapter_methods[] = {
    { "start_response", (PyCFunction)Adapter_start_response, METH_VARARGS, 0 },
    { "write",          (PyCFunction)Adapter_write, METH_VARARGS, 0 },
    { "file_wrapper",   (PyCFunction)Adapter_file_wrapper, METH_VARARGS, 0 },
    { NULL, NULL}
};

Adapter_xxx 系列函数,是wsgi协议的具体实现。我承认,前面说的在wsgi_build_environment中设置wsgi相关变量的说法有不对 的地方,大多数变量是在 Adapter_environ 中设置的:)

Adapter_start_response C实现的start_response

如何获得解释器?:

static InterpreterObject *wsgi_acquire_interpreter(const char *name)
{
    PyThreadState *tstate = NULL;
    PyInterpreterState *interp = NULL;
    InterpreterObject *handle = NULL;
    ...

    /*
     * Check if already have interpreter instance and
     * if not need to create one.
     */

    handle = (InterpreterObject *)PyDict_GetItemString(wsgi_interpreters,
                                                       name);

    if (!handle) {
        // 如果没有查找到解释器,新解释器在这里被创建
        handle = newInterpreterObject(name);
        ...

        // 存储到 wsgi_interpreters
        PyDict_SetItemString(wsgi_interpreters, name, (PyObject *)handle);
    }
    else
        Py_INCREF(handle);

    interp = handle->interp;

    /*
     * Create new thread state object. We should only be
     * getting called where no current active thread
     * state, so no need to remember the old one. When
     * working with the main Python interpreter always
     * use the simplified API for GIL locking so any
     * extension modules which use that will still work.
     */
    // thread 相关代码
    ...

    return handle;
}

加载app代码在wsgi_load_source函数:

static PyObject *wsgi_load_source(apr_pool_t *pool, request_rec *r,
                                  const char *name, int exists,
                                  const char* filename,
                                  const char *process_group,
                                  const char *application_group)
{
    ...

    fp = fopen(filename, "r");

    n = PyParser_SimpleParseFile(fp, filename, Py_file_input);
    ...

    co = (PyObject *)PyNode_Compile(n, filename);
    PyNode_Free(n);

    // 根据文件名字name,编译过的代码co,加载该模块
    if (co)
        m = PyImport_ExecCodeModuleEx((char *)name, co, (char *)filename);

    Py_XDECREF(co);

    if (m) {
        ...
        // 设置模块修改时间
        PyModule_AddObject(m, "__mtime__", object);
    }
    else {
        Py_BEGIN_ALLOW_THREADS
        if (r) {
            ap_log_rerror(APLOG_MARK, WSGI_LOG_ERR(0), r,
                          "mod_wsgi (pid=%d): Target WSGI script '%s' cannot "
                          "be loaded as Python module.", getpid(), filename);
        }
        ...
        wsgi_log_python_error(r, NULL, filename);
    }

    return m;
}

以上即是WSGIScriptAlias模式下,一个请求收到之后,apache调用wsgi_hook_handler, mod_wsgi的大致处理流程。还有一个问题,python环境到底是在什么时候初始化的呢? 让我们回头看。

wsgi_hook_init mod_wsgi.c +13031:

static int wsgi_hook_init(apr_pool_t *pconf, apr_pool_t *ptemp,
                          apr_pool_t *plog, server_rec *s)
{

    ...

    /* Retain reference to base server. */

    wsgi_server = s;

    /* Retain record of parent process ID. */

    wsgi_parent_pid = getpid();

    /* Determine whether multiprocess and/or multithread. */

    ap_mpm_query(AP_MPMQ_IS_THREADED, &wsgi_multithread);
    wsgi_multithread = (wsgi_multithread != AP_MPMQ_NOT_SUPPORTED);

    ap_mpm_query(AP_MPMQ_IS_FORKED, &wsgi_multiprocess);
    if (wsgi_multiprocess != AP_MPMQ_NOT_SUPPORTED) {
        ap_mpm_query(AP_MPMQ_MAX_DAEMONS, &wsgi_multiprocess);
        wsgi_multiprocess = (wsgi_multiprocess != 1);
    }

    /* Retain reference to main server config. */

    wsgi_server_config = ap_get_module_config(s->module_config, &wsgi_module);

    /*
     * Check that the version of Python found at
     * runtime is what was used at compilation.
     */

    wsgi_python_version();

    /*
     * Initialise Python if required to be done in
     * the parent process. Note that it will not be
     * initialised if mod_python loaded and it has
     * already been done.
     */

    if (wsgi_python_required == -1)
        wsgi_python_required = 1;

    // 在哪里初始化python,取决于 wsgi_python_after_fork 即 WSGILazyInitialization 选项
    // 是在apache进程fork之前,还是之后?
    if (!wsgi_python_after_fork)
        wsgi_python_init(pconf);

    /* Startup separate named daemon processes. */

    // WSGIDaemonProcess 模式下启动daemon进程,要探索daemon模式的奥秘,这里即是入口
#if defined(MOD_WSGI_WITH_DAEMONS)
    status = wsgi_start_daemons(pconf);
#endif

    return status;
}

fork 之后的初始化函数:

static void wsgi_hook_child_init(apr_pool_t *p, server_rec *s)
{
    ...

    // wsgi_python_required 取决于 WSGIRestrictEmbedded 选项
    if (wsgi_python_required) {
        /*
         * Initialise Python if required to be done in
         * the child process. Note that it will not be
         * initialised if mod_python loaded and it has
         * already been done.
         */

        if (wsgi_python_after_fork)
            wsgi_python_init(p);

        /*
         * Now perform additional initialisation steps
         * always done in child process.
         */

        wsgi_python_child_init(p);
    }
}

这两个只是和apache相关的,由apache调用的hook初始化,真正的python初始化在 wsgi_python_init, wsgi_python_child_init 两步初始化:

static void wsgi_python_init(apr_pool_t *p)
{

    static int initialized = 1;


    /* Perform initialisation if required. */

    if (!Py_IsInitialized() || !initialized) {
        ...


        /* Initialise Python. */

        ap_log_error(APLOG_MARK, WSGI_LOG_INFO(0), wsgi_server,
                     "mod_wsgi (pid=%d): Initializing Python.", getpid());

        initialized = 1;

        Py_Initialize(); // 神秘而又强大的 Py_Initialize

        /* Initialise threading. */

        PyEval_InitThreads();
#if PY_MAJOR_VERSION == 3 && PY_MINOR_VERSION >= 2
        /*
     * We now want to release the GIL. Before we do that
     * though we remember what the current thread state is.
     * We will use that later to restore the main thread
     * state when we want to cleanup interpreters on
     * shutdown.
         */

        wsgi_main_tstate = PyThreadState_Get();
        PyEval_ReleaseThread(wsgi_main_tstate);
#else
        PyThreadState_Swap(NULL);
        PyEval_ReleaseLock();
#endif

        wsgi_python_initialized = 1;

        /*
         * Register cleanups to be performed on parent restart
         * or shutdown. This will destroy Python itself.
         */

        apr_pool_cleanup_register(p, NULL, wsgi_python_parent_cleanup,
                                  apr_pool_cleanup_null);

    }
}


static void wsgi_python_child_init(apr_pool_t *p)
{

    // 第二步初始化所做的工作, 此时已经fork了

    /*
     * Trigger any special Python stuff required after a fork.
     * Only do this though if we were responsible for the
     * initialisation of the Python interpreter in the first
     * place to avoid it being done multiple times. Also only
     * do it if Python was initialised in parent process.
     */

    /* Finalise any Python objects required by child process. */

    /* Initialise Python interpreter instance table and lock. */

    // 存放所有解释器的字典
    wsgi_interpreters = PyDict_New();

    /*
     * Initialise the key for data related to a thread. At
     * the moment we only record an integer thread ID to be
     * used in lookup table to thread states associated with
     * an interprter.
     */

    /*
     * Cache a reference to the first Python interpreter
     * instance. This interpreter is special as some third party
     * Python modules will only work when used from within this
     * interpreter. This is generally when they use the Python
     * simplified GIL API or otherwise don't use threading API
     * properly. An empty string for name is used to identify
     * the first Python interpreter instance.
     */

    /* Loop through import scripts for this process and load them. */

    // 处理wsgi_import_list
    if (wsgi_import_list) {
        ...
    }
}

ha, 终于快完了,现在,让我们打印一些有趣的输出,来看一看这些函数在什么时间, 哪个进程被调用。注意,下面的patch针对没有使用过 unifdef 的代码:

diff --git a/mod_wsgi.c b/mod_wsgi.c
index f0764b8..1781f7b 100644
--- a/mod_wsgi.c
+++ b/mod_wsgi.c
@@ -29,6 +29,8 @@
  *
  */

+#define INFO(fmt, args...) ap_log_error(APLOG_MARK, WSGI_LOG_ERR(0), wsgi_server, "[pid %d] %s:%s:%d "fmt, getpid(),__FILE__, __PRETTY_FUNCTION__, __LINE__,args)
+
 #define CORE_PRIVATE 1

 #include "httpd.h"
@@ -5722,10 +5724,14 @@ static void wsgi_python_init(apr_pool_t *p)
     static int initialized = 1;
 #endif

+    INFO("%s", "enter");
+
     /* Perform initialisation if required. */

     if (!Py_IsInitialized() || !initialized) {

+        INFO("%s", "init python");
+
         /* Enable Python 3.0 migration warnings. */

 #if PY_MAJOR_VERSION == 2 && PY_MINOR_VERSION >= 6
@@ -5859,6 +5865,8 @@ static PyObject *wsgi_interpreters = NULL;

 static InterpreterObject *wsgi_acquire_interpreter(const char *name)
 {
+    INFO("search interpreter %s", name);
+
     PyThreadState *tstate = NULL;
     PyInterpreterState *interp = NULL;
     InterpreterObject *handle = NULL;
@@ -5893,6 +5901,9 @@ static InterpreterObject *wsgi_acquire_interpreter(const char *name)
                                                        name);

     if (!handle) {
+
+        INFO("create interpreter %s", name);
+
         handle = newInterpreterObject(name);

         if (!handle) {
@@ -5916,6 +5927,8 @@ static InterpreterObject *wsgi_acquire_interpreter(const char *name)
     else
         Py_INCREF(handle);

+    INFO("found interpreter %s", name);
+
     interp = handle->interp;

     /*
@@ -6339,6 +6352,8 @@ static int wsgi_execute_script(request_rec *r)
      * it is safe to start manipulating python objects.
      */

+    INFO("%s", "enter");
+
     interp = wsgi_acquire_interpreter(config->application_group);

     if (!interp) {
@@ -6543,6 +6558,7 @@ static int wsgi_execute_script(request_rec *r)
                 PyObject *method = NULL;
                 PyObject *args = NULL;

+                INFO("%s", "app running");
                 Py_INCREF(object);
                 status = Adapter_run(adapter, object);
                 Py_DECREF(object);
@@ -6693,6 +6709,8 @@ static void wsgi_python_child_init(apr_pool_t *p)
     int thread_id = 0;
     int *thread_handle = NULL;

+    INFO("%s", "init python further");
+
     /* Working with Python, so must acquire GIL. */

     state = PyGILState_Ensure();
@@ -6778,6 +6796,9 @@ static void wsgi_python_child_init(apr_pool_t *p)
     /* Loop through import scripts for this process and load them. */

     if (wsgi_import_list) {
+
+        INFO("%s", "dealing with wsgi_import_list");
+
         apr_array_header_t *scripts = NULL;

         WSGIScriptFile *entries;
@@ -8115,6 +8136,7 @@ static void wsgi_log_script_error(request_rec *r, const char *e, const char *n)

 static void wsgi_build_environment(request_rec *r)
 {
+    INFO("%s", "enter");
     WSGIRequestConfig *config = NULL;

     const char *value = NULL;
@@ -8862,6 +8884,7 @@ static int wsgi_hook_handler(request_rec *r)
     if (!r->handler)
         return DECLINED;

+    INFO("handler %s, file %s", r->handler, r->filename);
     /*
      * Construct request configuration and cache it in the
      * request object against this module so can access it later
@@ -9082,6 +9105,7 @@ static int wsgi_hook_handler(request_rec *r)

 #if AP_SERVER_MAJORVERSION_NUMBER < 2

+
 /*
  * Apache 1.3 module initialisation functions.
  */
@@ -12909,6 +12933,9 @@ static int wsgi_hook_daemon_handler(conn_rec *c)
 static int wsgi_hook_init(apr_pool_t *pconf, apr_pool_t *ptemp,
                           apr_pool_t *plog, server_rec *s)
 {
+
+    INFO("%s", "enter");
+
     void *data = NULL;
     const char *userdata_key = "wsgi_init";
     char package[128];
@@ -13028,6 +13055,8 @@ static void wsgi_hook_child_init(apr_pool_t *p, server_rec *s)
     }
 #endif

+    INFO("%s", "enter");
+
     if (wsgi_python_required) {
         /*
          * Initialise Python if required to be done in
@@ -13500,6 +13529,7 @@ static authn_status wsgi_check_password(request_rec *r, const char *user,
      * the last time it was accessed.
      */

+    /* FIXME: Reloading */
     if (module && config->script_reloading) {
         if (wsgi_reload_required(r->pool, r, script, module, NULL)) {
             /*
@@ -14804,6 +14834,9 @@ static int wsgi_hook_logio(apr_pool_t *pconf, apr_pool_t *ptemp,

 static void wsgi_register_hooks(apr_pool_t *p)
 {
+
+    INFO("%s", "enter");
+
     static const char * const p1[] = { "mod_alias.c", NULL };
     static const char * const n1[]= { "mod_userdir.c",
                                       "mod_vhost_alias.c", NULL };

日志输出,对应于上面给出的apache配置文件:

[Fri Sep 30 14:22:20 2011] [error] [pid 21372] mod_wsgi.c:wsgi_hook_init:12937 enter
[Fri Sep 30 14:22:20 2011] [error] [pid 21372] mod_wsgi.c:wsgi_register_hooks:14838 enter
[Fri Sep 30 14:22:20 2011] [error] [pid 21373] mod_wsgi.c:wsgi_hook_init:12937 enter
[Fri Sep 30 14:22:20 2011] [notice] Apache/2.2.17 (Ubuntu) mod_wsgi/3.3 Python/2.7.1+ configured -- resuming normal operations
[Fri Sep 30 14:22:20 2011] [error] [pid 21377] mod_wsgi.c:wsgi_hook_child_init:13058 enter
[Fri Sep 30 14:22:20 2011] [error] [pid 21377] mod_wsgi.c:wsgi_python_init:5727 enter
[Fri Sep 30 14:22:20 2011] [error] [pid 21377] mod_wsgi.c:wsgi_python_init:5733 init python
[Fri Sep 30 14:22:20 2011] [error] [pid 21378] mod_wsgi.c:wsgi_hook_child_init:13058 enter
[Fri Sep 30 14:22:20 2011] [error] [pid 21378] mod_wsgi.c:wsgi_python_init:5727 enter
[Fri Sep 30 14:22:20 2011] [error] [pid 21378] mod_wsgi.c:wsgi_python_init:5733 init python
[Fri Sep 30 14:22:20 2011] [error] [pid 21377] mod_wsgi.c:wsgi_python_child_init:6712 init python further
[Fri Sep 30 14:22:20 2011] [error] [pid 21378] mod_wsgi.c:wsgi_python_child_init:6712 init python further

jaime@westeros:/var/www$ ps aux | grep apache2
jaime    20827  0.0  0.0   3928   508 pts/2    S+   14:17   0:00 tail -f /var/log/apache2/error.log
root     21373  0.0  0.1  10224  3036 ?        Ss   14:22   0:00 /usr/sbin/apache2 -k start
www-data 21377  0.0  0.3 234368  6752 ?        Sl   14:22   0:00 /usr/sbin/apache2 -k start
www-data 21378  0.0  0.3 234392  6500 ?        Sl   14:22   0:00 /usr/sbin/apache2 -k start
jaime    23119  0.0  0.0   4156   856 pts/3    S+   16:37   0:00 grep --color=auto apache2

启动apache之后,在主进程21372中,执行wsgi_hook_init, wsgi_register_hooks, 其中wsgi_hook_init 在另一个进程中21373中也被执行了。 创建了两个子进程21377, 21378。每个进程都按顺序执行wsgi_hook_child_init, wsgi_python_init, wsgi_python_child_init。 此时,apache已经启动完成,python也已经初始化,但是解释器还没有创建。

第一次请求,由进程21377负责处理,创建了解释器,也加载了hello.wsgi:

[Fri Sep 30 14:22:29 2011] [error] [pid 21377] mod_wsgi.c:wsgi_hook_handler:8887 handler wsgi-script, file /var/www/hello.wsgi
[Fri Sep 30 14:22:29 2011] [error] [pid 21377] mod_wsgi.c:wsgi_build_environment:8139 enter
[Fri Sep 30 14:22:29 2011] [error] [pid 21377] mod_wsgi.c:wsgi_execute_script:6355 enter
[Fri Sep 30 14:22:29 2011] [error] [pid 21377] mod_wsgi.c:wsgi_acquire_interpreter:5868 search interpreter 127.0.1.1|/hello
[Fri Sep 30 14:22:29 2011] [error] [pid 21377] mod_wsgi.c:wsgi_acquire_interpreter:5905 create interpreter 127.0.1.1|/hello
[Fri Sep 30 14:22:29 2011] [error] [pid 21377] mod_wsgi.c:wsgi_acquire_interpreter:5930 found interpreter 127.0.1.1|/hello
[Fri Sep 30 14:22:29 2011] [info] [client 127.0.0.1] mod_wsgi (pid=21377, process='', application='127.0.1.1|/hello'): Loading WSGI script '/var/www/hello.wsgi'.
[Fri Sep 30 14:22:29 2011] [error] [pid 21377] mod_wsgi.c:wsgi_execute_script:6561 app running
[Fri Sep 30 14:22:29 2011] [error] [pid 21377] mod_wsgi.c:wsgi_hook_handler:8887 handler image/x-icon, file /var/www/favicon.ico
[Fri Sep 30 14:22:29 2011] [error] [client 127.0.0.1] File does not exist: /var/www/favicon.ico

第二次请求,什么也不需要做,解释器使用原来的,代码也已经加载过了,cool:

[Fri Sep 30 14:22:36 2011] [error] [pid 21377] mod_wsgi.c:wsgi_hook_handler:8887 handler wsgi-script, file /var/www/hello.wsgi
[Fri Sep 30 14:22:36 2011] [error] [pid 21377] mod_wsgi.c:wsgi_build_environment:8139 enter
[Fri Sep 30 14:22:36 2011] [error] [pid 21377] mod_wsgi.c:wsgi_execute_script:6355 enter
[Fri Sep 30 14:22:36 2011] [error] [pid 21377] mod_wsgi.c:wsgi_acquire_interpreter:5868 search interpreter 127.0.1.1|/hello
[Fri Sep 30 14:22:36 2011] [error] [pid 21377] mod_wsgi.c:wsgi_acquire_interpreter:5930 found interpreter 127.0.1.1|/hello
[Fri Sep 30 14:22:36 2011] [error] [pid 21377] mod_wsgi.c:wsgi_execute_script:6561 app running
[Fri Sep 30 14:22:36 2011] [error] [pid 21377] mod_wsgi.c:wsgi_hook_handler:8887 handler image/x-icon, file /var/www/favicon.ico
[Fri Sep 30 14:22:36 2011] [error] [client 127.0.0.1] File does not exist: /var/www/favicon.ico

第三次请求,修改了hello.wsgi,所以需要重新加载代码, reloading:

[Fri Sep 30 14:22:47 2011] [error] [pid 21377] mod_wsgi.c:wsgi_hook_handler:8887 handler wsgi-script, file /var/www/hello.wsgi
[Fri Sep 30 14:22:47 2011] [error] [pid 21377] mod_wsgi.c:wsgi_build_environment:8139 enter
[Fri Sep 30 14:22:47 2011] [error] [pid 21377] mod_wsgi.c:wsgi_execute_script:6355 enter
[Fri Sep 30 14:22:47 2011] [error] [pid 21377] mod_wsgi.c:wsgi_acquire_interpreter:5868 search interpreter 127.0.1.1|/hello
[Fri Sep 30 14:22:47 2011] [error] [pid 21377] mod_wsgi.c:wsgi_acquire_interpreter:5930 found interpreter 127.0.1.1|/hello
[Fri Sep 30 14:22:47 2011] [info] [client 127.0.0.1] mod_wsgi (pid=21377, process='', application='127.0.1.1|/hello'): Reloading WSGI script '/var/www/hello.wsgi'.
[Fri Sep 30 14:22:47 2011] [error] [pid 21377] mod_wsgi.c:wsgi_execute_script:6561 app running
[Fri Sep 30 14:22:47 2011] [error] [pid 21377] mod_wsgi.c:wsgi_hook_handler:8887 handler image/x-icon, file /var/www/favicon.ico
[Fri Sep 30 14:22:47 2011] [error] [client 127.0.0.1] File does not exist: /var/www/favicon.ico

虽然前三次请求都由21372执行,但我们确实观测到了21378:

[Fri Sep 30 14:41:37 2011] [error] [pid 21378] mod_wsgi.c:wsgi_hook_handler:8887 handler wsgi-script, file /var/www/hello.wsgi
[Fri Sep 30 14:41:37 2011] [error] [pid 21378] mod_wsgi.c:wsgi_build_environment:8139 enter
[Fri Sep 30 14:41:37 2011] [error] [pid 21378] mod_wsgi.c:wsgi_execute_script:6355 enter
[Fri Sep 30 14:41:37 2011] [error] [pid 21378] mod_wsgi.c:wsgi_acquire_interpreter:5868 search interpreter 127.0.1.1|/hello
[Fri Sep 30 14:41:37 2011] [error] [pid 21378] mod_wsgi.c:wsgi_acquire_interpreter:5905 create interpreter 127.0.1.1|/hello
[Fri Sep 30 14:41:37 2011] [error] [pid 21378] mod_wsgi.c:wsgi_acquire_interpreter:5930 found interpreter 127.0.1.1|/hello
[Fri Sep 30 14:41:37 2011] [info] [client 127.0.0.1] mod_wsgi (pid=21378, process='', application='127.0.1.1|/hello'): Loading WSGI script '/var/www/hello.wsgi'.
[Fri Sep 30 14:41:37 2011] [error] [pid 21378] mod_wsgi.c:wsgi_execute_script:6561 app running
[Fri Sep 30 14:41:37 2011] [error] [pid 21378] mod_wsgi.c:wsgi_hook_handler:8887 handler image/x-icon, file /var/www/favicon.ico
[Fri Sep 30 14:41:37 2011] [error] [client 127.0.0.1] File does not exist: /var/www/favicon.ico

Notes:

  • Python c api代码和apache c代码混在一起,其实只不过是对不同lib的变量进行操作罢了, 实际上都是c代码。当把libpython,libapache链接到本进程时,它们有各自的变量在全局空间里, 保存着自己的状态,其他的代码就是对这些变量的操作。 这部分解释了为什么mod_python, mod_wsgi会冲突,因为他们都链接了同一个库libpython, 如果协调 不善,则极易出问题。 http://code.google.com/p/modwsgi/wiki/InstallationIssues#Incompatible_ModPython_Versions

daemon模式备忘

wsgi_daemon_index 存放process_group到socket的一个映射, 由进程组的名字, 可以找到该组 进程正在监听的socket, 这个socket是与daemon通信的关键, 在fork之前创建, fork之后所有的子进程 都可访问, daemon需要关掉所有不是本进程组的socket fd。

wsgi_daemon_lists 所有已启动的daemon进程列表。

在apache启动的时候, 由wsgi_hook_init 调用start_daemons,创建所有的daemons, 此后daemon的数量就是固定的了。

pid7838 wsgi_hook_init调用返回之后, apache 又fork起了一个子进程 pid 7843, 非root权限, 调用wsgi_hook_child_init,此进程 负责处理分发所有的请求, 对每个请求调用wsgi_hook_handler, 在wsgi_execute_remote中和真正的daemon进程通过 socket进行交互, 该apache子进程可以被成为modwsgi的dispatcher。pid 7842是一个daemon进程。

不管是embedded模式, 还是daemon模式, 最后都会走到wsgi_execute_script函数。

请求headers, 标准的CGI变量, 是通过r->subprocess_env传递到daemon进程中的, 参见wsgi_build_environment, wsgi_send_request。 对象r,从dispatcher到daemon, 跨越了不同的进程, 已经不是原来的r了, 这点需要注意。

daemon进程如果发现需要reload代码, 则会发送一个0 Rejected 消息给dispatcher, 然后杀掉自己。apache捕获到daemon子进程死掉的信号, 重新启动一个daemon process, 仍然监听同一个socket。

daemon如果发现一切正常, 不需要reload(新的daemon总是如此), 会发送0 Continue的消息给dispatcher, 告诉它可以go on了。

dispatcher如果收到0 Rejected信号, 会重新尝试连接,直到收到0 Continue或超出重试次数为止。实际上, 0 Continue可以被看作是一种同步机制。

[Sun Oct 30 13:00:17 2011] [error] [pid 7837] mod_wsgi.c:wsgi_hook_init:13658 enter
[Sun Oct 30 13:00:17 2011] [error] [pid 7837] mod_wsgi.c:wsgi_register_hooks:15564 enter
[Sun Oct 30 13:00:17 2011] [error] [pid 7838] mod_wsgi.c:wsgi_hook_init:13658 enter
[Sun Oct 30 13:00:17 2011] [error] [pid 7838] mod_wsgi.c:wsgi_python_init:5817 enter
[Sun Oct 30 13:00:17 2011] [error] [pid 7838] mod_wsgi.c:wsgi_python_init:5823 init python
[Sun Oct 30 13:00:17 2011] [info] mod_wsgi (pid=7838): Python home /usr/local/sae/python.
[Sun Oct 30 13:00:17 2011] [info] mod_wsgi (pid=7838): Initializing Python.
[Sun Oct 30 13:00:17 2011] [error] [pid 7838] mod_wsgi.c:wsgi_start_daemons:11955 enter
[Sun Oct 30 13:00:17 2011] [error] [pid 7838] mod_wsgi.c:wsgi_start_process:11540 enter
[Sun Oct 30 13:00:17 2011] [error] [pid 7838] mod_wsgi.c:wsgi_start_process:11944 ok, we're father
[Sun Oct 30 13:00:17 2011] [error] [pid 7838] mod_wsgi.c:wsgi_hook_init:13754 forking a new process to listen all connections, will call wsgi_hook_child_init
[Sun Oct 30 13:00:17 2011] [warn] pid file /var/run/apache2.pid overwritten -- Unclean shutdown of previous Apache run?
[Sun Oct 30 13:00:17 2011] [notice] Apache/2.2.17 (Ubuntu) mod_wsgi/3.3 Python/2.6.7 configured -- resuming normal operations
[Sun Oct 30 13:00:17 2011] [info] Server built: Sep  1 2011 09:25:26
[Sun Oct 30 13:00:17 2011] [error] [pid 7843] mod_wsgi.c:wsgi_hook_child_init:13784 enter
[Sun Oct 30 13:00:17 2011] [error] [pid 7843] mod_wsgi.c:wsgi_python_child_init:6883 init python further
[Sun Oct 30 13:00:17 2011] [info] mod_wsgi (pid=7843): Attach interpreter ''.
[Sun Oct 30 13:00:17 2011] [error] [pid 7842] mod_wsgi.c:wsgi_start_process:11558 ok in child, we're a new daemon process
[Sun Oct 30 13:00:17 2011] [info] mod_wsgi (pid=7842): Starting process 'wic' with uid=1000, gid=1000 and threads=1.
[Sun Oct 30 13:00:17 2011] [error] [pid 7842] mod_wsgi.c:wsgi_python_child_init:6883 init python further
[Sun Oct 30 13:00:17 2011] [info] mod_wsgi (pid=7842): Attach interpreter ''.
[Sun Oct 30 13:00:17 2011] [error] [pid 7842] mod_wsgi.c:wsgi_daemon_main:11276 enter
[Sun Oct 30 13:00:17 2011] [error] [pid 7842] mod_wsgi.c:wsgi_daemon_main:11428 creating thread 0
[Sun Oct 30 13:00:17 2011] [error] [pid 7842] mod_wsgi.c:wsgi_daemon_thread:11119 enter
[Sun Oct 30 13:00:17 2011] [error] [pid 7842] mod_wsgi.c:wsgi_daemon_worker:10887 enter
[Sun Oct 30 13:00:17 2011] [error] [pid 7842] mod_wsgi.c:wsgi_monitor_thread:11181 enter
[Sun Oct 30 13:00:17 2011] [error] [pid 7842] mod_wsgi.c:wsgi_monitor_thread:11203 check worker status