SWIG 内存释放¶

日期	2012-04-22（最后修改），2008-12-30（创建）

简介¶

食谱描述¶

此食谱描述了在相应的 Python numpy 数组对象被销毁时，自动释放通过 `malloc()` 调用在 C 中分配的内存块。该食谱使用 SWIG 和一个修改后的 `numpy.i` 辅助文件。

更具体地说，在现有的 `numpy.i` 中添加了新的片段来处理自动释放数组的内存，这些数组的大小事先未知。与原始片段一样，可以通过调用 `PyArray_SimpleNewFromData()` 将 `malloc()` 内存块转换为返回的 numpy python 对象。但是，返回的 python 对象是使用 `PyCObject_FromVoidPtr()` 创建的，这确保了当 Python 对象被销毁时，分配的内存会自动释放。下面的示例展示了如何使用这些新片段来避免内存泄漏。

由于新的片段基于 `_ARGOUTVIEW_` 片段，因此选择了名称 `_ARGOUTVIEWM_`，其中 `M` 代表托管。所有托管片段（ARRAY1、2 和 3、FARRAY1、2 和 3）都已实现，并且现在已通过了广泛的测试。

从哪里获取文件¶

目前，修改后的 numpy.i 文件可在此处获得（最后更新于 2012-04-22）：* http://ezwidgets.googlecode.com/svn/trunk/numpy/numpy.i * http://ezwidgets.googlecode.com/svn/trunk/numpy/pyfragments.swg

代码是如何产生的¶

原始的内存释放代码由 Travis Oliphant 编写（参见 http://blog.enthought.com/?p=62 ），据我所知，这些聪明的人是第一个在 swig 文件中使用它的人（参见 http://niftilib.sourceforge.net/pynifti，文件 nifticlib.i）。Lisandro Dalcin 随后指出了一种使用 CObjects 的简化实现，Travis 在这篇更新的博客文章中详细介绍了它。

如何使用新的片段¶

重要步骤¶

在 yourfile.i 中，%init 函数使用您已经知道的相同的 `import_array()` 调用

在 [ ]

%init %{
    import_array();
%}

... 然后只需使用 ARGOUTVIEWM_ARRAY1 而不是 ARGOUTVIEW_ARRAY1，内存释放将在 Python 数组销毁时自动处理（参见下面的示例）。

一个简单的 ARGOUTVIEWM_ARRAY1 示例¶

SWIG 封装的 C 函数使用 `malloc()` 为 N 个整数数组分配内存。从 Python 中，此函数被重复调用，并且创建的数组被销毁（M 次）。

使用 numpy.i 中提供的 ARGOUTVIEW_ARRAY1，这将导致内存泄漏（我知道 ARGOUTVIEW_ARRAY1 不是为此目的设计的，但这很诱人！）。

使用 ARGOUTVIEWM_ARRAY1 片段，使用 `malloc()` 分配的内存将在数组删除时自动释放。

Python 测试程序使用 ARGOUTVIEW_ARRAY1 和 ARGOUTVIEWM_ARRAY1 创建和删除 1024\^2 个 int 数组 2048 次，当内存分配失败时，C 中会生成一个异常，并在 Python 中捕获，显示最终导致分配失败的迭代次数。

C 源代码（ezalloc.c 和 ezalloc.h）¶

这是 ezalloc.h 文件

在 [ ]

void alloc(int ni, int** veco, int *n);

这是 ezalloc.c 文件

在 [ ]

#include <stdio.h>
#include <errno.h>
#include "ezalloc.h"

void alloc(int ni, int** veco, int *n)
{
    int *temp;
    temp = (int *)malloc(ni*sizeof(int));

    if (temp == NULL)
        errno = ENOMEM;

    //veco is either NULL or pointing to the allocated block of memory...
    *veco = temp;
    *n = ni;
}

接口文件（ezalloc.i）¶

该文件（可在此处获取：ezalloc.i）做了一些有趣的事情：* 正如我在引言中所说，在 `%init` 部分调用 `import_array()` 函数现在还会初始化内存释放代码。这里没有其他需要添加的内容。* 如果内存分配失败，则会生成一个异常。在代码结构的几次迭代之后，使用 `errno` 和 `SWIG_fail` 是我想到的最简单的方法。* 在此示例中，创建了两个内联函数，一个使用 ARGOUTVIEW_ARRAY1，另一个使用 ARGOUTVIEWM_ARRAY1。这两个函数都使用 `alloc()` 函数（参见 ezalloc.h 和 ezalloc.c）。

在 [ ]

%module ezalloc
%{
#include <errno.h>
#include "ezalloc.h"

#define SWIG_FILE_WITH_INIT
%}

%include "numpy.i"

%init %{
    import_array();
%}

%apply (int** ARGOUTVIEWM_ARRAY1, int *DIM1) {(int** veco1, int* n1)}
%apply (int** ARGOUTVIEW_ARRAY1, int *DIM1) {(int** veco2, int* n2)}

%include "ezalloc.h"

%exception
{
    errno = 0;
    $action

    if (errno != 0)
    {
        switch(errno)
        {
            case ENOMEM:
                PyErr_Format(PyExc_MemoryError, "Failed malloc()");
                break;
            default:
                PyErr_Format(PyExc_Exception, "Unknown exception");
        }
        SWIG_fail;
    }
}

%rename (alloc_managed) my_alloc1;
%rename (alloc_leaking) my_alloc2;

%inline %{

void my_alloc1(int ni, int** veco1, int *n1)
{
    /* The function... */
    alloc(ni, veco1, n1);
}

void my_alloc2(int ni, int** veco2, int *n2)
{
    /* The function... */
    alloc(ni, veco2, n2);
}

%}

不要忘记，您需要在同一个目录中使用 numpy.i 文件才能编译。

安装文件 (setup_alloc.py)¶

这是 setup_alloc.py 文件

在 [ ]

#! /usr/bin/env python

# System imports
from distutils.core import *
from distutils      import sysconfig

# Third-party modules - we depend on numpy for everything
import numpy

# Obtain the numpy include directory.  This logic works across numpy versions.
try:
    numpy_include = numpy.get_include()
except AttributeError:
    numpy_include = numpy.get_numpy_include()

# alloc extension module
_ezalloc = Extension("_ezalloc",
                   ["ezalloc.i","ezalloc.c"],
                   include_dirs = [numpy_include],

                   extra_compile_args = ["--verbose"]
                   )

# NumyTypemapTests setup
setup(  name        = "alloc functions",
        description = "Testing managed arrays",
        author      = "Egor Zindy",
        version     = "1.0",
        ext_modules = [_ezalloc]
        )

编译模块¶

设置命令行如下（在 Windows 上，使用 mingw）

在 [ ]

$> python setup_alloc.py build --compiler=mingw32

或者在 UN*X 上，只需

在 [ ]

$> python setup_alloc.py build

测试模块¶

如果一切按计划进行，在 `build\lib.XXX` 目录中应该有一个 `_ezalloc.pyd` 文件。该文件需要复制到包含 `ezalloc.py` 文件的目录中（由 swig 生成）。

SVN 存储库中提供了一个 Python 测试程序 (test_alloc.py)，并在下面复制

在 [ ]

import ezalloc

n = 2048

# this multiplied by sizeof(int) to get size in bytes...
#assuming sizeof(int)=4 on a 32bit machine (sorry, it's late!)
m = 1024 * 1024
err = 0

print "ARGOUTVIEWM_ARRAY1 (managed arrays) - %d allocations (%d bytes each)" % (n,4*m)
for i in range(n):
    try:
        #allocating some memory
        a = ezalloc.alloc_managed(m)
        #deleting the array
        del a
    except:
        err = 1
        print "Step %d failed" % i
        break

if err == 0:
    print "Done!\n"

print "ARGOUTVIEW_ARRAY1 (unmanaged, leaking) - %d allocations (%d bytes each)" % (n,4*m)
for i in range(n):
    try:
        #allocating some memory
        a = ezalloc.alloc_leaking(m)
        #deleting the array
        del a
    except:
        err = 1
        print "Step %d failed" % i
        break

if err == 0:
    print "Done? Increase n!\n"

然后，一个

在 [ ]

$> python test_alloc.py

将产生类似于此的输出

在 [ ]

ARGOUTVIEWM_ARRAY1 (managed arrays) - 2048 allocations (4194304 bytes each)
Done!

ARGOUTVIEW_ARRAY1 (unmanaged, leaking) - 2048 allocations (4194304 bytes each)
Step 483 failed

每次数组视图被删除时，未管理的数组都会泄漏内存。管理的数组将无缝地删除内存块。这在 Windows XP 和 Linux 上都经过了测试。

一个简单的 ARGOUTVIEWM_ARRAY2 示例¶

以下示例展示了如何从 C 返回一个二维数组，该数组也受益于自动内存释放。

一个简单的“裁剪”函数使用 SWIG/numpy.i 包装，并返回输入数组的切片。当用 `array_out = crop.crop(array_in, d1_0,d1_1, d2_0,d2_1)` 使用时，它等效于本机 numpy 切片 `array_out = array_in[d1_0:d1_1, d2_0:d2_1]`。

C 源代码 (crop.c 和 crop.h)¶

这是 crop.h 文件

在 [ ]

void crop(int *arr_in, int dim1, int dim2, int d1_0, int d1_1, int d2_0, int d2_1, int **arr_out, int *dim1_out, int *dim2_out);

这是 crop.c 文件

在 [ ]

#include <stdlib.h>
#include <errno.h>

#include "crop.h"

void crop(int *arr_in, int dim1, int dim2, int d1_0, int d1_1, int d2_0, int d2_1, int **arr_out, int *dim1_out, int *dim2_out)
{
    int *arr=NULL;
    int dim1_o=0;
    int dim2_o=0;
    int i,j;

    //value checks
    if ((d1_1 < d1_0) || (d2_1 < d2_0) ||
        (d1_0 >= dim1) || (d1_1 >= dim1) || (d1_0 < 0) || (d1_1 < 0) ||
        (d2_0 >= dim2) || (d2_1 >= dim2) || (d2_0 < 0) || (d2_1 < 0))
    {
        errno = EPERM;
        goto end;
    }

    //output sizes
    dim1_o = d1_1-d1_0;
    dim2_o = d2_1-d2_0;

    //memory allocation
    arr = (int *)malloc(dim1_o*dim2_o*sizeof(int));
    if (arr == NULL)
    {
        errno = ENOMEM;
        goto end;
    }

    //copying the cropped arr_in region to arr (naive implementation)
    printf("\n--- d1_0=%d d1_1=%d (rows)  -- d2_0=%d d2_1=%d (columns)\n",d1_0,d1_1,d2_0,d2_1);
    for (j=0; j<dim1_o; j++)
    {
        for (i=0; i<dim2_o; i++)
        {
            arr[j*dim2_o+i] = arr_in[(j+d1_0)*dim2+(i+d2_0)];
            printf("%d ",arr[j*dim2_o+i]);
        }
        printf("\n");
    }
    printf("---\n\n");

end:
    *dim1_out = dim1_o;
    *dim2_out = dim2_o;
    *arr_out = arr;
}

接口文件 (crop.i)¶

该文件（可在此处获得：crop.i）做了一些有趣的事情：* 数组维度 DIM1 和 DIM2 与 Python 侧的 array.shape 顺序相同。在图像的行主数组定义中，DIM1 将是行数，DIM2 将是列数。* 使用 errno 库，当内存分配失败 (ENOMEM) 或索引出现问题 (EPERM) 时，会生成异常。

在 [ ]

%module crop
%{
#include <errno.h>
#include "crop.h"

#define SWIG_FILE_WITH_INIT
%}

%include "numpy.i"

%init %{
    import_array();
%}

%exception crop
{
    errno = 0;
    $action

    if (errno != 0)
    {
        switch(errno)
        {
            case EPERM:
                PyErr_Format(PyExc_IndexError, "Index error");
                break;
            case ENOMEM:
                PyErr_Format(PyExc_MemoryError, "Not enough memory");
                break;
            default:
                PyErr_Format(PyExc_Exception, "Unknown exception");
        }
        SWIG_fail;
    }
}

%apply (int* IN_ARRAY2, int DIM1, int DIM2) {(int *arr_in, int dim1, int dim2)}
%apply (int** ARGOUTVIEWM_ARRAY2, int* DIM1, int* DIM2) {(int **arr_out, int *dim1_out, int *dim2_out)}

%include "crop.h"

不要忘记，您需要在同一个目录中使用 numpy.i 文件才能编译。

安装文件 (setup_crop.py)¶

这是 setup_crop.py 文件

在 [ ]

#! /usr/bin/env python

# System imports
from distutils.core import *
from distutils      import sysconfig

# Third-party modules - we depend on numpy for everything
import numpy

# Obtain the numpy include directory.  This logic works across numpy versions.
try:
    numpy_include = numpy.get_include()
except AttributeError:
    numpy_include = numpy.get_numpy_include()

# crop extension module
_crop = Extension("_crop",
                   ["crop.i","crop.c"],
                   include_dirs = [numpy_include],

                   extra_compile_args = ["--verbose"]
                   )

# NumyTypemapTests setup
setup(  name        = "crop test",
        description = "A simple crop test to demonstrate the use of ARGOUTVIEWM_ARRAY2",
        author      = "Egor Zindy",
        version     = "1.0",
        ext_modules = [_crop]
        )

测试模块¶

如果一切按计划进行，在 `build\lib.XXX` 目录中应该有一个 `_crop.pyd` 文件。该文件需要复制到包含 `crop.py` 文件的目录中（由 swig 生成）。

SVN 存储库中提供了一个 Python 测试程序 (test_crop.py)，并在下面复制

在 [ ]

import crop
import numpy

a = numpy.zeros((5,10),numpy.int)
a[numpy.arange(5),:] = numpy.arange(10)

b = numpy.transpose([(10 ** numpy.arange(5))])
a = (a*b)[:,1:] #this array is most likely NOT contiguous

print a
print "dim1=%d dim2=%d" % (a.shape[0],a.shape[1])

d1_0 = 2
d1_1 = 4
d2_0 = 1
d2_1 = 5

c = crop.crop(a, d1_0,d1_1, d2_0,d2_1)
d = a[d1_0:d1_1, d2_0:d2_1]

print "returned array:"
print c

print "native slicing:"
print d

输出如下所示

在 [ ]

$ python test_crop.py
[[    1     2     3     4     5     6     7     8     9]
 [   10    20    30    40    50    60    70    80    90]
 [  100   200   300   400   500   600   700   800   900]
 [ 1000  2000  3000  4000  5000  6000  7000  8000  9000]
 [10000 20000 30000 40000 50000 60000 70000 80000 90000]]
dim1=5 dim2=9

--- d1_0=2 d1_1=4 (rows)  -- d2_0=1 d2_1=5 (columns)
200 300 400 500
2000 3000 4000 5000
---

returned array:
[[ 200  300  400  500]
 [2000 3000 4000 5000]]
native slicing:
[[ 200  300  400  500]
 [2000 3000 4000 5000]]

numpy.i 负责在需要时使数组连续，因此唯一需要处理的是数组方向。

结论和评论¶

各位再见！文件可在 [http://code.google.com/p/ezwidgets/source/browse/#svn/trunk/numpy Google 代码 SVN] 上获取。欢迎评论！

此致，Egor

章节作者：EgorZindy