Python中许多内建函数(built-in Function)都是由C语言写成的(我这里也不太确定,但Python中是找不到内建函数的源码的,在CPython中能够找到具体的C实现的内建函数),其源码在cpython下的Python/bltinmodule.c中,这里还能注意到的是在Python(非CPython)中,如Python34/include文件夹下有bltinmodule.h文件,但找不到bltinmodule.c

Python2.7下open():

open(name[, mode[, buffering]])
  • name: name is the file name to be opened
  • mode: mode is a string indicating how the file is to be opened
  • buffering: The optional buffering argument specifies the file’s desired buffer size: 0 means unbuffered, 1 means line buffered, any other positive value means use a buffer of (approximately) that size (in bytes). A negative buffering means to use the system default, which is usually line buffered for tty devices and fully buffered for other files. If omitted, the system default is used.

其C实现源码:

static PyObject *
builtin_open(PyObject *self, PyObject *args, PyObject *kwds)
{
    return PyObject_Call((PyObject*)&PyFile_Type, args, kwds);
}

后面的这步调用我没找到具体位置,但可以肯定open()直接用C实现了。

Python3.4下open():

open(file, mode='r', buffering=-1, encoding=None, 
errors=None, newline=None, closefd=True, opener=None)

这里不仔细介绍每个参数了,需要注意这个内建函数open()实际上是io.open()

在io库下实际也有介绍:

io.open(file, mode='r', buffering=-1, encoding=None, errors=None, 
newline=None, closefd=True, opener=None)
This is an alias for the builtin open() function.

自然而然,在C源码中找不到builtin_open()了。


对比两个版本的open()可以发现,3.4版本的open()相比2.7而言增加了相当多的新功能,尤其是encoding加入到了open()中,这样在处理UTF-8时不需要像在Python2中每次都要import codecs了。也可以猜想,当open()用python而不是C实现,其速度肯定会慢下来。

下面看看Python3中的open()实现

open()源码在Python34/Lib/_pyio.py中:

def open(file, mode="r", buffering=-1, encoding=None, errors=None,
         newline=None, closefd=True, opener=None):
    if not isinstance(file, (str, bytes, int)):
        raise TypeError("invalid file: %r" % file)
    if not isinstance(mode, str):
        raise TypeError("invalid mode: %r" % mode)
    if not isinstance(buffering, int):
        raise TypeError("invalid buffering: %r" % buffering)
    if encoding is not None and not isinstance(encoding, str):
        raise TypeError("invalid encoding: %r" % encoding)
    if errors is not None and not isinstance(errors, str):
        raise TypeError("invalid errors: %r" % errors)
    modes = set(mode)
    if modes - set("axrwb+tU") or len(mode) > len(modes):
        raise ValueError("invalid mode: %r" % mode)
    creating = "x" in modes
    reading = "r" in modes
    writing = "w" in modes
    appending = "a" in modes
    updating = "+" in modes
    text = "t" in modes
    binary = "b" in modes
    if "U" in modes:
        if creating or writing or appending:
            raise ValueError("can't use U and writing mode at once")
        import warnings
        warnings.warn("'U' mode is deprecated",
                      DeprecationWarning, 2)
        reading = True
    if text and binary:
        raise ValueError("can't have text and binary mode at once")
    if creating + reading + writing + appending > 1:
        raise ValueError("can't have read/write/append mode at once")
    if not (creating or reading or writing or appending):
        raise ValueError("must have exactly one of read/write/append mode")
    if binary and encoding is not None:
        raise ValueError("binary mode doesn't take an encoding argument")
    if binary and errors is not None:
        raise ValueError("binary mode doesn't take an errors argument")
    if binary and newline is not None:
        raise ValueError("binary mode doesn't take a newline argument")
    raw = FileIO(file,
                 (creating and "x" or "") +
                 (reading and "r" or "") +
                 (writing and "w" or "") +
                 (appending and "a" or "") +
                 (updating and "+" or ""),
                 closefd, opener=opener)
    result = raw
    try:
        line_buffering = False
        if buffering == 1 or buffering < 0 and raw.isatty():
            buffering = -1
            line_buffering = True
        if buffering < 0:
            buffering = DEFAULT_BUFFER_SIZE
            try:
                bs = os.fstat(raw.fileno()).st_blksize
            except (OSError, AttributeError):
                pass
            else:
                if bs > 1:
                    buffering = bs
        if buffering < 0:
            raise ValueError("invalid buffering size")
        if buffering == 0:
            if binary:
                return result
            raise ValueError("can't have unbuffered text I/O")
        if updating:
            buffer = BufferedRandom(raw, buffering)
        elif creating or writing or appending:
            buffer = BufferedWriter(raw, buffering)
        elif reading:
            buffer = BufferedReader(raw, buffering)
        else:
            raise ValueError("unknown mode: %r" % mode)
        result = buffer
        if binary:
            return result
        text = TextIOWrapper(buffer, encoding, errors, newline, line_buffering)
        result = text
        text.mode = mode
        return result
    except:
        result.close()
        raise

需要注意的是这一段:

raw = FileIO(file,
             (creating and "x" or "") +
             (reading and "r" or "") +
             (writing and "w" or "") +
             (appending and "a" or "") +
             (updating and "+" or ""),
             closefd, opener=opener)
result = raw

我没有找到FileIO的源码,但感觉FileIO()就是用C写成的,如果我的猜想没错的话,open()的大部分新功能实际上还是交给了各个Python模块来共同实现。

注意到这一段:

if binary:
    return result
text = TextIOWrapper(buffer, encoding, errors, newline, line_buffering)

对其进行encoding等处理是交给了TextIOWrapperTextIOWrapper是一个类(class),关于其encoding部分有下面两个方法:

def _get_encoder(self):
    make_encoder = codecs.getincrementalencoder(self._encoding)
    self._encoder = make_encoder(self._errors)
    return self._encoder
def _get_decoder(self):
    make_decoder = codecs.getincrementaldecoder(self._encoding)
    decoder = make_decoder(self._errors)
    if self._readuniversal:
        decoder = IncrementalNewlineDecoder(decoder, self._readtranslate)
    self._decoder = decoder
    return decoder

看到codecs就豁然开朗了


总的来说,Python2中open()直接用C实现,速度上快了不少,但实现的功能很少;需要更多功能可以利用codecs库。但在Python3中,相当于取消了“低端”版本的open(),似乎将codecs版和内建版融合;如果调用open()时不使用那些新参数,大概还是能认为是C实现的,但如果用到了新参数,就相当于在用codecs处理,只是省去了import codecs而已。