Python3中内建函数open()的一些细节
Python中许多内建函数(built-in Function)都是由C语言写成的(我这里也不太确定,但Python中是找不到内建函数的源码的,在CPython中能够找到具体的C实现的内建函数),其源码在cpython下的Python/bltinmodule.c
中,这里还能注意到的是在Python(非CPython)中,如Python34/include
文件夹下有bltinmodule.h
文件,但找不到bltinmodule.c
。
Python2.7下open()
:
open(name[, mode[, buffering]])
name
: name is the file name to be openedmode
: mode is a string indicating how the file is to be openedbuffering
: The optional buffering argument specifies the file’s desired buffer size: 0 means unbuffered, 1 means line buffered, any other positive value means use a buffer of (approximately) that size (in bytes). A negative buffering means to use the system default, which is usually line buffered for tty devices and fully buffered for other files. If omitted, the system default is used.
其C实现源码:
static PyObject *
builtin_open(PyObject *self, PyObject *args, PyObject *kwds)
{
return PyObject_Call((PyObject*)&PyFile_Type, args, kwds);
}
后面的这步调用我没找到具体位置,但可以肯定open()
直接用C实现了。
Python3.4下open()
:
open(file, mode='r', buffering=-1, encoding=None,
errors=None, newline=None, closefd=True, opener=None)
这里不仔细介绍每个参数了,需要注意这个内建函数open()
实际上是io.open()
。
在io库下实际也有介绍:
io.open(file, mode='r', buffering=-1, encoding=None, errors=None,
newline=None, closefd=True, opener=None)
This is an alias for the builtin open() function.
自然而然,在C源码中找不到builtin_open()
了。
对比两个版本的open()
可以发现,3.4
版本的open()
相比2.7
而言增加了相当多的新功能,尤其是encoding
加入到了open()
中,这样在处理UTF-8
时不需要像在Python2
中每次都要import codecs
了。也可以猜想,当open()
用python而不是C实现,其速度肯定会慢下来。
下面看看Python3中的open()
实现
open()
源码在Python34/Lib/_pyio.py
中:
def open(file, mode="r", buffering=-1, encoding=None, errors=None,
newline=None, closefd=True, opener=None):
if not isinstance(file, (str, bytes, int)):
raise TypeError("invalid file: %r" % file)
if not isinstance(mode, str):
raise TypeError("invalid mode: %r" % mode)
if not isinstance(buffering, int):
raise TypeError("invalid buffering: %r" % buffering)
if encoding is not None and not isinstance(encoding, str):
raise TypeError("invalid encoding: %r" % encoding)
if errors is not None and not isinstance(errors, str):
raise TypeError("invalid errors: %r" % errors)
modes = set(mode)
if modes - set("axrwb+tU") or len(mode) > len(modes):
raise ValueError("invalid mode: %r" % mode)
creating = "x" in modes
reading = "r" in modes
writing = "w" in modes
appending = "a" in modes
updating = "+" in modes
text = "t" in modes
binary = "b" in modes
if "U" in modes:
if creating or writing or appending:
raise ValueError("can't use U and writing mode at once")
import warnings
warnings.warn("'U' mode is deprecated",
DeprecationWarning, 2)
reading = True
if text and binary:
raise ValueError("can't have text and binary mode at once")
if creating + reading + writing + appending > 1:
raise ValueError("can't have read/write/append mode at once")
if not (creating or reading or writing or appending):
raise ValueError("must have exactly one of read/write/append mode")
if binary and encoding is not None:
raise ValueError("binary mode doesn't take an encoding argument")
if binary and errors is not None:
raise ValueError("binary mode doesn't take an errors argument")
if binary and newline is not None:
raise ValueError("binary mode doesn't take a newline argument")
raw = FileIO(file,
(creating and "x" or "") +
(reading and "r" or "") +
(writing and "w" or "") +
(appending and "a" or "") +
(updating and "+" or ""),
closefd, opener=opener)
result = raw
try:
line_buffering = False
if buffering == 1 or buffering < 0 and raw.isatty():
buffering = -1
line_buffering = True
if buffering < 0:
buffering = DEFAULT_BUFFER_SIZE
try:
bs = os.fstat(raw.fileno()).st_blksize
except (OSError, AttributeError):
pass
else:
if bs > 1:
buffering = bs
if buffering < 0:
raise ValueError("invalid buffering size")
if buffering == 0:
if binary:
return result
raise ValueError("can't have unbuffered text I/O")
if updating:
buffer = BufferedRandom(raw, buffering)
elif creating or writing or appending:
buffer = BufferedWriter(raw, buffering)
elif reading:
buffer = BufferedReader(raw, buffering)
else:
raise ValueError("unknown mode: %r" % mode)
result = buffer
if binary:
return result
text = TextIOWrapper(buffer, encoding, errors, newline, line_buffering)
result = text
text.mode = mode
return result
except:
result.close()
raise
需要注意的是这一段:
raw = FileIO(file,
(creating and "x" or "") +
(reading and "r" or "") +
(writing and "w" or "") +
(appending and "a" or "") +
(updating and "+" or ""),
closefd, opener=opener)
result = raw
我没有找到FileIO
的源码,但感觉FileIO()
就是用C写成的,如果我的猜想没错的话,open()
的大部分新功能实际上还是交给了各个Python模块来共同实现。
注意到这一段:
if binary:
return result
text = TextIOWrapper(buffer, encoding, errors, newline, line_buffering)
对其进行encoding
等处理是交给了TextIOWrapper
,TextIOWrapper
是一个类(class
),关于其encoding
部分有下面两个方法:
def _get_encoder(self):
make_encoder = codecs.getincrementalencoder(self._encoding)
self._encoder = make_encoder(self._errors)
return self._encoder
def _get_decoder(self):
make_decoder = codecs.getincrementaldecoder(self._encoding)
decoder = make_decoder(self._errors)
if self._readuniversal:
decoder = IncrementalNewlineDecoder(decoder, self._readtranslate)
self._decoder = decoder
return decoder
看到codecs
就豁然开朗了
总的来说,Python2中open()
直接用C实现,速度上快了不少,但实现的功能很少;需要更多功能可以利用codecs
库。但在Python3中,相当于取消了“低端”版本的open()
,似乎将codecs
版和内建版融合;如果调用open()
时不使用那些新参数,大概还是能认为是C实现的,但如果用到了新参数,就相当于在用codecs
处理,只是省去了import codecs
而已。