阅读go-bible时,看到下面一番话——
Contains和HasPrefix等函数好像想想也没太大优化空间,都是一顿遍历搞定,Join和Fields第一时间想不出来优化方案,那就看看源码研究下go的设计者们是怎么思考和实现的,Fields都没见人用过,那就只看Join部分。
先来一个小例子:
a := []string{"a", "b", "c"}
// join strings with comma
fmt.Println(strings.Join(a, ","))
// console output
a,b,c
vscode ctrl+click查看实现再加一点注释
// Join concatenates the elements of its first argument to create a single string. The separator
// string sep is placed between elements in the resulting string.
func Join(elems []string, sep string) string {
// 先看切片长度,0和1直接返回原值,无需添加sep
switch len(elems) {
case 0:
return ""
case 1:
return elems[0]
}
// 计算得到需要添加的 sep 字符串总长度
n := len(sep) * (len(elems) - 1)
for i := 0; i < len(elems); i++ {
n += len(elems[i]) // 累加获取返回值的长度
}
// Builder 写入效率高于字符串直接拼接
var b Builder
b.Grow(n)
b.WriteString(elems[0])
for _, s := range elems[1:] {
b.WriteString(sep)
b.WriteString(s)
}
return b.String()
}
那就先看看Builder
// A Builder is used to efficiently(高效) build a string using Write methods.
// It minimizes memory copying(最小化内存拷贝). The zero value is ready to use.(切片零值可用)
// Do not copy a non-zero Builder.(不要拷贝Builder)
type Builder struct {
addr *Builder // of receiver, to detect copies by value(检测)
buf []byte
}
看看WriteString
// WriteString appends the contents of s to b's buffer.
// It returns the length of s and a nil error.
func (b *Builder) WriteString(s string) (int, error) {
b.copyCheck()
b.buf = append(b.buf, s...)
return len(s), nil
}
// 检测方法
func (b *Builder) copyCheck() {
if b.addr == nil {
// This hack works around a failing of Go's escape analysis
// that was causing b to escape and be heap allocated.
// See issue 23382.
// TODO: once issue 7921 is fixed, this should be reverted to
// just "b.addr = b".
b.addr = (*Builder)(noescape(unsafe.Pointer(b)))
} else if b.addr != b {
panic("strings: illegal use of non-zero Builder copied by value")
}
}
copyCheck方法不允许对Builder结构体的拷贝进行WriteString等操作,事实上几乎每个Builder的相关方法都首先调用copyCheck
var b strings.Builder
b.Grow(100)
b.WriteString("a")
b.WriteString("b")
b.WriteString("c")
fmt.Println(b.String())
c := b
c.WriteString("d")
fmt.Println(c.String())
// console 输出情况
abc
panic: strings: illegal use of non-zero Builder copied by value
然后看看Grow和WriteString
// grow copies the buffer to a new, larger buffer so that there are at least n
// bytes of capacity beyond len(b.buf).
func (b *Builder) grow(n int) {
// 容量直接翻倍并加上n,在n很大时也保证了足够的空间分配
// 其实在这个函数之前都觉得除了copyCheck之外和普通的切片写入差异不大
// 也许设计者认为在大批量,多内容写入时尽早地分配足够的空间能够减少扩容的次数
// 性能在很大程度上得到了保证
// 原生的slice扩容相较之下扩容次数肯定会更多,因为系数一开始2,1024后1.25,
// 相较之下strings.Builder很明显更适合大文本的写入
buf := make([]byte, len(b.buf), 2*cap(b.buf)+n)
copy(buf, b.buf)
b.buf = buf
}
// Grow grows b's capacity, if necessary, to guarantee space for
// another n bytes. After Grow(n), at least n bytes can be written to b
// without another allocation. If n is negative, Grow panics.
func (b *Builder) Grow(n int) {
b.copyCheck() // 拷贝检测
if n < 0 {
panic("strings.Builder.Grow: negative count")
}
// 函数完成后,保证能够写入n个byte
// 不是每次调用都会产生实际的效果,取决于目前是否需要扩容,扩容次数应该尽量少
if cap(b.buf)-len(b.buf) < n {
b.grow(n) // 执行扩容
}
}
// 另一个调用了grow方法的方法
// WriteRune appends the UTF-8 encoding of Unicode code point r to b's buffer.
// It returns the length of r and a nil error.
func (b *Builder) WriteRune(r rune) (int, error) {
b.copyCheck()
// Compare as uint32 to correctly handle negative runes.
if uint32(r) < utf8.RuneSelf {
b.buf = append(b.buf, byte(r))
return 1, nil
}
l := len(b.buf)
if cap(b.buf)-l < utf8.UTFMax {
b.grow(utf8.UTFMax)
}
n := utf8.EncodeRune(b.buf[l:l+utf8.UTFMax], r)
b.buf = b.buf[:l+n]
return n, nil
}
// WriteString appends the contents of s to b's buffer.
// It returns the length of s and a nil error.
func (b *Builder) WriteString(s string) (int, error) {
b.copyCheck()
b.buf = append(b.buf, s...) // 添加完事
return len(s), nil
}
看完注释和代码其实理念很简单,大文本写入嘛,性能瓶颈大概率是在容器的扩容上,次数多自然性能差,那就在扩容时给予足够大的权重,strings.Builder中选择了系数2,同时多分配了需要的n长度空间,显著减少了扩容次数,都说到这份上了,就不上测试了。