Go 内置 strings.Join 背后干了点啥?

作者:guoxj
浏览:369

    阅读go-bible时,看到下面一番话——

go-bible-strings-Join
go-bible-strings-Join

    Contains和HasPrefix等函数好像想想也没太大优化空间,都是一顿遍历搞定,Join和Fields第一时间想不出来优化方案,那就看看源码研究下go的设计者们是怎么思考和实现的,Fields都没见人用过,那就只看Join部分。

    先来一个小例子:

a := []string{"a", "b", "c"}
// join strings with comma
fmt.Println(strings.Join(a, ","))

// console output
a,b,c

vscode ctrl+click查看实现再加一点注释

// Join concatenates the elements of its first argument to create a single string. The separator
// string sep is placed between elements in the resulting string.
func Join(elems []string, sep string) string {
    // 先看切片长度,0和1直接返回原值,无需添加sep
    switch len(elems) {
    case 0:
        return ""
    case 1:
        return elems[0]
    }
    // 计算得到需要添加的 sep 字符串总长度
    n := len(sep) * (len(elems) - 1)
    for i := 0; i < len(elems); i++ {
        n += len(elems[i]) // 累加获取返回值的长度
    }
    // Builder 写入效率高于字符串直接拼接
    var b Builder
    b.Grow(n)
    b.WriteString(elems[0])
    for _, s := range elems[1:] {
        b.WriteString(sep)
        b.WriteString(s)
    }
    return b.String()
}

    那就先看看Builder

// A Builder is used to efficiently(高效) build a string using Write methods.
// It minimizes memory copying(最小化内存拷贝). The zero value is ready to use.(切片零值可用)
// Do not copy a non-zero Builder.(不要拷贝Builder)
type Builder struct {
    addr *Builder // of receiver, to detect copies by value(检测)
    buf  []byte
}

    看看WriteString

// WriteString appends the contents of s to b's buffer.
// It returns the length of s and a nil error.
func (b *Builder) WriteString(s string) (int, error) {
    b.copyCheck()
    b.buf = append(b.buf, s...)
    return len(s), nil
}

// 检测方法
func (b *Builder) copyCheck() {
    if b.addr == nil {
        // This hack works around a failing of Go's escape analysis
        // that was causing b to escape and be heap allocated.
        // See issue 23382.
        // TODO: once issue 7921 is fixed, this should be reverted to
        // just "b.addr = b".
        b.addr = (*Builder)(noescape(unsafe.Pointer(b)))
    } else if b.addr != b {
        panic("strings: illegal use of non-zero Builder copied by value")
    }
}

    copyCheck方法不允许对Builder结构体的拷贝进行WriteString等操作,事实上几乎每个Builder的相关方法都首先调用copyCheck

var b strings.Builder
b.Grow(100)
b.WriteString("a")
b.WriteString("b")
b.WriteString("c")
fmt.Println(b.String())

c := b
c.WriteString("d")
fmt.Println(c.String())


// console 输出情况
abc
panic: strings: illegal use of non-zero Builder copied by value

    然后看看Grow和WriteString

// grow copies the buffer to a new, larger buffer so that there are at least n
// bytes of capacity beyond len(b.buf).
func (b *Builder) grow(n int) {
    // 容量直接翻倍并加上n,在n很大时也保证了足够的空间分配
    // 其实在这个函数之前都觉得除了copyCheck之外和普通的切片写入差异不大
    // 也许设计者认为在大批量,多内容写入时尽早地分配足够的空间能够减少扩容的次数
    // 性能在很大程度上得到了保证
    // 原生的slice扩容相较之下扩容次数肯定会更多,因为系数一开始2,1024后1.25,
    // 相较之下strings.Builder很明显更适合大文本的写入
    buf := make([]byte, len(b.buf), 2*cap(b.buf)+n)
    copy(buf, b.buf)
    b.buf = buf
}

// Grow grows b's capacity, if necessary, to guarantee space for
// another n bytes. After Grow(n), at least n bytes can be written to b
// without another allocation. If n is negative, Grow panics.
func (b *Builder) Grow(n int) {
    b.copyCheck() // 拷贝检测
    if n < 0 {
        panic("strings.Builder.Grow: negative count")
    }
    // 函数完成后,保证能够写入n个byte
    // 不是每次调用都会产生实际的效果,取决于目前是否需要扩容,扩容次数应该尽量少
    if cap(b.buf)-len(b.buf) < n {
        b.grow(n)  // 执行扩容
    }
}

// 另一个调用了grow方法的方法
// WriteRune appends the UTF-8 encoding of Unicode code point r to b's buffer.
// It returns the length of r and a nil error.
func (b *Builder) WriteRune(r rune) (int, error) {
    b.copyCheck()
    // Compare as uint32 to correctly handle negative runes.
    if uint32(r) < utf8.RuneSelf {
        b.buf = append(b.buf, byte(r))
        return 1, nil
    }
    l := len(b.buf)
    if cap(b.buf)-l < utf8.UTFMax {
        b.grow(utf8.UTFMax)
    }
    n := utf8.EncodeRune(b.buf[l:l+utf8.UTFMax], r)
    b.buf = b.buf[:l+n]
    return n, nil
}


// WriteString appends the contents of s to b's buffer.
// It returns the length of s and a nil error.
func (b *Builder) WriteString(s string) (int, error) {
    b.copyCheck()
    b.buf = append(b.buf, s...) // 添加完事
    return len(s), nil
}

    看完注释和代码其实理念很简单,大文本写入嘛,性能瓶颈大概率是在容器的扩容上,次数多自然性能差,那就在扩容时给予足够大的权重,strings.Builder中选择了系数2,同时多分配了需要的n长度空间,显著减少了扩容次数,都说到这份上了,就不上测试了。




登录后回复

共有1条评论


guoxj

go version go1.18.4 windows/amd64

2022-10-07 01:33 回复