Golang 的内存管理
Golang 的内存管理
[TOC]
白话 Go 语言内存管理(一)内存分配原理 白话 Go 语言内存管理(二)解密栈内存管理
Golang 的内存管理
Golang 的内存管理基于 tcmalloc,可以说起点挺高的。但是 Golang 在实现的时候还做了很多优化,我们下面通过源码来看一下 Golang 的内存管理实现。下面的源码分析基于go1.8rc3。
1. tcmalloc 介绍
关于 tcmalloc 可以参考这篇文章 tcmalloc 介绍,原始论文可以参考 TCMalloc : Thread-Caching Malloc or Thread-Caching Malloc。
2. Golang 内存管理
0. 准备知识
这里先简单介绍一下 Golang 运行调度。在 Golang 里面有三个基本的概念:G, M, P。
- G: Goroutine 执行的上下文环境。
- M: 操作系统线程。
- P: Processer。进程调度的关键,调度器,也可以认为约等于 CPU。
一个 Goroutine 的运行需要 G + P + M 三部分结合起来。好,先简单介绍到这里,更详细的放在后面的文章里面来说。
1. 逃逸分析(escape analysis)
对于手动管理内存的语言,比如 C/C++,我们使用 malloc 或者 new 申请的变量会被分配到堆上。但是 Golang 并不是这样,虽然 Golang 语言里面也有 new。Golang 编译器会进行逃逸分析来判断变量应该分配到什么地方。下面是一个简单的例子。
1package main
2import ()
3func foo() *int {
4 var x int
5 return &x
6}
7func bar() int {
8 x := new(int)
9 *x = 1
10 return *x
11}
12func main() {}
将上面文件保存为 escape.go,执行下面命令
1$ go run -gcflags '-m -l' escape.go
2./escape.go:6: moved to heap: x
3./escape.go:7: &x escape to heap
4./escape.go:11: bar new(int) does not escape
go1.13
1go run -gcflags "-m -l" cmd.go
2
3
4.\cmd.go:6:6: moved to heap: x
5.\cmd.go:10:10: bar new(int) does not escape
上面的意思是 foo() 中的 x 最后在堆上分配,而 bar() 中的 x 最后分配在了栈上(尽管是通过 new 分配的)。在官网 (golang.org) FAQ 上有一个关于变量分配的问题如下: How do I know whether a variable is allocated on the heap or the stack?
From a correctness standpoint, you don’t need to know. Each variable in Go exists as long as there are references to it. The storage location chosen by the implementation is irrelevant to the semantics of the language.
The storage location does have an effect on writing efficient programs. When possible, the Go compilers will allocate variables that are local to a function in that function’s stack frame. However, if the compiler cannot prove that the variable is not referenced after the function returns, then the compiler must allocate the variable on the garbage-collected heap to avoid dangling pointer errors. Also, if a local variable is very large, it might make more sense to store it on the heap rather than the stack.
In the current compilers, if a variable has its address taken, that variable is a candidate for allocation on the heap. However, a basic escape analysis recognizes some cases when such variables will not live past the return from the function and can reside on the stack.
如何得知变量是分配在栈(stack)上还是堆(heap)上?
准确地说,你并不需要知道。Golang 中的变量只要被引用就一直会存活,存储在堆上还是栈上由内部实现决定而和具体的语法没有关系。
知道变量的存储位置确实和效率编程有关系。如果可能,Golang 编译器会将函数的局部变量分配到函数栈帧(stack frame)上。然而,如果编译器不能确保变量在函数 return 之后不再被引用,编译器就会将变量分配到堆上。而且,如果一个局部变量非常大,那么它也应该被分配到堆上而不是栈上。
当前情况下,如果一个变量被取地址,那么它就有可能被分配到堆上。然而,还要对这些变量做逃逸分析,如果函数 return 之后,变量不再被引用,则将其分配到栈上。
2. 关键数据结构
几个关键的地方:
- mcache: per-P cache,可以认为是 local cache。
- mcentral: 全局 cache,mcache 不够用的时候向 mcentral 申请。
- mheap: 当 mcentral 也不够用的时候,通过 mheap 向操作系统申请。
可以将其看成多级内存分配器。简单的分配过程可以描述如下,具体的之后下面的再说。
- 先向 mcache 申请。
- mcache 不足的话向 mcentral 申请填充。
- mcentral 不足则向 mheap 申请填充。
- mheap 不足的话,则想操作系统申请。
2.1 mcache
我们知道每个 Gorontine 的运行都是绑定到一个 P 上面,mcache 是每个 P 的 cache。这么做的好处是分配内存时不需要加锁。mcache 结构如下。
runtime/mcache.go
1// Per-thread (in Go, per-P) cache for small objects.
2// No locking needed because it is per-thread (per-P).
3//
4// mcaches are allocated from non-GC'd memory, so any heap pointers
5// must be specially handled.
6//
7//go:notinheap
8type mcache struct {
9 // The following members are accessed on every malloc,
10 // so they are grouped here for better caching.
11 next_sample uintptr // trigger heap sample after allocating this many bytes
12 local_scan uintptr // bytes of scannable heap allocated
13
14 // Allocator cache for tiny objects w/o pointers.
15 // See "Tiny allocator" comment in malloc.go.
16
17 // tiny points to the beginning of the current tiny block, or
18 // nil if there is no current tiny block.
19 //
20 // tiny is a heap pointer. Since mcache is in non-GC'd memory,
21 // we handle it by clearing it in releaseAll during mark
22 // termination.
23 tiny uintptr
24 tinyoffset uintptr
25 local_tinyallocs uintptr // number of tiny allocs not counted in other stats
26
27 // The rest is not accessed on every malloc.
28
29 alloc [numSpanClasses]*mspan // spans to allocate from, indexed by spanClass
30
31 stackcache [_NumStackOrders]stackfreelist
32
33 // Local allocator stats, flushed during GC.
34 local_largefree uintptr // bytes freed for large objects (>maxsmallsize)
35 local_nlargefree uintptr // number of frees for large objects (>maxsmallsize)
36 local_nsmallfree [_NumSizeClasses]uintptr // number of frees for small objects (<=maxsmallsize)
37
38 // flushGen indicates the sweepgen during which this mcache
39 // was last flushed. If flushGen != mheap_.sweepgen, the spans
40 // in this mcache are stale and need to the flushed so they
41 // can be swept. This is done in acquirep.
42 flushGen uint32
43}
我们可以暂时只关注 alloc [_NumSizeClasses]*mspan,这是一个大小为 67 的指针(指针指向 mspan )数组(_NumSizeClasses = 67),每个数组元素也就是 mspan,被切分成特定大小的块。当要分配内存时,为 object 在 alloc 数组中选择合适的元素来分配。67 种块大小为 0,8 byte, 16 byte, … ,这个和 tcmalloc 稍有区别。
1//file: sizeclasses.go
2var class_to_size = [_NumSizeClasses]uint16{0, 8, 16, 32, 48, 64, 80, 96, 112, 128, 144, 160, 176, 192, 208, 224, 240, 256, 288, 320, 352, 384, 416, 448, 480, 512, 576, 640, 704, 768, 896, 1024, 1152, 1280, 1408, 1536, 1792, 2048, 2304, 2688, 3072, 3200, 3456, 4096, 4864, 5376, 6144, 6528, 6784, 6912, 8192, 9472, 9728, 10240, 10880, 12288, 13568, 14336, 16384, 18432, 19072, 20480, 21760, 24576, 27264, 28672, 32768}
这里仔细想有个小问题,上面的 alloc 类似内存池的 freelist 数组或者链表,正常实现每个数组元素是一个链表,链表由特定大小的块串起来。但是这里统一使用了 mspan 结构,那么只有一种可能,就是 mspan 中记录了需要分配的块大小。我们来看一下 mspan 的结构。
2.2 mspan
span 在 tcmalloc 中作为一种管理内存的基本单位而存在。Golang 的 mspan 的结构如下,省略了部分内容。
1// runtime/mheap.go
2//go:notinheap
3type mspan struct {
4 next *mspan // next span in list, or nil if none
5 prev *mspan // previous span in list, or nil if none
6 list *mSpanList // For debugging. TODO: Remove.
7
8 startAddr uintptr // address of first byte of span aka s.base()
9 npages uintptr // number of pages in span
10
11 manualFreeList gclinkptr // list of free objects in mSpanManual spans
12
13 // freeindex is the slot index between 0 and nelems at which to begin scanning
14 // for the next free object in this span.
15 // Each allocation scans allocBits starting at freeindex until it encounters a 0
16 // indicating a free object. freeindex is then adjusted so that subsequent scans begin
17 // just past the newly discovered free object.
18 //
19 // If freeindex == nelem, this span has no free objects.
20 //
21 // allocBits is a bitmap of objects in this span.
22 // If n >= freeindex and allocBits[n/8] & (1<<(n%8)) is 0
23 // then object n is free;
24 // otherwise, object n is allocated. Bits starting at nelem are
25 // undefined and should never be referenced.
26 //
27 // Object n starts at address n*elemsize + (start << pageShift).
28 freeindex uintptr
29 // TODO: Look up nelems from sizeclass and remove this field if it
30 // helps performance.
31 nelems uintptr // number of object in the span.
32
33 // Cache of the allocBits at freeindex. allocCache is shifted
34 // such that the lowest bit corresponds to the bit freeindex.
35 // allocCache holds the complement of allocBits, thus allowing
36 // ctz (count trailing zero) to use it directly.
37 // allocCache may contain bits beyond s.nelems; the caller must ignore
38 // these.
39 allocCache uint64// 用位图来管理可用的 free object,1 表示可用
40
41 // allocBits and gcmarkBits hold pointers to a span's mark and
42 // allocation bits. The pointers are 8 byte aligned.
43 // There are three arenas where this data is held.
44 // free: Dirty arenas that are no longer accessed
45 // and can be reused.
46 // next: Holds information to be used in the next GC cycle.
47 // current: Information being used during this GC cycle.
48 // previous: Information being used during the last GC cycle.
49 // A new GC cycle starts with the call to finishsweep_m.
50 // finishsweep_m moves the previous arena to the free arena,
51 // the current arena to the previous arena, and
52 // the next arena to the current arena.
53 // The next arena is populated as the spans request
54 // memory to hold gcmarkBits for the next GC cycle as well
55 // as allocBits for newly allocated spans.
56 //
57 // The pointer arithmetic is done "by hand" instead of using
58 // arrays to avoid bounds checks along critical performance
59 // paths.
60 // The sweep will free the old allocBits and set allocBits to the
61 // gcmarkBits. The gcmarkBits are replaced with a fresh zeroed
62 // out memory.
63 allocBits *gcBits
64 gcmarkBits *gcBits
65
66 // sweep generation:
67 // if sweepgen == h->sweepgen - 2, the span needs sweeping
68 // if sweepgen == h->sweepgen - 1, the span is currently being swept
69 // if sweepgen == h->sweepgen, the span is swept and ready to use
70 // if sweepgen == h->sweepgen + 1, the span was cached before sweep began and is still cached, and needs sweeping
71 // if sweepgen == h->sweepgen + 3, the span was swept and then cached and is still cached
72 // h->sweepgen is incremented by 2 after every GC
73
74 sweepgen uint32
75 divMul uint16 // for divide by elemsize - divMagic.mul
76 baseMask uint16 // if non-0, elemsize is a power of 2, & this will get object allocation base
77 allocCount uint16 // number of allocated objects
78 spanclass spanClass // size class and noscan (uint8)
79 state mSpanState // mspaninuse etc
80 needzero uint8 // needs to be zeroed before allocation
81 divShift uint8 // for divide by elemsize - divMagic.shift
82 divShift2 uint8 // for divide by elemsize - divMagic.shift2
83 scavenged bool // whether this span has had its pages released to the OS
84 elemsize uintptr // computed from sizeclass or from npages
85 limit uintptr // end of data in span
86 speciallock mutex // guards specials list
87 specials *special // linked list of special records sorted by offset.
88}
从上面的结构可以看出:
- next, prev: 指针域, mspan 一般都是以链表形式使用。
- npages: mspan 的大小为 page 大小的整数倍。
- sizeclass: 0 ~ _NumSizeClasses 之间的一个值,这个解释了我们的疑问。比如,sizeclass = 3,那么这个 mspan 被分割成 32 byte 的块。
- elemsize: 通过 sizeclass 或者 npages 可以计算出来。比如 sizeclass = 3, elemsize = 32 byte。对于大于 32Kb 的内存分配,都是分配整数页,elemsize = page_size * npages。
- nelems: span 中包块的总数目。
- freeindex: 0 ~ nelemes-1,表示分配到第几个块。
2.3 mcentral
上面说到当 mcache 不够用的时候,会从 mcentral 申请。那我们下面就来介绍一下 mcentral。
1// runtime/mheap.go
2//go:notinheap
3type mSpanList struct {
4 first *mspan // first span in list, or nil if none
5 last *mspan // last span in list, or nil if none
6}
1// runtime/mcentral.go
2// Central list of free objects of a given size.
3//
4//go:notinheap
5type mcentral struct {
6 lock mutex
7 spanclass spanClass
8 nonempty mSpanList // list of spans with a free object, ie a nonempty free list
9 empty mSpanList // list of spans with no free objects (or cached in an mcache)
10
11 // nmalloc is the cumulative count of objects allocated from
12 // this mcentral, assuming all spans in mcaches are
13 // fully-allocated. Written atomically, read under STW.
14 nmalloc uint64
15}
mcentral 分析:
- sizeclass: 也有成员 sizeclass,那么 mcentral 是不是也有 67 个呢?是的。
- lock: 因为会有多个 P 过来竞争。
- nonempty: mspan 的双向链表,当前 mcentral 中可用的 mspan list。
- empty: 已经被使用的,可以认为是一种对所有 mspan 的 track。
问题来了,mcentral 存在于什么地方?虽然在上面我们将 mcentral 和 mheap 作为两个部分来讲,但是作为全局的结构,这两部分是可以定义在一起的。实际上也是这样,mcentral 包含在 mheap 中。
2.4 mheap
1// runtime/mheap.go
2
3// minPhysPageSize is a lower-bound on the physical page size. The
4// true physical page size may be larger than this. In contrast,
5// sys.PhysPageSize is an upper-bound on the physical page size.
6const minPhysPageSize = 4096
7
8// Main malloc heap.
9// The heap itself is the "free" and "scav" treaps,
10// but all the other global data is here too.
11//
12// mheap must not be heap-allocated because it contains mSpanLists,
13// which must not be heap-allocated.
14//
15//go:notinheap
16type mheap struct {
17 // lock must only be acquired on the system stack, otherwise a g
18 // could self-deadlock if its stack grows with the lock held.
19 lock mutex
20 free mTreap // free spans
21 sweepgen uint32 // sweep generation, see comment in mspan
22 sweepdone uint32 // all spans are swept
23 sweepers uint32 // number of active sweepone calls
24
25 // allspans is a slice of all mspans ever created. Each mspan
26 // appears exactly once.
27 //
28 // The memory for allspans is manually managed and can be
29 // reallocated and move as the heap grows.
30 //
31 // In general, allspans is protected by mheap_.lock, which
32 // prevents concurrent access as well as freeing the backing
33 // store. Accesses during STW might not hold the lock, but
34 // must ensure that allocation cannot happen around the
35 // access (since that may free the backing store).
36 allspans []*mspan // all spans out there
37
38 // sweepSpans contains two mspan stacks: one of swept in-use
39 // spans, and one of unswept in-use spans. These two trade
40 // roles on each GC cycle. Since the sweepgen increases by 2
41 // on each cycle, this means the swept spans are in
42 // sweepSpans[sweepgen/2%2] and the unswept spans are in
43 // sweepSpans[1-sweepgen/2%2]. Sweeping pops spans from the
44 // unswept stack and pushes spans that are still in-use on the
45 // swept stack. Likewise, allocating an in-use span pushes it
46 // on the swept stack.
47 sweepSpans [2]gcSweepBuf
48
49 _ uint32 // align uint64 fields on 32-bit for atomics
50
51 // Proportional sweep
52 //
53 // These parameters represent a linear function from heap_live
54 // to page sweep count. The proportional sweep system works to
55 // stay in the black by keeping the current page sweep count
56 // above this line at the current heap_live.
57 //
58 // The line has slope sweepPagesPerByte and passes through a
59 // basis point at (sweepHeapLiveBasis, pagesSweptBasis). At
60 // any given time, the system is at (memstats.heap_live,
61 // pagesSwept) in this space.
62 //
63 // It's important that the line pass through a point we
64 // control rather than simply starting at a (0,0) origin
65 // because that lets us adjust sweep pacing at any time while
66 // accounting for current progress. If we could only adjust
67 // the slope, it would create a discontinuity in debt if any
68 // progress has already been made.
69 pagesInUse uint64 // pages of spans in stats mSpanInUse; R/W with mheap.lock
70 pagesSwept uint64 // pages swept this cycle; updated atomically
71 pagesSweptBasis uint64 // pagesSwept to use as the origin of the sweep ratio; updated atomically
72 sweepHeapLiveBasis uint64 // value of heap_live to use as the origin of sweep ratio; written with lock, read without
73 sweepPagesPerByte float64 // proportional sweep ratio; written with lock, read without
74 // TODO(austin): pagesInUse should be a uintptr, but the 386
75 // compiler can't 8-byte align fields.
76
77 // Scavenger pacing parameters
78 //
79 // The two basis parameters and the scavenge ratio parallel the proportional
80 // sweeping implementation, the primary differences being that:
81 // * Scavenging concerns itself with RSS, estimated as heapRetained()
82 // * Rather than pacing the scavenger to the GC, it is paced to a
83 // time-based rate computed in gcPaceScavenger.
84 //
85 // scavengeRetainedGoal represents our goal RSS.
86 //
87 // All fields must be accessed with lock.
88 //
89 // TODO(mknyszek): Consider abstracting the basis fields and the scavenge ratio
90 // into its own type so that this logic may be shared with proportional sweeping.
91 scavengeTimeBasis int64
92 scavengeRetainedBasis uint64
93 scavengeBytesPerNS float64
94 scavengeRetainedGoal uint64
95 scavengeGen uint64 // incremented on each pacing update
96
97 // Page reclaimer state
98
99 // reclaimIndex is the page index in allArenas of next page to
100 // reclaim. Specifically, it refers to page (i %
101 // pagesPerArena) of arena allArenas[i / pagesPerArena].
102 //
103 // If this is >= 1<<63, the page reclaimer is done scanning
104 // the page marks.
105 //
106 // This is accessed atomically.
107 reclaimIndex uint64
108 // reclaimCredit is spare credit for extra pages swept. Since
109 // the page reclaimer works in large chunks, it may reclaim
110 // more than requested. Any spare pages released go to this
111 // credit pool.
112 //
113 // This is accessed atomically.
114 reclaimCredit uintptr
115
116 // Malloc stats.
117 largealloc uint64 // bytes allocated for large objects
118 nlargealloc uint64 // number of large object allocations
119 largefree uint64 // bytes freed for large objects (>maxsmallsize)
120 nlargefree uint64 // number of frees for large objects (>maxsmallsize)
121 nsmallfree [_NumSizeClasses]uint64 // number of frees for small objects (<=maxsmallsize)
122
123 // arenas is the heap arena map. It points to the metadata for
124 // the heap for every arena frame of the entire usable virtual
125 // address space.
126 //
127 // Use arenaIndex to compute indexes into this array.
128 //
129 // For regions of the address space that are not backed by the
130 // Go heap, the arena map contains nil.
131 //
132 // Modifications are protected by mheap_.lock. Reads can be
133 // performed without locking; however, a given entry can
134 // transition from nil to non-nil at any time when the lock
135 // isn't held. (Entries never transitions back to nil.)
136 //
137 // In general, this is a two-level mapping consisting of an L1
138 // map and possibly many L2 maps. This saves space when there
139 // are a huge number of arena frames. However, on many
140 // platforms (even 64-bit), arenaL1Bits is 0, making this
141 // effectively a single-level map. In this case, arenas[0]
142 // will never be nil.
143 arenas [1 << arenaL1Bits]*[1 << arenaL2Bits]*heapArena
144
145 // heapArenaAlloc is pre-reserved space for allocating heapArena
146 // objects. This is only used on 32-bit, where we pre-reserve
147 // this space to avoid interleaving it with the heap itself.
148 heapArenaAlloc linearAlloc
149
150 // arenaHints is a list of addresses at which to attempt to
151 // add more heap arenas. This is initially populated with a
152 // set of general hint addresses, and grown with the bounds of
153 // actual heap arena ranges.
154 arenaHints *arenaHint
155
156 // arena is a pre-reserved space for allocating heap arenas
157 // (the actual arenas). This is only used on 32-bit.
158 arena linearAlloc
159
160 // allArenas is the arenaIndex of every mapped arena. This can
161 // be used to iterate through the address space.
162 //
163 // Access is protected by mheap_.lock. However, since this is
164 // append-only and old backing arrays are never freed, it is
165 // safe to acquire mheap_.lock, copy the slice header, and
166 // then release mheap_.lock.
167 allArenas []arenaIdx
168
169 // sweepArenas is a snapshot of allArenas taken at the
170 // beginning of the sweep cycle. This can be read safely by
171 // simply blocking GC (by disabling preemption).
172 sweepArenas []arenaIdx
173
174 // curArena is the arena that the heap is currently growing
175 // into. This should always be physPageSize-aligned.
176 curArena struct {
177 base, end uintptr
178 }
179
180 _ uint32 // ensure 64-bit alignment of central
181
182 // central free lists for small size classes.
183 // the padding makes sure that the mcentrals are
184 // spaced CacheLinePadSize bytes apart, so that each mcentral.lock
185 // gets its own cache line.
186 // central is indexed by spanClass.
187 central [numSpanClasses]struct {
188 mcentral mcentral
189 pad [cpu.CacheLinePadSize - unsafe.Sizeof(mcentral{})%cpu.CacheLinePadSize]byte
190 }
191
192 spanalloc fixalloc // allocator for span*
193 cachealloc fixalloc // allocator for mcache*
194 treapalloc fixalloc // allocator for treapNodes*
195 specialfinalizeralloc fixalloc // allocator for specialfinalizer*
196 specialprofilealloc fixalloc // allocator for specialprofile*
197 speciallock mutex // lock for special record allocators.
198 arenaHintAlloc fixalloc // allocator for arenaHints
199
200 unused *specialfinalizer // never set, just here to force the specialfinalizer type into DWARF
201}
202
203var mheap_ mheap
mheap_ 是一个全局变量,会在系统初始化的时候初始化(在函数 mallocinit() 中)。我们先看一下 mheap 具体结构。
-
allspans []*mspan: 所有的 spans 都是通过 mheap_ 申请,所有申请过的 mspan 都会记录在 allspans。结构体中的 lock 就是用来保证并发安全的。注释中有关于 STW 的说明,这个之后会在 Golang 的 GC 文章中细说。
-
central [_NumSizeClasses]…: 这个就是之前介绍的 mcentral ,每种大小的块对应一个 mcentral。mcentral 上面介绍过了。pad 可以认为是一个字节填充,为了避免伪共享(false sharing)问题的。False Sharing伪共享 可以参考 False Sharing - wikipedia,这里就不细说了。
-
sweepgen, sweepdone: GC 相关。(Golang 的 GC 策略是 Mark & Sweep, 这里是用来表示 sweep 的,这里就不再深入了。)
-
free [_MaxMHeapList]mSpanList: 这是一个 SpanList 数组,每个 SpanList 里面的 mspan 由 1 ~ 127 (_MaxMHeapList - 1) 个 page 组成。比如 free[3] 是由包含 3 个 page 的 mspan 组成的链表。free 表示的是 free list,也就是未分配的。对应的还有 busy list。
-
freelarge mSpanList: mspan 组成的链表,每个元素(也就是 mspan)的 page 个数大于 127。对应的还有 busylarge。
-
spans []*mspan: 记录 arena 区域页号(page number)和 mspan 的映射关系。
-
arena_start, arena_end, arena_used: 要解释这几个变量之前要解释一下 arena。arena 是 Golang 中用于分配内存的连续虚拟地址区域。由 mheap 管理,堆上申请的所有内存都来自 arena。那么如何标志内存可用呢?操作系统的常见做法用两种:一种是用链表将所有的可用内存都串起来;另一种是使用位图来标志内存块是否可用。结合上面一条 spans,内存的布局是下面这样的。
1+-----------------------+---------------------+-----------------------+
2| spans | bitmap | arena |
3+-----------------------+---------------------+-----------------------+
-
spanalloc, cachealloc fixalloc: fixalloc 是 free-list,用来分配特定大小的块。比如 cachealloc 分配 mcache 大小的块。
-
剩下的是一些统计信息和 GC 相关的信息,这里暂且按住不表,以后专门拿出来说。
3. 初始化
在系统初始化阶段,上面介绍的几个结构会被进行初始化,我们直接看一下初始化代码:mallocinit()。
1// runtime/malloc.go
2
3// OS memory management abstraction layer
4//
5// Regions of the address space managed by the runtime may be in one of four
6// states at any given time:
7// 1) None - Unreserved and unmapped, the default state of any region.
8// 2) Reserved - Owned by the runtime, but accessing it would cause a fault.
9// Does not count against the process' memory footprint.
10// 3) Prepared - Reserved, intended not to be backed by physical memory (though
11// an OS may implement this lazily). Can transition efficiently to
12// Ready. Accessing memory in such a region is undefined (may
13// fault, may give back unexpected zeroes, etc.).
14// 4) Ready - may be accessed safely.
15//
16// This set of states is more than is strictly necessary to support all the
17// currently supported platforms. One could get by with just None, Reserved, and
18// Ready. However, the Prepared state gives us flexibility for performance
19// purposes. For example, on POSIX-y operating systems, Reserved is usually a
20// private anonymous mmap'd region with PROT_NONE set, and to transition
21// to Ready would require setting PROT_READ|PROT_WRITE. However the
22// underspecification of Prepared lets us use just MADV_FREE to transition from
23// Ready to Prepared. Thus with the Prepared state we can set the permission
24// bits just once early on, we can efficiently tell the OS that it's free to
25// take pages away from us when we don't strictly need them.
26//
27// For each OS there is a common set of helpers defined that transition
28// memory regions between these states. The helpers are as follows:
29//
30// sysAlloc transitions an OS-chosen region of memory from None to Ready.
31// More specifically, it obtains a large chunk of zeroed memory from the
32// operating system, typically on the order of a hundred kilobytes
33// or a megabyte. This memory is always immediately available for use.
34//
35// sysFree transitions a memory region from any state to None. Therefore, it
36// returns memory unconditionally. It is used if an out-of-memory error has been
37// detected midway through an allocation or to carve out an aligned section of
38// the address space. It is okay if sysFree is a no-op only if sysReserve always
39// returns a memory region aligned to the heap allocator's alignment
40// restrictions.
41//
42// sysReserve transitions a memory region from None to Reserved. It reserves
43// address space in such a way that it would cause a fatal fault upon access
44// (either via permissions or not committing the memory). Such a reservation is
45// thus never backed by physical memory.
46// If the pointer passed to it is non-nil, the caller wants the
47// reservation there, but sysReserve can still choose another
48// location if that one is unavailable.
49// NOTE: sysReserve returns OS-aligned memory, but the heap allocator
50// may use larger alignment, so the caller must be careful to realign the
51// memory obtained by sysReserve.
52//
53// sysMap transitions a memory region from Reserved to Prepared. It ensures the
54// memory region can be efficiently transitioned to Ready.
55//
56// sysUsed transitions a memory region from Prepared to Ready. It notifies the
57// operating system that the memory region is needed and ensures that the region
58// may be safely accessed. This is typically a no-op on systems that don't have
59// an explicit commit step and hard over-commit limits, but is critical on
60// Windows, for example.
61//
62// sysUnused transitions a memory region from Ready to Prepared. It notifies the
63// operating system that the physical pages backing this memory region are no
64// longer needed and can be reused for other purposes. The contents of a
65// sysUnused memory region are considered forfeit and the region must not be
66// accessed again until sysUsed is called.
67//
68// sysFault transitions a memory region from Ready or Prepared to Reserved. It
69// marks a region such that it will always fault if accessed. Used only for
70// debugging the runtime.
71
72func mallocinit() {
73 if class_to_size[_TinySizeClass] != _TinySize {
74 throw("bad TinySizeClass")
75 }
76
77 testdefersizes()
78
79 if heapArenaBitmapBytes&(heapArenaBitmapBytes-1) != 0 {
80 // heapBits expects modular arithmetic on bitmap
81 // addresses to work.
82 throw("heapArenaBitmapBytes not a power of 2")
83 }
84
85 // Copy class sizes out for statistics table.
86 for i := range class_to_size {
87 memstats.by_size[i].size = uint32(class_to_size[i])
88 }
89
90 // Check physPageSize.
91 if physPageSize == 0 {
92 // The OS init code failed to fetch the physical page size.
93 throw("failed to get system page size")
94 }
95 if physPageSize < minPhysPageSize {
96 print("system page size (", physPageSize, ") is smaller than minimum page size (", minPhysPageSize, ")\n")
97 throw("bad system page size")
98 }
99 if physPageSize&(physPageSize-1) != 0 {
100 print("system page size (", physPageSize, ") must be a power of 2\n")
101 throw("bad system page size")
102 }
103 if physHugePageSize&(physHugePageSize-1) != 0 {
104 print("system huge page size (", physHugePageSize, ") must be a power of 2\n")
105 throw("bad system huge page size")
106 }
107 if physHugePageSize != 0 {
108 // Since physHugePageSize is a power of 2, it suffices to increase
109 // physHugePageShift until 1<<physHugePageShift == physHugePageSize.
110 for 1<<physHugePageShift != physHugePageSize {
111 physHugePageShift++
112 }
113 }
114
115 // Initialize the heap.
116 mheap_.init()
117 _g_ := getg()
118 _g_.m.mcache = allocmcache()
119 //系统指针大小 PtrSize = 8,表示这是一个 64 位系统。
120 // Create initial arena growth hints.
121 if sys.PtrSize == 8 {
122 // On a 64-bit machine, we pick the following hints
123 // because:
124 //
125 // 1. Starting from the middle of the address space
126 // makes it easier to grow out a contiguous range
127 // without running in to some other mapping.
128 //
129 // 2. This makes Go heap addresses more easily
130 // recognizable when debugging.
131 //
132 // 3. Stack scanning in gccgo is still conservative,
133 // so it's important that addresses be distinguishable
134 // from other data.
135 //
136 // Starting at 0x00c0 means that the valid memory addresses
137 // will begin 0x00c0, 0x00c1, ...
138 // In little-endian, that's c0 00, c1 00, ... None of those are valid
139 // UTF-8 sequences, and they are otherwise as far away from
140 // ff (likely a common byte) as possible. If that fails, we try other 0xXXc0
141 // addresses. An earlier attempt to use 0x11f8 caused out of memory errors
142 // on OS X during thread allocations. 0x00c0 causes conflicts with
143 // AddressSanitizer which reserves all memory up to 0x0100.
144 // These choices reduce the odds of a conservative garbage collector
145 // not collecting memory because some non-pointer block of memory
146 // had a bit pattern that matched a memory address.
147 //
148 // However, on arm64, we ignore all this advice above and slam the
149 // allocation at 0x40 << 32 because when using 4k pages with 3-level
150 // translation buffers, the user address space is limited to 39 bits
151 // On darwin/arm64, the address space is even smaller.
152 // On AIX, mmaps starts at 0x0A00000000000000 for 64-bit.
153 // processes.
154 for i := 0x7f; i >= 0; i-- {
155 var p uintptr
156 switch {
157 case GOARCH == "arm64" && GOOS == "darwin":
158 p = uintptr(i)<<40 | uintptrMask&(0x0013<<28)
159 case GOARCH == "arm64":
160 p = uintptr(i)<<40 | uintptrMask&(0x0040<<32)
161 case GOOS == "aix":
162 if i == 0 {
163 // We don't use addresses directly after 0x0A00000000000000
164 // to avoid collisions with others mmaps done by non-go programs.
165 continue
166 }
167 p = uintptr(i)<<40 | uintptrMask&(0xa0<<52)
168 case raceenabled:
169 // The TSAN runtime requires the heap
170 // to be in the range [0x00c000000000,
171 // 0x00e000000000).
172 p = uintptr(i)<<32 | uintptrMask&(0x00c0<<32)
173 if p >= uintptrMask&0x00e000000000 {
174 continue
175 }
176 default:
177 p = uintptr(i)<<40 | uintptrMask&(0x00c0<<32)
178 }
179 // go1.10以前是申请连续虚拟空间
180 // pSize = bitmapSize + spansSize + arenaSize + _PageSize //向 OS 申请大小为 pSize 的连续的虚拟地址空间
181 // p = uintptr(sysReserve(unsafe.Pointer(p), pSize, &reserved))
182 // if p != 0 {
183 // break
184 // }
185 hint := (*arenaHint)(mheap_.arenaHintAlloc.alloc())
186 hint.addr = p
187 hint.next, mheap_.arenaHints = mheap_.arenaHints, hint
188 }
189 } else {//这里是 32 位系统代码对应的操作
190 // On a 32-bit machine, we're much more concerned
191 // about keeping the usable heap contiguous.
192 // Hence:
193 //
194 // 1. We reserve space for all heapArenas up front so
195 // they don't get interleaved with the heap. They're
196 // ~258MB, so this isn't too bad. (We could reserve a
197 // smaller amount of space up front if this is a
198 // problem.)
199 //
200 // 2. We hint the heap to start right above the end of
201 // the binary so we have the best chance of keeping it
202 // contiguous.
203 //
204 // 3. We try to stake out a reasonably large initial
205 // heap reservation.
206
207 const arenaMetaSize = (1 << arenaBits) * unsafe.Sizeof(heapArena{})
208 meta := uintptr(sysReserve(nil, arenaMetaSize))
209 if meta != 0 {
210 mheap_.heapArenaAlloc.init(meta, arenaMetaSize)
211 }
212
213 // We want to start the arena low, but if we're linked
214 // against C code, it's possible global constructors
215 // have called malloc and adjusted the process' brk.
216 // Query the brk so we can avoid trying to map the
217 // region over it (which will cause the kernel to put
218 // the region somewhere else, likely at a high
219 // address).
220 procBrk := sbrk0()
221
222 // If we ask for the end of the data segment but the
223 // operating system requires a little more space
224 // before we can start allocating, it will give out a
225 // slightly higher pointer. Except QEMU, which is
226 // buggy, as usual: it won't adjust the pointer
227 // upward. So adjust it upward a little bit ourselves:
228 // 1/4 MB to get away from the running binary image.
229 p := firstmoduledata.end
230 if p < procBrk {
231 p = procBrk
232 }
233 if mheap_.heapArenaAlloc.next <= p && p < mheap_.heapArenaAlloc.end {
234 p = mheap_.heapArenaAlloc.end
235 }
236 p = round(p+(256<<10), heapArenaBytes)
237 // Because we're worried about fragmentation on
238 // 32-bit, we try to make a large initial reservation.
239 arenaSizes := []uintptr{
240 512 << 20,
241 256 << 20,
242 128 << 20,
243 }
244 for _, arenaSize := range arenaSizes {
245 a, size := sysReserveAligned(unsafe.Pointer(p), arenaSize, heapArenaBytes)
246 if a != nil {
247 mheap_.arena.init(uintptr(a), size)
248 p = uintptr(a) + size // For hint below
249 break
250 }
251 }
252 hint := (*arenaHint)(mheap_.arenaHintAlloc.alloc())
253 hint.addr = p
254 hint.next, mheap_.arenaHints = mheap_.arenaHints, hint
255 }
256}
3.1 arena 相关
arena 相关地址的大小初始化代码如下。
1// runtime/mheap.go
2// Try to add at least npage pages of memory to the heap,
3// returning whether it worked.
4//
5// h must be locked.
6func (h *mheap) grow(npage uintptr) bool {
7 ask := npage << _PageShift
8
9 nBase := round(h.curArena.base+ask, physPageSize)
round 是一个对齐操作,向上取 _PageSize 的倍数。实现也很有意思,代码如下。
1// runtime/stubs.go
2// round n up to a multiple of a. a must be a power of 2.
3func round(n, a uintptr) uintptr {
4 return (n + a - 1) &^ (a - 1)
5}
这里值得注意的是 Golang 的内存管理虚拟地址页大小为 8k
1// runtime/malloc.go
2_PageSize = 1 << _PageShift
3_PageShift = 13
go1.8的解析
1arena 相关地址的大小初始化代码如下。
2
3arenaSize := round(_MaxMem, _PageSize)
4bitmapSize = arenaSize / (sys.PtrSize * 8 / 2)
5spansSize = arenaSize / _PageSize * sys.PtrSize
6spansSize = round(spansSize, _PageSize)
7
8_MaxMem = uintptr(1<<_MHeapMap_TotalBits - 1)
9首先解释一下变量 _MaxMem ,里面还有一个变量就不再列出来了。简单来说 _MaxMem 就是系统为 arena 区分配的大小:64 位系统分配 512 G;对于 Windows 64 位系统,arena 区分配 32 G。round 是一个对齐操作,向上取 _PageSize 的倍数。实现也很有意思,代码如下。
10
11// round n up to a multiple of a. a must be a power of 2.
12func round(n, a uintptr) uintptr {
13 return (n + a - 1) &^ (a - 1)
14}
15bitmap 用两个 bit 表示一个字的可用状态,那么算下来 bitmap 的大小为 16 G。读过 Golang 源码的同学会发现其实这段代码的注释里写的 bitmap 的大小为 32 G。其实是这段注释很久没有更新了,之前是用 4 个 bit 来表示一个字的可用状态,这真是一个悲伤的故事啊。
16
17spans 记录的 arena 区的块页号和对应的 mspan 指针的对应关系。比如 arena 区内存地址 x,对应的页号就是 page_num = (x - arena_start) / page_size,那么 spans 就会记录 spans[page_num] = x。如果 arena 为 512 G的话,spans 区的大小为 512 G / 8K * 8 = 512 M。这里值得注意的是 Golang 的内存管理虚拟地址页大小为 8k。
18
19_PageSize = 1 << _PageShift
20_PageShift = 13
21所以这一段连续的的虚拟内存布局(64 位)如下:
22
23+-----------------------+---------------------+-----------------------+
24| spans 512M | bitmap 16G | arena 512 |
25+-----------------------+---------------------+-----------------------+
3.2 虚拟地址申请
go1.8的解析
1主要是下面这段代码。
2
3//尝试从不同地址开始申请
4for i := 0; i <= 0x7f; i++ {
5 switch {
6 case GOARCH == "arm64" && GOOS == "darwin":
7 p = uintptr(i)<<40 | uintptrMask&(0x0013<<28)
8 case GOARCH == "arm64":
9 p = uintptr(i)<<40 | uintptrMask&(0x0040<<32)
10 default:
11 p = uintptr(i)<<40 | uintptrMask&(0x00c0<<32)
12 }
13 pSize = bitmapSize + spansSize + arenaSize + _PageSize
14 //向 OS 申请大小为 pSize 的连续的虚拟地址空间
15 p = uintptr(sysReserve(unsafe.Pointer(p), pSize, &reserved))
16 if p != 0 {
17 break
18 }
19}
20初始化的时候,Golang 向操作系统申请一段连续的地址空间,就是上面的 spans + bitmap + arena。p 就是这段连续地址空间的开始地址,不同平台的 p 取值不一样。像 OS 申请的时候视不同的 OS 版本,调用不同的系统调用,比如 Unix 系统调用 mmap (mmap 想操作系统内核申请新的虚拟地址区间,可指定起始地址和长度),Windows 系统调用 VirtualAlloc (类似 mmap)。
21
22//bsd
23func sysReserve(v unsafe.Pointer, n uintptr, reserved *bool) unsafe.Pointer {
24 if sys.PtrSize == 8 && uint64(n) > 1<<32 || sys.GoosNacl != 0 {
25 *reserved = false
26 return v
27 }
28
29 p := mmap(v, n, _PROT_NONE, _MAP_ANON|_MAP_PRIVATE, -1, 0)
30 if uintptr(p) < 4096 {
31 return nil
32 }
33 *reserved = true
34 return p
35}
36//darwin
37func sysReserve(v unsafe.Pointer, n uintptr, reserved *bool) unsafe.Pointer {
38 *reserved = true
39 p := mmap(v, n, _PROT_NONE, _MAP_ANON|_MAP_PRIVATE, -1, 0)
40 if uintptr(p) < 4096 {
41 return nil
42 }
43 return p
44}
45//linux
46func sysReserve(v unsafe.Pointer, n uintptr, reserved *bool) unsafe.Pointer {
47 ...
48 p := mmap(v, n, _PROT_NONE, _MAP_ANON|_MAP_PRIVATE, -1, 0)
49 if uintptr(p) < 4096 {
50 return nil
51 }
52 *reserved = true
53 return p
54}
55//windows
56func sysReserve(v unsafe.Pointer, n uintptr, reserved *bool) unsafe.Pointer {
57 *reserved = true
58 // v is just a hint.
59 // First try at v.
60 v = unsafe.Pointer(stdcall4(_VirtualAlloc, uintptr(v), n, _MEM_RESERVE, _PAGE_READWRITE))
61 if v != nil {
62 return v
63 } // Next let the kernel choose the address.
64 return unsafe.Pointer(stdcall4(_VirtualAlloc, 0, n, _MEM_RESERVE, _PAGE_READWRITE))
65}
go1.13
1// linux
2func sysReserve(v unsafe.Pointer, n uintptr) unsafe.Pointer {
3 p, err := mmap(v, n, _PROT_NONE, _MAP_ANON|_MAP_PRIVATE, -1, 0)
4 if err != 0 {
5 return nil
6 }
7 return p
8}
9// windows
10func sysReserve(v unsafe.Pointer, n uintptr) unsafe.Pointer {
11 // v is just a hint.
12 // First try at v.
13 // This will fail if any of [v, v+n) is already reserved.
14 v = unsafe.Pointer(stdcall4(_VirtualAlloc, uintptr(v), n, _MEM_RESERVE, _PAGE_READWRITE))
15 if v != nil {
16 return v
17 }
18
19 // Next let the kernel choose the address.
20 return unsafe.Pointer(stdcall4(_VirtualAlloc, 0, n, _MEM_RESERVE, _PAGE_READWRITE))
21}
3.3 mheap 初始化
go1.8
1我们上面介绍 mheap 结构的时候知道 spans, bitmap, arena 都是存在于 mheap 中的,从操作系统申请完地址之后就是初始化 mheap 了。
2
3func mallocinit() {
4 ...
5 p1 := round(p, _PageSize)
6
7 spansStart := p1
8 mheap_.bitmap = p1 + spansSize + bitmapSize
9 if sys.PtrSize == 4 {
10 // Set arena_start such that we can accept memory
11 // reservations located anywhere in the 4GB virtual space.
12 mheap_.arena_start = 0
13 } else {
14 mheap_.arena_start = p1 + (spansSize + bitmapSize)
15 }
16 mheap_.arena_end = p + pSize
17 mheap_.arena_used = p1 + (spansSize + bitmapSize)
18 mheap_.arena_reserved = reserved
19 if mheap_.arena_start&(_PageSize-1) != 0 {
20 println("bad pagesize", hex(p), hex(p1), hex(spansSize), hex(bitmapSize), hex(_PageSize), "start", hex(mheap_.arena_start))
21 throw("misrounded allocation in mallocinit")
22 } // Initialize the rest of the allocator.
23 mheap_.init(spansStart, spansSize) //获取当前 G
24 _g_ := getg() // 获取 G 上绑定的 M 的 mcache
25 _g_.m.mcache = allocmcache()
26}
27p 是从连续虚拟地址的起始地址,先进行对齐,然后初始化 arena,bitmap,spans 地址。mheap_.init()会初始化 fixalloc 等相关的成员,还有 mcentral 的初始化。
28
29func (h *mheap) init(spansStart, spansBytes uintptr) {
30 h.spanalloc.init(unsafe.Sizeof(mspan{}), recordspan, unsafe.Pointer(h), &memstats.mspan_sys)
31 h.cachealloc.init(unsafe.Sizeof(mcache{}), nil, nil, &memstats.mcache_sys)
32 h.specialfinalizeralloc.init(unsafe.Sizeof(specialfinalizer{}), nil, nil, &memstats.other_sys)
33 h.specialprofilealloc.init(unsafe.Sizeof(specialprofile{}), nil, nil, &memstats.other_sys)
34
35 h.spanalloc.zero = false
36
37 // h->mapcache needs no init
38 for i := range h.free {
39 h.free[i].init()
40 h.busy[i].init()
41 }
42
43 h.freelarge.init()
44 h.busylarge.init()
45 for i := range h.central {
46 h.central[i].mcentral.init(int32(i))
47 }
48
49 sp := (*slice)(unsafe.Pointer(&h.spans))
50 sp.array = unsafe.Pointer(spansStart)
51 sp.len = 0
52 sp.cap = int(spansBytes / sys.PtrSize)
53}
54mheap 初始化之后,对当前的线程也就是 M 进行初始化。
55
56//获取当前 G
57_g_ := getg()
58// 获取 G 上绑定的 M 的 mcache
59_g_.m.mcache = allocmcache()
3.4 per-P mcache 初始化
go1.8
1上面好像并没有说到针对 P 的 mcache 初始化,因为这个时候还没有初始化 P。我们看一下 bootstrap 的代码。
2
3func schedinit() {
4 ...
5 mallocinit()
6 ... if procs > _MaxGomaxprocs {
7 procs = _MaxGomaxprocs
8 } if procresize(procs) != nil {
9 ...
10 }
11}
12其中 mallocinit() 上面说过了。对 P 的初始化在函数 procresize() 中执行,我们下面只看内存相关的部分。
13
14func procresize(nprocs int32) *p {
15 ... // initialize new P's
16 for i := int32(0); i < nprocs; i++ {
17 pp := allp[i] if pp == nil {
18 pp = new(p)
19 pp.id = i
20 pp.status = _Pgcstop
21 pp.sudogcache = pp.sudogbuf[:0]
22 for i := range pp.deferpool {
23 pp.deferpool[i] = pp.deferpoolbuf[i][:0]
24 }
25 atomicstorep(unsafe.Pointer(&allp[i]), unsafe.Pointer(pp))
26 } // P mcache 初始化
27 if pp.mcache == nil {
28 if old == 0 && i == 0 {
29 if getg().m.mcache == nil {
30 throw("missing mcache?")
31 } // P[0] 分配给主 Goroutine
32 pp.mcache = getg().m.mcache // bootstrap
33 } else { // P[0] 之外的 P 申请 mcache
34 pp.mcache = allocmcache()
35 }
36 }
37 ...
38 }
39 ...
40}
41所有的 P 都存放在一个全局数组 allp 中,procresize() 的目的就是将 allp 中用到的 P 进行初始化,同时对多余 P 的资源剥离。我们看一下 allocmcache。
42
43func allocmcache() *mcache {
44 lock(&mheap_.lock)
45 c := (*mcache)(mheap_.cachealloc.alloc())
46 unlock(&mheap_.lock)
47 for i := 0; i < _NumSizeClasses; i++ {
48 c.alloc[i] = &emptymspan
49 }
50 c.next_sample = nextSample() return c
51}
52allocmcache 是调用 mheap 中特定分配器 cachealloc 来分配的。我们前面说过 fixalloc 是一个特定大小内存块的 free list。那么 cachealloc 就是 mcache 的 free list,每次分配也很简单,直接取 list 表头就可以了。mcache.alloc 现在并没有分配,是从 mcentral 中分配的。
4. 内存分配
先说一下给对象 object 分配内存的主要流程:
- object size > 32K,则使用 mheap 直接分配。
- object size < 16 byte,使用 mcache 的小对象分配器 tiny 直接分配。 (其实 tiny 就是一个指针,暂且这么说吧。)
- object size > 16 byte && size <=32K byte 时,先使用 mcache 中对应的 size class 分配。
- 如果 mcache 对应的 size class 的 span 已经没有可用的块,则向 mcentral 请求。
- 如果 mcentral 也没有可用的块,则向 mheap 申请,并切分。
- 如果 mheap 也没有合适的 span,则想操作系统申请。
我们看一下在堆上,也就是 arena 区分配内存的相关函数。
1package main
2func foo() *int {
3 x := 1
4 return &x
5}
6func main() {
7 x := foo() println(*x)
8}
根据之前介绍的逃逸分析,foo() 中的 x 会被分配到堆上。把上面代码保存为 test1.go 看一下汇编代码。
1$ go build -gcflags '-l' -o test1 test1.go
2$ go tool objdump -s "main\.foo" test1
3TEXT main.foo(SB) /Users/didi/code/go/malloc_example/test2.go
4 test2.go:3 0x2040 65488b0c25a0080000 GS MOVQ GS:0x8a0, CX
5 test2.go:3 0x2049 483b6110 CMPQ 0x10(CX), SP
6 test2.go:3 0x204d 762a JBE 0x2079
7 test2.go:3 0x204f 4883ec10 SUBQ $0x10, SP
8 test2.go:4 0x2053 488d1d66460500 LEAQ 0x54666(IP), BX
9 test2.go:4 0x205a 48891c24 MOVQ BX, 0(SP)
10 test2.go:4 0x205e e82d8f0000 CALL runtime.newobject(SB)
11 test2.go:4 0x2063 488b442408 MOVQ 0x8(SP), AX
12 test2.go:4 0x2068 48c70001000000 MOVQ $0x1, 0(AX)
13 test2.go:5 0x206f 4889442418 MOVQ AX, 0x18(SP)
14 test2.go:5 0x2074 4883c410 ADDQ $0x10, SP
15 test2.go:5 0x2078 c3 RET
16 test2.go:3 0x2079 e8a28d0400 CALL runtime.morestack_noctxt(SB)
17 test2.go:3 0x207e ebc0 JMP main.foo(SB)
堆上内存分配调用了 runtime 包的 newobject 函数。
1// runtime/malloc.go
2// Allocate an object of size bytes.
3// Small objects are allocated from the per-P cache's free lists.
4// Large objects (> 32 kB) are allocated straight from the heap.
5func mallocgc(size uintptr, typ *_type, needzero bool) unsafe.Pointer {
6 if gcphase == _GCmarktermination {
7 throw("mallocgc called with gcphase == _GCmarktermination")
8 }
9
10 if size == 0 {
11 return unsafe.Pointer(&zerobase)
12 }
13
14 if debug.sbrk != 0 {
15 align := uintptr(16)
16 if typ != nil {
17 // TODO(austin): This should be just
18 // align = uintptr(typ.align)
19 // but that's only 4 on 32-bit platforms,
20 // even if there's a uint64 field in typ (see #599).
21 // This causes 64-bit atomic accesses to panic.
22 // Hence, we use stricter alignment that matches
23 // the normal allocator better.
24 if size&7 == 0 {
25 align = 8
26 } else if size&3 == 0 {
27 align = 4
28 } else if size&1 == 0 {
29 align = 2
30 } else {
31 align = 1
32 }
33 }
34 return persistentalloc(size, align, &memstats.other_sys)
35 }
36
37 // assistG is the G to charge for this allocation, or nil if
38 // GC is not currently active.
39 var assistG *g
40 if gcBlackenEnabled != 0 {
41 // Charge the current user G for this allocation.
42 assistG = getg()
43 if assistG.m.curg != nil {
44 assistG = assistG.m.curg
45 }
46 // Charge the allocation against the G. We'll account
47 // for internal fragmentation at the end of mallocgc.
48 assistG.gcAssistBytes -= int64(size)
49
50 if assistG.gcAssistBytes < 0 {
51 // This G is in debt. Assist the GC to correct
52 // this before allocating. This must happen
53 // before disabling preemption.
54 gcAssistAlloc(assistG)
55 }
56 }
57
58 // Set mp.mallocing to keep from being preempted by GC.
59 mp := acquirem()
60 if mp.mallocing != 0 {
61 throw("malloc deadlock")
62 }
63 if mp.gsignal == getg() {
64 throw("malloc during signal")
65 }
66 mp.mallocing = 1
67
68 shouldhelpgc := false
69 dataSize := size
70 c := gomcache()
71 var x unsafe.Pointer
72 noscan := typ == nil || typ.ptrdata == 0
73 if size <= maxSmallSize {
74 if noscan && size < maxTinySize { // 小于 16 byte 的小对象分配
75 // Tiny allocator.
76 //
77 // Tiny allocator combines several tiny allocation requests
78 // into a single memory block. The resulting memory block
79 // is freed when all subobjects are unreachable. The subobjects
80 // must be noscan (don't have pointers), this ensures that
81 // the amount of potentially wasted memory is bounded.
82 //
83 // Size of the memory block used for combining (maxTinySize) is tunable.
84 // Current setting is 16 bytes, which relates to 2x worst case memory
85 // wastage (when all but one subobjects are unreachable).
86 // 8 bytes would result in no wastage at all, but provides less
87 // opportunities for combining.
88 // 32 bytes provides more opportunities for combining,
89 // but can lead to 4x worst case wastage.
90 // The best case winning is 8x regardless of block size.
91 //
92 // Objects obtained from tiny allocator must not be freed explicitly.
93 // So when an object will be freed explicitly, we ensure that
94 // its size >= maxTinySize.
95 //
96 // SetFinalizer has a special case for objects potentially coming
97 // from tiny allocator, it such case it allows to set finalizers
98 // for an inner byte of a memory block.
99 //
100 // The main targets of tiny allocator are small strings and
101 // standalone escaping variables. On a json benchmark
102 // the allocator reduces number of allocations by ~12% and
103 // reduces heap size by ~20%.
104 off := c.tinyoffset // 地址对齐
105 // Align tiny pointer for required (conservative) alignment.
106 if size&7 == 0 {
107 off = round(off, 8)
108 } else if size&3 == 0 {
109 off = round(off, 4)
110 } else if size&1 == 0 {
111 off = round(off, 2)
112 }
113 if off+size <= maxTinySize && c.tiny != 0 {// 分配
114 // The object fits into existing tiny block.
115 x = unsafe.Pointer(c.tiny + off)
116 c.tinyoffset = off + size
117 c.local_tinyallocs++
118 mp.mallocing = 0
119 releasem(mp)
120 return x
121 }
122 // tiny 不够了,为其重新分配一个 16 byte 内存块
123 // Allocate a new maxTinySize block.
124 span := c.alloc[tinySpanClass]
125 v := nextFreeFast(span)
126 if v == 0 {
127 v, _, shouldhelpgc = c.nextFree(tinySpanClass)
128 }
129 x = unsafe.Pointer(v)//将申请的内存块全置为 0
130 (*[2]uint64)(x)[0] = 0
131 (*[2]uint64)(x)[1] = 0
132 // See if we need to replace the existing tiny block with the new one
133 // based on amount of remaining free space.
134 // 如果申请的内存块用不完,则将剩下的给 tiny,用 tinyoffset 记录分配了多少。
135 if size < c.tinyoffset || c.tiny == 0 {
136 c.tiny = uintptr(x)
137 c.tinyoffset = size
138 }
139 size = maxTinySize
140 } else {
141 var sizeclass uint8//计算 sizeclass
142 if size <= smallSizeMax-8 {
143 sizeclass = size_to_class8[(size+smallSizeDiv-1)/smallSizeDiv]
144 } else {
145 sizeclass = size_to_class128[(size-smallSizeMax+largeSizeDiv-1)/largeSizeDiv]
146 }
147 size = uintptr(class_to_size[sizeclass])
148 spc := makeSpanClass(sizeclass, noscan)
149 span := c.alloc[spc]
150 v := nextFreeFast(span)//从对应的 span 里面分配一个 object
151 if v == 0 {
152 v, span, shouldhelpgc = c.nextFree(spc)
153 }
154 x = unsafe.Pointer(v)
155 if needzero && span.needzero != 0 {
156 memclrNoHeapPointers(unsafe.Pointer(v), size)
157 }
158 }
159 } else {//object size > 32K byte
160 var s *mspan
161 shouldhelpgc = true
162 systemstack(func() {
163 s = largeAlloc(size, needzero, noscan)
164 })
165 s.freeindex = 1
166 s.allocCount = 1
167 x = unsafe.Pointer(s.base())
168 size = s.elemsize
169 }
170
171 var scanSize uintptr
172 if !noscan {
173 // If allocating a defer+arg block, now that we've picked a malloc size
174 // large enough to hold everything, cut the "asked for" size down to
175 // just the defer header, so that the GC bitmap will record the arg block
176 // as containing nothing at all (as if it were unused space at the end of
177 // a malloc block caused by size rounding).
178 // The defer arg areas are scanned as part of scanstack.
179 if typ == deferType {
180 dataSize = unsafe.Sizeof(_defer{})
181 }
182 heapBitsSetType(uintptr(x), size, dataSize, typ)
183 if dataSize > typ.size {
184 // Array allocation. If there are any
185 // pointers, GC has to scan to the last
186 // element.
187 if typ.ptrdata != 0 {
188 scanSize = dataSize - typ.size + typ.ptrdata
189 }
190 } else {
191 scanSize = typ.ptrdata
192 }
193 c.local_scan += scanSize
194 }
195
196 // Ensure that the stores above that initialize x to
197 // type-safe memory and set the heap bits occur before
198 // the caller can make x observable to the garbage
199 // collector. Otherwise, on weakly ordered machines,
200 // the garbage collector could follow a pointer to x,
201 // but see uninitialized memory or stale heap bits.
202 publicationBarrier()
203
204 // Allocate black during GC.
205 // All slots hold nil so no scanning is needed.
206 // This may be racing with GC so do it atomically if there can be
207 // a race marking the bit.
208 if gcphase != _GCoff {
209 gcmarknewobject(uintptr(x), size, scanSize)
210 }
211
212 if raceenabled {
213 racemalloc(x, size)
214 }
215
216 if msanenabled {
217 msanmalloc(x, size)
218 }
219
220 mp.mallocing = 0
221 releasem(mp)
222
223 if debug.allocfreetrace != 0 {
224 tracealloc(x, size, typ)
225 }
226
227 if rate := MemProfileRate; rate > 0 {
228 if rate != 1 && size < c.next_sample {
229 c.next_sample -= size
230 } else {
231 mp := acquirem()
232 profilealloc(mp, x, size)
233 releasem(mp)
234 }
235 }
236
237 if assistG != nil {
238 // Account for internal fragmentation in the assist
239 // debt now that we know it.
240 assistG.gcAssistBytes -= int64(size - dataSize)
241 }
242
243 if shouldhelpgc {
244 if t := (gcTrigger{kind: gcTriggerHeap}); t.test() {
245 gcStart(t)
246 }
247 }
248
249 return x
250}
251
252func largeAlloc(size uintptr, needzero bool, noscan bool) *mspan {
253 // print("largeAlloc size=", size, "\n")
254
255 if size+_PageSize < size {
256 throw("out of memory")
257 }
258 npages := size >> _PageShift
259 if size&_PageMask != 0 {
260 npages++
261 }
262
263 // Deduct credit for this span allocation and sweep if
264 // necessary. mHeap_Alloc will also sweep npages, so this only
265 // pays the debt down to npage pages.
266 deductSweepCredit(npages*_PageSize, npages)
267
268 s := mheap_.alloc(npages, makeSpanClass(0, noscan), true, needzero)
269 if s == nil {
270 throw("out of memory")
271 }
272 s.limit = s.base() + size
273 heapBitsForAddr(s.base()).initSpan(s)
274 return s
275}
276
277// implementation of new builtin
278// compiler (both frontend and SSA backend) knows the signature
279// of this function
280func newobject(typ *_type) unsafe.Pointer {
281 return mallocgc(typ.size, typ, true)
282}
整个分配过程可以根据 object size 拆解成三部分:size < 16 byte, 16 byte <= size <= 32 K byte, size > 32 K byte。
4.1 size 小于 16 byte
对于小于 16 byte 的内存块,mcache 有个专门的内存区域 tiny 用来分配,tiny 是指针,指向开始地址。 tinyoffset 表示 tiny 当前分配到什么地址了,之后的分配根据 tinyoffset 寻址。先根据要分配的对象大小进行地址对齐,比如 size 是 8 的倍数,tinyoffset 和 8 对齐。然后就是进行分配。如果 tiny 剩余的空间不够用,则重新申请一个 16 byte 的内存块,并分配给 object。如果有结余,则记录在 tiny 上。
4.2 size 大于 32 K byte
对于大于 32 Kb 的内存分配,直接跳过 mcache 和 mcentral,通过 mheap 分配。 对于大于 32 K 的内存分配都是分配整数页,先右移然后低位与计算需要的页数。
4.3 size 介于 16 和 32K
对于 size 介于 16 ~ 32K byte 的内存分配先计算应该分配的 sizeclass,然后去 mcache 里面 alloc[sizeclass] 申请,如果 mcache.alloc[sizeclass] 不足以申请,则 mcache 向 mcentral 申请,然后再分配。mcentral 给 mcache 分配完之后会判断自己需不需要扩充,如果需要则想 mheap 申请。
我们首先看一下如何计算 sizeclass 的,预先定义了两个数组:size_to_class8 和 size_to_class128。 数组 size_to_class8,其第 i 个值表示地址区间 ( (i-1)8, i8 ] (smallSizeDiv = 8) 对应的 sizeclass,size_to_class128 类似。小于 1024 - 8 = 1016 (smallSizeMax=1024),使用 size_to_class8,否则使用数组 size_to_class128。看一下数组具体的值:0, 1, 2, 3, 3, 4, 4…。举个例子,比如要分配 17 byte 的内存 (16 byte 以下的使用 mcache.tiny 分配),sizeclass = size_to_calss8[(17+7)/8] = size_to_class8[3] = 3。不得不说这种用空间换时间的策略确实提高了运行效率。
计算出 sizeclass,那么就可以去 mcache.alloc[sizeclass] 分配了,注意这是一个 mspan 指针,真正的分配函数是 nextFreeFast() 函数。如下。
1// nextFreeFast returns the next free object if one is quickly available.
2// Otherwise it returns 0.
3func nextFreeFast(s *mspan) gclinkptr {
4 theBit := sys.Ctz64(s.allocCache) // Is there a free object in the allocCache?
5 if theBit < 64 {
6 result := s.freeindex + uintptr(theBit)
7 if result < s.nelems {
8 freeidx := result + 1
9 if freeidx%64 == 0 && freeidx != s.nelems {
10 return 0
11 }
12 s.allocCache >>= uint(theBit + 1)
13 s.freeindex = freeidx
14 s.allocCount++
15 return gclinkptr(result*s.elemsize + s.base())
16 }
17 }
18 return 0
19}
allocCache 这里是用位图表示内存是否可用,1 表示可用。然后通过 span 里面的 freeindex 和 elemsize 来计算地址即可。
如果 mcache.alloc[sizeclass] 已经不够用了,则从 mcentral 申请内存到 mcache。
1// nextFree returns the next free object from the cached span if one is available.
2// Otherwise it refills the cache with a span with an available object and
3// returns that object along with a flag indicating that this was a heavy
4// weight allocation. If it is a heavy weight allocation the caller must
5// determine whether a new GC cycle needs to be started or if the GC is active
6// whether this goroutine needs to assist the GC.
7//
8// Must run in a non-preemptible context since otherwise the owner of
9// c could change.
10func (c *mcache) nextFree(spc spanClass) (v gclinkptr, s *mspan, shouldhelpgc bool) {
11 s = c.alloc[spc]
12 shouldhelpgc = false
13 freeIndex := s.nextFreeIndex()
14 if freeIndex == s.nelems {
15 // The span is full.
16 if uintptr(s.allocCount) != s.nelems {
17 println("runtime: s.allocCount=", s.allocCount, "s.nelems=", s.nelems)
18 throw("s.allocCount != s.nelems && freeIndex == s.nelems")
19 }
20 c.refill(spc)
21 shouldhelpgc = true
22 s = c.alloc[spc]
23
24 freeIndex = s.nextFreeIndex()
25 }
26
27 if freeIndex >= s.nelems {
28 throw("freeIndex is not valid")
29 }
30
31 v = gclinkptr(freeIndex*s.elemsize + s.base())
32 s.allocCount++
33 if uintptr(s.allocCount) > s.nelems {
34 println("s.allocCount=", s.allocCount, "s.nelems=", s.nelems)
35 throw("s.allocCount > s.nelems")
36 }
37 return
38}
mcache 向 mcentral,如果 mcentral 不够,则向 mheap 申请。
1// refill acquires a new span of span class spc for c. This span will
2// have at least one free object. The current span in c must be full.
3//
4// Must run in a non-preemptible context since otherwise the owner of
5// c could change.
6func (c *mcache) refill(spc spanClass) {
7 // Return the current cached span to the central lists.
8 s := c.alloc[spc]
9
10 if uintptr(s.allocCount) != s.nelems {
11 throw("refill of span with free space remaining")
12 }
13 if s != &emptymspan {
14 // Mark this span as no longer cached.
15 if s.sweepgen != mheap_.sweepgen+3 {
16 throw("bad sweepgen in refill")
17 }
18 atomic.Store(&s.sweepgen, mheap_.sweepgen)
19 }
20
21 // Get a new cached span from the central lists.
22 s = mheap_.central[spc].mcentral.cacheSpan()
23 if s == nil {
24 throw("out of memory")
25 }
26
27 if uintptr(s.allocCount) == s.nelems {
28 throw("span has no free space")
29 }
30
31 // Indicate that this span is cached and prevent asynchronous
32 // sweeping in the next sweep phase.
33 s.sweepgen = mheap_.sweepgen + 3
34
35 c.alloc[spc] = s
36}
1
2// Allocate a span to use in an mcache.
3func (c *mcentral) cacheSpan() *mspan {
4 // Deduct credit for this span allocation and sweep if necessary.
5 spanBytes := uintptr(class_to_allocnpages[c.spanclass.sizeclass()]) * _PageSize
6 deductSweepCredit(spanBytes, 0)
7
8 lock(&c.lock)
9 traceDone := false
10 if trace.enabled {
11 traceGCSweepStart()
12 }
13 sg := mheap_.sweepgen
14retry:
15 var s *mspan
16 for s = c.nonempty.first; s != nil; s = s.next {
17 if s.sweepgen == sg-2 && atomic.Cas(&s.sweepgen, sg-2, sg-1) {
18 c.nonempty.remove(s)
19 c.empty.insertBack(s)
20 unlock(&c.lock)
21 s.sweep(true)
22 goto havespan
23 }
24 if s.sweepgen == sg-1 {
25 // the span is being swept by background sweeper, skip
26 continue
27 }
28 // we have a nonempty span that does not require sweeping, allocate from it
29 c.nonempty.remove(s)
30 c.empty.insertBack(s)
31 unlock(&c.lock)
32 goto havespan
33 }
34
35 for s = c.empty.first; s != nil; s = s.next {
36 if s.sweepgen == sg-2 && atomic.Cas(&s.sweepgen, sg-2, sg-1) {
37 // we have an empty span that requires sweeping,
38 // sweep it and see if we can free some space in it
39 c.empty.remove(s)
40 // swept spans are at the end of the list
41 c.empty.insertBack(s)
42 unlock(&c.lock)
43 s.sweep(true)
44 freeIndex := s.nextFreeIndex()
45 if freeIndex != s.nelems {
46 s.freeindex = freeIndex
47 goto havespan
48 }
49 lock(&c.lock)
50 // the span is still empty after sweep
51 // it is already in the empty list, so just retry
52 goto retry
53 }
54 if s.sweepgen == sg-1 {
55 // the span is being swept by background sweeper, skip
56 continue
57 }
58 // already swept empty span,
59 // all subsequent ones must also be either swept or in process of sweeping
60 break
61 }
62 if trace.enabled {
63 traceGCSweepDone()
64 traceDone = true
65 }
66 unlock(&c.lock)
67
68 // Replenish central list if empty.
69 s = c.grow()
70 if s == nil {
71 return nil
72 }
73 lock(&c.lock)
74 c.empty.insertBack(s)
75 unlock(&c.lock)
76
77 // At this point s is a non-empty span, queued at the end of the empty list,
78 // c is unlocked.
79havespan:
80 if trace.enabled && !traceDone {
81 traceGCSweepDone()
82 }
83 n := int(s.nelems) - int(s.allocCount)
84 if n == 0 || s.freeindex == s.nelems || uintptr(s.allocCount) == s.nelems {
85 throw("span has no free objects")
86 }
87 // Assume all objects from this span will be allocated in the
88 // mcache. If it gets uncached, we'll adjust this.
89 atomic.Xadd64(&c.nmalloc, int64(n))
90 usedBytes := uintptr(s.allocCount) * s.elemsize
91 atomic.Xadd64(&memstats.heap_live, int64(spanBytes)-int64(usedBytes))
92 if trace.enabled {
93 // heap_live changed.
94 traceHeapAlloc()
95 }
96 if gcBlackenEnabled != 0 {
97 // heap_live changed.
98 gcController.revise()
99 }
100 freeByteBase := s.freeindex &^ (64 - 1)
101 whichByte := freeByteBase / 8
102 // Init alloc bits cache.
103 s.refillAllocCache(whichByte)
104
105 // Adjust the allocCache so that s.freeindex corresponds to the low bit in
106 // s.allocCache.
107 s.allocCache >>= s.freeindex % 64
108
109 return s
110}
111
112// Return span from an mcache.
113func (c *mcentral) uncacheSpan(s *mspan) {
114 if s.allocCount == 0 {
115 throw("uncaching span but s.allocCount == 0")
116 }
117
118 sg := mheap_.sweepgen
119 stale := s.sweepgen == sg+1
120 if stale {
121 // Span was cached before sweep began. It's our
122 // responsibility to sweep it.
123 //
124 // Set sweepgen to indicate it's not cached but needs
125 // sweeping and can't be allocated from. sweep will
126 // set s.sweepgen to indicate s is swept.
127 atomic.Store(&s.sweepgen, sg-1)
128 } else {
129 // Indicate that s is no longer cached.
130 atomic.Store(&s.sweepgen, sg)
131 }
132
133 n := int(s.nelems) - int(s.allocCount)
134 if n > 0 {
135 // cacheSpan updated alloc assuming all objects on s
136 // were going to be allocated. Adjust for any that
137 // weren't. We must do this before potentially
138 // sweeping the span.
139 atomic.Xadd64(&c.nmalloc, -int64(n))
140
141 lock(&c.lock)
142 c.empty.remove(s)
143 c.nonempty.insert(s)
144 if !stale {
145 // mCentral_CacheSpan conservatively counted
146 // unallocated slots in heap_live. Undo this.
147 //
148 // If this span was cached before sweep, then
149 // heap_live was totally recomputed since
150 // caching this span, so we don't do this for
151 // stale spans.
152 atomic.Xadd64(&memstats.heap_live, -int64(n)*int64(s.elemsize))
153 }
154 unlock(&c.lock)
155 }
156
157 if stale {
158 // Now that s is in the right mcentral list, we can
159 // sweep it.
160 s.sweep(false)
161 }
162}
163
164// freeSpan updates c and s after sweeping s.
165// It sets s's sweepgen to the latest generation,
166// and, based on the number of free objects in s,
167// moves s to the appropriate list of c or returns it
168// to the heap.
169// freeSpan reports whether s was returned to the heap.
170// If preserve=true, it does not move s (the caller
171// must take care of it).
172func (c *mcentral) freeSpan(s *mspan, preserve bool, wasempty bool) bool {
173 if sg := mheap_.sweepgen; s.sweepgen == sg+1 || s.sweepgen == sg+3 {
174 throw("freeSpan given cached span")
175 }
176 s.needzero = 1
177
178 if preserve {
179 // preserve is set only when called from (un)cacheSpan above,
180 // the span must be in the empty list.
181 if !s.inList() {
182 throw("can't preserve unlinked span")
183 }
184 atomic.Store(&s.sweepgen, mheap_.sweepgen)
185 return false
186 }
187
188 lock(&c.lock)
189
190 // Move to nonempty if necessary.
191 if wasempty {
192 c.empty.remove(s)
193 c.nonempty.insert(s)
194 }
195
196 // delay updating sweepgen until here. This is the signal that
197 // the span may be used in an mcache, so it must come after the
198 // linked list operations above (actually, just after the
199 // lock of c above.)
200 atomic.Store(&s.sweepgen, mheap_.sweepgen)
201
202 if s.allocCount != 0 {
203 unlock(&c.lock)
204 return false
205 }
206
207 c.nonempty.remove(s)
208 unlock(&c.lock)
209 mheap_.freeSpan(s, false)
210 return true
211}
212
213// grow allocates a new empty span from the heap and initializes it for c's size class.
214func (c *mcentral) grow() *mspan {
215 npages := uintptr(class_to_allocnpages[c.spanclass.sizeclass()])
216 size := uintptr(class_to_size[c.spanclass.sizeclass()])
217 //这里想 mheap 申请
218 s := mheap_.alloc(npages, c.spanclass, false, true)
219 if s == nil {
220 return nil
221 }
222
223 // Use division by multiplication and shifts to quickly compute:
224 // n := (npages << _PageShift) / size
225 n := (npages << _PageShift) >> s.divShift * uintptr(s.divMul) >> s.divShift2
226 s.limit = s.base() + size*n
227 heapBitsForAddr(s.base()).initSpan(s)
228 return s
229}
如果 mheap 不足,则想 OS 申请。接上面的代码 mheap_.alloc()
1// alloc allocates a new span of npage pages from the GC'd heap.
2//
3// Either large must be true or spanclass must indicates the span's
4// size class and scannability.
5//
6// If needzero is true, the memory for the returned span will be zeroed.
7func (h *mheap) alloc(npage uintptr, spanclass spanClass, large bool, needzero bool) *mspan {
8 // Don't do any operations that lock the heap on the G stack.
9 // It might trigger stack growth, and the stack growth code needs
10 // to be able to allocate heap.
11 var s *mspan
12 systemstack(func() {
13 s = h.alloc_m(npage, spanclass, large)
14 })
15
16 if s != nil {
17 if needzero && s.needzero != 0 {
18 memclrNoHeapPointers(unsafe.Pointer(s.base()), s.npages<<_PageShift)
19 }
20 s.needzero = 0
21 }
22 return s
23}
24
25// alloc_m is the internal implementation of mheap.alloc.
26//
27// alloc_m must run on the system stack because it locks the heap, so
28// any stack growth during alloc_m would self-deadlock.
29//
30//go:systemstack
31func (h *mheap) alloc_m(npage uintptr, spanclass spanClass, large bool) *mspan {
32 _g_ := getg()
33
34 // To prevent excessive heap growth, before allocating n pages
35 // we need to sweep and reclaim at least n pages.
36 if h.sweepdone == 0 {
37 h.reclaim(npage)
38 }
39
40 lock(&h.lock)
41 // transfer stats from cache to global
42 memstats.heap_scan += uint64(_g_.m.mcache.local_scan)
43 _g_.m.mcache.local_scan = 0
44 memstats.tinyallocs += uint64(_g_.m.mcache.local_tinyallocs)
45 _g_.m.mcache.local_tinyallocs = 0
46
47 s := h.allocSpanLocked(npage, &memstats.heap_inuse)
48 if s != nil {
49 // Record span info, because gc needs to be
50 // able to map interior pointer to containing span.
51 atomic.Store(&s.sweepgen, h.sweepgen)
52 h.sweepSpans[h.sweepgen/2%2].push(s) // Add to swept in-use list.
53 s.state = mSpanInUse
54 s.allocCount = 0
55 s.spanclass = spanclass
56 if sizeclass := spanclass.sizeclass(); sizeclass == 0 {
57 s.elemsize = s.npages << _PageShift
58 s.divShift = 0
59 s.divMul = 0
60 s.divShift2 = 0
61 s.baseMask = 0
62 } else {
63 s.elemsize = uintptr(class_to_size[sizeclass])
64 m := &class_to_divmagic[sizeclass]
65 s.divShift = m.shift
66 s.divMul = m.mul
67 s.divShift2 = m.shift2
68 s.baseMask = m.baseMask
69 }
70
71 // Mark in-use span in arena page bitmap.
72 arena, pageIdx, pageMask := pageIndexOf(s.base())
73 arena.pageInUse[pageIdx] |= pageMask
74
75 // update stats, sweep lists
76 h.pagesInUse += uint64(npage)
77 if large {
78 memstats.heap_objects++
79 mheap_.largealloc += uint64(s.elemsize)
80 mheap_.nlargealloc++
81 atomic.Xadd64(&memstats.heap_live, int64(npage<<_PageShift))
82 }
83 }
84 // heap_scan and heap_live were updated.
85 if gcBlackenEnabled != 0 {
86 gcController.revise()
87 }
88
89 if trace.enabled {
90 traceHeapAlloc()
91 }
92
93 // h.spans is accessed concurrently without synchronization
94 // from other threads. Hence, there must be a store/store
95 // barrier here to ensure the writes to h.spans above happen
96 // before the caller can publish a pointer p to an object
97 // allocated from s. As soon as this happens, the garbage
98 // collector running on another processor could read p and
99 // look up s in h.spans. The unlock acts as the barrier to
100 // order these writes. On the read side, the data dependency
101 // between p and the index in h.spans orders the reads.
102 unlock(&h.lock)
103 return s
104}
105
106// Try to add at least npage pages of memory to the heap,
107// returning whether it worked.
108//
109// h must be locked.
110func (h *mheap) grow(npage uintptr) bool {
111 ask := npage << _PageShift
112
113 nBase := round(h.curArena.base+ask, physPageSize)
114 if nBase > h.curArena.end {
115 // Not enough room in the current arena. Allocate more
116 // arena space. This may not be contiguous with the
117 // current arena, so we have to request the full ask.
118 av, asize := h.sysAlloc(ask)//sysAlloc 会调用系统调用(mmap 或者 VirtualAlloc,和初始化那部分有点类似)去向操作系统申请。
119 if av == nil {
120 print("runtime: out of memory: cannot allocate ", ask, "-byte block (", memstats.heap_sys, " in use)\n")
121 return false
122 }
123
124 if uintptr(av) == h.curArena.end {
125 // The new space is contiguous with the old
126 // space, so just extend the current space.
127 h.curArena.end = uintptr(av) + asize
128 } else {
129 // The new space is discontiguous. Track what
130 // remains of the current space and switch to
131 // the new space. This should be rare.
132 if size := h.curArena.end - h.curArena.base; size != 0 {
133 h.growAddSpan(unsafe.Pointer(h.curArena.base), size)
134 }
135 // Switch to the new space.
136 h.curArena.base = uintptr(av)
137 h.curArena.end = uintptr(av) + asize
138 }
139
140 // The memory just allocated counts as both released
141 // and idle, even though it's not yet backed by spans.
142 //
143 // The allocation is always aligned to the heap arena
144 // size which is always > physPageSize, so its safe to
145 // just add directly to heap_released. Coalescing, if
146 // possible, will also always be correct in terms of
147 // accounting, because s.base() must be a physical
148 // page boundary.
149 memstats.heap_released += uint64(asize)
150 memstats.heap_idle += uint64(asize)
151
152 // Recalculate nBase
153 nBase = round(h.curArena.base+ask, physPageSize)
154 }
155
156 // Grow into the current arena.
157 v := h.curArena.base
158 h.curArena.base = nBase
159 h.growAddSpan(unsafe.Pointer(v), nBase-v)
160 return true
161}
整个函数调用链如上所示,最后 sysAlloc 会调用系统调用(mmap 或者 VirtualAlloc,和初始化那部分有点类似)去向操作系统申请。
5. 内存回收
5.1 mcache 回收
mcache 回收可以分两部分:第一部分是将 alloc 中未用完的内存归还给对应的 mcentral。
1func freemcache(c *mcache) {
2 systemstack(func() {
3 c.releaseAll()
4 ...
5
6 lock(&mheap_.lock)
7 purgecachedstats(c)
8 mheap_.cachealloc.free(unsafe.Pointer(c))
9 unlock(&mheap_.lock)
10 })
11}
12func (c *mcache) releaseAll() {
13 for i := 0; i < _NumSizeClasses; i++ {
14 s := c.alloc[i] if s != &emptymspan {
15 mheap_.central[i].mcentral.uncacheSpan(s)
16 c.alloc[i] = &emptymspan
17 }
18 } // Clear tinyalloc pool.
19 c.tiny = 0
20 c.tinyoffset = 0}
函数 releaseAll() 负责将 mcache.alloc 中各个 sizeclass 中的 mspan 归还给 mcentral。这里需要注意的是归还给 mcentral 的时候需要加锁,因为 mcentral 是全局的。除此之外将剩下的 mcache (基本是个空壳)归还给 mheap.cachealloc,其实就是把 mcache 插入 free list 表头。
1func (f *fixalloc) free(p unsafe.Pointer) {
2 f.inuse -= f.size
3 v := (*mlink)(p)
4 v.next = f.list
5 f.list = v
6}
5.2 mcentral 回收
当 mspan 没有 free object 的时候,将 mspan 归还给 mheap。
1func (c *mcentral) freeSpan(s *mspan, preserve bool, wasempty bool) bool {
2 ...
3 lock(&c.lock)
4 ...
5 if s.allocCount != 0 {
6 unlock(&c.lock)
7 return false
8 }
9
10 c.nonempty.remove(s)
11 unlock(&c.lock)
12 mheap_.freeSpan(s, 0)
13 return true
14}
5.3 mheap
mheap 并不向操作系统归还,但是会对 span 做一些操作,比如合并相邻的 span。
3. 总结
tcmalloc 是一种理论,运用到实践中还要考虑工程实现的问题。学习 Golang 源码的过程中,除了知道它是如何工作的之外,还可以学习到很多有趣的知识,比如使用变量填充 CacheLine 避免 False Sharing,利用 debruijn 序列求解 Trailing Zero(在函数中 sys.Ctz64 使用)等等。我想这就是读源码的意义所在吧。
参考
Golang的内存管理(上篇) Golang的内存管理(中篇) Golang的内存管理(下篇)
Virtual Memory
Virtual Memory.pdf http://csapp.cs.cmu.edu/2e/ch9-preview.pdf
内存分配
内存分配的函数在go SDK内部,我们是不能用的,如果要用,请使用: https://github.com/smasher164/mem
1package mem
2
3import (
4 "unsafe"
5
6 "golang.org/x/sys/unix"
7)
8
9type slice struct {
10 addr unsafe.Pointer
11 len int
12 cap int
13}
14
15func mmap(size uint) (unsafe.Pointer, error) {
16 b, err := unix.Mmap(-1, 0, int(size), unix.PROT_READ|unix.PROT_WRITE, unix.MAP_PRIVATE|unix.MAP_ANONYMOUS)
17 if err != nil {
18 return nil, err
19 }
20 sl := (*slice)(unsafe.Pointer(&b))
21 return sl.addr, nil
22}
23
24func munmap(p unsafe.Pointer) error {
25 size := int(((*header)(p)).size + szheader)
26 b := *(*[]byte)(unsafe.Pointer(&slice{p, size, size}))
27 return unix.Munmap(b)
28}
windows
1package mem
2
3import (
4 "unsafe"
5
6 "golang.org/x/sys/windows"
7)
8
9func mmap(size uint) (unsafe.Pointer, error) {
10 p, err := windows.VirtualAlloc(0, uintptr(size), windows.MEM_COMMIT|windows.MEM_RESERVE, windows.PAGE_READWRITE)
11 if err != nil {
12 return nil, err
13 }
14 return unsafe.Pointer(p), nil
15}
16
17func munmap(p unsafe.Pointer) error {
18 return windows.VirtualFree(uintptr(p), 0, windows.MEM_RELEASE)
19}