Kubernetes

Sep 11, 2020 go golang go语言 GPM是什么.md go文档 go技术

Kubernetes

G、P、M 是 Go 调度器的三个核心组件，各司其职。在它们精密地配合下，Go 调度器得以高效运转，这也是 Go 天然支持高并发的内在动力。今天这篇文章我们来深入理解 GPM 模型。

先看 G，取 goroutine 的首字母，主要保存 goroutine 的一些状态信息以及 CPU 的一些寄存器的值，例如 IP 寄存器，以便在轮到本 goroutine 执行时，CPU 知道要从哪一条指令处开始执行。

当 goroutine 被调离 CPU 时，调度器负责把 CPU 寄存器的值保存在 g 对象的成员变量之中。

当 goroutine 被调度起来运行时，调度器又负责把 g 对象的成员变量所保存的寄存器值恢复到 CPU 的寄存器。

本系列使用的代码版本是 1.9.2，来看一下 g 的源码：

 1type g struct {
 2
 3	// goroutine 使用的栈
 4	stack       stack   // offset known to runtime/cgo
 5	// 用于栈的扩张和收缩检查，抢占标志
 6	stackguard0 uintptr // offset known to liblink
 7	stackguard1 uintptr // offset known to liblink
 8
 9	_panic         *_panic // innermost panic - offset known to liblink
10	_defer         *_defer // innermost defer
11	// 当前与 g 绑定的 m
12	m              *m      // current m; offset known to arm liblink
13	// goroutine 的运行现场
14	sched          gobuf
15	syscallsp      uintptr        // if status==Gsyscall, syscallsp = sched.sp to use during gc
16	syscallpc      uintptr        // if status==Gsyscall, syscallpc = sched.pc to use during gc
17	stktopsp       uintptr        // expected sp at top of stack, to check in traceback
18	// wakeup 时传入的参数
19	param          unsafe.Pointer // passed parameter on wakeup
20	atomicstatus   uint32
21	stackLock      uint32 // sigprof/scang lock; TODO: fold in to atomicstatus
22	goid           int64
23	// g 被阻塞之后的近似时间
24	waitsince      int64  // approx time when the g become blocked
25	// g 被阻塞的原因
26	waitreason     string // if status==Gwaiting
27	// 指向全局队列里下一个 g
28	schedlink      guintptr
29	// 抢占调度标志。这个为 true 时，stackguard0 等于 stackpreempt
30	preempt        bool     // preemption signal, duplicates stackguard0 = stackpreempt
31	paniconfault   bool     // panic (instead of crash) on unexpected fault address
32	preemptscan    bool     // preempted g does scan for gc
33	gcscandone     bool     // g has scanned stack; protected by _Gscan bit in status
34	gcscanvalid    bool     // false at start of gc cycle, true if G has not run since last scan; TODO: remove?
35	throwsplit     bool     // must not split stack
36	raceignore     int8     // ignore race detection events
37	sysblocktraced bool     // StartTrace has emitted EvGoInSyscall about this goroutine
38	// syscall 返回之后的 cputicks，用来做 tracing
39	sysexitticks   int64    // cputicks when syscall has returned (for tracing)
40	traceseq       uint64   // trace event sequencer
41	tracelastp     puintptr // last P emitted an event for this goroutine
42	// 如果调用了 LockOsThread，那么这个 g 会绑定到某个 m 上
43	lockedm        *m
44	sig            uint32
45	writebuf       []byte
46	sigcode0       uintptr
47	sigcode1       uintptr
48	sigpc          uintptr
49	// 创建该 goroutine 的语句的指令地址
50	gopc           uintptr // pc of go statement that created this goroutine
51	// goroutine 函数的指令地址
52	startpc        uintptr // pc of goroutine function
53	racectx        uintptr
54	waiting        *sudog         // sudog structures this g is waiting on (that have a valid elem ptr); in lock order
55	cgoCtxt        []uintptr      // cgo traceback context
56	labels         unsafe.Pointer // profiler labels
57	// time.Sleep 缓存的定时器
58	timer          *timer         // cached timer for time.Sleep
59
60	gcAssistBytes int64
61}

源码中，比较重要的字段我已经作了注释，其他未作注释的与调度关系不大或者我暂时也没有理解的。

g 结构体关联了两个比较简单的结构体，stack 表示 goroutine 运行时的栈：

1// 描述栈的数据结构，栈的范围：[lo, hi)
2type stack struct {
3    // 栈顶，低地址
4	lo uintptr
5	// 栈低，高地址
6	hi uintptr
7}

Goroutine 运行时，光有栈还不行，至少还得包括 PC，SP 等寄存器，gobuf 就保存了这些值：

 1type gobuf struct {
 2	// 存储 rsp 寄存器的值
 3	sp   uintptr
 4	// 存储 rip 寄存器的值
 5	pc   uintptr
 6	// 指向 goroutine
 7	g    guintptr
 8	ctxt unsafe.Pointer // this has to be a pointer so that gc scans it
 9	// 保存系统调用的返回值
10	ret  sys.Uintreg
11	lr   uintptr
12	bp   uintptr // for GOEXPERIMENT=framepointer
13}

再来看 M，取 machine 的首字母，它代表一个工作线程，或者说系统线程。G 需要调度到 M 上才能运行，M 是真正工作的人。结构体 m 就是我们常说的 M，它保存了 M 自身使用的栈信息、当前正在 M 上执行的 G 信息、与之绑定的 P 信息……

当 M 没有工作可做的时候，在它休眠前，会“自旋”地来找工作：检查全局队列，查看 network poller，试图执行 gc 任务，或者“偷”工作。

结构体 m 的源码如下：

 1// m 代表工作线程，保存了自身使用的栈信息
 2type m struct {
 3	// 记录工作线程（也就是内核线程）使用的栈信息。在执行调度代码时需要使用
 4	// 执行用户 goroutine 代码时，使用用户 goroutine 自己的栈，因此调度时会发生栈的切换
 5	g0      *g     // goroutine with scheduling stack/
 6	morebuf gobuf  // gobuf arg to morestack
 7	divmod  uint32 // div/mod denominator for arm - known to liblink
 8
 9	// Fields not known to debuggers.
10	procid        uint64     // for debuggers, but offset not hard-coded
11	gsignal       *g         // signal-handling g
12	sigmask       sigset     // storage for saved signal mask
13	// 通过 tls 结构体实现 m 与工作线程的绑定
14	// 这里是线程本地存储
15	tls           [6]uintptr // thread-local storage (for x86 extern register)
16	mstartfn      func()
17	// 指向正在运行的 gorutine 对象
18	curg          *g       // current running goroutine
19	caughtsig     guintptr // goroutine running during fatal signal
20	// 当前工作线程绑定的 p
21	p             puintptr // attached p for executing go code (nil if not executing go code)
22	nextp         puintptr
23	id            int32
24	mallocing     int32
25	throwing      int32
26	// 该字段不等于空字符串的话，要保持 curg 始终在这个 m 上运行
27	preemptoff    string // if != "", keep curg running on this m
28	locks         int32
29	softfloat     int32
30	dying         int32
31	profilehz     int32
32	helpgc        int32
33	// 为 true 时表示当前 m 处于自旋状态，正在从其他线程偷工作
34	spinning      bool // m is out of work and is actively looking for work
35	// m 正阻塞在 note 上
36	blocked       bool // m is blocked on a note
37	// m 正在执行 write barrier
38	inwb          bool // m is executing a write barrier
39	newSigstack   bool // minit on C thread called sigaltstack
40	printlock     int8
41	// 正在执行 cgo 调用
42	incgo         bool // m is executing a cgo call
43	fastrand      uint32
44	// cgo 调用总计数
45	ncgocall      uint64      // number of cgo calls in total
46	ncgo          int32       // number of cgo calls currently in progress
47	cgoCallersUse uint32      // if non-zero, cgoCallers in use temporarily
48	cgoCallers    *cgoCallers // cgo traceback if crashing in cgo call
49	// 没有 goroutine 需要运行时，工作线程睡眠在这个 park 成员上，
50	// 其它线程通过这个 park 唤醒该工作线程
51	park          note
52	// 记录所有工作线程的链表
53	alllink       *m // on allm
54	schedlink     muintptr
55	mcache        *mcache
56	lockedg       *g
57	createstack   [32]uintptr // stack that created this thread.
58	freglo        [16]uint32  // d[i] lsb and f[i]
59	freghi        [16]uint32  // d[i] msb and f[i+16]
60	fflag         uint32      // floating point compare flags
61	locked        uint32      // tracking for lockosthread
62	// 正在等待锁的下一个 m
63	nextwaitm     uintptr     // next m waiting for lock
64	needextram    bool
65	traceback     uint8
66	waitunlockf   unsafe.Pointer // todo go func(*g, unsafe.pointer) bool
67	waitlock      unsafe.Pointer
68	waittraceev   byte
69	waittraceskip int
70	startingtrace bool
71	syscalltick   uint32
72	// 工作线程 id
73	thread        uintptr // thread handle
74
75	// these are here because they are too large to be on the stack
76	// of low-level NOSPLIT functions.
77	libcall   libcall
78	libcallpc uintptr // for cpu profiler
79	libcallsp uintptr
80	libcallg  guintptr
81	syscall   libcall // stores syscall parameters on windows
82
83	mOS
84}

再来看 P，取 processor 的首字母，为 M 的执行提供“上下文”，保存 M 执行 G 时的一些资源，例如本地可运行 G 队列，memeory cache 等。

一个 M 只有绑定 P 才能执行 goroutine，当 M 被阻塞时，整个 P 会被传递给其他 M ，或者说整个 P 被接管。

 1// p 保存 go 运行时所必须的资源
 2type p struct {
 3	lock mutex
 4
 5	// 在 allp 中的索引
 6	id          int32
 7	status      uint32 // one of pidle/prunning/...
 8	link        puintptr
 9	// 每次调用 schedule 时会加一
10	schedtick   uint32
11	// 每次系统调用时加一
12	syscalltick uint32
13	// 用于 sysmon 线程记录被监控 p 的系统调用时间和运行时间
14	sysmontick  sysmontick // last tick observed by sysmon
15	// 指向绑定的 m，如果 p 是 idle 的话，那这个指针是 nil
16	m           muintptr   // back-link to associated m (nil if idle)
17	mcache      *mcache
18	racectx     uintptr
19
20	deferpool    [5][]*_defer // pool of available defer structs of different sizes (see panic.go)
21	deferpoolbuf [5][32]*_defer
22
23	// Cache of goroutine ids, amortizes accesses to runtime·sched.goidgen.
24	goidcache    uint64
25	goidcacheend uint64
26
27	// Queue of runnable goroutines. Accessed without lock.
28	// 本地可运行的队列，不用通过锁即可访问
29	runqhead uint32 // 队列头
30	runqtail uint32 // 队列尾
31	// 使用数组实现的循环队列
32	runq     [256]guintptr
33	
34	// runnext 非空时，代表的是一个 runnable 状态的 G，
35	// 这个 G 被 当前 G 修改为 ready 状态，相比 runq 中的 G 有更高的优先级。
36	// 如果当前 G 还有剩余的可用时间，那么就应该运行这个 G
37	// 运行之后，该 G 会继承当前 G 的剩余时间
38	runnext guintptr
39
40	// Available G's (status == Gdead)
41	// 空闲的 g
42	gfree    *g
43	gfreecnt int32
44
45	sudogcache []*sudog
46	sudogbuf   [128]*sudog
47
48	tracebuf traceBufPtr
49	traceSwept, traceReclaimed uintptr
50
51	palloc persistentAlloc // per-P to avoid mutex
52
53	// Per-P GC state
54	gcAssistTime     int64 // Nanoseconds in assistAlloc
55	gcBgMarkWorker   guintptr
56	gcMarkWorkerMode gcMarkWorkerMode
57	runSafePointFn uint32 // if 1, run sched.safePointFn at next safe point
58
59	pad [sys.CacheLineSize]byte
60}

GPM 三足鼎力，共同成就 Go scheduler。G 需要在 M 上才能运行，M 依赖 P 提供的资源，P 则持有待运行的 G。你中有我，我中有你。

描述三者的关系：

曹大 golang notes GPM 三者关系

M 会从与它绑定的 P 的本地队列获取可运行的 G，也会从 network poller 里获取可运行的 G，还会从其他 P 偷 G。

最后我们从宏观上总结一下 GPM，这篇文章尝试从它们的状态流转角度总结。

首先是 G 的状态流转：

G 的状态流转图

说明一下，上图省略了一些垃圾回收的状态。

接着是 P 的状态流转：

P 的状态流转图

通常情况下（在程序运行时不调整 P 的个数），P 只会在上图中的四种状态下进行切换。当程序刚开始运行进行初始化时，所有的 P 都处于 _Pgcstop 状态，随着 P 的初始化（runtime.procresize），会被置于 _Pidle。

当 M 需要运行时，会 runtime.acquirep 来使 P 变成 Prunning 状态，并通过 runtime.releasep 来释放。

当 G 执行时需要进入系统调用，P 会被设置为 _Psyscall，如果这个时候被系统监控抢夺（runtime.retake），则 P 会被重新修改为 _Pidle。

如果在程序运行中发生 GC，则 P 会被设置为 _Pgcstop，并在 runtime.startTheWorld 时重新调整为 _Prunning。

最后，我们来看 M 的状态变化：

M 的状态流转图

M 只有自旋和非自旋两种状态。自旋的时候，会努力找工作；找不到的时候会进入非自旋状态，之后会休眠，直到有工作需要处理时，被其他工作线程唤醒，又进入自旋状态。