Inside PostgreSQL Shared Memory - Bruce Momjian

17 downloads 510 Views 149KB Size Report
POSTGRESQL is an open-source, full-featured relational database. This presentation gives an overview of the shared memory structures used by Postgres.
Inside PostgreSQL Shared Memory BRUCE MOMJIAN

POSTGRESQL is an open-source, full-featured relational database. This presentation gives an overview of the shared memory structures used by Postgres. Creative Commons Attribution License

http://momjian.us/presentations

Last updated: May, 2017

1 / 25

Outline

1. File storage format 2. Shared memory creation 3. Shared buffers 4. Row value access 5. Locking 6. Other structures

2 / 25

File System /data

Postgres

/data

Postgres

Postgres

3 / 25

File System /data/base

Postgres

Postgres

Postgres

/data

/base /global /pg_clog /pg_multixact /pg_subtrans /pg_tblspc /pg_twophase /pg_xlog 4 / 25

File System /data/base/db

Postgres

Postgres

/data

/base /16385 (production) /1 (template1) /16821 (test) /17982 (devel) /21452 (marketing)

Postgres

5 / 25

File System /data/base/db/table

Postgres

Postgres

/data

/base /16385

/24692 (customer) /27214 (order) /25932 (product) /25952 (employee) /27839 (part)

Postgres

6 / 25

File System Data Pages

Postgres

/data

/base /16385

/24692 8k

8k

8k

8k

Postgres

Postgres

7 / 25

Data Pages Postgres

/data

/base /16385

/24692 8k

8k

8k

8k

Postgres

Postgres Page Header

Item

Item

Item

8K Tuple Tuple

Tuple

Special

8 / 25

File System Block Tuple Postgres

/data

/base /16385

/24692 8k

8k

8k

8k

Postgres Page Header

Item

Item

Item

Postgres 8K Tuple Tuple

Tuple

Special

Tuple

9 / 25

File System Tuple ’Martin’

int4in(’9241’) Tuple

textout() Header

Value

Value

Value

Value

Value

Value

OID − object id of tuple (optional) xmin − creation transaction id xmax − destruction transaction id cmin − creation command id cmax − destruction command id ctid − tuple id (page / item) natts − number of attributes infomask − tuple flags hoff − length of tuple header bits − bit map representing NULLs

10 / 25

Tuple Header C Structures typedef struct HeapTupleFields { TransactionId t_xmin; TransactionId t_xmax; union { CommandId t_cid; TransactionId t_xvac; } t_field3; } HeapTupleFields;

/* inserting xact ID */ /* deleting or locking xact ID */

/* inserting or deleting command ID, or both */ /* VACUUM FULL xact ID */

typedef struct HeapTupleHeaderData { union { HeapTupleFields t_heap; DatumTupleFields t_datum; } t_choice; ItemPointerData t_ctid;

/* current TID of this or newer tuple */

/* Fields below here must match MinimalTupleData! */ uint16

t_infomask2;

/* number of attributes + various flags */

uint16

t_infomask;

/* various flag bits, see below */

uint8

t_hoff;

/* sizeof header incl. bitmap, padding */

/* ^ − 23 bytes − ^ */ bits8

t_bits[1];

/* bitmap of NULLs −− VARIABLE LENGTH */

/* MORE DATA FOLLOWS AT END OF STRUCT */ } HeapTupleHeaderData;

11 / 25

Shared Memory Creation ()

rk

postmaster

fo

postgres

postgres

Program (Text)

Program (Text)

Program (Text)

Data

Data

Data

Shared Memory

Shared Memory

Shared Memory

Stack

Stack

Stack

12 / 25

Shared Memory

PROC

Lightweight Locks

XLOG Buffers

Proc Array

Lock Hashes

CLOG Buffers

LOCK

Subtrans Buffers

Auto Vacuum

PROCLOCK

Btree Vacuum

Two−Phase Structs Multi−XACT Buffers

Statistics Background Writer

Synchronized Scan

Shared Invalidation

Buffer Descriptors Shared Buffers

Semaphores

13 / 25

Shared Buffers Buffer Descriptors

Pin Count − prevent page replacement LWLock − for page changes

8k

8k

8k Shared Buffers

read()

Page Header

Item

Item

Item

write() Postgres

/data /base /16385 /24692

8K 8k 8k 8k 8k

Tuple Tuple

Tuple

Special

Postgres

Postgres

14 / 25

HeapTuples

8k

8k

8k Shared Buffers

Page Header

Item

Item

Item

8K Tuple Tuple

Tuple

Special

HeapTuple

’Martin’

int4in(’9241’) Tuple

textout() Header

Value

Value

Value

Value

Value

Postgres

Value

C pointer OID − object id of tuple (optional) xmin − creation transaction id xmax − destruction transaction id cmin − creation command id cmax − destruction command id ctid − tuple id (page / item) natts − number of attributes infomask − tuple flags hoff − length of tuple header bits − bit map representing NULLs

15 / 25

Finding A Tuple Value in C Datum nocachegetattr(HeapTuple tuple, int attnum, TupleDesc tupleDesc, bool *isnull) { HeapTupleHeader tup = tuple−>t_data; Form_pg_attribute *att = tupleDesc−>attrs; { int

i;

/* * Note − This loop is a little tricky. For each non−null attribute, * we have to first account for alignment padding before the attr, * then advance over the attr based on its length. Nulls have no * storage and no alignment padding either. We can use/set * attcacheoff until we reach either a null or a var−width attribute. */ off = 0; for (i = 0;; i++) /* loop exit is at "break" */ { if (HeapTupleHasNulls(tuple) && att_isnull(i, bp)) continue; /* this cannot be the target att */ if (att[i]−>attlen == −1) off = att_align_pointer(off, att[i]−>attalign, −1, tp + off); else /* not varlena, so safe to use att_align_nominal */ off = att_align_nominal(off, att[i]−>attalign); if (i == attnum) break; off = att_addlength_pointer(off, att[i]−>attlen, tp + off); } } return fetchatt(att[attnum], tp + off); }

16 / 25

Value Access in C #define fetch_att(T,attbyval,attlen) \ ( \ (attbyval) ? \ ( \ (attlen) == (int) sizeof(int32) ? \ Int32GetDatum(*((int32 *)(T))) \ : \ ( \ (attlen) == (int) sizeof(int16) ? \ Int16GetDatum(*((int16 *)(T))) \ : \ ( \ AssertMacro((attlen) == 1), \ CharGetDatum(*((char *)(T))) \ ) \ ) \ ) \ : \ PointerGetDatum((char *) (T)) \ ) 17 / 25

Test And Set Lock Can Succeed Or Fail

1

1

0/1

0

1

Success

Failure

Was 0 on exchange

Was 1 on exchange Lock already taken 18 / 25

Test And Set Lock x86 Assembler

static __inline__ int tas(volatile slock_t *lock) { register slock_t _res = 1;

: : :

/* * Use a non−locking test before asserting the bus lock. Note that the * extra test appears to be a small loss on some x86 platforms and a small * win on others; it’s by no means clear that we should keep it. */ __asm__ __volatile__( " cmpb $0,%1 \n" " jne 1f \n" " lock \n" " xchgb %0,%1 \n" "1: \n" "+q"(_res), "+m"(*lock) "memory", "cc"); return (int) _res;

}

19 / 25

Spin Lock Always Succeeds 1

1

0/1

0

Sleep of increasing duration

1

Success

Failure

Was 0 on exchange

Was 1 on exchange Lock already taken

Spinlocks are designed for short-lived locking operations, like

20 / 25

Light Weight Locks Sleep On Lock

PROC

Lightweight Locks

XLOG Buffers

Proc Array

Lock Hashes

CLOG Buffers

LOCK

Subtrans Buffers

Auto Vacuum

PROCLOCK

Btree Vacuum

Two−Phase Structs Multi−XACT Buffers

Statistics Background Writer

Synchronized Scan

Shared Invalidation

Buffer Descriptors Shared Buffers

Semaphores

Light weight locks attempt to acquire the lock, and go to sleep on a semaphore if the lock request fails. Spinlocks control access to

21 / 25

Database Object Locks

PROC

PROCLOCK

LOCK Lock Hashes

22 / 25

Proc

PROC empty

used

used

empty

used

empty

Proc Array

23 / 25

Other Shared Memory Structures

PROC

Lightweight Locks

XLOG Buffers

Proc Array

Lock Hashes

CLOG Buffers

LOCK

Subtrans Buffers

Auto Vacuum

PROCLOCK

Btree Vacuum

Two−Phase Structs Multi−XACT Buffers

Statistics Background Writer

Synchronized Scan

Shared Invalidation

Buffer Descriptors Shared Buffers

Semaphores

24 / 25

Conclusion

http://momjian.us/presentations

https://www.flickr.com/photos/john_getchel/ 25 / 25