When first starting with Postgres, I read a great book Learn Postgres, which starts with a chapter covering basic terms and concepts with Postgres. Here are some things I found helpful to know, especially for someone coming from a different database system that’s trying to learn Postgres.
Terms:
Cluster: Equivalent to a SQL Server instance, an instance of one or more databases.
Superuser: Similar to dbo in SQL Server, a user that can perform any action with the instance. There can be any number of superusers.
Postmaster: Single process that listens for client connections. The postmaster then creates a new backend process for that connection.
Fork: Clone of a running process. Each client database connection gets a new process.
Vacuum: “Dead” rows are left behind when a record is updated or deleted. The VACUUM process will remove these rows to reclaim the space.
Template Database: Similar to model database in SQL Server – The database that’s copied when a new database is created.
Tablespace: Allows database files to be stored in a different location (Different partition, disk, etc.)
TOAST (The Oversized-Attribute Storage Technique): For storing data too large to fit into a normal row, either by compressing the data or splitting the data into separate parts.
OID: Object Identifier – Unique integer representing a database object. A table can be created with the option
WITH (OIDS = TRUE)
to generate a unique identifier for each row in a table as well.
Storage:
Tables are stored in separate files.
Data stored in 8k pages.
Page has a header, followed by row pointers to data at end of page.
Sub-processes:
CheckPointer: Executes checkpoints
Background Writer: Push data from memory to disk
WalWriter: Writes to the Write Ahead Log
Concurrency:
Postgres uses MVCC (Multi-Version Concurrency Control).
Query Processing: Steps
Parser: Generates Parse Tree – Checks Syntax
Analyzer: Generates Query Tree – Semantic check (Do tables exists, etc.)
Rewriter: If Rules have been created, the statement may need to be rewritten to implement those.
Planner: Generates Plan Tree – Optimize query
Executor: Runs query
PG_DATA Files: Files in Data directory
PG_VERSION: The major version of the Postgres cluster.
pg_hba.conf: HBA = Host Based Authentication. Client authentication configuration.
postgresql.conf: General configuration for the cluster.
postgresql.auto.conf: System configuration set by ALTER SYSTEM commands.
pg_ident.conf: User mapping. This can be used to set up Windows Authentication In Postgres.
postmaster.pid: The Process ID for the postmaster process.
If configuration files are updated, the system either needs to be restarted, or the configs reloaded with the pg_reload_conf command.
Databases:
Users specify a database when they connect to a cluster.
When the cluster is created, three databases are created: postgres, template0, template1.
Postgres is the default database.
Template1 is copied to create any new databases, so any objects that template1 contains will be included in the new DB. Template0 is not meant to be modified, it is to be a pristine copy of the template.
Backups:
Logical: Generates SQL statements to recreate data. Based on a snapshot of the data. Can restore to a different version of PG. Use pg_dump and pg_restore utilities.
Physical: Copy the actual data file, along with the log. Can allow for a point in time restore. Have to stop database service to get consistent backup.
Replication:
Logical Replication: Recreate commands to apply to secondary – Can replicate only selected databases or tables from cluster. Publish and subscribe model. Can replicate between different major versions of Postgres. Doesn’t replicate schema changes.
Physical/Streaming Replication: Applies WAL logs to replica. Synchronous or asynchronous.
Users and Roles:
A Role can represent either a User or a User Group (Combines the User and Role concepts from SQL Server). A Role can have the ability to login to the DB, or not have that ability (For when the Role represents a User Group).
Configuration Files:
pg_hba.conf: HBA = Host-Based Authentication – Users, IP Addresses, databases, and authentication methods (Password, Certificate, LDAP, etc).
pg_ident.conf: User name mapping – Usually OS names to Postgres user names.
postgresql.conf: Server configuration
postgresql.auto.conf: Overrides to postgresql.conf – Managed by Postgres, not meant to be edited manually.
Links:
PostgreSQL Terminology
Rach Belaid – Introduction To Postgres Physical Storage
Several Nines – Understanding Postgresql Architecture
Ottertune – The part of Postgresql we hate the most
Ottertune – Yes, Postgresql has problems. But we’re sticking with it