As for every transactional database, disk I/O is the main limiting factor for PostgresSQL. If you plan to deploy a high-usage database please take some precautions on the used storage. Use either a RAID 1 or 10 and choose a FS that does fast block-IO (ext2).
Accounts and permissions
When starting with a fresh database, you've to create some users first before you can start using it.
su - postgresql createuser -P <user> createdb -O <user> <db>
Also be sure to tune the pg_hba.conf in the data-directory to your needs. To emulate MySQLs default "every user needs to authenticate from everywhere"-semantics use the following config:
local all all md5 host all all 127.0.0.1/32 md5 host all all ::1/128 md5
Performance and Tuning
If you're getting sequential scans despite having indexes and have run a vacuum analyze recently you might need to increase the default_statistics_target.
Since PostgresSQL is a transactional database, old rows don't get actually removed/replaced when you update/delete them (since they might be still needed in older/long running transactions). To actually free them you need to issue a vacuum.
A normal vacuum will only mark deprecated rows for reuse, to actually reclaim diskspace (e.g. when having deleted large amounts of data) you need to issue a full vacuum. Please note that it might be faster to backup the data you want to keep and truncate the table if you plan to remove large portions of a table.
Transaction ID Wraparound
A regularly run plain or "lazy" vacuum should prevent Transaction ID Wraparound in all cases, a full vacuum is not required.
You can check your TXID counters with the following query:
SELECT datname, age(datfrozenxid) FROM pg_database;
If the age is noticeably higher than 1 Billion (2^30 or 1073741824 to be exact) after a recent vacuum something is probably not working as intended; if it should be anywhere near 2 billion you should take immediate action to prevent problems.
Fixing broken databases
set zero_damaged_pages to on; vacuum; pray;
If you want to know which table in your database claims the most diskspace, here is a query that returns the size of the tables from the current database.
A block is 8192 bytes therefore I calculate the kb size in an extra column.
UPDATE: I calculate the size to bytes, so I can use the "pg_size_pretty" function to convert it to a human readable form (KB, MB, GB, ...)
relfilenode column holds the file name for this table / data. You can find it in the data directory from postgres (main/).
The relkind column holds the type of the data and reltuples the count of rows in this table.
SELECT relname, pg_size_pretty(relpages * 8 *1024) as size, relfilenode, relkind, reltuples, relpages FROM pg_class ORDER BY relpages DESC;
relnames starting with pg_toast are TOAST-storage for large tables. Compare the appended number with the relfilenodes to get the associated table.
If you have a lot of huge data, thre will be a lot of toast tables. With this query you get an additional field that shows the toast table name if this table has one, and the original table for each toast table:
SELECT relname, pg_size_pretty(relpages * 8 *1024) AS size, CASE WHEN relkind = 't' THEN (SELECT pgd.relname FROM pg_class pgd WHERE pgd.relfilenode = SUBSTRING(pg.relname FROM 10)) ELSE (SELECT pgc.relname FROM pg_class pgc WHERE pg.reltoastrelid = pgc.relfilenode) END AS refrelname, relfilenode, relkind, reltuples, relpages FROM pg_class pg ORDER BY relpages DESC;
relname | relfilenode | relkind | reltuples | relpages | relpages_kb -----------------------------------------+-------------+---------+-------------+----------+------------- pg_toast_16496 | 16499 | t | 6.74842e+06 | 1684158 | 13473264 eintrag | 16510 | r | 3.97601e+06 | 271484 | 2171872 admin_log | 16496 | r | 9.49351e+06 | 248608 | 1988864 history | 16654 | r | 1.98684e+07 | 204714 | 1637712 ctimes | 16695 | r | 1.36451e+07 | 189826 | 1518608
More about admin functions regarding databasize and table size can be found in the System Admin Functions under the Table 9-48.
To gracefully cancel queries, one can use pg_cancel_backend(). This should be safer than SIGTERMing the according process.
Accessing the Database
psql and other CLI programs
psql password prompts
psql has, unlike the mysql client, no option to supply a password on the command line (which would be insecure). There are two solutions for this problem:
- environment variables:
export PGPASSWORD=password export PGUSER=username export PGHOST=host
- a ~/.pgpass file