How do you find the row count for all your tables in Postgres
I'm looking for a way to find the row count for all my tables in Postgres. I know I can do this one table at a time with:
SELECT count(*) FROM table_name;
but I'd like to see the row count for all the tables and then order by that to get an idea of how big all my tables are.
There's three ways to get this sort of count, each with their own tradeoffs.
If you want a true count, you have to execute the SELECT statement like the one you used against each table. This is because PostgreSQL keeps row visibility information in the row itself, not anywhere else, so any accurate count can only be relative to some transaction. You're getting a count of what that transaction sees at the point in time when it executes. You could automate this to run against every table in the database, but you probably don't need that level of accuracy or want to wait that long.
The second approach notes that the statistics collector tracks roughly how many rows are "live" (not deleted or obsoleted by later updates) at any time. This value can be off by a bit under heavy activity, but is generally a good estimate:
SELECT schemaname,relname,n_live_tup FROM pg_stat_user_tables ORDER BY n_live_tup DESC;
That can also show you how many rows are dead, which is itself an interesting number to monitor.
The third way is to note that the system ANALYZE command, which is executed by the autovacuum process regularly as of PostgreSQL 8.3 to update table statistics, also computes a row estimate. You can grab that one like this:
SELECT nspname AS schemaname,relname,reltuples FROM pg_class C LEFT JOIN pg_namespace N ON (N.oid = C.relnamespace) WHERE nspname NOT IN ('pg_catalog', 'information_schema') AND relkind='r' ORDER BY reltuples DESC;
Which of these queries is better to use is hard to say. Normally I make that decision based on whether there's more useful information I also want to use inside of pg_class or inside of pg_stat_user_tables. For basic counting purposes just to see how big things are in general, either should be accurate enough.
Read more... Read less...
Here is a solution that does not require functions to get an accurate count for each table:
select table_schema, table_name, (xpath('/row/cnt/text()', xml_count))::text::int as row_count from ( select table_name, table_schema, query_to_xml(format('select count(*) as cnt from %I.%I', table_schema, table_name), false, true, '') as xml_count from information_schema.tables where table_schema = 'public' --<< change here for the schema you want ) t
query_to_xml will run the passed SQL query and return an XML with the result (the row count for that table). The outer
xpath() will then extract the count information from that xml and convert it to a number
The derived table is not really necessary, but makes the
xpath() a bit easier to understand - otherwise the whole
query_to_xml() would need to be passed to the
To get estimates, see Greg Smith's answer.
To get exact counts, the other answers so far are plagued with some issues, some of them serious (see below). Here's a version that's hopefully better:
CREATE FUNCTION rowcount_all(schema_name text default 'public') RETURNS table(table_name text, cnt bigint) as $$ declare table_name text; begin for table_name in SELECT c.relname FROM pg_class c JOIN pg_namespace s ON (c.relnamespace=s.oid) WHERE c.relkind = 'r' AND s.nspname=schema_name LOOP RETURN QUERY EXECUTE format('select cast(%L as text),count(*) from %I.%I', table_name, schema_name, table_name); END LOOP; end $$ language plpgsql;
It takes a schema name as parameter, or
public if no parameter is given.
To work with a specific list of schemas or a list coming from a query without modifying the function, it can be called from within a query like this:
WITH rc(schema_name,tbl) AS ( select s.n,rowcount_all(s.n) from (values ('schema1'),('schema2')) as s(n) ) SELECT schema_name,(tbl).* FROM rc;
This produces a 3-columns output with the schema, the table and the rows count.
Now here are some issues in the other answers that this function avoids:
Table and schema names shouldn't be injected into executable SQL without being quoted, either with
quote_identor with the more modern
format()function with its
%Iformat string. Otherwise some malicious person may name their table
tablename;DROP TABLE other_tablewhich is perfectly valid as a table name.
Even without the SQL injection and funny characters problems, table name may exist in variants differing by case. If a table is named
ABCDand another one
SELECT count(*) FROM...must use a quoted name otherwise it will skip
%Iof format does this automatically.
information_schema.tableslists custom composite types in addition to tables, even when table_type is
'BASE TABLE'(!). As a consequence, we can't iterate on
information_schema.tables, otherwise we risk having
select count(*) from name_of_composite_typeand that would fail. OTOH
pg_class where relkind='r'should always work fine.
The type of COUNT() is
int. Tables with more than 2.15 billion rows may exist (running a count(*) on them is a bad idea, though).
A permanent type need not to be created for a function to return a resultset with several columns.
RETURNS TABLE(definition...)is a better alternative.
If you don't mind potentially stale data, you can access the same statistics used by the query optimizer.
SELECT relname, n_tup_ins - n_tup_del as rowcount FROM pg_stat_all_tables;
The hacky, practical answer for people trying to evaluate which Heroku plan they need and can't wait for heroku's slow row counter to refresh:
Basically you want to run
psql, copy the results to your favorite text editor (it will look like this:
public | auth_group | table | axrsosvelhutvw public | auth_group_permissions | table | axrsosvelhutvw public | auth_permission | table | axrsosvelhutvw public | auth_user | table | axrsosvelhutvw public | auth_user_groups | table | axrsosvelhutvw public | auth_user_user_permissions | table | axrsosvelhutvw public | background_task | table | axrsosvelhutvw public | django_admin_log | table | axrsosvelhutvw public | django_content_type | table | axrsosvelhutvw public | django_migrations | table | axrsosvelhutvw public | django_session | table | axrsosvelhutvw public | exercises_assignment | table | axrsosvelhutvw
), then run a regex search and replace like this:
^[^|]*\|\s+([^|]*?)\s+\| table \|.*$
select '\1', count(*) from \1 union/g
which will yield you something very similar to this:
select 'auth_group', count(*) from auth_group union select 'auth_group_permissions', count(*) from auth_group_permissions union select 'auth_permission', count(*) from auth_permission union select 'auth_user', count(*) from auth_user union select 'auth_user_groups', count(*) from auth_user_groups union select 'auth_user_user_permissions', count(*) from auth_user_user_permissions union select 'background_task', count(*) from background_task union select 'django_admin_log', count(*) from django_admin_log union select 'django_content_type', count(*) from django_content_type union select 'django_migrations', count(*) from django_migrations union select 'django_session', count(*) from django_session ;
(You'll need to remove the last
union and add the semicolon at the end manually)
Run it in
psql and you're done.
?column? | count --------------------------------+------- auth_group_permissions | 0 auth_user_user_permissions | 0 django_session | 1306 django_content_type | 17 auth_user_groups | 162 django_admin_log | 9106 django_migrations | 19 [..]
Not sure if an answer in bash is acceptable to you, but FWIW...
PGCOMMAND=" psql -h localhost -U fred -d mydb -At -c \" SELECT table_name FROM information_schema.tables WHERE table_type='BASE TABLE' AND table_schema='public' \"" TABLENAMES=$(export PGPASSWORD=test; eval "$PGCOMMAND") for TABLENAME in $TABLENAMES; do PGCOMMAND=" psql -h localhost -U fred -d mydb -At -c \" SELECT '$TABLENAME', count(*) FROM $TABLENAME \"" eval "$PGCOMMAND" done