Simplify Range Partition Management in AWS RDS Aurora PostgreSQL with pg_partman and pg_cron
Range partitioning is a powerful technique for managing large datasets in a PostgreSQL database. It allows you to divide your data into multiple smaller and more manageable partitions based on a specified range of values. However, setting up and managing range partitions manually can be complex and time-consuming. In this blog post, I will explore how to simplify range partition management in AWS RDS Aurora PostgreSQL using the pg_partman extension.
What is pg_partman?
pg_partman is a PostgreSQL extension that provides native partitioning capabilities with additional features for easier partition management. It simplifies the process of creating, maintaining, and managing partitions in a PostgreSQL database.
Key Capabilities of pg_partman
- Automatic partition creation for time-based partitioning (e.g., monthly partitions).
- Pre-creation of a default partition to capture data outside specified ranges.
- Intelligent retention management for automatically dropping older partitions.
- Dynamic partition maintenance for operations like splitting, merging, and attaching new partitions.
- Monitoring and reporting functions to gather insights into partition sizes and query performance.
If you want to read more about pg_partman here is the pg_partman GitHub Repository
Installing pg_partman in AWS RDS Aurora PostgreSQL
To install pg_partman in AWS RDS Aurora PostgreSQL, follow these steps:
- Connect to your RDS Aurora PostgreSQL instance using a client tool.
- It’s recommended to create the separate schema for pg_partman. To create the schema and install pg_partman extension execute following commands:
CREATE SCHEMA partman;
CREATE EXTENSION pg_partman WITH SCHEMA partman;
Setting up Range Partitioning with pg_partman
Setting up range partitioning with pg_partman involves the following steps:
Choose a column in your table that will be used as the partition key, such as a date or numeric column, and Create the initial parent table manually.
In below example we’re creating the sales table which partitioned by the sale_date
CREATE TABLE public.sales_data (
id SERIAL,
sale_date DATE,
amount NUMERIC
) PARTITION BY RANGE (sale_date);
Parent partitioned table sales_data is created, now let’s configure pg_partman to manage the child partition of sales_data with create_parent function.
SELECT partman.create_parent(
p_parent_table => 'public.sales_data',
p_control => 'sale_date',
p_type => 'native',
p_interval => 'monthly',
p_premake => 6,
p_start_partition => '2022-04-01'
);
Let’s break down each parameter of the create_parent
function in a simpler language:
p_parent_table
: This is the main table that will be partitioned. It should already exist and include the schema name.p_control
: This parameter determines the column that will be used for partitioning. It should be a column with either integer or time-based data types.p_type
: You can choose between two options: 'native' or 'partman'. 'Native' provides better performance and flexibility, while 'partman' relies on inheritance.p_interval
: This specifies the interval or range for each partition. For example, you can use 'monthly', 'daily', or even specify an integer range.p_premake
: This parameter determines the number of partitions to create in advance to accommodate future data inserts.p_start_partition
: This optional parameter specifies the start date or value for the first partition to be created. It allows you to define a specific starting point for your partitions based on your data. For example, if you setp_start_partition
to '2022-04-01', the first partition will be created starting from April 1, 2022.
We have successfully set up pg_partman to manage the partitions of the sales_data table. You can view the configuration in the partman.part_config table. To see all the configuration details, run the following command:
SELECT * from partman.part_config;
If you want to update the existing pg_partman configuration for a specific table, such as changing the retention period for the child tables of sales_data, you can use the following command:
UPDATE partman.part_config
SET infinite_time_partitions = true,
retention = '3 months',
retention_keep_table=true
WHERE parent_table = 'public.sales_data';
By executing this command, you can modify the retention settings to retain data for a specific duration, such as 3 months, and ensure that the child tables are retained while performing partition maintenance.
To ensure the regular maintenance of partitions, we need to periodically execute the run_maintenance_proc
function provided by pg_partman. To automate this process, we can set up a cron job using the pg_cron extension.
Install pg_cron extension in aurora postgres
To install the pg_cron extension in AWS Aurora PostgreSQL, follow these steps:
- Modify the cluster parameter group associated with your PostgreSQL DB instance. Add “pg_cron” to the value of the “shared_preload_libraries” parameter. This parameter controls the shared libraries that are loaded when the database starts.
- Restart your PostgreSQL DB instance to apply the changes made to the parameter group. This will ensure that the pg_cron extension is loaded and available for use.
- Run flowing command to install pg_cron extention
CREATE EXTENSION pg_cron;
Please note that by default, the pg_cron extension saves all its configuration in the PostgreSQL database itself. However, if you want to change the default database where pg_cron stores its data, you can do so by modifying the "cron.database_name" parameter in the cluster parameter group associated with your database instance.
Configure pg_cron to run run_maintenance_proc
periodically
- Connect to the PostgreSQL database of your Aurora cluster.
- Run the following command to schedule the execution of the run_maintenance_proc function on a monthly basis:
SELECT cron.schedule('@monthly', $$CALL partman.run_maintenance_proc()$$);
By executing this command, pg_cron will be configured to automatically execute the run_maintenance_proc
function every month. This function is responsible for performing maintenance tasks on the partitioned tables managed by pg_partman.
As mentioned before, pg_cron stores its configuration in the default database “postgres”. However, if you want to change the configuration and run the cron job in a specific database like “sales”, you can do so by executing the following command:
UPDATE cron.job SET database = 'sales' WHERE jobid = 1;
This setup ensures that partition maintenance is regularly performed without manual intervention, helping to optimize the performance and organization of your partitioned data.
Congratulations!!! 🥳
By utilizing the capabilities of pg_partman and pg_cron extensions in PostgreSQL, you can automate partition management tasks, such as creation, retention, and maintenance, while scheduling their execution at regular intervals. This simplifies data organization, enhances query performance, and improves overall database efficiency.