Shell Script: Delete Old Backup Files By Name Pattern
Hey guys! Ever found yourself drowning in a sea of old backup files? It's a common problem, especially when you're diligently backing up databases. Those files can quickly eat up your storage, and sifting through them manually to delete the old ones? Ain't nobody got time for that! So, let's dive into crafting a Shell/Bash script that'll automatically delete those outdated backups based on their names and a specific pattern. This is a surefire way to keep your storage tidy and your sanity intact.
Understanding the Backup File Naming Convention
Before we jump into the script, let's break down the naming convention of your backup files. You mentioned they follow this pattern: prodYYYYMMDD_HHMM.sql.gz, like prod20210528_1200.sql.gz. This is actually a pretty smart and organized way to name backups because it includes the date and time of creation right in the filename. This makes it easy to identify and sort files based on their age. We can leverage this pattern in our script to pinpoint and delete older files efficiently.
prod: This is likely a prefix indicating the production environment or database. It's a good practice to include such prefixes to differentiate backups from different environments.YYYYMMDD: This represents the year, month, and day the backup was created. For example,20210528means May 28, 2021._: An underscore is used as a separator to improve readability.HHMM: This represents the hour and minute the backup was created, in 24-hour format. For instance,1200means 12:00 PM..sql.gz: This is the file extension, indicating that it's a gzipped SQL dump file. SQL dumps are common formats for database backups, and gzip compression helps reduce their size.
Knowing this pattern is crucial because our script will use it to identify files that match the backup naming scheme and then determine which ones are old enough to be deleted. The date +\%Y\%m\%d_\%H\%M part you mentioned is the key to dynamically generating the current date and time in the same format, which we'll use for comparison.
Crafting the Shell/Bash Script
Alright, let's get our hands dirty and write the script! We'll break it down step-by-step so you can understand exactly what's going on. This script will essentially do the following:
- Define Variables: We'll set variables for the backup directory, the filename pattern, and the retention period (how old a file needs to be before it's deleted).
- Calculate the Cutoff Date: Based on the retention period, we'll calculate the date before which files should be deleted.
- Find Old Backup Files: We'll use the
findcommand to locate files that match the filename pattern and are older than the cutoff date. - Delete the Old Files: Finally, we'll use the
rmcommand to delete the identified files.
Here's the script:
#!/bin/bash
# --- Configuration ---
BACKUP_DIR="/path/to/your/backups" # Replace with the actual path to your backup directory
FILE_PATTERN="prod[0-9]\\{8\\}_[0-9]\\{4\\}.sql.gz" # Pattern to match backup files
RETENTION_DAYS=7 # Number of days to keep backups
# --- Calculate Cutoff Date ---
CUTOFF_DATE=$(date -d "$(date +%Y-%m-%d) -$RETENTION_DAYS days" +%Y%m%d)
# --- Find and Delete Old Files ---
find "$BACKUP_DIR" -maxdepth 1 -type f -regextype posix-extended -regex ".*/${FILE_PATTERN}" -printf '%T+ %p\\n' | \
sort -z -r | \
while IFS= read -r -d {{content}}#39;\0' file_info; do
file_date=$(echo "$file_info" | awk '{print substr($2, length($2)-14, 8)}')
file_path=$(echo "$file_info" | awk '{print $2}')
if [[ "$file_date" -lt "$CUTOFF_DATE" ]]; then
echo "Deleting: $file_path"
rm "$file_path"
fi
done
echo "Old backups cleanup complete."
Let's dissect this script piece by piece:
#!/bin/bash: This shebang line tells the system to use Bash to execute the script. It's crucial for making the script executable.# --- Configuration ---: This is a comment block, making the script more readable by grouping related settings.BACKUP_DIR="/path/to/your/backups": This line defines theBACKUP_DIRvariable, which should be set to the actual path where your backup files are stored. Remember to replace/path/to/your/backupswith your actual directory!FILE_PATTERN="prod[0-9]\\{8\\}_[0-9]\\{4\\}.sql.gz": This sets theFILE_PATTERNvariable, which is a regular expression that matches your backup filenames. Let's break down the regex:prod: Matches the literal string "prod".[0-9]\\{8\\}: Matches exactly 8 digits (YYYYMMDD)._: Matches the underscore separator.[0-9]\\{4\\}: Matches exactly 4 digits (HHMM)..sql.gz: Matches the file extension.
RETENTION_DAYS=7: This sets theRETENTION_DAYSvariable to 7, meaning we want to keep backups for 7 days. You can adjust this to your desired retention period.# --- Calculate Cutoff Date ---: Another comment block for organization.CUTOFF_DATE=$(date -d "$(date +%Y-%m-%d) -$RETENTION_DAYS days" +%Y%m%d): This is the magic line that calculates the cutoff date. It uses thedatecommand to:- Get the current date in
YYYY-MM-DDformat (date +%Y-%m-%d). - Subtract the retention days (
-$RETENTION_DAYS days). - Format the resulting date in
YYYYMMDDformat (+%Y%m%d). - The result is stored in the
CUTOFF_DATEvariable.
- Get the current date in
# --- Find and Delete Old Files ---: You guessed it, another comment block!- The core logic of finding and deleting files is handled by this block:
find "$BACKUP_DIR" -maxdepth 1 -type f -regextype posix-extended -regex ".*/${FILE_PATTERN}" -printf '%T+ %p\\n': This is thefindcommand that does the heavy lifting:"$BACKUP_DIR": Specifies the directory to search in.-maxdepth 1: Limits the search to the specified directory, preventing it from going into subdirectories.-type f: Specifies that we're looking for files.-regextype posix-extended: Enables extended regular expression syntax.-regex ".*/${FILE_PATTERN}": This is where the filename pattern matching happens. It searches for files whose full path matches the specified regular expression.-printf '%T+ %p\\n': Formats the output to include the modification time and the full file path, separated by a space.
sort -z -r: Sorts the results in reverse chronological order (newest first) by modification time.- The while loop processes each file found by
find:file_date=$(echo "$file_info" | awk '{print substr($2, length($2)-7, 8)}'): Extracts the date part (YYYYMMDD) from the filename.file_path=$(echo "$file_info" | awk '{print $2}'): Extracts the full file path.if [[ "$file_date" -lt "$CUTOFF_DATE" ]]: Compares the file's date with the cutoff date. If the file's date is older than the cutoff date, the following block is executed.echo "Deleting: $file_path": Prints a message indicating which file is being deleted. This is a good practice for logging and debugging.rm "$file_path": Deletes the file. This is the point of no return! Make sure you've tested the script thoroughly before running it on important backups.
echo "Old backups cleanup complete.": Prints a message indicating that the cleanup process is finished.
Making the Script Executable and Running It
Once you've saved the script (let's say as cleanup_backups.sh), you need to make it executable. Open your terminal and run the following command:
chmod +x cleanup_backups.sh
This command adds execute permissions to the script file. Now you can run the script by typing:
./cleanup_backups.sh
The script will then go through the process of finding and deleting old backup files, printing messages as it goes. Remember to double-check the output and your backup directory after the first run to ensure everything went as planned.
Scheduling the Script with Cron
To automate the backup cleanup process, you can schedule the script to run regularly using cron. Cron is a time-based job scheduler in Unix-like operating systems. To schedule a cron job, you need to edit the crontab file. Open your terminal and type:
crontab -e
This will open the crontab file in a text editor. Add a line like the following to schedule the script to run every day at 1:00 AM:
0 1 * * * /path/to/your/cleanup_backups.sh
Let's break down this cron expression:
0: Minute (0-59)1: Hour (0-23)*: Day of the month (1-31)*: Month (1-12)*: Day of the week (0-6, where 0 is Sunday)/path/to/your/cleanup_backups.sh: The full path to your script.
Remember to replace /path/to/your/cleanup_backups.sh with the actual path to your script. Save the crontab file, and cron will take care of running the script according to your schedule.
Important Considerations and Best Practices
Before you unleash this script on your precious backups, let's talk about some crucial considerations and best practices:
- Testing is Key: Always, always, always test the script thoroughly in a non-production environment before running it on your live backups. Create a test directory, populate it with some dummy backup files, and run the script to see if it behaves as expected. This will save you from potential data loss nightmares.
- Double-Check the
BACKUP_DIR: Make absolutely sure that theBACKUP_DIRvariable is set correctly. If it's pointing to the wrong directory, you might end up deleting files you didn't intend to delete. - Understand the
FILE_PATTERN: The regular expression inFILE_PATTERNis the heart of the script's file identification logic. Make sure it accurately matches your backup filenames and doesn't accidentally match other files. - Dry Run Mode: Consider adding a "dry run" mode to your script. This mode would print the files that would be deleted without actually deleting them. This is a great way to preview the script's actions before committing to them. You can implement this using a command-line argument (e.g.,
-nor--dry-run) and anifstatement to conditionally execute thermcommand. - Logging: Implement proper logging in your script. Write messages to a log file indicating which files were deleted and when. This will help you track the script's activity and troubleshoot any issues.
- Error Handling: Add error handling to your script. Check for potential errors, such as the backup directory not existing or the
rmcommand failing, and handle them gracefully. This might involve printing error messages, sending email notifications, or exiting the script with a non-zero exit code. - Backup Your Backups (Yes, Really!): It might sound like overkill, but it's always a good idea to have a backup of your backups. Consider using a separate backup system or offsite storage to protect your backups from accidental deletion or other disasters.
Conclusion
So there you have it, guys! A comprehensive guide to crafting a Shell/Bash script for deleting old backup files based on their names and a specific pattern. This script can be a lifesaver when it comes to managing your storage and keeping your backups organized. Just remember to test thoroughly, understand the script's logic, and implement best practices to ensure the safety of your data. Happy scripting!