Remove duplicated files …

Remove duplicates between two directories, e.g. to_keep and old_backup_to_check

First run fdupes -r, which print blocks of duplicates files

fdupes -r to_keep old_backup_to_check > dupes.txt

Then I small python script to check if at least one file of the block is from to_keep, it so remove all duplicates from old_backup_to_check

import os

filename = "./dupes.txt"
to_keep = "to_keep"

with open(filename, 'r', encoding='UTF-8') as file:
    files = []
    while line := file.readline():
        line = line.strip() # remove spaces and '\n'
        if line:
            files.append(line)
        else:
            if any(l.startswith(to_keep) for l in files):
                for l in files:
                    if not l.startswith(to_keep):
                        if os.path.exists(l):
                            print("delete: ", l)
                            # uncomment next line to actually remove
                            # os.remove(l)
                        else:
                            print("the file: ",l, "does not exist") 
                        pass

            else:
                print("old backup only: ", files)
            files=[]
            

Then to remove empty directories

fdfind -td -te . old_backup_to_check -x rmdir

as many time as needed.

Leave a Reply

Your email address will not be published. Required fields are marked *