Securely Renaming a Massive Amount of Files with PHP – Part 3

Part One: Reading a folder structure recursively.
Part Two: Renaming the files (uniquely) and storing new and old file paths.
Part Three: Copy the files into a sequenced folder system.

This is the third and final part of my tutorial regarding renaming and copying files with new secure names. In the first two sections we covered:

  • how to set up PHP locally
  • how to call our PHP script in command line
  • how to recursively read a directory
  • how to store file names in an array
  • how to create UNIQUE hashed versions of the file names, and store them as the array keys.

Now all that is left is copying the files, and logging the new names paired with the old names. You could always skip the copying part, and simply change the name of the current files, but I DO NOT RECOMMEND this approach. A simple error while doing this script could lose all information of which file is which, and that would be disastrous.


Inside our __construct() function I will call copy_files(), after the directory has been fully read and stored in our $files_array.

public function copy_files() {

        if( ! is_dir( NEW_DIR ) ) {
            if ( ! mkdir( NEW_DIR ) ) die('Failed to make '. NEW_DIR );
        }

        // Count for # of files in current directory
        $counter = 0;
        // Directory number
        $sub_dir_num = 1;

        // Folder name - This will add 0's to the beginning of our folder counter to
        // always have 4 digits (0001, 0002, etc.)
        $sub_dir = str_pad($sub_dir_num, 4, "0", STR_PAD_LEFT);
        echo $sub_dir;

        if( ! is_dir( NEW_DIR . '\\' . $sub_dir ) ) {
            if ( ! mkdir( NEW_DIR . '\\' . $sub_dir ) ) die('Failed to make '. NEW_DIR. '\\' . $sub_dir );
        }

        foreach( self::$file_names as $file_key => $old_file ) {
            // Create the new file path
            $new_file = NEW_DIR . "\\" . $sub_dir . "\\" . $file_key . self::get_ext( $old_file );
            
            // Store new and old file paths in log array
            self::$log_array[] = array( $new_file, $old_file );

            // If the file fails to copy, error out.
            if( ! copy( $old_file, $new_file ) ) die('Could not copy ' . $old_file . ' to ' . $new_file);
            else self::$copied++;

            // increment counter, if 99 files are in the folder, create new subfolder
            // and reset counter
            $counter++;
            if($counter >= 99 ) {
                $counter=0;
                $sub_dir_num++;
                $sub_dir = str_pad($sub_dir_num, 4, "0", STR_PAD_LEFT);
                
                if( ! is_dir( NEW_DIR . '\\' . $sub_dir ) ) {
                    if ( ! mkdir( NEW_DIR . '\\' . $sub_dir ) ) die('Failed to make '. NEW_DIR. '\\' . $sub_dir );
                }
            }
        }
    }

This function will loop through all files that read_directories reads, and copy them into their new directory. It will also store the new and old file paths for the log.

The only thing left to do is create the logfile we added in our parameters! We will call this function after the files have been copied.

public function create_log() {
        if( $log = fopen( LOG_FILE. '.csv', 'w' ) ) {

            foreach( self::$log_array as $file ) {
                fputcsv( $log, $file );
            }
            fclose( $log );
            echo "\nLog Filed in " . LOG_FILE . ".csv\n\n";
        } else {
            echo "\n Failed to write " . LOG_FILE . ".csv\n\n";
        }
    }

We made it! Once the process is done (it can take a while, I suggest doing this over lunch if you have near 43,000 files like I did). You will be left with your new directory and a lovely logfile where you specified!

For you lazy folk (all the me’s out there) here is the full script:

<?php

/* Arguments for this script:
    -c  : Current Directory
    -n  : New Directory
    -l  : Log file name
*/
$args = getopt('c:n:l:');

define ('CURRENT_DIR', $args['c'] );
define ('NEW_DIR', $args['n'] );
define ('LOG_FILE', $args['l'] );

$hasher = new DirectoryHasher();

class DirectoryHasher {

    public static $file_names = array();
    
    public static $log_array = array();
    
    public static $copied = 0;

    public function __construct() {

        self::read_directories( CURRENT_DIR );

        self::copy_files();
        
        self::create_log();
        
        echo count( self::$file_names ) . ' files found, ' . self::$copied . " copied successfully \n\n\n";
        
    }


    public function read_directories( $dir ) {
        $files = scandir( $dir );

        foreach($files as $file ) {
            $file_path = $dir . "\\" . $file;

            if( substr($file, 0, 1) !="." ) {
                if( is_dir( $file_path ) )  {
                    self::read_directories( $file_path );           
                } else {
                    $hash = md5($file_path);
                    if (! array_key_exists( $hash, self::$file_names) ) {
                        self::$file_names[$hash] = $file_path;
                    } else {
                        $key = 1;
                        $new_hash = $hash;
                        while( array_key_exists( $new_hash, self::$file_names) ) {
                            $new_file = self::new_file_name( $file, $key );
                            $new_hash = md5( $new_file );
                            $key++;
                        }
                        self::$file_names[$new_hash] = $file_path;
                    }
                }
            }
        }
    }
    
    public function new_file_name($file, $key) {
        $ext = self::get_ext($file);
        $file_no_ext = self::remove_ext($file);
        return $file_no_ext . $key . $ext;
    }
    
    public function get_ext( $file ) {
        return "." . pathinfo($file, PATHINFO_EXTENSION);
    }
    
    public function remove_ext( $file ) {
        return substr($file, 0, strrpos($file, '.')); 
    }

    public function copy_files() {

        if( ! is_dir( NEW_DIR ) ) {
            if ( ! mkdir( NEW_DIR ) ) die('Failed to make '. NEW_DIR );
        }

        // Count for # of files in current directory
        $counter = 0;
        // Directory number
        $sub_dir_num = 1;

        // Folder name
        $sub_dir = str_pad($sub_dir_num, 4, "0", STR_PAD_LEFT);
        echo $sub_dir;

        if( ! is_dir( NEW_DIR . '\\' . $sub_dir ) ) {
            if ( ! mkdir( NEW_DIR . '\\' . $sub_dir ) ) die('Failed to make '. NEW_DIR. '\\' . $sub_dir );
        }

        foreach( self::$file_names as $file_key => $old_file ) {
            $new_file = NEW_DIR . "\\" . $sub_dir . "\\" . $file_key . self::get_ext( $old_file );
            self::$log_array[] = array( $new_file, $old_file );

            if( ! copy( $old_file, $new_file ) ) die('Could not copy ' . $old_file . ' to ' . $new_file);
            else self::$copied++;

            $counter++;
            if($counter >= 99 ) {
                $counter=0;
                $sub_dir_num++;
                $sub_dir = str_pad($sub_dir_num, 4, "0", STR_PAD_LEFT);
                
                if( ! is_dir( NEW_DIR . '\\' . $sub_dir ) ) {
                    if ( ! mkdir( NEW_DIR . '\\' . $sub_dir ) ) die('Failed to make '. NEW_DIR. '\\' . $sub_dir );
                }
            }
        }
    }

    public function create_log() {
        if( $log = fopen( LOG_FILE. '.csv', 'w' ) ) {

            foreach( self::$log_array as $file ) {
                fputcsv( $log, $file );
            }
            fclose( $log );
            echo "\nLog Filed in " . LOG_FILE . ".csv\n\n";
        } else {
            echo "\n Failed to write " . LOG_FILE . ".csv\n\n";
        }
    }

}
?>

Leave a Reply

Your email address will not be published. Required fields are marked *