Rust, Software Development

Rust Find Duplicate Files With The Same Digests

Rust Lang

This post shows how to find duplicate files in Rust with the same digests. The codes display list of duplicate files under the same digest.

Personal Use Case

I have a lot of files stored in Google Drive. I plan to back them up to external hard disks to minimize my online storage usage costs. To efficiently use the local disks, I need to find duplicate files and get rid of them. Doing so manually is not easy with tons of files. Therefore, I need a small Rust application that only lists all duplicate files.

Find Duplicate Files In Rust

Before we get to the logic, we need to decide on the data structure first. The easiest one is a HashMap whose key type and value type are String and Vec<String>, respectively.

The key represents the hash or digest of duplicate files.

Recursively Traverse Directories

We need to check, go through the files one by one, and recursively traverse directories.

Build Up List Of Duplicate Files

As the codes go through the files, we gather the list of duplicate files using our data structure and the following logic.

First, get the complete file name. Then, generate the hash using the content of the file. If a hash is not available in the HashMap as a key, create an entry using that key and the file name as its value. Otherwise, retrieve the entry and add the file name to the list of duplicate files.

Display Duplicate Files After Finding Them

The remaining codes are self-explanatory. Although we have the duplicate files list with their respective hashes, they also include the unique files. Therefore, we need to display hashes that only have more than one file.

The codes are just basic. We can still improve it. For example, we can get the user’s root directory as an argument input to the program. We can even delete or move the duplicate files and retaining only one copy of the file.

This post is part of the Rust Programming Language For Beginners Tutorial.

You Might Also Like