Duplicater for Duplicate Search

Hi, today I would like to show you very simple idea, that is not implemented in any operating system I know. The Duplicater is a simple solution for find in folders tree duplicates of files with different names but with the same content.

image

Solution is very simple. It analyzes folder structures and get all files and calculates MD5 sums for each file. When any duplicate will be found with the same MD5 sum Duplicater moves it to the MoveTo directory. In this way you can analyze your big storage hard drivers and eliminate duplicates of whatever you have like photos, movies, documents, presentations, music. Simple it looks for all file duplicates. If you like you may parallel calculations for multi-core systems by calling Parallel.For loop. But I am leaving it up to you. Code is very simple and looks like is shown below.

namespace Duplicater
{
    using System;
    using System.Collections.Generic;
    using System.IO;
    using System.Security.Cryptography;
    using System.Text;
    using System.Windows.Forms;

    public partial class MainForm : Form
    {
        public MainForm()
        {
            InitializeComponent();
        }

        private void btSelect1_Click(object sender, EventArgs e)
        {
            if (fbDialog1.ShowDialog() == System.Windows.Forms.DialogResult.OK)
            {
                tbSelect1.Text = fbDialog1.SelectedPath;
            }
        }

        private void btSelect2_Click(object sender, EventArgs e)
        {
            if (fbDialog2.ShowDialog() == System.Windows.Forms.DialogResult.OK)
            {
                tbSelect2.Text = fbDialog2.SelectedPath;
            }
        }

        private void btSelect3_Click(object sender, EventArgs e)
        {
            if (fbMoveTo.ShowDialog() == System.Windows.Forms.DialogResult.OK)
            {
                tbMoveTo.Text = fbMoveTo.SelectedPath;
            }
        }

        private void btMove_Click(object sender, EventArgs e)
        {
            if (string.IsNullOrEmpty(tbSelect1.Text) ||
                string.IsNullOrEmpty(tbSelect2.Text) ||
                string.IsNullOrEmpty(tbMoveTo.Text))
            {
                MessageBox.Show("Select all folder paths first!");
                return;
            }

            var d1 = Directory.EnumerateFiles(tbSelect1.Text, "*.*",
                SearchOption.AllDirectories);
            var d2 = Directory.EnumerateFiles(tbSelect2.Text, "*.*",
                SearchOption.AllDirectories);

            var sums = new HashSet<string>();

            using (var md5 = MD5.Create())
            {
                var count = 0;
                count += MoveDuplicates(d1, sums, md5);
                count += MoveDuplicates(d2, sums, md5);

                MessageBox.Show(
                    string.Format(
                    "{0} duplicates was moved or deleted.", count));
            }
        }

        private int MoveDuplicates(
            IEnumerable<string> directory,
            HashSet<string> sums, MD5 md5)
        {
            var count = 0;
            foreach (var fileName in directory)
            {
                byte[] key = null;
                using (var stream = File.OpenRead(fileName))
                {
                    key = md5.ComputeHash(stream);
                }
                var hex = new StringBuilder(key.Length * 2);
                foreach (byte k in key)
                {
                    hex.AppendFormat("{0:x2}", k);
                }
                var sum = hex.ToString();
                if (sums.Contains(sum))
                {
                    var path = Path.Combine(
                        tbMoveTo.Text, Path.GetFileName(fileName));
                    if (!File.Exists(path))
                    {
                        File.Move(fileName, path);
                    }
                    else
                    {
                        File.Delete(fileName);
                    }
                    count++;
                }
                else
                {
                    sums.Add(sum);
                }
            }

            return count;
        }
    }
}

And here you can download that solution code. The Duplicater Source Code (1754 downloads). There is one more thing, please use this solution as a prototype only. For a product, you can write a service that will use File System changed notification events and calculate the hash sum for everything on your hard drive with automatically move duplicates to the folder. When I build this for my wife who was trying to eliminate duplicates of photos I realized that there is no such thing in any operating system I know. But it would be awesome to always have duplicates free file system and also for Cloud storage systems for a lot of files it also would be nice to eliminate duplicates and save disk space, right. So, think about the big picture when you read that code solution because it is something more. Enjoy!

p ;).

Leave a Reply

Your email address will not be published. Required fields are marked *

*

This site uses Akismet to reduce spam. Learn how your comment data is processed.