<?xml version="1.0" encoding="UTF-8"?><rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	>
<channel>
	<title>Comments on: Writing a duplicate file finder in Erlang</title>
	<atom:link href="http://blog.diginux.net/2007/04/03/writing-a-duplicate-file-finder-in-erlang/feed/" rel="self" type="application/rss+xml" />
	<link>http://blog.diginux.net/2007/04/03/writing-a-duplicate-file-finder-in-erlang/</link>
	<description>A place for hopefully something useful, but I really don't know.</description>
	<pubDate>Tue, 06 Jan 2009 07:27:24 +0000</pubDate>
	<generator>http://wordpress.org/?v=MU</generator>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
		<item>
		<title>By: free of slot fortune wheel</title>
		<link>http://blog.diginux.net/2007/04/03/writing-a-duplicate-file-finder-in-erlang/#comment-3550</link>
		<dc:creator>free of slot fortune wheel</dc:creator>
		<pubDate>Sat, 20 Sep 2008 17:02:08 +0000</pubDate>
		<guid isPermaLink="false">http://blog.diginux.net/2007/04/03/writing-a-duplicate-file-finder-in-erlang/#comment-3550</guid>
		<description>download slot fortune wheel of &lt;a href="http://www.al.com/forums/profile.ssf?nickname=owen97" rel="nofollow"&gt;wheel fortune slot of download&lt;/a&gt;</description>
		<content:encoded><![CDATA[<p>download slot fortune wheel of <a href="http://www.al.com/forums/profile.ssf?nickname=owen97" rel="nofollow">wheel fortune slot of download</a></p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Jim Harris</title>
		<link>http://blog.diginux.net/2007/04/03/writing-a-duplicate-file-finder-in-erlang/#comment-3538</link>
		<dc:creator>Jim Harris</dc:creator>
		<pubDate>Tue, 05 Aug 2008 16:43:54 +0000</pubDate>
		<guid isPermaLink="false">http://blog.diginux.net/2007/04/03/writing-a-duplicate-file-finder-in-erlang/#comment-3538</guid>
		<description>I can't get the source. Save never finishes; times out repeatedly, on separate days.

Please advise.

Thanks for any help.

Jim</description>
		<content:encoded><![CDATA[<p>I can&#8217;t get the source. Save never finishes; times out repeatedly, on separate days.</p>
<p>Please advise.</p>
<p>Thanks for any help.</p>
<p>Jim</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: thepetite</title>
		<link>http://blog.diginux.net/2007/04/03/writing-a-duplicate-file-finder-in-erlang/#comment-2123</link>
		<dc:creator>thepetite</dc:creator>
		<pubDate>Sun, 26 Aug 2007 17:44:59 +0000</pubDate>
		<guid isPermaLink="false">http://blog.diginux.net/2007/04/03/writing-a-duplicate-file-finder-in-erlang/#comment-2123</guid>
		<description>comment - thepetite</description>
		<content:encoded><![CDATA[<p>comment - thepetite</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: diginux</title>
		<link>http://blog.diginux.net/2007/04/03/writing-a-duplicate-file-finder-in-erlang/#comment-435</link>
		<dc:creator>diginux</dc:creator>
		<pubDate>Wed, 09 May 2007 12:10:44 +0000</pubDate>
		<guid isPermaLink="false">http://blog.diginux.net/2007/04/03/writing-a-duplicate-file-finder-in-erlang/#comment-435</guid>
		<description>Pádraig,

Actually not a bad tool now that I have tried it. I just saw it was a GUI, and no information was listed about the required libraries for it, so I just assumed it was a pain. You should make your introductory page of it a little more informative.

Thanks!
Jordan Wilberding</description>
		<content:encoded><![CDATA[<p>Pádraig,</p>
<p>Actually not a bad tool now that I have tried it. I just saw it was a GUI, and no information was listed about the required libraries for it, so I just assumed it was a pain. You should make your introductory page of it a little more informative.</p>
<p>Thanks!<br />
Jordan Wilberding</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Pádraig Brady</title>
		<link>http://blog.diginux.net/2007/04/03/writing-a-duplicate-file-finder-in-erlang/#comment-434</link>
		<dc:creator>Pádraig Brady</dc:creator>
		<pubDate>Wed, 09 May 2007 11:29:09 +0000</pubDate>
		<guid isPermaLink="false">http://blog.diginux.net/2007/04/03/writing-a-duplicate-file-finder-in-erlang/#comment-434</guid>
		<description>Hey you didn't provide a link to FSlint. boo hiss :)
http://www.pixelbeat.org/fslint/

You can download the tarball of fslint.
Also there is an ebuild file linked prominently on that page.

I've been very careful about performance in fslint,
and I've found nothing as robust or fast even though
it's written in shell script.

cheers,
Pádraig.</description>
		<content:encoded><![CDATA[<p>Hey you didn&#8217;t provide a link to FSlint. boo hiss :)<br />
<a href="http://www.pixelbeat.org/fslint/" rel="nofollow">http://www.pixelbeat.org/fslint/</a></p>
<p>You can download the tarball of fslint.<br />
Also there is an ebuild file linked prominently on that page.</p>
<p>I&#8217;ve been very careful about performance in fslint,<br />
and I&#8217;ve found nothing as robust or fast even though<br />
it&#8217;s written in shell script.</p>
<p>cheers,<br />
Pádraig.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: diginux</title>
		<link>http://blog.diginux.net/2007/04/03/writing-a-duplicate-file-finder-in-erlang/#comment-64</link>
		<dc:creator>diginux</dc:creator>
		<pubDate>Tue, 17 Apr 2007 17:30:58 +0000</pubDate>
		<guid isPermaLink="false">http://blog.diginux.net/2007/04/03/writing-a-duplicate-file-finder-in-erlang/#comment-64</guid>
		<description>I'll take your ideas into consideration for a version 2. I know it is "cheating", but in general I still got the same accuracy of results in less time. Also, it seemed quicker to take 512 bytes, hash it, then compare the 16 bit hashes, then it took to just compare the 512 bytes. I admit that doesn't make much sense, but I didn't have time to investigate further.

Thanks for your comments, I will have to check your implementation out!</description>
		<content:encoded><![CDATA[<p>I&#8217;ll take your ideas into consideration for a version 2. I know it is &#8220;cheating&#8221;, but in general I still got the same accuracy of results in less time. Also, it seemed quicker to take 512 bytes, hash it, then compare the 16 bit hashes, then it took to just compare the 512 bytes. I admit that doesn&#8217;t make much sense, but I didn&#8217;t have time to investigate further.</p>
<p>Thanks for your comments, I will have to check your implementation out!</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Justin</title>
		<link>http://blog.diginux.net/2007/04/03/writing-a-duplicate-file-finder-in-erlang/#comment-27</link>
		<dc:creator>Justin</dc:creator>
		<pubDate>Sat, 14 Apr 2007 03:04:45 +0000</pubDate>
		<guid isPermaLink="false">http://blog.diginux.net/2007/04/03/writing-a-duplicate-file-finder-in-erlang/#comment-27</guid>
		<description>It looks like you are only looking at part of the files, which is cheating :)  You actually do not need to calculate any md5 hashes in order to implement a duplicate file finding program efficiently.  The Perl program you referenced seems to get things almost right, but it looks like it does too much reading.  In your case, you could probably just store the 256+256 byte chunks as the "hash" itself, rather than the md5 hash of it.

I have a duplicate finder program implemented in python, but the general idea would be able to be implemented in Erlang as well.  The method I use is to find all files of the same size, and then compare them all to each other 8K at a time.  With this method, the program only reads as many bytes as it needs to to determine if files are different, or the same.  It only ends up reading an entire file if it is actually a duplicate.  This makes it extremely IO efficient, while also being 100% accurate. 

http://bouncybouncy.net/dupes.py</description>
		<content:encoded><![CDATA[<p>It looks like you are only looking at part of the files, which is cheating :)  You actually do not need to calculate any md5 hashes in order to implement a duplicate file finding program efficiently.  The Perl program you referenced seems to get things almost right, but it looks like it does too much reading.  In your case, you could probably just store the 256+256 byte chunks as the &#8220;hash&#8221; itself, rather than the md5 hash of it.</p>
<p>I have a duplicate finder program implemented in python, but the general idea would be able to be implemented in Erlang as well.  The method I use is to find all files of the same size, and then compare them all to each other 8K at a time.  With this method, the program only reads as many bytes as it needs to to determine if files are different, or the same.  It only ends up reading an entire file if it is actually a duplicate.  This makes it extremely IO efficient, while also being 100% accurate. </p>
<p><a href="http://bouncybouncy.net/dupes.py" rel="nofollow">http://bouncybouncy.net/dupes.py</a></p>
]]></content:encoded>
	</item>
	<item>
		<title>By: diginux</title>
		<link>http://blog.diginux.net/2007/04/03/writing-a-duplicate-file-finder-in-erlang/#comment-6</link>
		<dc:creator>diginux</dc:creator>
		<pubDate>Thu, 05 Apr 2007 01:27:22 +0000</pubDate>
		<guid isPermaLink="false">http://blog.diginux.net/2007/04/03/writing-a-duplicate-file-finder-in-erlang/#comment-6</guid>
		<description>I probably will in the next version if I update it. From my testing on my own files, it just wasn't needed, and I was trying to keep the code as clean as possible and reduce the number of comparisons needed. I will definitely do it though if I update the code.</description>
		<content:encoded><![CDATA[<p>I probably will in the next version if I update it. From my testing on my own files, it just wasn&#8217;t needed, and I was trying to keep the code as clean as possible and reduce the number of comparisons needed. I will definitely do it though if I update the code.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Dan</title>
		<link>http://blog.diginux.net/2007/04/03/writing-a-duplicate-file-finder-in-erlang/#comment-5</link>
		<dc:creator>Dan</dc:creator>
		<pubDate>Wed, 04 Apr 2007 22:30:26 +0000</pubDate>
		<guid isPermaLink="false">http://blog.diginux.net/2007/04/03/writing-a-duplicate-file-finder-in-erlang/#comment-5</guid>
		<description>For better accuracy, (assuming you didn't already do this) why don't you just take the ones that hash the same with the first and last 256 bytes, and do a full comparison on just those files?</description>
		<content:encoded><![CDATA[<p>For better accuracy, (assuming you didn&#8217;t already do this) why don&#8217;t you just take the ones that hash the same with the first and last 256 bytes, and do a full comparison on just those files?</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: diginux</title>
		<link>http://blog.diginux.net/2007/04/03/writing-a-duplicate-file-finder-in-erlang/#comment-4</link>
		<dc:creator>diginux</dc:creator>
		<pubDate>Wed, 04 Apr 2007 18:57:26 +0000</pubDate>
		<guid isPermaLink="false">http://blog.diginux.net/2007/04/03/writing-a-duplicate-file-finder-in-erlang/#comment-4</guid>
		<description>Vince, I appreciate testing it in Windows. I did not have the time to try it out yet. Good to hear that it works well!</description>
		<content:encoded><![CDATA[<p>Vince, I appreciate testing it in Windows. I did not have the time to try it out yet. Good to hear that it works well!</p>
]]></content:encoded>
	</item>
</channel>
</rss>
