Welcome Guest! To enable all features, please Login or Register.

Notification

Icon
Error

Options
View
Last Go to last post Unread Go to first unread post
#1 Posted : Friday, June 10, 2016 10:14:37 AM(UTC)
WillJenkins24

Groups: Registered, Tech Support
Posts: 6


I want to make an application which can find duplicate word files regardless of the file name. By that I mean that the contents of the files consist of the same exact content, word for word. Does anyone have any idea how this can be done?
 

Try the latest version of LEADTOOLS for free for 60 days by downloading the evaluation: https://www.leadtools.com/downloads

Wanna join the discussion? Login to your LEADTOOLS Support accountor Register a new forum account.

#2 Posted : Tuesday, July 5, 2016 6:11:31 AM(UTC)
Nick Villalobos

Groups: Registered
Posts: 119

Was thanked: 4 time(s) in 4 post(s)

You can try doing this by reading the text from your DOC file and save it in string[] (array of strings) so that each element in the array contains a word. After this, open the same doc file for read and start reading the text word by word, and search for each word in the String[] array. If you find the same word in two different places in the same string array, delete that last element; and keep the first one. Then jump to the next word in the file. I think the required functions of doing this are explained in the following article (as Priyaranjan described):
https://social.msdn.microsoft.com/Forums/en-US/ab821d14-bfbc-4c08-b44b-7a5d293ecb2c/compare-word-documents-c?forum=isvvba

If you prefer not to use the above approach, you could try converting the doc file to text format using document converter class from Leadtools. This class can convert the doc files to text even if it contains images. Dealing with Text files should be much easier than doc files.
After this, you can use the same approach above by reading the text from the text files (*.txt) instead of reading it from word documents. You can find more details about converting your DOC files here <Link to: https://www.leadtools.co...entconverters_using.html

Nick Villalobos
Developer Support Engineer
LEAD Technologies, Inc.

LEAD Logo
 
You cannot post new topics in this forum.
You cannot reply to topics in this forum.
You cannot delete your posts in this forum.
You cannot edit your posts in this forum.
You cannot create polls in this forum.
You cannot vote in polls in this forum.

Powered by YAF.NET | YAF.NET © 2003-2024, Yet Another Forum.NET
This page was generated in 0.051 seconds.