Cannot move duplicates to temporary folder

Applications / Tips

Cannot move duplicates to temporary folder


TooMuchData 03-12-2006, 2:28

Using Enterprise edition, I first tried to move duplicate iTunes files on my remote drive.  NoClone found all the dups and smart mark did a good job in selecting duplicates. When I tried to move the duplicates to a temp folder, the system said it could not locate a certain file. I had to click on that screen, then another screen popped up with "File error" and it needed a click on the ok button. This happened multiple times. I gave up using NoClone on iTunes thinking it was something with iTunes. Total data set size was about 2,400 original files using 10G.  At the time I started, there were 10,000 files and 40G. I fixed it without NoClone.

Today I ran NoClone on my photos. 6.5 hours of runtime for 58,000 files in 80G. NoClone found the duplicates (44,000 as a result of the scan) and Smart Mark worked just fine. Then when I tried to move the marked files to a temp folder (on the same remote drive with the name "Throw Me Away") I got a number of the same error messages. First "cannot locate the file" followed by "File Error". Both require manual interaction. This happened mutiple times so I was never able to complete the transfer. In fact, NoClone then crashed so I lost the 6.5 hours of work and I am now rerunning the search. 

This work is being done on a Windows XP machine with a networked remote HD.  While I am doing this, I have made sure that no other software is running on all my home computers (4 of them) that would access the photos file on the remote drive. So, nothing should be causing the file names to change between the time the scan is done and the time NoClone is to move the found duplicate files.

Help please!

Re: Cannot move duplicates to temporary folder


TooMuchData 03-12-2006, 16:39

  The overnite run took 5:15 because I set a lower limit of 1M.  There are roughly half the number of files as I mentioned previously. Again, I am trying to transfer the identified duplicates out of my photos folder on the remote drive and put them in a temp folder on the same drive.  I continue to get these error messages.  See the attachment for examples.

I did go to the column in the report called Report-read errors, selected all 126 entries and then selected "Ignore" which I hope would cause the software to ignore these errors.

As I write this the system is "moving" the files to the temp folder and I have gotten ~6 of the error messages shown in the attachment.

Re: Cannot move duplicates to temporary folder


daikatana 04-04-2006, 18:04
If you are willing to get your hands dirty, there's a workaround for your problem. It involves: exporting the results to a CSV file, editing the file in Excel, and creating a batch file (CMD file) with the "Move" command (in WinXP) for each file that needs to be moved and executing the batch file.
NOTE:
Win2k/XP has MOVE command among its commands, but older Windows platforms (9x, ME, NT) don't have a single (file) MOVE command! In those platforms, you might have to use COPY and DEL commands in a two step operation. The first step would be a batch file that performs COPY operation for each file that needs to be moved to a specific folder, then in the second step, another batch file performs the DEL operation for each file (i.e. delete them from their original folders) that were previously copied to the specific folder. Excel can be used to generate the batch files for either the single-step MOVE operation or the two-step COPY and DEL operations.
The part I'm not sure about is if the Win2k/XP command prompt would be able to execute the MOVE and/or the COPY commands when large files are involved.

Here's the method, if you're willing, and if you have a copy of Excel and can work with various Excel functions:
1. I'm assuming that you're using smart-mark in NoClone to mark the files you want to move. That means, you're probably marking files based on date (oldest files to be marked for later deletion). Unfortunately, simply exporting the results into CSV file won't help you because "marked files" aren't marked or tagged in the CSV file; i.e. there is no separate field in CSV file, that contains a fieldname "selected" or marked"

    -- hint! hint! NoClone developers! Please add the option of specifying fieldnames or header data prior to export into a CSV file, where the user can select among a list of various fieldnames which fields they want to keep, such as name, folder name, duplicate-number, file-size, file-date, file-version-no, marked/unmarked-tag, and so on!

For our purposes we need the results sorted by duplicate-no and date prior to exporting into a CSV file to help ease working with the file in Excel. If you click on the file-date column header so that the list is sorted with the newest date on top and the oldest at the bottom, and then click on the duplicate-no column header to resort by duplicate number, you'll end up with duplicate file groups, where the newest in each group is listed at the top of the group followed by the duplicate files of older date. Then, you can export the list as a CSV file, ready to be used in Excel. (NOTE: if you have Excel already installed and the CSV extension associated with Excel, then NoClone will ask to open the file automatically once the information is written to the file, and you can accept the suggestion to get to work on immediately in Excel. Otherwise, you need to start Excel, then import the CSV file manually!)    [see cap-01.gif and cap-02.gif]

Working in Excel:
When the file is opened or imported into Excel, make sure that each record is divided into the relevant columns: i.e. filename is in one column followed by folder or file-path information in the following column, and so on.
IMPORTANT:
Select each column separately and format the contents as per data type. In this case, Filename, In Folder  columns (columns A and B) should be formatted as TEXT (Format -- Cells -- choose "text"). Columns C, D (Duplicate #, and file Size columns) should be formatted as Integer (Format -- Cells -- choose "number", with no decimal places, and you can select thousand separator). Column E (Date Modified) should be formatted as date (Format -- Cells -- Custom, type: <yyyy-mm-dd hh:mm>). The formatting is important, because it will prevent any mistakes due to data type mismatch during sorting or comparison operations!
Also, make sure that the sorting is in order: i.e. each group of duplicate files (which you can identify by the number in the "duplicate number" column being the same for the group of files) is ordered such that the newest dated file is at the top of the group of files followed by the older dated duplicates. If not, then you need to select the rows and columns of data (the data table in Excel-speak) and re-sort them.
  • Select all the data
  • click on Data menu and choose sort
  • in the dialogue window, in 'Sort by', choose "Duplicate #" - and ascending, and in 'Then by', choose "Date Modified" - descending, make sure My List Has "Header row" chosen, and click on OK. Your list or data table will be sorted correctly.
[see cap-03.gif]

Save the file as an Excel workbook so that the original CSV file is left intact in case of problems and you need it again.

Looking at the Excel file, you have all the information you need to work with to identify which files you want to keep (newest file in a duplicate-group) and which files you need to move (the older files in a duplicate group).

Excel has several great functions to help you out. The first one you'll need is "IF()" function, which you'll use to compare "Duplicate #" column data row by row.  In my example (referring to cap-04.gif), column C has "duplicate #" information for each file, and the first file in each group that have the same "duplicate #" is the newest one (placed at the top of grouping). So, we write "IF()" formula in an empty column (H) comparing each row to its predecessor in column C (the data in "duplicate #" column header).  Thus the cell G3 contains the formula:
=IF(C3=C2,"del","keep")
and we copy this formula and paste it to all the rows below G3 under the column G. Since between different duplicate groups, the "duplicate #" will be different, the topmost file in each duplicate-group will be tagged with "keep" and the rest of the files in the same duplicate group will be tagged with "del". And because the files in each duplicate-group is sorted by date, and the topmost file of each group is the newest, they will also be the files that will get tagged with "keep"! --Simple enough logic?   ;-)       [see cap-04.gif]

Let's talk about a very useful Excel function available among text/string-functions, namely, CONCATANATE --you might want to check the help file for more details on this function, because essentially, this is the function we will use to automatically create the Windows MOVE command with filenames and such, which will form the basis for the batch (CMD) file that will be executed in the command window!
CONCATANATE can take many arguments, each of which could be text, numeric or even contents of other cells in the worksheet, or may contain other Excel functions or formulas, and will output all the arguments "concatenated' as a single string entity/data.
Since we are planning to do multiple MOVE operations on selected files, that means for each file that needs to be moved, we need to write the equivalent Windows MOVE command with the proper arguments, and put them in a single CMD file for execution. What that translates to, in Excel-speak, is that we have to determine a sort of a template for a single MOVE command, based on the syntax of the command, and figure out a way to supply the relevant arguments for the MOVE command.

According to Windows help file, MOVE command syntax is as follows:
move [/y | /-y] [source] [target]

/y or /-y
are switches that prompts you to confirm when you want to overwrite an existing destination file. The default is /y, which means you will be prompted should a destination file exist which a move command might want to overwrite. For our purposes, we don't want to overwrite any file, whether we are prompted to overwrite it or not! Thus, we need to figure out a way to make sure each file being sent/moved to a destination folder is unique! Personally, I prefer to use the original pathname of a file as part of the new filename. (more on this issue later)

source
the pathname and filename of the file that is going to be moved

target
the pathname and [new] filename to move the files to.


OK, let's take a brief moment to think about the situation. We don't want to overwrite any file during a move operation, so that we move any old duplicate file in any folder to be kept at the destination folder. This is possible only if we can make sure each file has a unique filename when it is put into the destination folder. Since MOVE command allows a file to be renamed when it is being placed at a destination folder, we can devise a method to rename the files and especially with the help of Excel functions, this could be easily accomplished. A simple method I personally use it to convert the original file path into something that could be used as a prefix to the new filename.
EX:
    original file:  D:\sample docs\temp\file-01.txt
    [converted] new filename:  d~sample docs~temp~file-01.txt
As you can see the path spec of the file is modified such that the "\" or ":\" characters are replaced with "~" character. You can also use "_" or any other character instead of "~" as long as the chosen character isn't used as part of a pathname or a filename among the original list files. Just make sure that the character you've chosen is a valid character to be used in a filename, since Windows has a set of conventions for file naming. Also keep in mind that there are limits to the length of a filename, and if you have files in subfolder that is too-deep-nested, you may hit a snag!
  • you can use FIND in Excel to search if the character you've chosen is ever used in the CSV file, since all the files and pathnames you'll be working with are already listed in the CSV file! There's another reason to using a unique character for the purposes of converting a pathname into a regular text string: the same character can be used to parse filenames so that a pathname could be extracted and reconstructed, especially if you want to re-create a folder structure on a different drive, and move the files to the re-created/re-constructed folders and subfolders!
  • to test if you're exceeding the filename length limits, you can use LEN function and total the contents of Columns A and B contents.
EX:
in H2, type in "=Len(A2)+Len(B2)" and copy and paste this formula into the rows under column H. When you reach the end of the data list in column H, use the MAX() function and select the rows above as the argument for MAX() function. MAX() will show you the maximum value encountered among all the data in column H, so you'll know what's the longest length of filename you might end up using with the file naming scheme I've proposed.
[see cap-05.gif, cap-06.gif]
  • According to Win2k/XP help file, a filename can be max 215 characters long (excluding the pathname), including the spaces. It cannot contain \ / : * ? " < > | characters.
  • If it seems like you are going to have a very long filename due to long and deeply nested folders, think of work-arounds. You can use an index (integer) as part of the filename, incremented from row to row, and add this index number to the filename, separating it with the unique character ("~"), such as file-01~1248.txt or 1248~file-01.txt (for a file located in row 1248). As long as you keep the Excel file, you can always reconstruct the folder name, by performing "lookup" operations in Excel using its many lookup functions and the new filename!
Assuming you didn't hit any snags (not too-long filenames, and you've find a unique separator-character), let's continue with MOVE command syntax and creating a CMD file.

For the purposes of demonstration, my data files are in D:\test folder (which was scanned by NoClone and results are output in a CSV file). I want to move the older duplicates to C:\temp folder.

Thus, the command will be:
    MOVE  <original path><original filename>    <destination path><new filename>

In case, some of the path names and/or filenames might contain spaces, it is always a good idea to use quotes to enclose the "source" and "target" specs, therefore the actual command shall look:
    MOVE    "<original path><original filename>"      "<destination path><new filename>"

Example (using our new file naming convention for the new files):
        MOVE   "D:\test\f04-miss-mid.txt"      "C:\temp\d~test~f04-miss-mid.txt"
or     MOVE   "D:\test\f04-miss-mid.txt"      "C:\temp\test~f04-miss-mid.txt"
  • why include redundant information if the original files are on the same drive such as "D" drive, huh?  :-)
Now, the next step is to figure out a way to generate the MOVE command (as per above example) for each file listed in the Excel file, and then make sure only those files which are "tagged" with "DEL" in Column G have the corresponding MOVE command with full arguments, while those files that are tagged with "KEEP" in column G have a blank cell.

Looking at the example MOVE commands, it is obvious that almost all the information we need are available in various cells. For example, column B contains the path spec for the original file, and it could be modified easily to make up part of the new filename (the prefix). We've also already decided that the destination folder is going to be on another drive and a specific folder, which is fixed. The only thing is a string function, which might help us put everything together . Here comes the CONCATANATE function to the rescue.

But first... let's copy column B data (original file path information) into an empty column so that we can obtain the "prefix" for the new filenames.  [see cap-07.gif]
While the pasted data is still selected/hi-lighted, we can perform a very quick editing using Find & Replace, twice:
Find what=        D:\
Replace with=    <empty>
click on Replace All. [see cap-08.gif]

Find what=        \
Replace with=    ~
click on Replace All. [see cap-09.gif]

The data in the "prefix" column will have the drive letter, colon characters removed, and the folder separator character ("\") will be replaced by your unique separator character ("~"). [see cap-10.gif]

Now, we write the formula for the complete Move command, and copy and paste the formula at each row in the next column, so that each file in the list will have its own MOVE command.

In cell I2, type the following formula:
    =CONCATENATE("move     ",CHAR(34),B2,A2,CHAR(34),"     ",CHAR(34),"C:\temp\",H2,A2,CHAR(34))

and copy and paste to the rows below it, until you reach to the end of the data list.  [see cap-11.gif,  see cap-12.gif,  see cap-13.gif]
NOTES:
  • I've used some extra <space> characters in some of the terms of CONCATENATE function to obtain a blocked look, but it's not really necessary, and you can get by with only one <space> character.
  • because <path+filename> terms may include spaces, you need to enclose the <source> and <target> items in double quotes, thus, CHAR(34) is used which equals to a double-quote character.
Of course, if you use all the rows with the MOVE command, then you'll end up moving all the files into the destination folder (C:\temp) with their new names, but what we want to do is move only the old duplicates in each duplicate group. So, we have to use the tags (keep, del) that we have previously formulated in column G using the IF() function. Thus, if we have the data under column I organized such that those rows which have "keep" in their column G, will have a corresponding "blank" cell in column I, instead of the CONCATENATE function result, we'll be all right!

The easiest way to do this is to combine both IF and CONCATENATE functions by nesting the CONCATENATE function inside the IF function. The logic is:
If <comparison is true> then <output MOVE command>
    <comparison is false> then <output "blank" or "empty">

Original IF() function in cell G3 was:  
=IF(C3=C2,"del","keep")

Original CONCATENATE() function in cell I3 was:
=CONCATENATE("move     ",CHAR(34),B3,A3,CHAR(34),"     ",CHAR(34),"C:\temp\",H3,A3,CHAR(34))

Thus, in cell G3, we edit the IF() function and substitute the CONCATENATE() function contents in place of the "del" term (i.e. for the case when comparison is true), and in place of the "keep" term (i.e. when the comparison is false) we replace it with an empty input ("" two double-quotes with no space in between). Finally, we copy and paste G3 formula into the rows below until we reach the end of the listed data.

The final formula in Cell G3 will be:
=IF(C3=C2,CONCATENATE("move     ",CHAR(34),B3,A3,CHAR(34),"     ",CHAR(34),"C:\temp\",H3,A3,CHAR(34)),"")

[see cap-14.gif]

The last step is to copy the contents of column G where you have the blank cells and the cells with the MOVE command (depending on the duplicate files in each grouping and their dates), and paste the data into an empty text file using an ASCII text editor. Windows has Notepad as an editor, and it can do the job, but unfortunately, you can't perform a Find & Replace operation to locate and remove empty/blank lines within Notepad. On the other hand, you can use Word, and paste the information there. Then perform a Find & Replace using the following:
Find what=        ^p^p
Replace with=    ^p
and click on Replace All.  It will remove all the blank lines.
Save the file as "Plain Text" type file, giving it a ".CMD" extension, so that Windows will recognize the file as a Command file or a batch file. Execute the file! Depending on the sizes and number of files, it may take a while to perform the operation.  :-)
NOTE:
As I said, I never tested moving files of very large size, although I did perform a test with 500 - 1,000 MB size, and this method worked, but... I'm not sure if there's a file size limit that might cause problems. Just make sure your destination drive will have the necessary empty space for the files you want to move to.
Well, here it is: "Poor man's move files" operation!  :-)


Daikatana

P.S.: I've attached a PDF file which has copies of the screen captures showing Excel sheet stuff and others, since I can attach only one file, so you need the pdf file to look at the screen captures that were referenced in this message.

Powered by Community Server, by Telligent Systems