2008/09/17

Sargasso Sea

Well, that was fun...

I'm not sure what I hate more; good intentions, or round-to-its. I've learned a little about C# in the fourteen months since my last post. I've also learned a little about Java. None of it got recorded here, however, because... well, just because.

The title of this entry suggests I spent the time stuck in a morass of seaweed and lost ships. Mostly, however, I was extracting data from Lotus Notes so a certain Fortune 50 pharmaceutical company can retire its Notes/Domino platform. Come to think of it... Nah, that would be too easy.

I did have a couple of occasions to write little C# command-line utilities to clean up the data. They worked all right, I guess, but they were as ad-hoc as you can get -- basically procedural code stuck inside a single class. Unit testing? What unit testing? I don't gotta show you no unit testing!

I have another clean up project looming on the horizon and, since I have a little time, I'm trying to keep the horse out in front. I'm also thinking about doing it in Java so I can learn more about that language. The problem I'm having, though, is figuring out what and how to test.

The scenario is this: Twenty-some Lotus Notes application logbooks were extracted. The export placed the record data in a CSV file, one record for each document in the database. In addition, the exporter detached any files found with each Notes document in a subfolder named after the document's unique ID. Like this:

D:/App/Export/1234ABCD4D3C2B1AA1FE2DC9E34A9F25/MyDoc.doc

(The format actually makes sense; it uniquely ties each attachment to its source document.)

The problem is that there are over 123,000 files, spread across 70,000 subdirectories, all directly under the Export subdirectory, and the users want their files regrouped by region.

Conceptually, the application is simple. Create a list of the CSV files contained in the D:/App directory. For each CSV, extract the region ID from the CSV name, create a new subfolder with the ID as its name, then read in the CSV file, one line at a time. For each line that has a value in the fourth column (containing the full path and file name), create a string array from the value. Now, for each element of the array, retrieve the unique subfolder and file, move -- subfolder and file intact -- to the new regional subfolder, erase the original, rinse and repeat until clean. Oh, and since I'm paranoid about all things hardware, do a file check between the old and new files before deleting the old. (The I/O on this is going to be a BITCH!)

I guess the first test is to assertEquals("D:/App", path.name.toString()). We'll see how that goes.

No comments: