this looks less interesting now that i have used it than it did at a first glance. i tried it on a couple of pages i have and this is what i think is going on: it finds the source URL, extracts a few features, then looks for results in their DB or google's that strongly intersect with those. ie "page -> x, y, z, find pages that strongly associate with x, y, and z".
true document alignment would be computationally difficult to do quickly, so this is a decent approximation. but you wind up with similar stuff topically, not copies of your original document.
# posted by jose nazario : February 8, 2005 at 5:56 PM