Whatever solution I came up with, it had to satisfy these properties: a) be fast, b) not be dependant on the number of card names, since I knew that M:tG (the only trading card game around at the time) was going to get a lot bigger as time went on, and since I knew that the amount of data was going to increase as time went on too, c) be tolerant of errors in the data, because some people are terrible spellers, some people use horrible abbreviations of card names, and some are just too lazy to look up the real name of a card when they can't think of it so they guess.
What I did was to build a big tree structure containing the card names in such a way that input data could be compared against all the strings in the tree more or less simultaneously. So that reduced the problem from an O((number of card names)*(amount of data)) problem into an O(amount of data) problem. This was a big win. Once I had that working, I added some error-correcting heuristics into the algorithm so that it could catch most common errors (things like missng letters, trnasposed letters, incorrwct letters, and so on). That worked very well. Really amazingly remarkably well, in fact. That worked well enough that I released the first version of the M:tG pricelist on May Eighteenth, 1994.
After solving the "what card is this line of data talking about" problem, it had to write a routine to find the price information on the line of data and tally it up. That part was trivial in comparison, so I won't go into detail on it.
On the side of the algorithms behind the scenes, the first major change was to modify the format of the card name database I use so that it could store typical abbreviations and misspellings of cards that the error correcting heuristics couldn't catch. This is effectively like saying "this particular error is really not an error". Call it a hack if you will, but it works, and since the algorithm isn't sensitive to the number of names in the database, there's no speed penalty for adding "virtual" card names.
The second major change came with the introduction of the statistical analysis of the data. Obviously, that changed the nature of the prices in the list, but for the better I think.
There was also a minor change along the way in how the program interprets lines with multiple price-like numbers on them. I used to just take the highest one, but now the algorithm for picking the price is tuned to the sorts of sale and auction formats that people really use in an attempt to find more actual prices and fewer buyout bid numbers.
Am I selling out? Maybe. But I've worked hard to make a quality, free product that people find useful and I think I deserve to get a little something back for it. I think my lists are more realistic than those in Scrye magazine; as I understand it Scrye takes surveys of card shop owners to see what they're buying/selling cards for, while I directly sample what normal people all over the world are buying/selling them for. My lists are certainly more current than Scrye's lists. They're definitely cheaper than buying a copy of Scrye every month, and hopefully they're more accessible to people both inside and outside the U.S.