A friend of mine, Paweł R., who is an academic lecturer once gave following task to his students:
Two words are anagrams if they contain the exactly same number of the same letters. Write a program which, for given a set of words, will find a largest group of anagrams from this set. If more than one set exists with the same number of words, write the contents of each set.
A large set of words to play with this problem can be downloaded from my friend's page.
The task is not difficult at all - you just have to group words which are anagrams and then find the largest group. The easiest way to group is to split each word, sort letters ascending to get the hash key that uniquely identifies the word and then store the word in a hashtable using the key.
For example, having "leaf", "flea", "shore" and "horse", you split each word, sort letters and you get two distinct hash keys "aefl" and "ehors" and each hash group containst exactly two words. Let's call such hash a norm of a word.
It is easy to write a program in C# just few dozens of lines long but if are you a LinQ guy you would probably love to try to solve this task using LinQ.
I strongly recommend to stop reading now. Try to solve this task by yourself. You can start with sligtly simplified version: do not find all largest groups of anagrams, find just any largest group.
Are you ready:
You probably came up with something similar to this. When I test it on the large set of words from the link given above, I get a group of seven words, just foreach the result and you will see them.
Now something more complicated - let's find all largest groups, not just the first one. To accomplish this we just group these groups of words by the size of the group and take the first groupped group of words. Sounds funny? Well, take a look:
Isn't LinQ amazing?