* The program reads 10 web pages each from two or more different pre-defined categories. You can define the categories yourself. (Examples: "star trek fan sites", "java developer blogs", "information about south american rodents"). The urls for these web pages should be maintained in a control file that is read when the program starts. Use txt files.
* For each category, the program maintains frequencies of words appearing in the web pages.
* The user can enter any other URL, and the program decides which category it best belongs in, using a similarity metric of your choosing, and further, recommends the most closely matching of the other known pages.
<!-- -->
* Use [login to view URL] for all data structures. Use of maps would be best.
* *Use Swing components for the GUI and parser if possible.*
* *Use Java networking components for accessing web pages.*