Hadoop

############################################

1. Top 10 CommonWords in multiple files

Read mutiple files parallelly from HDFS, count the number of 'CommondWords' appering in these files at the same time excluding 'StopWords'.
Sort these 'CommonWords' in descending order.
Finally, pick up top 10 most frequent 'CommonWord' in the 'CommonWords list' generated in previous step.

Read multiple files from HDFS, use 'Bag of Word' model to compute the frequence of every word in every different file excluding 'StopWords'.
Compute -TF-IDF of every word w.r.t a document
Normalize TF-IDF of every word w.r.t a document
Compute the relevance of every document w.r.t query words --> Ranked Retrieval Models(mentioned above)
Sort documents according to the relevance to query words