IS
iShubhamSharma/Inverted-Indexing-for-unstructured-data
"Optimized Indexing of unstructured data for Data Lake environment" is a project which is going to deal with indexing pool of unstructured data in Data Lake environment. Data Lake is a repository which hold vast amount of data in its native form. The idea of data lake is to have a single storehouse of all data in an enterprise ranging from the raw data to transformed data which is used for various purposes including visualization, machine learning, analytics and reporting. This project begins with using unstructured data sets containing data in native format, and then indexing it by Inverted Indexing technique using Hashing so as to get optimized results in speed and time.