We learned about text clustering methods for documents by representing each document as a vector of non-stopwords and comparing the similarity of documents using the Tanimoto Cosine Distance metric.
1.Write pseudocode that takes as input a corpus (set) of the document and creates vectors for each document where the
vectors do not contain stop-words and are weighted by the term frequency multiplied by the log of inverse
document frequency as described in the course module.
DocumentVectorSet documentVectorSet =
2.Write pseudocode that takes two document vectors and measures their similarity.
Similarity similarity =