write pseudocode that takes as input a corpus set of the document and creates vector 5062775

We learned about text clustering methods for documents by representing each document as a vector of non-stopwords and comparing the similarity of documents using the Tanimoto Cosine Distance metric.

1.Write pseudocode that takes as input a corpus (set) of the document and creates vectors for each document where the

vectors do not contain stop-words and are weighted by the term frequency multiplied by the log of inverse

document frequency as described in the course module.

DocumentVectorSet documentVectorSet =


2.Write pseudocode that takes two document vectors and measures their similarity.

Similarity similarity =

DocumentSimilarity(documentVectorA, documentVectorB);


