Book Distance through Text Analysis - 39

Example output

Goals

Implement a word frequency list for a book and use it to compare similarity to other books.

Description

Tasks

  1. With the meta data, get the books by ID using Bridges API: DataSource.getGutenbergBookText()

  2. Generate a word frequency list with the book text

  3. Compare two frequency lists to determine the similarity between book texts. Can use a L1 Norm comparison.

Extensions

One can change the complextiy of the comparison algorithm to one that is L2 Norm (Euclidean Distance) or even L3 Norm.

How did you display the similarity value? One can switch up the visual aspect of the assignment to convey different meanings. You can display a

Changing the dataset used for this comparison could be a good way to get more engagement in the assignment. Using song lyrics is a good alternative to the gutenberg data.

Help

For C++

Bridges documentation

GraphAdjList documentation

GutenbergData documentation

DataSource documentation

For Java

Bridges documentation

GraphAdjList documentation

GutenbergData documentation

DataSource documentation

For Python

Bridges documentation

GraphAdjList documentation

GutenbergData documentation

DataSource documentation