Visualisation

March 2019

This project explored the use of visualisation techniques to allow for data analysis, extraction, filtering, and knowledge discovery, presented in a user-friendly interface. The dataset to be visualised was the MovieLens dataset. This dataset contains the following information about movies:

title
year of release
genre
tags (such as atmospheric, or big-budget)

The dataset also contained data about reviews for each movie.

Structure:

Pandas was used to extract and manipulate the data relating to the movies and reviews, and useful functions were created in a helper class to allow operations such as extracting movies by year or calculating an average rating for each movie. The results were visualised using VTK.

What I Learnt:

How to use a python visualisation library (VTK), along with a popular data processing library (Pandas) to handle and manipulate data
How to design useful visualisations to see as much information as possible from a set of data

Visualising Movies By Year

The first page in the application shows the number of movies released in each year, along with other information such as the average rating of movies in a specific year. There are options to change how the information is displayed.

Examples:

Number of movies per year

The larger the bar, the more movies in the year. The colour of the bar indicates the average rating of all of the movies in that year (blue is higher, red is lower).

Options

Various options to change what visualisation is shown, to change the data that is included, and to switch between showing individual movies or congregating them by year

Number of movies per year (show individual)

Now shows all movies in each year, coloured by rating. The front of the graph shows the same information as before. The depth of each bar indicates the number of ratings the corresponding movie has. Hover ability is shown.

❮ ❯

Visualising Movies By Genre

The second page in the application shows the number of movies in each genre, along with other information such as the average rating of movies in a specific genre. Again, there are options to change how the information is displayed.

Examples:

Number of movies by genre

Visualising Similarities Between Movies

The final page in the application shows a 3D plot of movies in a specific year range; this aims to show average ratings, number of ratings, and similarity between movies in one plot. The similarities were calculated using TSNE on the ratings for each movie.

Examples:

Similarity of Movies in Year

Each sphere represents a movie: the size represents the number of ratings; the colour represents the average rating; the distance between spheres represents similarity.