Home
About
- Gabriele Degola
  
  Engineer in Data Science and Artificial Intelligence
- Learn More
- Email
- Facebook
- LinkedIn
- Instagram
- Github
Posts
- All Posts
- All Tags
Projects
My resume

Distributed processing of Amazon book reviews

23 Jan 2021

Reading time ~1 minute

Download attachment GitHub repository

The developed distributed application receives text reviews as input, processes them and displays the results on a web interface. The performed operations are:

word count: count the total number of occurrences of each word across all received reviews;
sentiment analysis: predict if the received review is positive or negative.

This distributed application is able to manage and analyse huge streams of text data. The data processing pipeline is orchestrated through Kubernetes and is composed of:

Kafka, as message broker;
Spark, with the Structured Streaming and MLlib components, for data processing;
MongoDB, as distributed NoSQL database.

The web interface is run over NGINX and is built in React

After the development and deployement of the application, experiments are conducted to evaluate non-functional properties, such as fault-tolerance, performances, and costs.

This project was carried in 2020/21 as part of the Systèmes Distribués pour le Traitement de Données course, for the Ingénierie des Systèmes d’Information degree at Grenoble INP - Ensimag.

Authors:

Maxime Chaloupe
Gabriele Degola
Concetto Antonino Privitera
Quentin Stentzel