Download attachment GitHub repository
The developed distributed application receives text reviews as input, processes them and displays the results on a web interface. The performed operations are:
- word count: count the total number of occurrences of each word across all received reviews;
- sentiment analysis: predict if the received review is positive or negative.
This distributed application is able to manage and analyse huge streams of text data. The data processing pipeline is orchestrated through Kubernetes and is composed of:
- Kafka, as message broker;
- Spark, with the Structured Streaming and MLlib components, for data processing;
- MongoDB, as distributed NoSQL database.

The web interface is run over NGINX and is built in React

After the development and deployement of the application, experiments are conducted to evaluate non-functional properties, such as fault-tolerance, performances, and costs.
This project was carried in 2020/21 as part of the Systèmes Distribués pour le Traitement de Données course, for the Ingénierie des Systèmes d’Information degree at Grenoble INP - Ensimag.
Authors:
- Maxime Chaloupe
- Gabriele Degola
- Concetto Antonino Privitera
- Quentin Stentzel