Wine Quality and Type Prediction from Physicochemical Properties Using Neural Networks for Machine Learning: A Free Software for Winemakers and Customers

Quality assessment is a crucial issue within the wine industry. The traditional way of assessing by human experts is time-consuming and very expensive. Machine learning techniques help in the process of quality assurance in a wide range of industries. The purpose of this study was to develop and offer free software, for winemakers and customers in which they can easily provide the physicochemical properties of the wine and receive an accurate prediction of the anticipated quality and type of the wine. We used comprehensive datasets of 6497 examples, which contained physicochemical properties and appropriate quality. We combined these datasets, built and trained several neural networks models. We evaluated their performance and selected the best model. Wine quality estimations were modeled as a regression problem and wine type detection as a classification problem. The best model performed well for the prediction of wine quality (root means squared error=0.54) and type (f-score=0.99). With our free software, winemakers and customers can examine how a fine change in each physicochemical property could affect the quality of the wine. They could easily figure out the importance of each physicochemical property, and which one to ignore for reduction of cost. The process is very fast, accurate, and does not require taste experts for sensory tests.


Introduction
Artificial Intelligence (AI) in the era of deep learning affects a wide range of industries. Even industries that previously did not make use of advanced technologies (Dahal et al., 2021;Mor & Dardeck, 2018;Mor, 2021). AI and more specifically neural networks for machine learning help to enhance the production in many different types of industries. Machine learning can help in the process of quality assurance. AI makes quality assurance more efficient and more accurate and saves expensive manpower use (Aich et al., 2018).
In recent years there is an increase in wine consumption, and wine industries strive to produce good quality wine at less cost (Kumar, 2020). Most of the chemicals are almost the same for different types of wine. However, the exact fine concentration of each chemical is different in different types of wines (Kumar, 2020).
For the purpose of quality assurance, It is very important to predict wine quality from its chemical properties. To maintain the quality with less cost, there is a need to know the relationship between the fine different chemicals' concentration and wine's quality (Gupta et al., 2020). In the traditional approach, taste experts were used. This task is very challenging, takes a lot of time and is expensive (Gupta et al., 2020).
Neural networks excel at predicting outcomes from a set of many features. Neural networks are designed to find very complex relationships between input features and output, and have the ability to model complex relationships between input and output (Mor & Dardeck, 2021). Relevant to our case, neural networks are suitable to learn the complex relationship between the exact fine concentration of each chemical in the wine (input) to its quality (output).
In order to develop an optimal solution for our problem, before constructing and training our model, we need to carefully select the most important wine's chemical features which their concentration might influence quality (Gupta, 2018).
Wine's quality is assessed by physicochemical and sensory tests. Physicochemical tests include features such as density, alcohol, and pH values. Sensory tests are performed by human taste experts which is a very complex and expensive process. In our literature review, we found eleven consistent physicochemical properties common to almost all data sets designed to evaluate wines' quality (Cortez et al., 2009). The eleven physicochemical properties are: fixed acidity, volatile acidity, citric acid, residual sugar, chlorides, free sulfur dioxide, total sulfur dioxide, density, pH, sulphates, and alcohol (Cortez et al., 2009).
The predominant fixed acids found in wines are tartaric, malic, citric, and succinic. Their respective levels found in wine can vary greatly but in general one would expect to see 1,000 to 4,000 mg/L tartaric acid, 0 to 8,000 mg/L malic acid, 0 to 500 mg/L citric acid, and 500 to 2,000 mg/L succinic acid. All of these acids originate in grapes with the exception of succinic acid, which is produced by yeast during the fermentation process (Sarkar et al., 2018).
Volatile acidity is a measure of the low molecular weight (or steam distillable) fatty acids in wine and is generally perceived as the odour of vinegar. Winemakers are usually most concerned with acetic acid, which accounts for more than 93% of steam distillable acids in wine. The average volatile acidity value for red table wines during this period is about 0.60 g/L. The average volatile acidity value for white table wines is about 0.43 g/L (Vilela, 2018).
Citric acid has many uses in wine production. Citric acid is a weak organic acid, which is often used as a natural preservative or additive to food or drink to add a sour taste to food. Citric acid is often added to wines to increase acidity, complement a specific flavor or prevent ferric hazes. It can be added to finished wines to increase acidity and give a "fresh" flavor. The disadvantage of adding citric acid is its microbial instability. Since bacteria use citric acid in their metabolism, it may increase the growth of unwanted microbes (Zhong et al., 2020).
Residual Sugar is from natural grape sugars leftover in a wine after the alcoholic fermentation finishes. It's measured in grams per liter. Residual sugar levels vary in different types of wine. In fact, many grocery store wines labeled as "dry" contain about 10 g/L of residual sugar. Noticeably sweet wines start at around 35 grams per liter of residual sugar and then go up from there (Wang & Peng, 2017).
Chlorides (sodium chloride) give the wine a salty flavor which may turn away potential consumers. Sodium chloride, commonly known as salt, is an ionic compound with the chemical formula NaCl. The maximum concentration of chlorides in wine is about 0.20 -0.60 g/L (Vallone et al., 2021).
Sulfur dioxide (SO2) and its salts have been added during winemaking since the 17th century. SO2 and its sulfite salts, remain an essential winemaking additive as there is no one other additive that has the same dual properties of anti-oxidation and preservation. It remains a potentially adverse reaction causing and toxic product for wine consumers and winemakers in amounts greater than 10 mg/L, and that accordingly, should be handled with care. When SO2 is incorporated into a must or a wine, a fraction of it will react with sugars, or aldehydes (ethanal) or ketones. The remaining fraction, called free, is the one with the most important properties. SO2 Total = SO2 free + SO2 reacted. The most active fraction of free SO2 is called active SO2 and is composed of molecular SO2. During maturation and storage, concentrations of free SO2 values of 25 mg/L on red wine and 30 mg /L on white wine are recommended. An active SO2 concentration of 0.35 mg/L ensures a minimum protection, and a value of 0.6 mg/L maximum protection (Capece et al., 2020).
Density is the mass per unit volume of wine or must at 20 • C. It is expressed in grams per milliliter, and denoted by the symbol 20 • C. Alcohol is less dense than water, in fact the specific gravity of alcohol is approximately 0.8, or 20% less dense. So, as the yeast consumes the sugar in the wine, and converts it to alcohol, we are lowering the gravity of the must. After fermentation is complete, the specific gravity the wine should be at, or slightly less than 1.00 (Pickering et al., 1998).
pH is a scale used to specify the acidity or basicity of an aqueous solution (lower pH indicates higher acidity). The pH level of a wine ranges from 3 to 4 (Forino et al., 2020).
Sulfites, also commonly called sulfur dioxide, are chemical compounds that contain the sulfite ion. They are found naturally in a variety of food sources, including black tea, peanuts, eggs, and fermented foods. They are also used as a preservative in many foods. Sulfites are a food preservative widely used in winemaking, thanks to their ability to maintain the flavor and freshness of wine. While they're found in many foods and beverages, they're particularly associated with a long list of side effects related to wine consumption, including the dreaded wine-induced headache (Roullier-Gall et al., 2017).
Alcohol is an organic compound that carries at least one hydroxyl functional group (−OH) bound to a saturated carbon atom. Wine can have anywhere between 5% and 23% Alcohol by Volume (ABV). The average alcohol content of wine is about 12%. This amount varies depending on the variety of wine, as well as the winemaker and their desired ABV (Vasiljevic et al., 2018).
These days with the progress of machine learning techniques, and advances in information technologies, it is possible to collect, store and process massive and highly complex datasets. It is possible to classify the wines and select the importance of each chemical parameter for its quality (Aich et al., 2018;Gupta, 2018).
The purpose of this study is to develop and offer an easy access open system for winemakers and customers in which they can easily provide the physicochemical properties of the wine and receive an accurate prediction of the anticipated quality of the wine.
Our goal is that with the help of our trained model, winemakers and customers could examine how a fine change in each component could affect the quality of the wine. They could easily figure out the importance of each physicochemical property in the wine and which one to ignore for reduction of cost. In addition, they could easily figure out the exact concentration of each physicochemical property needed to increase the quality of their wine and apply it accordingly. Therefore, our first task is to find comprehensive datasets which contain wines' physicochemical properties and the appropriate quality. Our second task is to combine these datasets, build and train models, evaluate their performance and select the best model. Our third task is to provide easy access to the trained model so winemakers and customers could predict the anticipated quality of the wine based on its physicochemical properties. Such a system could help to enhance the production process and reduce the need for human expertise for product quality assurance.

Method
We used the Wine Quality Dataset from the UCI machine learning repository which contained two separate datasets for red wine and white wine (Cortez et al., 2009). We combined these two datasets, related to red and white vinho verde wine samples, from the north of Portugal. The datasets were designed to model wine quality based on physicochemical tests.
We used the Keras functional API to train a model to predict two outputs. We combined the two datasets to predict the wine quality and whether the wine is red or white solely from the attributes. We modeled wine quality estimations as a regression problem and wine type detection as a classification problem.
Our strategy was to examine different architectures and different numbers of hidden layers. We evaluated the performance of each model and selected the best model for deployment.

Experiments and Results
In order to yield the best results we had to perform preprocessing to deal with imbalanced data, as presented in figure 1. We removed classes with quality equal to 3, 4, 8 and 9 because they had very few observations, presented in figure 2. In addition we normalized the data by the mean and standard deviation of each property. We examined several models.The neural network model with four layers and with relu activation function in the hidden layers, yielded the best results in the regression problem of quality, and in the classification problem of wine type detection. Figure 3 presents the reduction of root mean squared error during training of wine quality prediction. Figure 4 presents the reduction in loss for wine type prediction during training. Figure 5 presents the wine type confusion matrix. Figure 6 presents four performance metrics: accuracy, precision, recall and f1-score of the trained wine type model. Figure 7 is the scatter plot for wine quality.

Conclusion
The model performs well for prediction of wine quality and type as obtained from the confusion matrix and the loss metrics. We are offering easy access to our trained model. Winemakers and customers can easily receive an accurate prediction of the anticipated quality and type of the wine from its physicochemical properties.
With our free software and access to the trained model,winemakers and customers can examine how a fine change in each physicochemical property could affect the quality of the wine. They could easily figure out the importance of each physicochemical property, and which one to ignore for reduction of cost. In addition, they could easily figure out the exact concentration of each physicochemical property needed to increase the quality of their wine and apply it accordingly. The process is very fast, accurate and does not require taste experts for sensory tests. Sensory tests performed by human taste experts are very complex, slow and expensive.  Modeling wine preferences by data mining from physicochemical properties. Decision support systems, 47(4), 547-553.