We adress the problem of insuficient interpretability of explanations for domain experts. We solve this issue by introducing `describe()`

function, which automaticly generates natural language descriptions of explanations generated with `ingredients`

package.

The `ingredients`

package allows for generating prediction validation and predition perturbation explanations. They allow for both global and local model explanation.

Generic function `decribe()`

generates a natural language description for explanations generated with `feature_importance()`

, `ceteris_paribus()`

functions.

To show generating automatic descriptions we first load the data set and build a random forest model classifying, which of the passangers survived sinking of the titanic. Then, using `DALEX`

package, we generate an explainer of the model. Lastly we select a random passanger, which prediction’s should be explained.

```
library("DALEX")
library("ingredients")
library("randomForest")
titanic <- na.omit(titanic)
model_titanic_rf <- randomForest(survived == "yes" ~ .,
data = titanic)
explain_titanic_rf <- explain(model_titanic_rf,
data = titanic[,-9],
y = titanic$survived == "yes",
label = "Random Forest")
```

```
#> Preparation of a new explainer is initiated
#> -> model label : Random Forest
#> -> data : 2099 rows 8 cols
#> -> target variable : 2099 values
#> -> predict function : yhat.randomForest will be used ( [33m default [39m )
#> -> predicted values : numerical, min = 0.01004417 , mean = 0.3238004 , max = 0.9931964
#> -> residual function : difference between y and yhat ( [33m default [39m )
#> -> residuals : numerical, min = -0.804059 , mean = 0.0006397947 , max = 0.91234
#> -> model_info : package randomForest , ver. 4.6.14 , task regression ( [33m default [39m )
#> [32m A new explainer has been created! [39m
```

```
#> gender age class embarked country fare sibsp parch
#> 314 male 32 3rd Southampton England 16.02 1 0
```

Now we are ready for generating various explantions and then describing it with `describe()`

function.

Feature importance explanation shows the importance of all the model’s variables. As it is a global explanation technique, no passanger need to be specified.

Function `describe()`

easily describes which variables are the most important. Argument `nonsignificance_treshold`

as always sets the level above which variables become significant. For higher treshold, less variables will be described as significant.

```
#> The number of important variables for Random Forest's prediction is 65 out of 108.
#> Variables _baseline_, _baseline_, _baseline_ have the highest importantance.
```

Ceteris Paribus profiles shows how the model’s input changes with the change of a specified variable.

```
perturbed_variable <- "class"
cp_rf <- ceteris_paribus(explain_titanic_rf,
passanger,
variables = perturbed_variable)
plot(cp_rf, variable_type = "categorical")
```

For a user with no experience, interpreting the above plot may be not straightforward. Thus we generate a natural language description in order to make it easier.

```
#> For the selected instance, prediction estimated by Random Forest is equal to 0.074.
#>
#> Model's prediction would increase substantially if the value of class variable would change to "deck crew", "1st", "engineering crew", "victualling crew".
#> The largest change would be marked if class variable would change to "deck crew".
#>
#> Other variables are with less importance and they do not change prediction by more than 0.06%.
```

Natural lannguage descriptions should be flexible in order to provide the desired level of complexity and specificity. Thus various parameters can modify the description being generated.

```
#> Random Forest predicts that for the selected instance, the probability that the passanger will survive is equal to 0.074
#>
#> The most important change in Random Forest's prediction would occur for class = "deck crew". It increases the prediction by 0.371.
#> The second most important change in the prediction would occur for class = "1st". It increases the prediction by 0.21.
#> The third most important change in the prediction would occur for class = "engineering crew". It increases the prediction by 0.089.
#>
#> Other variable values are with less importance. They do not change the the probability that the passanger will survive by more than 0.075.
```

Please note, that `describe()`

can handle only one variable at a time, so it is recommended to specify, which variables should be described.

```
describe(cp_rf,
display_numbers = TRUE,
label = "the probability that the passanger will survive",
variables = perturbed_variable)
```

```
#> Random Forest predicts that for the selected instance, the probability that the passanger will survive is equal to 0.074
#>
#> The most important change in Random Forest's prediction would occur for class = "deck crew". It increases the prediction by 0.371.
#> The second most important change in the prediction would occur for class = "1st". It increases the prediction by 0.21.
#> The third most important change in the prediction would occur for class = "engineering crew". It increases the prediction by 0.089.
#>
#> Other variable values are with less importance. They do not change the the probability that the passanger will survive by more than 0.075.
```

Continuous variables are described as well.

```
perturbed_variable_continuous <- "age"
cp_rf <- ceteris_paribus(explain_titanic_rf,
passanger)
plot(cp_rf, variables = perturbed_variable_continuous)
```

```
#> Random Forest predicts that for the selected instance, prediction is equal to 0.074
#>
#> The highest prediction occurs for (age = 2), while the lowest for (age = 28).
#> Breakpoint is identified at (age = 9).
#>
#> Average model responses are *higher* for variable values *lower* than breakpoint (= 9).
```

Ceteris Paribus profiles are described only for a single observation. If we want to access the influence of more than one observation, we need to describe dependency profiles.

```
#> Random Forest's mean prediction is equal to 0.074.
#>
#> The highest prediction occurs for (fare = 7.1506), while the lowest for (fare = 14.1).
#> Breakpoint is identified at (fare = 13).
#>
#> Average model responses are *higher* for variable values *higher* than breakpoint (= 13).
```

```
pdp <- aggregate_profiles(cp_rf, type = "partial", variable_type = "categorical")
plot(pdp, variables = perturbed_variable)
```

```
#> Random Forest's mean prediction is equal to 0.074.
#>
#> Model's prediction would increase substantially if the value of class variable would change to "deck crew", "1st", "engineering crew".
#> The largest change would be marked if class variable would change to "2nd".
#>
#> Other variables are with less importance and they do not change prediction by more than 0.06%.
```