TensorFlow Probability is a library for statistical computation and probabilistic modeling built on top of TensorFlow.
Its building blocks include a vast range of distributions and
invertible transformations (bijectors), probabilistic layers
that may be used in keras
models, and tools for
probabilistic reasoning including variational inference and Markov Chain
Monte Carlo.
Install the released version of tfprobability
from
CRAN:
install.packages("tfprobability")
To install tfprobability
from github, do
devtools::install_github("rstudio/tfprobability")
Then, use the install_tfprobability()
function to
install TensorFlow and TensorFlow Probability python modules.
library(tfprobability)
install_tfprobability()
you will automatically get the current stable version of TensorFlow Probability together with TensorFlow. Correspondingly, if you need nightly builds,
install_tfprobability(version = "nightly")
will get you the nightly build of TensorFlow as well as TensorFlow Probability.
High-level application of tfprobability
to tasks
like
are described in the vignettes/articles and/or featured on the TensorFlow for R blog.
This introductory text illustrates the lower-level building blocks:
distributions, bijectors, and probabilistic keras
layers.
library(tfprobability)
library(tensorflow)
Distributions are objects with methods to compute summary statistics, (log) probability, and (optionally) quantities like entropy and KL divergence.
# create a binomial distribution with n = 7 and p = 0.3
<- tfd_binomial(total_count = 7, probs = 0.3)
d
# compute mean
%>% tfd_mean()
d #> tf.Tensor(2.1000001, shape=(), dtype=float32)
# compute variance
%>% tfd_variance()
d #> tf.Tensor(1.47, shape=(), dtype=float32)
# compute probability
%>% tfd_prob(2.3)
d #> tf.Tensor(0.303791, shape=(), dtype=float32)
# Represent a cold day with 0 and a hot day with 1.
# Suppose the first day of a sequence has a 0.8 chance of being cold.
# We can model this using the categorical distribution:
<- tfd_categorical(probs = c(0.8, 0.2))
initial_distribution #> Loaded Tensorflow version 2.9.1
# Suppose a cold day has a 30% chance of being followed by a hot day
# and a hot day has a 20% chance of being followed by a cold day.
# We can model this as:
<- tfd_categorical(
transition_distribution probs = matrix(c(0.7, 0.3, 0.2, 0.8), nrow = 2, byrow = TRUE) %>%
$cast(tf$float32)
tf
)# Suppose additionally that on each day the temperature is
# normally distributed with mean and standard deviation 0 and 5 on
# a cold day and mean and standard deviation 15 and 10 on a hot day.
# We can model this with:
<- tfd_normal(loc = c(0, 15), scale = c(5, 10))
observation_distribution # We can combine these distributions into a single week long
# hidden Markov model with:
<- tfd_hidden_markov_model(
d initial_distribution = initial_distribution,
transition_distribution = transition_distribution,
observation_distribution = observation_distribution,
num_steps = 7
)# The expected temperatures for each day are given by:
%>% tfd_mean() # shape [7], elements approach 9.0
d #> tf.Tensor([3. 6. 7.4999995 8.249999 8.625001 8.812501 8.90625 ], shape=(7), dtype=float32)
# The log pdf of a week of temperature 0 is:
%>% tfd_log_prob(rep(0, 7))
d #> tf.Tensor(-19.855635, shape=(), dtype=float32)
Bijectors are invertible transformations that allow to derive data likelihood under the transformed distribution from that under the base distribution. For an in-detail explanation, see Getting into the flow: Bijectors in TensorFlow Probability on the TensorFlow for R blog.
# create an affine transformation that shifts by 3.33 and scales by 0.5
<- tfb_shift(3.33)(tfb_scale(0.5))
b
# apply the transformation
<- c(100, 1000, 10000)
x %>% tfb_forward(x)
b #> tf.Tensor([ 53.33 503.33 5003.33], shape=(3), dtype=float32)
# create a bijector to that performs the discrete cosine transform (DCT)
<- tfb_discrete_cosine_transform()
b
# run on sample data
<- matrix(runif(3))
x %>% tfb_forward(x)
b #> tf.Tensor(
#> [[0.5221709 ]
#> [0.5336635 ]
#> [0.06735111]], shape=(3, 1), dtype=float32)
tfprobality
wraps distributions in Keras layers so we
can use them seemlessly in a neural network, and work with tensors as
targets as usual. For example, we can use
layer_kl_divergence_add_loss
to have the network take care
of the KL loss automatically, and train a variational autoencoder with
just negative log likelihood only, like this:
library(keras)
<- 2
encoded_size <- c(2L, 2L, 1L)
input_shape <- 100
train_size <- array(runif(train_size * Reduce(`*`, input_shape)), dim = c(train_size, input_shape))
x_train
# encoder is a keras sequential model
<- keras_model_sequential() %>%
encoder_model layer_flatten(input_shape = input_shape) %>%
layer_dense(units = 10, activation = "relu") %>%
layer_dense(units = params_size_multivariate_normal_tri_l(encoded_size)) %>%
layer_multivariate_normal_tri_l(event_size = encoded_size) %>%
# last layer adds KL divergence loss
layer_kl_divergence_add_loss(
distribution = tfd_independent(
tfd_normal(loc = c(0, 0), scale = 1),
reinterpreted_batch_ndims = 1
),weight = train_size)
# decoder is a keras sequential model
<- keras_model_sequential() %>%
decoder_model layer_dense(units = 10,
activation = 'relu',
input_shape = encoded_size) %>%
layer_dense(params_size_independent_bernoulli(input_shape)) %>%
layer_independent_bernoulli(event_shape = input_shape,
convert_to_tensor_fn = tfp$distributions$Bernoulli$logits)
# keras functional model uniting them both
<- keras_model(inputs = encoder_model$inputs,
vae_model outputs = decoder_model(encoder_model$outputs[1]))
# VAE loss now is just log probability of the data
<- function (x, rv_x)
vae_loss - (rv_x %>% tfd_log_prob(x))
%>% compile(
vae_model optimizer = "adam",
loss = vae_loss
)
vae_model#> Model: "model"
#> ________________________________________________________________________________
#> Layer (type) Output Shape Param #
#> ================================================================================
#> flatten_input (InputLayer) [(None, 2, 2, 1)] 0
#> flatten (Flatten) (None, 4) 0
#> dense_1 (Dense) (None, 10) 50
#> dense (Dense) (None, 5) 55
#> multivariate_normal_tri_l (Multiva ((None, 2), 0
#> riateNormalTriL) (None, 2))
#> kl_divergence_add_loss (KLDivergen (None, 2) 0
#> ceAddLoss)
#> sequential_1 (Sequential) (None, 2, 2, 1) 74
#> ================================================================================
#> Total params: 179
#> Trainable params: 179
#> Non-trainable params: 0
#> ________________________________________________________________________________
%>% fit(x_train, x_train, batch_size = 25, epochs = 1) vae_model