Bayesian Neural Network (VI) for regression - Distributed Training

# Copyright 2018 Amazon.com, Inc. or its affiliates. All Rights Reserved.
#
#   Licensed under the Apache License, Version 2.0 (the "License").
#   You may not use this file except in compliance with the License.
#   A copy of the License is located at
#
#       http://www.apache.org/licenses/LICENSE-2.0
#
#   or in the "license" file accompanying this file. This file is distributed
#   on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either
#   express or implied. See the License for the specific language governing
#   permissions and limitations under the License.
# ==============================================================================

The following example follows the same example from Bayesian Neural Network (VI) for regression, with implementation of Horovod’s distributed training.

[ ]:
import warnings
warnings.filterwarnings('ignore')
import mxfusion as mf
import mxnet as mx
import numpy as np
import mxnet.gluon.nn as nn
import mxfusion.components
import mxfusion.inference

First of all, initialize Horovod with hvd.init(). We also want to set the global context to GPU or CPU depends where the code is executed.

[ ]:
import horovod.mxnet as hvd
import mxnet as mx
hvd.init()
mx.context.Context.default_ctx = mx.gpu(hvd.local_rank()) if mx.test_utils.list_gpus() else mx.cpu()

Generate Synthetic Data

[ ]:
import GPy
from pylab import *
import matplotlib.pyplot as plt

np.random.seed(0)
k = GPy.kern.RBF(1, lengthscale=0.1)
x = np.random.rand(1000,1)
y = np.random.multivariate_normal(mean=np.zeros((1000,)), cov=k.K(x), size=(1,)).T
plt.plot(x[:,0], y[:,0], '.')

Model definition

[ ]:
D = 50
net = nn.HybridSequential(prefix='nn_')
with net.name_scope():
    net.add(nn.Dense(D, activation="tanh", in_units=1))
    net.add(nn.Dense(D, activation="tanh", in_units=D))
    net.add(nn.Dense(1, flatten=True, in_units=D))
net.initialize(mx.init.Xavier(magnitude=3))
[ ]:
from mxfusion.components.variables.var_trans import PositiveTransformation
from mxfusion.inference import VariationalPosteriorForwardSampling
from mxfusion.components.functions.operators import broadcast_to
from mxfusion.components.distributions import Normal
from mxfusion import Variable, Model
from mxfusion.components.functions import MXFusionGluonFunction
[ ]:
m = Model()
m.N = Variable()
m.f = MXFusionGluonFunction(net, num_outputs=1,broadcastable=False)
m.x = Variable(shape=(m.N,1))
m.v = Variable(shape=(1,), transformation=PositiveTransformation(), initial_value=mx.nd.array([0.01]))
m.r = m.f(m.x)
for v in m.r.factor.parameters.values():
    v.set_prior(Normal(mean=broadcast_to(mx.nd.array([0]), v.shape),
                       variance=broadcast_to(mx.nd.array([1.]), v.shape)))
m.y = Normal.define_variable(mean=m.r, variance=broadcast_to(m.v, (m.N,1)), shape=(m.N,1))

Inference with Meanfield

[ ]:
from mxfusion.inference import DistributedBatchInferenceLoop, create_Gaussian_meanfield, DistributedGradBasedInference, StochasticVariationalInference

To allow distributed training instead of single processor training, the inference class used would be DistributedGradBasedInference. The default grad_loop of DistributedGradBasedInference is DistributedBatchInferenceLoop, as opposed to GradBasedInference, which is BatchInferenceLoop.

Note that currently the code is not running distributed training in Horovod as we are still not running horovodrun or mpirun command from our system.

[ ]:
observed = [m.y, m.x]
q = create_Gaussian_meanfield(model=m, observed=observed)
alg = StochasticVariationalInference(num_samples=3, model=m, posterior=q, observed=observed)
infr = DistributedGradBasedInference(inference_algorithm=alg, grad_loop=DistributedBatchInferenceLoop())

We also need to specify the correct shape of data when initializing the inference. In this case if we are using 4 processors, and the shape of the data is (1000,1), we have to divide by 4. The line below will produce error in the notebook since it is still not running in Horovod.

[ ]:
infr.initialize(y=(250,1), x=(250,1))
[ ]:
for v_name, v in m.r.factor.parameters.items():
    infr.params[q[v].factor.mean] = net.collect_params()[v_name].data()
    infr.params[q[v].factor.variance] = mx.nd.ones_like(infr.params[q[v].factor.variance])*1e-6
[ ]:
infr.run(max_iter=2000, learning_rate=1e-2, y=mx.nd.array(y), x=mx.nd.array(x), verbose=True)

Use prediction to visualize the resulting BNN

[ ]:
xt = np.linspace(0,1,100)[:,None]
[ ]:
infr2 = VariationalPosteriorForwardSampling(10, [m.x], infr, [m.r])
res = infr2.run(x=mx.nd.array(xt))
[ ]:
yt = res[0].asnumpy()
[ ]:
yt_mean = yt.mean(0)
yt_std = yt.std(0)

for i in range(yt.shape[0]):
    plt.plot(xt[:,0],yt[i,:,0],'k',alpha=0.2)
plt.plot(x[:,0],y[:,0],'.')
plt.show()

Running Horovod

Currently, the only way to execute Horovod in MXFusion is via horovodrun or mpirun command from the system. Hence, we can first convert this notebook into Python file then execute the Python file with command line.

[ ]:
!jupyter nbconvert --to script bnn_regression-distributed.ipynb

To run it on Horovod and allow distributed training, we should run horovodrun or mpirun from our system while specifying the number of processors. More details about running Horovod can be found here. A simple way to run it is with the format: horovodrun -np {number of processors} -H localhost:4 python {python file}

NOTE : Please restart this notebook before executing the code below.

[ ]:
!mpirun -np 4 -H localhost:4 python bnn_regression-distributed.py