{ "info": { "author": "Tom Charnock", "author_email": "charnock@iap.fr", "bugtrack_url": null, "classifiers": [ "License :: OSI Approved :: MIT License", "Operating System :: OS Independent", "Programming Language :: Python :: 3.6" ], "description": "# Information maximiser\n\nUsing neural networks, sufficient statistics can be obtained from data by maximising the Fisher information.\n\nThe neural network takes some data ${\\bf d}$ and maps it to a compressed summary $\\mathscr{f}:{\\bf d}\\to{\\bf x}$ where ${\\bf x}$ can have the same size as the dimensionality of the parameter space, rather than the data space.\n\nTo train the neural network a batch of simulations ${\\bf d}_{\\sf sim}^{\\sf fid}$ created at a fiducial parameter value $\\boldsymbol{\\theta}^{\\rm fid}$ are compressed by the neural network to obtain ${\\bf x}_{\\sf sim}^{\\sf fid}$. From this we can calculate the covariance ${\\bf C_\\mathscr{f}}$ of the compressed summaries. We learn about model parameter distributions using the derivative of the simulation. This can be provided analytically or numercially using ${\\bf d}_{\\sf sim}^{\\sf fid+}$ created above the fiducial parameter value $\\boldsymbol{\\theta}^{\\sf fid+}$ and ${\\bf d}_{\\sf sim}^{\\sf fid-}$ created below the fiducial parameter value $\\boldsymbol{\\theta}^{\\sf fid-}$. The simulations are compressed using the network and used to find mean of the summaries $\\partial\\boldsymbol{\\mu}_\\mathscr{f}/\\partial\\theta_\\alpha\\equiv\\boldsymbol{\\mu}_\\mathscr{f},_\\alpha$ via the chain rule\n$$\\frac{\\partial\\mu}{\\partial\\theta_\\alpha} = \\frac{1}{n_{\\textrm{sims}}}\\sum_{i=1}^{n_{\\textrm{sims}}}\\frac{\\partial{\\bf x}_i}{\\partial{\\bf d}_i}\\frac{\\partial{\\bf d}_i}{\\partial\\theta_\\alpha}.$$\nWe then use ${\\bf C}_\\mathscr{f}$ and $\\boldsymbol{\\mu}_\\mathscr{f},_\\alpha$ to calculate the Fisher information\n$${\\bf F}_{\\alpha\\beta} = \\boldsymbol{\\mu}_\\mathscr{f},^T_{\\alpha}{\\bf C}^{-1}_\\mathscr{f}\\boldsymbol{\\mu}_\\mathscr{f},_{\\beta}.$$\nWe want to maximise the Fisher information, and we want the summaries to be orthogonal so to train the network we minimise the loss function\n $$\\Lambda = -\\ln|{\\bf F}_{\\alpha\\beta}|+\\lambda||{\\bf C}_\\mathscr{f}-\\mathbb{1}||_2,$$\nwhere $\\lambda$ is some coupling for the square norm of the network covariance.\n\nWhen using this code please cite arXiv:1802.03537.

\nThe code in the paper can be downloaded as v1 or v1.1 of the code kept on zenodo:

\n[![DOI](https://zenodo.org/badge/DOI/10.5281/zenodo.1175196.svg)](https://doi.org/10.5281/zenodo.1175196)\n
\n\nThis code is run using
\n>`python-3.6.6`\n\n>`tensorflow-1.12.0`\n\n>`numpy-1.15.0`\n\n>`tqdm==4.29.0`\n\nAlthough these precise versions may not be necessary, I have put them here to avoid possible conflicts.\n\n## Load modules\n\n\n```python\n%matplotlib inline\nimport numpy as np\nimport matplotlib.pyplot as plt\nimport tensorflow as tf\nimport IMNN.IMNN as IMNN\nimport IMNN.ABC.ABC as ABC\nimport IMNN.ABC.priors as priors\n```\n\n# Summarising the mean and the variance\n\nFor this example we are going to use $n_{\\bf d}=10$ data points of a 1D field of Gaussian noise with unknown mean and variance to see if the network can learn to summarise them.

\n\nThe likelihood is given by\n$$\\mathcal{L} = \\prod_i^{n_{\\bf d}}\\frac{1}{\\sqrt{2\\pi|\\Sigma|}}\\exp\\left[-\\frac{1}{2}\\frac{(d_i-\\mu)^2}{\\Sigma}\\right]$$\n\nWe can solve this problem analytically, so it is useful to check how well the network does. There is a single sufficient statistic which describes each the mean and the variance, which can be found by finding the maximum of the probability. We find that\n$$\\sum_i^{n_{\\bf d}}d_i = \\mu\\textrm{ and }\\sum_i^{n_{\\bf d}}(d_i-\\mu)^2=n_{\\bf d}\\Sigma$$\n\nWe can calculate the Fisher information by taking the negative of second derivative of the likelihood taking the expectation by inserting the above relations at examining at some fiducial parameter values\n$${\\bf F}_{\\alpha\\beta} = -\\left.\\left(\\begin{array}{cc}\\displaystyle-\\frac{n_{\\bf d}}{\\Sigma}&0\\\\0&\\displaystyle-\\frac{n_{\\bf d}}{2\\Sigma^2}\\end{array}\\right)\\right|_{\\textrm{fiducial}}.$$\nIf we choose a fiducial mean of $\\mu^{\\textrm{fid}}=0$ and variance of $\\Sigma^{\\textrm{fid}} = 1$ then we obtain a Fisher information matrix of\n\n\n\n\n\n```python\nexact_fisher = -np.array([[-10. / 1., 0.], [0. , - 0.5 * 10 / 1.**2.]])\ndeterminant_exact_fisher = np.linalg.det(exact_fisher)\nprint(\"determinant of the Fisher information\", determinant_exact_fisher)\nplt.imshow(np.linalg.inv(exact_fisher))\nplt.title(\"Inverse Fisher matrix\")\nplt.xticks([0, 1], [r\"$\\mu$\", r\"$\\Sigma$\"])\nplt.yticks([0, 1], [r\"$\\mu$\", r\"$\\Sigma$\"])\nplt.colorbar();\n```\n\n determinant of the Fisher information 50.000000000000014\n\n\n\n![png](images/output_8_1.png)\n\n\nLet us observe our _real_ data which happens to have true parameters $\\mu=3$ and $\\Sigma=2$\n\n\n```python\nreal_data = np.random.normal(3., np.sqrt(2.), size = (1, 10))\n```\n\n\n```python\nfig, ax = plt.subplots(1, 1, figsize = (10, 6))\nax.plot(real_data[0], label = \"observed data\")\nax.legend(frameon = False)\nax.set_xlim([0, 9])\nax.set_xticks([])\nax.set_ylabel(\"Data amplitude\");\n```\n\n\n![png](images/output_11_0.png)\n\n\nThe posterior distribution for this data (normalised to integrate to 1) is\n\n\n```python\n\u03bc_array = np.linspace(-10, 10, 1000)\n\u03a3_array = np.linspace(0.001, 10, 1000)\n\nparameter_grid = np.array(np.meshgrid(\u03bc_array, \u03a3_array, indexing = \"ij\"))\ndx = (\u03bc_array[1] - \u03bc_array[0]) * (\u03a3_array[1] - \u03a3_array[0])\n\nanalytic_posterior = np.exp(-0.5 * (np.sum((real_data[0][:, np.newaxis] - parameter_grid[0, :, 0][np.newaxis, :])**2., axis = 0)[:, np.newaxis] / parameter_grid[1, 0, :][np.newaxis, :] + real_data.shape[1] * np.log(2. * np.pi * parameter_grid[1, 0, :][np.newaxis, :])))\nanalytic_posterior = analytic_posterior.T / np.sum(analytic_posterior * dx)\n```\n\n\n```python\nfig, ax = plt.subplots(2, 2, figsize = (16, 10))\nplt.subplots_adjust(wspace = 0, hspace = 0)\nax[0, 0].plot(parameter_grid[0, :, 0], np.sum(analytic_posterior, axis = 0), linewidth = 1.5, color = 'C2', label = \"Analytic marginalised posterior\")\nax[0, 0].legend(frameon = False)\nax[0, 0].set_xlim([-10, 10])\nax[0, 0].set_ylabel('$\\\\mathcal{P}(\\\\mu|{\\\\bf d})$')\nax[0, 0].set_yticks([])\nax[0, 0].set_xticks([])\nax[1, 0].set_xlabel('$\\mu$');\nax[1, 0].set_ylim([0, 10])\nax[1, 0].set_ylabel('$\\Sigma$')\nax[1, 0].set_xlim([-10, 10])\nax[1, 0].contour(parameter_grid[0, :, 0], parameter_grid[1, 0, :], analytic_posterior, colors = \"C2\")\nax[1, 1].plot(np.sum(analytic_posterior, axis = 1), parameter_grid[1, 0, :], linewidth = 1.5, color = 'C2', label = \"Analytic marginalised posterior\")\nax[1, 1].legend(frameon = False)\nax[1, 1].set_ylim([0, 10])\nax[1, 1].set_xlabel('$\\\\mathcal{P}(\\\\Sigma|{\\\\bf d})$')\nax[1, 1].set_xticks([])\nax[1, 1].set_yticks([])\nax[0, 1].axis(\"off\");\n```\n\n\n![png](images/output_14_0.png)\n\n\nNow lets see how the information maximising neural network can recover this posterior.\n\n## Generate data\n\nWe start by defining a function to generate the data with the correct shape. The shape must be\n```\ndata_shape = None + input shape\n```\n\n\n```python\ninput_shape = [10]\n```\n\nIt is useful to define the generating function so that it only takes in the value of the parameter as its input since the function can then be used for ABC later.

\nThe data needs to be generated at a fiducial parameter value and at perturbed values just below and above the fiducial parameter for the numerical derivative.\n\n\n```python\n\u03b8_fid = np.array([0, 1.])\n\u0394\u03b8pm = np.array([0.1, 0.1])\n```\n\nThe data at the perturbed values should have the shape\n```\nperturbed_data_shape = None + number of parameters + input shape\n```\n\nThe generating function is defined so that the fiducial parameter is passed as a list so that many simulations can be made at once. This is very useful for the ABC function later.\n\n\n```python\ndef simulator(\u03b8, seed, simulator_args):\n if seed is not None:\n np.random.seed(seed)\n if len(\u03b8.shape) > 1:\n \u03bc = \u03b8[:, 0]\n \u03a3 = \u03b8[:, 1]\n else:\n \u03bc = 0.\n \u03a3 = \u03b8\n return np.moveaxis(np.random.normal(\u03bc, np.sqrt(\u03a3), simulator_args[\"input shape\"] + [\u03b8.shape[0]]), -1, 0)\n```\n\n```python\ndef simulator(\u03b8, seed, simulator_args):\n if seed is not None:\n np.random.seed(seed)\n if len(\u03b8.shape) > 1:\n if \u03b8.shape[1] == 2:\n \u03bc = \u03b8[:, 0]\n \u03a3 = \u03b8[:, 1]\n else:\n \u03bc = \u03b8[:, 0]\n \u03a3 = np.ones_like(\u03bc)\n else:\n \u03bc = \u03b8\n \u03a3 = 1.\n return np.moveaxis(np.random.normal(\u03bc, np.sqrt(\u03a3), simulator_args[\"input shape\"] + [\u03b8.shape[0]]), -1, 0)\n```\n\n### Training data\nEnough data needs to be made to approximate the covariance matrix of the output summaries. The number of simulations needed to approximate the covariance is `n_s`. If the data is particularly large then it might not be possible to pass all the data into active memory at once and so several the simulations can be split into batches.\n\nFor example if we wanted to make 2000 simulations, but estimate the covariance using 1000 simulations at a time\nwe would set\n\n\n```python\nn_s = 1000\nnum_sims = 10 * n_s\nseed = np.random.randint(1e6)\n```\n\nThe training data can now be made\n\n\n```python\nt = simulator(\u03b8 = np.tile(\u03b8_fid, [num_sims, 1]), seed = seed, simulator_args = {\"input shape\": input_shape})\n```\n\nIdeally we would be able to take the derivative of our simulations with respect to the model parameters. We can indeed do that in this case, but since this is possibly a rare occurance I will show an example where the derivatives are calculated numerically. By suppressing the sample variance between the simulations created at some lower and upper varied parameter values, far fewer simulations are needed.\n\n\n```python\nn_p = 1000\nnum_partial_sims = 10 * n_p\n```\n\nThe sample variance is supressed by choosing the same initial seed when creating the upper and lower simulations.\n\n\n```python\nt_m = simulator(\u03b8 = np.tile(\u03b8_fid - np.array([0.1, 0.]), [num_partial_sims, 1]), seed = seed, simulator_args = {\"input shape\": input_shape})\nt_p = simulator(\u03b8 = np.tile(\u03b8_fid + np.array([0.1, 0.]), [num_partial_sims, 1]), seed = seed, simulator_args = {\"input shape\": input_shape})\nt_m = np.stack([t_m, simulator(\u03b8 = np.tile(\u03b8_fid - np.array([0., 0.1]), [num_partial_sims, 1]), seed = seed, simulator_args = {\"input shape\": input_shape})], axis = 1)\nt_p = np.stack([t_p, simulator(\u03b8 = np.tile(\u03b8_fid + np.array([0., 0.1]), [num_partial_sims, 1]), seed = seed, simulator_args = {\"input shape\": input_shape})], axis = 1)\nt_d = (t_p - t_m) / (2. * \u0394\u03b8pm)[np.newaxis, :, np.newaxis]\n```\n\nThe fiducial simulations and simulations for the derivative must be collected in a dictionary to be stored in the TensorFlow graph or passed to the training function.\n\n\n```python\ndata = {\"data\": t, \"data_d\": t_d}\n```\n\n### Test data\nWe should also make some test data, but here we will use only one combination. This needs adding to the dictionary\n\n\n```python\nnum_validation_sims = n_s\nnum_validation_partial_sims = n_p\nseed = np.random.randint(1e6)\ntt = simulator(\u03b8 = np.tile(\u03b8_fid, [num_validation_sims, 1]), seed = seed, simulator_args = {\"input shape\": input_shape})\ntt_m = simulator(\u03b8 = np.tile(\u03b8_fid - np.array([0.1, 0.]), [num_validation_partial_sims, 1]), seed = seed, simulator_args = {\"input shape\": input_shape})\ntt_p = simulator(\u03b8 = np.tile(\u03b8_fid + np.array([0.1, 0.]), [num_validation_partial_sims, 1]), seed = seed, simulator_args = {\"input shape\": input_shape})\ntt_m = np.stack([tt_m, simulator(\u03b8 = np.tile(\u03b8_fid - np.array([0., 0.1]), [num_validation_partial_sims, 1]), seed = seed, simulator_args = {\"input shape\": input_shape})], axis = 1)\ntt_p = np.stack([tt_p, simulator(\u03b8 = np.tile(\u03b8_fid + np.array([0., 0.1]), [num_validation_partial_sims, 1]), seed = seed, simulator_args = {\"input shape\": input_shape})], axis = 1)\ntt_d = (tt_p - tt_m) / (2. * \u0394\u03b8pm)[np.newaxis, :, np.newaxis]\ndata[\"validation_data\"] = tt\ndata[\"validation_data_d\"] = tt_d\n```\n\n### Data visualisation\nWe can plot the data to see what it looks like.\n\n\n```python\nfig, ax = plt.subplots(1, 1, figsize = (10, 6))\nax.plot(data['data'][np.random.randint(num_sims)], label = \"training data\")\nax.plot(data['validation_data'][np.random.randint(num_validation_sims)], label = \"test data\")\nax.legend(frameon = False)\nax.set_xlim([0, 9])\nax.set_xticks([])\nax.set_ylabel(\"Data amplitude\");\n```\n\n\n![png](images/output_36_0.png)\n\n\nIt is also very useful to plot the upper and lower derivatives to check that the sample variance is actually supressed since the network learns extremely slowly if this isn't done properly.\n\n\n```python\nfig, ax = plt.subplots(2, 2, figsize = (20, 10))\nplt.subplots_adjust(hspace = 0)\ntraining_index = np.random.randint(num_partial_sims)\ntest_index = np.random.randint(num_validation_partial_sims)\n\nax[0, 0].plot(t_m[training_index, 0], label = \"lower training data\", color = \"C0\", linestyle = \"dashed\")\nax[0, 0].plot(t_p[training_index, 0], label = \"upper training data\", color = \"C0\")\nax[0, 0].plot(tt_m[test_index, 0], label = \"lower validation data\", color = \"C1\", linestyle = \"dashed\")\nax[0, 0].plot(tt_p[test_index, 0], label = \"upper validation data\", color = \"C1\")\nax[0, 0].legend(frameon = False)\nax[0, 0].set_xlim([0, 9])\nax[0, 0].set_xticks([])\nax[0, 0].set_ylabel(\"Data amplitude with varied mean\")\nax[1, 0].plot(data[\"data_d\"][training_index, 0], label = \"derivative training data\", color = \"C0\")\nax[1, 0].plot(data[\"data_d\"][test_index, 0], label = \"derivative validation data\", color = \"C1\")\nax[1, 0].set_xlim([0, 9])\nax[1, 0].set_xticks([])\nax[1, 0].legend(frameon = False)\nax[1, 0].set_ylabel(\"Amplitude of the derivative of the data\\nwith respect to the mean\");\n\nax[0, 1].plot(t_m[training_index, 1], label = \"lower training data\", color = \"C0\", linestyle = \"dashed\")\nax[0, 1].plot(t_p[training_index, 1], label = \"upper training data\", color = \"C0\")\nax[0, 1].plot(tt_m[test_index, 1], label = \"lower validation data\", color = \"C1\", linestyle = \"dashed\")\nax[0, 1].plot(tt_p[test_index, 1], label = \"upper validation data\", color = \"C1\")\nax[0, 1].legend(frameon = False)\nax[0, 1].set_xlim([0, 9])\nax[0, 1].set_xticks([])\nax[0, 1].set_ylabel(\"Data amplitude with varied covariance\")\nax[1, 1].plot(data[\"data_d\"][training_index, 1], label = \"derivative training data\", color = \"C0\")\nax[1, 1].plot(data[\"data_d\"][test_index, 1], label = \"derivative validation data\", color = \"C1\")\nax[1, 1].set_xlim([0, 9])\nax[1, 1].set_xticks([])\nax[1, 1].legend(frameon = False)\nax[1, 1].set_ylabel(\"Amplitude of the derivative of the data\\nwith respect to covariance\");\n```\n\n\n![png](images/output_38_0.png)\n\n\n## Initiliase the neural network\n### Define network parameters\nThe network is initialised with a base set of parameters
\n\n> `\"dtype\"` - `int, optional` - 32 or 64 for 32 or 64 bit floats and integers\n\n> `\"number of simulations\"` - `int` - the number of simulations to use to approximate covariance\n\n> `\"number of derivative simulations\"` - `int` - the number of derivatives of the simulations to calculate derivative of the mean\n\n> `\"fiducial\"` - `list` - fiducial parameter values at which to train the network at\n\n> `\"number of summaries\"` - `int` - number of summaries the network makes from the data\n\n> `\"input shape\"` - `list` - the shape of the input data\n\n> `\"filename\"` - `str, optional` - a filename to save/load the graph with\n\n\n```python\nparameters = {\n \"dtype\": 32,\n \"number of simulations\": n_s,\n \"number of derivative simulations\": n_p,\n \"fiducial\": \u03b8_fid.tolist(),\n \"number of summaries\": 2,\n \"input shape\": input_shape,\n \"filename\": \"data/model\",\n}\n```\n\n\n```python\ntf.reset_default_graph()\nn = IMNN.IMNN(parameters = parameters)\n```\n\n## Self-defined network\n\nThe information maximising neural network must be a provided with a neural network to optimise. In principle, this should be highly specified to pull out the informative features in the data. All weights should be defined in their own variable scope. Additional tensors which control, say, the dropout or the value of a leaky relu negative gradient can be defined and passed to the training and validation phase using a dictionary.\n\nBelow is an example of a network which takes in the data and passes it through a fully connected neural network with 2 hidden layers with 128 neurons in each and outputs two summaries. The activation is leaky relu on the hidden layers and linear on the output.\n\n
\n
NOTE THAT BIASES SHOULD NOT BE INCLUDED. THERE IS A MISSING SET OF GRADIENTS WHICH ARE NOT DEFINED IN THE CORE TENSORFLOW CODE.
\n\n\n```python\ndef build_network(data, **kwargs):\n \u03b1 = kwargs[\"activation_parameter\"]\n with tf.variable_scope(\"layer_1\"):\n weights = tf.get_variable(\"weights\", shape = [input_shape[-1], 128], initializer = tf.variance_scaling_initializer())\n output = tf.nn.leaky_relu(tf.matmul(data, weights, name = \"multiply\"), \u03b1, name = \"output\")\n with tf.variable_scope(\"layer_2\"):\n weights = tf.get_variable(\"weights\", shape = [128, 128], initializer = tf.variance_scaling_initializer())\n output = tf.nn.leaky_relu(tf.matmul(output, weights, name = \"multiply\"), \u03b1, name = \"output\")\n with tf.variable_scope(\"layer_3\"):\n weights = tf.get_variable(\"weights\", shape = (128, n.n_summaries), initializer = tf.variance_scaling_initializer())\n output = tf.identity(tf.matmul(output, weights, name = \"multiply\"), name = \"output\")\n return output\n```\n\nExtra tensors such as for dropout, the activation parameter for functions such as leaky relu, or boolean training phase parameter for batch normalisation can be added (with necessary named placeholders). Note that in the network above we only need the activation parameter, but I will leave the extra tensors in the next few cells as an example.\n\n\n```python\n\u03b4 = tf.placeholder(dtype = tf.float32, shape = (), name = \"dropout_value\")\n\u03b1 = tf.placeholder(dtype = tf.float32, shape = (), name = \"activation_parameter\")\n\u03d5 = tf.placeholder(dtype = tf.bool, shape = (), name = \"training_phase\")\n```\n\nThe network needs to be passed to the IMNN module only taking in a single tensor, as such we should use a lambda function passing through the tensors that we want.\n\n\n```python\nnetwork = lambda x: build_network(x, activation_parameter = \u03b1, dropout_value = \u03b4, training_phase = \u03d5)\n```\n\nAnd now the tensor names are stored in training and validation dictionaries with their values to be called during training and validation.\n\n\n```python\ntraining_dictionary = {\"dropout_value:0\": 0.8,\n \"activation_parameter:0\": 0.01,\n \"training_phase:0\": True,\n }\n\nvalidation_dictionary = {\"dropout_value:0\": 1.,\n \"activation_parameter:0\": 0.01,\n \"training_phase:0\": False,\n }\n```\n\nThere is a very limited network building ability which can be called to build simple networks including fully connected and 1D, 2D and 3D convolutional. All weights are initialised using He initialisation. A limited number of activation functions can be specified (functions that do not need extra parameters), such as `tanh`, `sigmoid`, `relu`, `elu`. The network architecture is described using a list. Each element of the list is a hidden layer. A dense layer can be made using an integer where thet value indicates the number of neurons. A convolutional layer can be built by using a list where the first element is an integer where the number describes the number of filters, the second element is a list of the kernel size in the x and y directions, the third elemnet is a list of the strides in the x and y directions and the final element is string of 'SAME' or 'VALID' which describes the padding prescription.\n\n```python\nautomatic_network = {\n \"activation function\" : tf.nn.relu,\n \"hidden layers\" : [128, 128]\n}\n```\n\n
This is still under (slow) development - it would be better to provide your own network because you'll know more about your data!
\n\n## Setup the graph\nThe graph can now be setup easily by passing the network to the setup function.\n\n\n```python\nn.setup(network = network)\n```\n\n saving the graph as data/model.meta\n\n\nThe data can be preloaded to the graph by passing the dictionary to the setup function\n```python\nn.setup(network = network, load_data = data)\n```\n\n## Train the network\nThe training can now be performed by passing the number of weight and bias updates to perform, the learning rate, how many simulations to pass through the network at once and the number of simulations, number of derivatives of the simulations for training and validation, the dictionaries for training and validation tensors if they are used in the network, the data if it hasn't been preloaded to the network and whether to run the history object.\nThe strength of the coupling to the covariance regulariser can be passed using the `constraint_strength` keyword (although it is automatically set to 2). This may need to be changed depending on the size of the Fisher information.\n\nAutomatically, the training function can be rerun to continue training the network further.\n\n\n```python\nupdates = 500\nat_once = 1000\nlearning_rate = 1e-3\n\nn.train(updates, at_once, learning_rate,\n constraint_strength = 2.,\n training_dictionary = training_dictionary,\n validation_dictionary = validation_dictionary,\n get_history = True, data = data, restart = False, diagnostics = True)\n```\n\n\n HBox(children=(IntProgress(value=0, description='Updates', max=500, style=ProgressStyle(description_width='ini\u2026\n\n\n\n\n\nThe network can also be reinitialised before training if something goes wrong by running\n```python\nn.train(updates, at_once, learning_rate, constraint_strength = 2., training_dictionary = training_dictionary, validation_dictionary = validation_dictionary, get_history = True, data = data, restart = True)\n```\nDiagnostics can be collected, including the values of the weights and gradients at every epoch, the value of the gradient of the loss function and the determinant of the covariance. Note that this will make the network take a lot longer to train. This option is selected using\n```python\nn.train(updates, at_once, learning_rate, constraint_strength = 2., training_dictionary = training_dictionary, validation_dictionary = validation_dictionary, get_history = True, data = data, restart = False, diagnostics = True)\n```\n\n\nIf run then the history object will contain the value of the determinant of the Fisher information from the training and the validation data.\n\n\n```python\nfig, ax = plt.subplots(2, 1, sharex = True, figsize = (10, 10))\nplt.subplots_adjust(hspace = 0)\nepochs = np.arange(1, len(n.history[\"det F\"]) + 1)\nax[0].plot(epochs, n.history[\"loss\"], label = 'loss from training data')\nax[0].plot(epochs, n.history[\"test loss\"], label = 'loss from validation data')\nax[0].legend(frameon = False)\nax[0].set_xlim([1, epochs[-1]])\nax[0].set_ylabel(r\"$loss$\")\nax[1].plot(epochs, n.history[\"det F\"], label = r'$|{\\bf F}_{\\alpha\\beta}|$ from training data')\nax[1].plot(epochs, n.history[\"det test F\"], label = r'$|{\\bf F}_{\\alpha\\beta}|$ from validation data')\nax[1].legend(frameon = False)\nax[1].axhline(determinant_exact_fisher, color = \"black\", linestyle = \"dashed\")\nax[1].set_xlim([1, epochs[-1]])\nax[1].set_ylabel(r\"$|{\\bf F}_{\\alpha\\beta}|$\")\nax[1].set_xlabel(\"Number of epochs\");\n```\n\n\n![png](images/output_58_0.png)\n\n\nSince we collected the diagnostics we can plot the covariance\n\n\n```python\nfig, ax = plt.subplots(1, 1, sharex = True, figsize = (10, 6))\nax.plot(epochs, n.diagnostics[\"det C\"], label = r'$|{\\bf C}_\\mathscr{f}|$ from training data')\nax.plot(epochs, n.diagnostics[\"det test C\"], label = r'$|{\\bf C}_\\mathscr{f}|$ from validation data')\nax.axhline(1, color = \"black\", linestyle = \"dashed\")\nax.legend(frameon = False)\nax.set_xlim([1, epochs[-1]])\nax.set_ylabel(r\"$|{\\bf C}_\\mathscr{f}|$\")\nax.set_xlabel(\"Number of epochs\");\n```\n\n\n![png](images/output_60_0.png)\n\n\nWe'll also plot the weights from the first neuron in each layer to every neuron in the next layer (eventhough it's not particularly useful in this case)\n\n\n```python\nfig, ax = plt.subplots(1, 3, sharex = True, figsize = (20, 6))\nend = len(n.diagnostics[\"weights\"][0][:, 0, 0])\nax[0].plot(epochs, n.diagnostics[\"weights\"][0][:end, 0, :]);\nax[0].set_ylabel(\"Layer 1 weight amplitudes\");\nax[0].set_xlabel(\"Number of epochs\");\nax[0].set_xlim([1, epochs[-1]])\nax[1].plot(epochs, n.diagnostics[\"weights\"][1][:end, 0, :]);\nax[1].set_ylabel(\"Layer 2 weight amplitudes\");\nax[1].set_xlabel(\"Number of epochs\");\nax[1].set_xlim([1, epochs[-1]])\nax[2].plot(epochs, n.diagnostics[\"weights\"][2][:end, 0, :]);\nax[2].set_xlim([1, epochs[-1]])\nax[2].set_ylabel(\"Layer 3 weight amplitudes\");\nax[2].set_xlabel(\"Number of epochs\");\n```\n\n\n![png](images/output_62_0.png)\n\n\nAnd their corresponding gradients are\n\n\n```python\nfig, ax = plt.subplots(1, 3, sharex = True, figsize = (20, 6))\nax[0].plot(epochs, n.diagnostics[\"gradients\"][0][:end, 0, :])\nax[0].axhline(0, color = \"black\", linestyle = \"dashed\")\nax[0].set_ylabel(\"Gradient amplitudes for layer 1 weights\")\nax[0].set_xlabel(\"Number of epochs\");\nax[0].set_xlim([1, epochs[-1]])\nax[1].plot(epochs, n.diagnostics[\"gradients\"][1][:end, 0, :])\nax[1].axhline(0, color = \"black\", linestyle = \"dashed\")\nax[1].set_ylabel(\"Gradient amplitudes for layer 2 weights\")\nax[1].set_xlabel(\"Number of epochs\");\nax[1].set_xlim([1, epochs[-1]])\nax[2].plot(epochs, n.diagnostics[\"gradients\"][2][:end, 0, :])\nax[2].axhline(0, color = \"black\", linestyle = \"dashed\")\nax[2].set_ylabel(\"Gradient amplitudes for layer 3 weights\");\nax[2].set_xlim([1, epochs[-1]])\nax[2].set_xlabel(\"Number of epochs\");\n```\n\n\n![png](images/output_64_0.png)\n\n\nAnd finally, the value of the gradient of the loss function\n\n\n```python\nfig, ax = plt.subplots(1, 1, sharex = True, figsize = (10, 6))\nax.plot(epochs, np.mean(n.diagnostics[\"fisher gradient\"], axis = 1)[:, 0], label = r'$Gradient of the loss function from first summary ')\nax.plot(epochs, np.mean(n.diagnostics[\"fisher gradient\"], axis = 1)[:, 1], label = r'$Gradient of the loss function from first summary ')\nax.legend(frameon = False)\nax.set_xlim([1, epochs[-1]])\nax.set_ylabel(r\"$Gradient of the loss function$\")\nax.set_xlabel(\"Number of epochs\");\n```\n\n\n![png](images/output_66_0.png)\n\n\n## Resetting the network\nIf you need to reset the weights for any reason, and you don't want to run the training function with `restart = True`, then you can call\n```python\nn.reinitialise_session()\n```\n\n## Saving the network\n\nIf you don't initialise the network with a save name you can save the network as a `TensorFlow` `meta` graph. For example saving the model in the directory `./data` called `saved_model.meta` can be done using the function\n```python\nn.save_network(filename = \"data/saved_model\", first_time = True)\n```\nIf `save file` is passed with a correct file name when initialising the module then the initialised network will be saved by\n```python\nn.begin_session()\n```\nand then saved at the end of training.\n\n## Loading the network\n\nYou can load the network from a `TensorFlow` `meta` graph (from `/.data/saved_model.meta`) using the same parameter dictionary as used when originally training the network and then running\n```python\nn = IMNN(parameters = parameters)\nn.restore_network()\n```\nTraining can be continued after restoring the model - although the Adam optimiser might need to reacquaint itself.\n\n## Approximate Bayesian computation\n\nWe can now do ABC (or PMC-ABC) with our calculated summary. From the samples we create simulations at each parameter value and feed each simulation through the network to get summaries. The summaries are compared to the summary of the real data to find the distances which can be used to accept or reject points.\n\nWe start by defining our prior as a truncated Gaussian (uniform is also available). The uniform function is taken from delfi by Justin Alsing.\n\nWe are going to choose the mean value of the variance to be 1 with a variance of the variance of 4 cut at 0 and 10.\n\n\n```python\nprior = priors.TruncatedGaussian(np.array([0., 1.]), np.array([[10., 0.], [0., 10.]]), np.array([-10., 0.]), np.array([10., 10.]))\n```\n\nThe ABC module takes in the _observed_ data, the prior and the TF session. It also takes in the simulator and its arguments and the validation dictionary which needs to be passed to the graph.\n\n\n```python\nabc = ABC.ABC(real_data = real_data, prior = prior, sess = n.sess, get_compressor = n.get_compressor, simulator = simulator, seed = None, simulator_args = {\"input shape\": input_shape}, dictionary = validation_dictionary)\n```\n\n## Gaussian approximation\nBefore running all the simulations need for approximate Bayesian computation, we can get the Gaussian approximation of the posterior from the MLE and the inverse Fisher information.\n\n\n```python\nprint(\"maximum likelihood estimate\", abc.MLE[0])\nprint(\"determinant of the Fisher information\", np.linalg.det(abc.fisher))\nplt.imshow(np.linalg.inv(abc.fisher))\nplt.title(\"Inverse Fisher matrix\")\nplt.xticks([0, 1], [r\"$\\mu$\", r\"$\\Sigma$\"])\nplt.yticks([0, 1], [r\"$\\mu$\", r\"$\\Sigma$\"])\nplt.colorbar();\n```\n\n maximum likelihood estimate [2.5744095 5.815166 ]\n determinant of the Fisher information 46.824364\n\n\n\n![png](images/output_75_1.png)\n\n\n\n```python\ngaussian_approximation, grid = abc.gaussian_approximation(gridsize = 100)\n```\n\n\n```python\nfig, ax = plt.subplots(2, 2, figsize = (16, 10))\nplt.subplots_adjust(wspace = 0, hspace = 0)\nax[0, 0].plot(parameter_grid[0, :, 0], np.sum(analytic_posterior * (parameter_grid[0, 1, 0] - parameter_grid[0, 0, 0]), axis = 0), linewidth = 1.5, color = 'C2', label = \"Analytic marginalised posterior\")\nax[0, 0].plot(grid[0, :, 0], np.sum(gaussian_approximation * (grid[0, 1, 0] - grid[0, 0, 0]), axis = 0), color = \"C1\", label = \"Gaussian approximation\")\nax[0, 0].axvline(abc.MLE[0, 0], linestyle = \"dashed\", color = \"black\", label = \"Maximum likelihood estimate of mean\")\nax[0, 0].legend(frameon = False)\nax[0, 0].set_xlim([-10, 10])\nax[0, 0].set_ylabel('$\\\\mathcal{P}(\\\\mu|{\\\\bf d})$')\nax[0, 0].set_yticks([])\nax[0, 0].set_xticks([])\nax[1, 0].set_xlabel('$\\mu$');\nax[1, 0].set_ylim([0, 10])\nax[1, 0].set_ylabel('$\\Sigma$')\nax[1, 0].set_xlim([-10, 10])\nax[1, 0].contour(parameter_grid[0, :, 0], parameter_grid[1, 0, :], analytic_posterior, colors = \"C2\")\nax[1, 0].contour(grid[0, :, 0], grid[1, 0, :], gaussian_approximation, colors = \"C1\")\nax[1, 0].axvline(abc.MLE[0, 0], linestyle = \"dashed\", color = \"black\", label = \"Maximum likelihood estimate of mean\")\nax[1, 0].axhline(abc.MLE[0, 1], linestyle = \"dotted\", color = \"black\", label = \"Maximum likelihood estimate of covariance\")\nax[1, 1].plot(np.sum(analytic_posterior * (parameter_grid[1, 0, 1] - parameter_grid[1, 0, 0]), axis = 1), parameter_grid[1, 0, :], linewidth = 1.5, color = 'C2', label = \"Analytic marginalised posterior\")\nax[1, 1].plot(np.sum(gaussian_approximation * (grid[1, 0, 1] - grid[1, 0, 0]), axis = 1), grid[1, 0, :], color = \"C1\", label = \"Gaussian approximation\")\nax[1, 1].axhline(abc.MLE[0, 1], linestyle = \"dotted\", color = \"black\", label = \"Maximum likelihood estimate of covariance\")\nax[1, 1].legend(frameon = False)\nax[1, 1].set_ylim([0, 10])\nax[1, 1].set_xlabel('$\\\\mathcal{P}(\\\\Sigma|{\\\\bf d})$')\nax[1, 1].set_xticks([])\nax[1, 1].set_yticks([])\nax[0, 1].axis(\"off\");\n```\n\n\n![png](images/output_77_0.png)\n\n\nWe can see that the maximum likelihood estimate for the mean is almost perfect whilst it is incorrect for the variance. However, we can now see the ABC does in its place.\n\n### ABC\nThe most simple ABC takes the number of draws and a switch to state whether to run all the simulations in parallel or sequentially. The full simulations can also be saved by passing a file name. The draws are stored in the class attribute `ABC_dict`.\n\n\n```python\nabc.ABC(draws = 100000, at_once = True, save_sims = None, MLE = True)\n```\n\nIn ABC, draws are accepted if the distance between the simulation summary and the simulation of the real data are \"close\", i.e. smaller than some \u03f5 value, which is chosen somewhat arbitrarily.\n\n\n```python\n\u03f5 = 2.\naccept_indices = np.argwhere(abc.ABC_dict[\"distances\"] < \u03f5)[:, 0]\nreject_indices = np.argwhere(abc.ABC_dict[\"distances\"] >= \u03f5)[:, 0]\nprint(\"Number of accepted samples = \", accept_indices.shape[0])\n```\n\n Number of accepted samples = 1223\n\n\n### Plot samples\nWe can plot the output samples and the histogram of the accepted samples, which should peak around `\u03b8 = 1` (where we generated the real data). The monotonic function of all the output samples shows that the network has learned how to summarise the data.\n\n\n```python\nfig, ax = plt.subplots(2, 2, figsize = (14, 10))\nplt.subplots_adjust(hspace = 0, wspace = 0.2)\nax[0, 0].scatter(abc.ABC_dict[\"parameters\"][reject_indices, 0], abc.ABC_dict[\"summaries\"][reject_indices, 0], s = 1, alpha = 0.1, label = \"Rejected samples\", color = \"C3\")\nax[0, 0].scatter(abc.ABC_dict[\"parameters\"][accept_indices, 0] , abc.ABC_dict[\"summaries\"][accept_indices, 0], s = 1, label = \"Accepted samples\", color = \"C6\", alpha = 0.5)\nax[0, 0].axhline(abc.summary[0, 0], color = 'black', linestyle = 'dashed', label = \"Summary of observed data\")\nax[0, 0].legend(frameon=False)\nax[0, 0].set_ylabel('First network output', labelpad = 0)\nax[0, 0].set_xlim([-10, 10])\nax[0, 0].set_xticks([])\nax[1, 0].scatter(abc.ABC_dict[\"parameters\"][reject_indices, 0], abc.ABC_dict[\"summaries\"][reject_indices, 1], s = 1, alpha = 0.1, label = \"Rejected samples\", color = \"C3\")\nax[1, 0].scatter(abc.ABC_dict[\"parameters\"][accept_indices, 0] , abc.ABC_dict[\"summaries\"][accept_indices, 1], s = 1, label = \"Accepted samples\", color = \"C6\", alpha = 0.5)\nax[1, 0].axhline(abc.summary[0, 1], color = 'black', linestyle = 'dashed', label = \"Summary of observed data\")\nax[1, 0].legend(frameon=False)\nax[1, 0].set_ylabel('Second network output', labelpad = 0)\nax[1, 0].set_xlim([-10, 10])\nax[1, 0].set_xlabel(\"$\\mu$\")\nax[0, 1].scatter(abc.ABC_dict[\"parameters\"][reject_indices, 1], abc.ABC_dict[\"summaries\"][reject_indices, 0], s = 1, alpha = 0.1, label = \"Rejected samples\", color = \"C3\")\nax[0, 1].scatter(abc.ABC_dict[\"parameters\"][accept_indices, 1] , abc.ABC_dict[\"summaries\"][accept_indices, 0], s = 1, label = \"Accepted samples\", color = \"C6\", alpha = 0.5)\nax[0, 1].axhline(abc.summary[0, 0], color = 'black', linestyle = 'dashed', label = \"Summary of observed data\")\nax[0, 1].legend(frameon=False)\nax[0, 1].set_ylabel('First network output', labelpad = 0)\nax[0, 1].set_xlim([0, 10])\nax[0, 1].set_xticks([])\nax[1, 1].scatter(abc.ABC_dict[\"parameters\"][reject_indices, 1], abc.ABC_dict[\"summaries\"][reject_indices, 1], s = 1, alpha = 0.1, label = \"Rejected samples\", color = \"C3\")\nax[1, 1].scatter(abc.ABC_dict[\"parameters\"][accept_indices, 1] , abc.ABC_dict[\"summaries\"][accept_indices, 1], s = 1, label = \"Accepted samples\", color = \"C6\", alpha = 0.5)\nax[1, 1].axhline(abc.summary[0, 1], color = 'black', linestyle = 'dashed', label = \"Summary of observed data\")\nax[1, 1].legend(frameon=False)\nax[1, 1].set_ylabel('Second network output', labelpad = 0)\nax[1, 1].set_xlim([0, 10])\nax[1, 1].set_xlabel(\"$\\Sigma$\");\n```\n\n\n![png](images/output_84_0.png)\n\n\n\n```python\nfig, ax = plt.subplots(2, 2, figsize = (16, 10))\nplt.subplots_adjust(wspace = 0, hspace = 0)\nax[0, 0].plot(parameter_grid[0, :, 0], np.sum(analytic_posterior * (parameter_grid[0, 1, 0] - parameter_grid[0, 0, 0]), axis = 0), linewidth = 1.5, color = 'C2', label = \"Analytic marginalised posterior\")\nax[0, 0].plot(grid[0, :, 0], np.sum(gaussian_approximation * (grid[0, 1, 0] - grid[0, 0, 0]), axis = 0), color = \"C1\", label = \"Gaussian approximation\")\nax[0, 0].hist(abc.ABC_dict[\"parameters\"][accept_indices, 0], np.linspace(-10, 10, 100), histtype = u'step', density = True, linewidth = 1.5, color = \"C6\", label = \"ABC posterior\");\nax[0, 0].axvline(abc.MLE[0, 0], linestyle = \"dashed\", color = \"black\", label = \"Maximum likelihood estimate of mean\")\nax[0, 0].legend(frameon = False)\nax[0, 0].set_xlim([-10, 10])\nax[0, 0].set_ylabel('$\\\\mathcal{P}(\\\\mu|{\\\\bf d})$')\nax[0, 0].set_yticks([])\nax[0, 0].set_xticks([])\nax[1, 0].set_xlabel('$\\mu$');\nax[1, 0].set_ylim([0, 10])\nax[1, 0].set_ylabel('$\\Sigma$')\nax[1, 0].set_xlim([-10, 10])\nax[1, 0].scatter(abc.ABC_dict[\"parameters\"][accept_indices, 0], abc.ABC_dict[\"parameters\"][accept_indices, 1], color = \"C6\", s = 1, alpha = 0.5)\nax[1, 0].scatter(abc.ABC_dict[\"parameters\"][reject_indices, 0], abc.ABC_dict[\"parameters\"][reject_indices, 1], color = \"C3\", s = 1, alpha = 0.01)\nax[1, 0].contour(parameter_grid[0, :, 0], parameter_grid[1, 0, :], analytic_posterior, colors = \"C2\")\nax[1, 0].contour(grid[0, :, 0], grid[1, 0, :], gaussian_approximation, colors = \"C1\")\nax[1, 0].axvline(abc.MLE[0, 0], linestyle = \"dashed\", color = \"black\", label = \"Maximum likelihood estimate of mean\")\nax[1, 0].axhline(abc.MLE[0, 1], linestyle = \"dotted\", color = \"black\", label = \"Maximum likelihood estimate of covariance\")\nax[1, 1].hist(abc.ABC_dict[\"parameters\"][accept_indices, 1], np.linspace(0, 10, 100), histtype = u'step', orientation=\"horizontal\", density = True, linewidth = 1.5, color = \"C6\", label = \"ABC posterior\");\nax[1, 1].plot(np.sum(analytic_posterior * (parameter_grid[1, 0, 1] - parameter_grid[1, 0, 0]), axis = 1), parameter_grid[1, 0, :], linewidth = 1.5, color = 'C2', label = \"Analytic marginalised posterior\")\nax[1, 1].plot(np.sum(gaussian_approximation * (grid[1, 0, 1] - grid[1, 0, 0]), axis = 1), grid[1, 0, :], color = \"C1\", label = \"Gaussian approximation\")\nax[1, 1].axhline(abc.MLE[0, 1], linestyle = \"dotted\", color = \"black\", label = \"Maximum likelihood estimate of covariance\")\nax[1, 1].legend(frameon = False)\nax[1, 1].set_ylim([0, 10])\nax[1, 1].set_xlabel('$\\\\mathcal{P}(\\\\Sigma|{\\\\bf d})$')\nax[1, 1].set_xticks([])\nax[1, 1].set_yticks([])\nax[0, 1].axis(\"off\");\n```\n\n\n![png](images/output_85_0.png)\n\n\nWe now get samples from the posterior disrtibution which is not too far from the analytic posterior, and is at least unbiased. However, many samples are rejected to achieve this, and the rejection is defined somewhat arbitrarily, making it very computationally heavy and uncertain. We can improve on this using a PMC.\n\n## PMC-ABC\nPopulation Monte Carlo ABC is a way of reducing the number of draws by first sampling from a prior, accepting the closest 75% of the samples and weighting all the rest of the samples to create a new proposal distribution. The furthest 25% of the original samples are redrawn from the new proposal distribution. The furthest 25% of the simulation summaries are continually rejected and the proposal distribution updated until the number of draws needed accept all the 25% of the samples is much greater than this number of samples. This ratio is called the criterion.\n\nIf we want 1000 samples from the approximate distribution at the end of the PMC we need to set `posterior = 1000`. The initial random draw (as in ABC above) initialises with `draws`, the larger this is the better proposal distribution will be on the first iteration.\n\nThe `PMC` can be continued by running again with a smaller criterion.\n\n\n```python\nabc.PMC(draws = 2000, posterior = 2000, criterion = 0.01, at_once = True, save_sims = None, MLE = True)\n```\n\n iteration = 28, current criterion = 0.008327330715775712, total draws = 982296, \u03f5 = 0.432746022939682..\n\nTo restart the PMC from scratch then one can run\n```python\nabc.PMC(draws = 1000, posterior = 1000, criterion = 0.01, at_once = True, save_sims = None, restart = True)\n```\n\n\n```python\nfig, ax = plt.subplots(2, 2, figsize = (14, 10))\nplt.subplots_adjust(hspace = 0, wspace = 0.2)\nax[0, 0].scatter(abc.PMC_dict[\"parameters\"][:, 0] , abc.PMC_dict[\"summaries\"][:, 0], s = 1, label = \"Accepted samples\", color = \"C4\", alpha = 0.5)\nax[0, 0].axhline(abc.summary[0, 0], color = 'black', linestyle = 'dashed', label = \"Summary of observed data\")\nax[0, 0].legend(frameon=False)\nax[0, 0].set_ylabel('First network output', labelpad = 0)\nax[0, 0].set_xlim([-10, 10])\nax[0, 0].set_xticks([])\nax[1, 0].scatter(abc.PMC_dict[\"parameters\"][:, 0], abc.PMC_dict[\"summaries\"][:, 1], s = 1, alpha = 0.5, label = \"Accepted samples\", color = \"C4\")\nax[1, 0].legend(frameon=False)\nax[1, 0].set_ylabel('Second network output', labelpad = 0)\nax[1, 0].set_xlim([-10, 10])\nax[1, 0].set_xlabel(\"$\\mu$\")\nax[0, 1].scatter(abc.PMC_dict[\"parameters\"][:, 1], abc.PMC_dict[\"summaries\"][:, 0], s = 1, alpha = 0.5, label = \"Accepted samples\", color = \"C4\")\nax[0, 1].axhline(abc.summary[0, 0], color = 'black', linestyle = 'dashed', label = \"Summary of observed data\")\nax[0, 1].legend(frameon=False)\nax[0, 1].set_ylabel('First network output', labelpad = 0)\nax[0, 1].set_xlim([0, 10])\nax[0, 1].set_xticks([])\nax[1, 1].scatter(abc.PMC_dict[\"parameters\"][:, 1], abc.PMC_dict[\"summaries\"][:, 1], s = 1, alpha = 0.5, label = \"Accepted samples\", color = \"C4\")\nax[1, 1].axhline(abc.summary[0, 1], color = 'black', linestyle = 'dashed', label = \"Summary of observed data\")\nax[1, 1].legend(frameon=False)\nax[1, 1].set_ylabel('Second network output', labelpad = 0)\nax[1, 1].set_xlim([0, 10])\nax[1, 1].set_xlabel(\"$\\Sigma$\");\n```\n\n\n![png](images/output_90_0.png)\n\n\n\n```python\nfig, ax = plt.subplots(2, 2, figsize = (16, 10))\nplt.subplots_adjust(wspace = 0, hspace = 0)\nax[0, 0].plot(parameter_grid[0, :, 0], np.sum(analytic_posterior * (parameter_grid[0, 1, 0] - parameter_grid[0, 0, 0]), axis = 0), linewidth = 1.5, color = 'C2', label = \"Analytic marginalised posterior\")\nax[0, 0].plot(grid[0, :, 0], np.sum(gaussian_approximation * (grid[0, 1, 0] - grid[0, 0, 0]), axis = 0), color = \"C1\", label = \"Gaussian approximation\")\nax[0, 0].hist(abc.ABC_dict[\"parameters\"][accept_indices, 0], np.linspace(-10, 10, 100), histtype = u'step', density = True, linewidth = 1.5, color = \"C6\", alpha = 0.3, label = \"ABC posterior\");\nax[0, 0].hist(abc.PMC_dict[\"parameters\"][:, 0], np.linspace(-10, 10, 100), histtype = u'step', density = True, linewidth = 1.5, color = \"C4\", label = \"PMC posterior\");\nax[0, 0].axvline(abc.MLE[0, 0], linestyle = \"dashed\", color = \"black\", label = \"Maximum likelihood estimate of mean\")\nax[0, 0].legend(frameon = False)\nax[0, 0].set_xlim([-10, 10])\nax[0, 0].set_ylabel('$\\\\mathcal{P}(\\\\mu|{\\\\bf d})$')\nax[0, 0].set_yticks([])\nax[0, 0].set_xticks([])\nax[1, 0].set_xlabel('$\\mu$');\nax[1, 0].set_ylim([0, 10])\nax[1, 0].set_ylabel('$\\Sigma$')\nax[1, 0].set_xlim([-10, 10])\nax[1, 0].scatter(abc.ABC_dict[\"parameters\"][accept_indices, 0], abc.ABC_dict[\"parameters\"][accept_indices, 1], color = \"C6\", s = 1, alpha = 0.2)\nax[1, 0].scatter(abc.PMC_dict[\"parameters\"][:, 0], abc.PMC_dict[\"parameters\"][:, 1], color = \"C4\", s = 1, alpha = 0.7)\nax[1, 0].contour(parameter_grid[0, :, 0], parameter_grid[1, 0, :], analytic_posterior, colors = \"C2\")\nax[1, 0].contour(grid[0, :, 0], grid[1, 0, :], gaussian_approximation, colors = \"C1\")\nax[1, 0].axvline(abc.MLE[0, 0], linestyle = \"dashed\", color = \"black\", label = \"Maximum likelihood estimate of mean\")\nax[1, 0].axhline(abc.MLE[0, 1], linestyle = \"dotted\", color = \"black\", label = \"Maximum likelihood estimate of covariance\")\nax[1, 1].hist(abc.ABC_dict[\"parameters\"][accept_indices, 1], np.linspace(0, 10, 100), histtype = u'step', orientation=\"horizontal\", density = True, linewidth = 1.5, color = \"C6\", alpha = 0.3, label = \"ABC posterior\");\nax[1, 1].plot(np.sum(analytic_posterior * (parameter_grid[1, 0, 1] - parameter_grid[1, 0, 0]), axis = 1), parameter_grid[1, 0, :], linewidth = 1.5, color = 'C2', label = \"Analytic marginalised posterior\")\nax[1, 1].plot(np.sum(gaussian_approximation * (grid[1, 0, 1] - grid[1, 0, 0]), axis = 1), grid[1, 0, :], color = \"C1\", label = \"Gaussian approximation\")\nax[1, 1].hist(abc.PMC_dict[\"parameters\"][:, 1], np.linspace(0, 10, 100), histtype = u'step', orientation=\"horizontal\", density = True, linewidth = 1.5, color = \"C4\", label = \"PMC posterior\");\nax[1, 1].axhline(abc.MLE[0, 1], linestyle = \"dotted\", color = \"black\", label = \"Maximum likelihood estimate of covariance\")\nax[1, 1].legend(frameon = False)\nax[1, 1].set_ylim([0, 10])\nax[1, 1].set_xlabel('$\\\\mathcal{P}(\\\\Sigma|{\\\\bf d})$')\nax[1, 1].set_xticks([])\nax[1, 1].set_yticks([])\nax[0, 1].axis(\"off\");\n```\n\n\n![png](images/output_91_0.png)\n\n\nWe can see that the IMNN can recover great posteriors even when the data is extremely far from the fiducial parameter value at which the network was trained! Woohoo - give yourself a pat on the back!\n\n\n", "description_content_type": "text/markdown", "docs_url": null, "download_url": "", "downloads": { "last_day": -1, "last_month": -1, "last_week": -1 }, "home_page": "https://github.com/tomcharnock/information_maximiser.git", "keywords": "", "license": "", "maintainer": "", "maintainer_email": "", "name": "IMNN", "package_url": "https://pypi.org/project/IMNN/", "platform": "", "project_url": "https://pypi.org/project/IMNN/", "project_urls": { "Homepage": "https://github.com/tomcharnock/information_maximiser.git" }, "release_url": "https://pypi.org/project/IMNN/0.1rc1/", "requires_dist": [ "tensorflow (>=1.12.0)", "tqdm (>=4.29.0)", "numpy (>=1.16.0)", "scipy (>=1.2.0)" ], "requires_python": ">=3.6", "summary": "Using neural networks to extract sufficient statistics from data by maximising the Fisher information", "version": "0.1rc1" }, "last_serial": 4911106, "releases": { "0.1.dev0": [ { "comment_text": "", "digests": { "md5": "3e1977e6bac0f3329c48ca0f15ccf10b", "sha256": "2503e8aeb9bf4413fb4c8fbe7ce87c6ab86544afa1685ffcec3ae2128d63add0" }, "downloads": -1, "filename": "IMNN-0.1.dev0-py3-none-any.whl", "has_sig": false, "md5_digest": "3e1977e6bac0f3329c48ca0f15ccf10b", "packagetype": "bdist_wheel", "python_version": "py3", "requires_python": ">=3.6", "size": 35250, "upload_time": "2019-01-23T17:36:04", "url": "https://files.pythonhosted.org/packages/d6/4d/92e3b83a60d39127f8af8f5597448057de8060f39822d78a352001a306e8/IMNN-0.1.dev0-py3-none-any.whl" }, { "comment_text": "", "digests": { "md5": "be51168f81db7066ac4b0356b000ba74", "sha256": "e718b51b8f67365ea0069240091040807259d42fca158a3f073572d22461c573" }, "downloads": -1, "filename": "IMNN-0.1.dev0.tar.gz", "has_sig": false, "md5_digest": "be51168f81db7066ac4b0356b000ba74", "packagetype": "sdist", "python_version": "source", "requires_python": ">=3.6", "size": 49339, "upload_time": "2019-01-23T17:36:06", "url": "https://files.pythonhosted.org/packages/0d/ad/ea5a670be10cd97bf81eb4033bdd8ca963c4656f1f06322d4a4094dd119a/IMNN-0.1.dev0.tar.gz" } ], "0.1.dev2": [ { "comment_text": "", "digests": { "md5": "2fe58c400d0a3de14f7f16ed0f04f6ee", "sha256": "4eeb4ed8d347b9eedf72356be589101d9e857eaa4a57030fba528e0ba022a76a" }, "downloads": -1, "filename": "IMNN-0.1.dev2-py3-none-any.whl", "has_sig": false, "md5_digest": "2fe58c400d0a3de14f7f16ed0f04f6ee", "packagetype": "bdist_wheel", "python_version": "py3", "requires_python": ">=3.6", "size": 34996, "upload_time": "2019-01-28T11:07:11", "url": "https://files.pythonhosted.org/packages/d4/27/d7e51b66acf3c7b8b3a2f8077f7e012d83cd9809281f4220cd794bb66131/IMNN-0.1.dev2-py3-none-any.whl" }, { "comment_text": "", "digests": { "md5": "2c0008b8d90d369680cb1094e126fdfd", "sha256": "f962aa006ba1776a317ddae0b5fe836e5840590b5cfbce4ef89742e2b33988c0" }, "downloads": -1, "filename": "IMNN-0.1.dev2.tar.gz", "has_sig": false, "md5_digest": "2c0008b8d90d369680cb1094e126fdfd", "packagetype": "sdist", "python_version": "source", "requires_python": ">=3.6", "size": 48651, "upload_time": "2019-01-28T11:07:13", "url": "https://files.pythonhosted.org/packages/bf/13/bf5345e3b85e425735949638c8e75c2b13c0c746cb04543008aaebf21533/IMNN-0.1.dev2.tar.gz" } ], "0.1.dev3": [ { "comment_text": "", "digests": { "md5": "a51c6ead7513f57e0722cdf02e268e01", "sha256": "5a1dc1d06db2e7809904a3926b457cf9d7669a4e3bea40d551d4d1dadfc7d032" }, "downloads": -1, "filename": "IMNN-0.1.dev3-py3-none-any.whl", "has_sig": false, "md5_digest": "a51c6ead7513f57e0722cdf02e268e01", "packagetype": "bdist_wheel", "python_version": "py3", "requires_python": ">=3.6", "size": 35003, "upload_time": "2019-01-28T11:58:49", "url": "https://files.pythonhosted.org/packages/2f/97/ad16b748782bb56040583ed3c4e27a12c8241c4e8c2abbecaad0c5557c38/IMNN-0.1.dev3-py3-none-any.whl" }, { "comment_text": "", "digests": { "md5": "d9aae80c21d9b6902d2d5e95060ae409", "sha256": "4499e43651c2585e76a9fc5eed5e1ee55b3c606eed3c59e7fb35a2d9f76795d9" }, "downloads": -1, "filename": "IMNN-0.1.dev3.tar.gz", "has_sig": false, "md5_digest": "d9aae80c21d9b6902d2d5e95060ae409", "packagetype": "sdist", "python_version": "source", "requires_python": ">=3.6", "size": 48653, "upload_time": "2019-01-28T11:58:51", "url": "https://files.pythonhosted.org/packages/2e/fe/6f6e666d7a444d755250bf45c035bc6c2e2959a36feb6218ea58c013e22e/IMNN-0.1.dev3.tar.gz" } ], "0.1.dev4": [ { "comment_text": "", "digests": { "md5": "76ecccc76a96f13e18e47b83729c4b7a", "sha256": "ff814a7096f97c3a5aa71d87492cc88c610a5707830e44a941954eba1e4f4b0d" }, "downloads": -1, "filename": "IMNN-0.1.dev4-py3-none-any.whl", "has_sig": false, "md5_digest": "76ecccc76a96f13e18e47b83729c4b7a", "packagetype": "bdist_wheel", "python_version": "py3", "requires_python": ">=3.6", "size": 35002, "upload_time": "2019-01-28T12:13:11", "url": "https://files.pythonhosted.org/packages/ab/36/c67ad160b987c50110e0960d48b688bad0445e76d90a8849ceb5c5d6bdcb/IMNN-0.1.dev4-py3-none-any.whl" }, { "comment_text": "", "digests": { "md5": "39f419fc52499f8ebe0e03ba1208f1e2", "sha256": "4b87c285b6a7bc54283d20fe8d3cfdcd8ab8c39274163c08cf8ddc1b546304fc" }, "downloads": -1, "filename": "IMNN-0.1.dev4.tar.gz", "has_sig": false, "md5_digest": "39f419fc52499f8ebe0e03ba1208f1e2", "packagetype": "sdist", "python_version": "source", "requires_python": ">=3.6", "size": 48653, "upload_time": "2019-01-28T12:13:13", "url": "https://files.pythonhosted.org/packages/16/41/50978445be50fae20ccaacccbb77e3188cf6cd9f39853b45f38e098ebdb8/IMNN-0.1.dev4.tar.gz" } ], "0.1.dev5": [ { "comment_text": "", "digests": { "md5": "18296fe144d015772ca5576c1a9a3e3c", "sha256": "9957217601cd6d7c2818cd12d6735c957e9a892f45b8b10830baf27c129485ba" }, "downloads": -1, "filename": "IMNN-0.1.dev5-py3-none-any.whl", "has_sig": false, "md5_digest": "18296fe144d015772ca5576c1a9a3e3c", "packagetype": "bdist_wheel", "python_version": "py3", "requires_python": ">=3.6", "size": 35001, "upload_time": "2019-01-28T12:17:08", "url": "https://files.pythonhosted.org/packages/ef/96/6bba42cac447e1623a64d3a79d6f78c298e0784a18b9a7e3e1e2693bae9a/IMNN-0.1.dev5-py3-none-any.whl" }, { "comment_text": "", "digests": { "md5": "3022e29661f800bf3312ef81ffce0cb1", "sha256": "6132855954e3d0ee4599c7c41076cb02f8bc061c41226e3630f0467a1f4374d3" }, "downloads": -1, "filename": "IMNN-0.1.dev5.tar.gz", "has_sig": false, "md5_digest": "3022e29661f800bf3312ef81ffce0cb1", "packagetype": "sdist", "python_version": "source", "requires_python": ">=3.6", "size": 48648, "upload_time": "2019-01-28T12:17:10", "url": "https://files.pythonhosted.org/packages/07/38/92cc32817d3452f2c9a3faccb85107509e02bd3e25a197aaf09f83431998/IMNN-0.1.dev5.tar.gz" } ], "0.1.dev6": [ { "comment_text": "", "digests": { "md5": "5e918e9e534ae7b7be2d08a536ccc9b8", "sha256": "f3b048958ac811e33de46d12317b0f2099da72670e5b50b6bc0fd5b883134444" }, "downloads": -1, "filename": "IMNN-0.1.dev6-py3-none-any.whl", "has_sig": false, "md5_digest": "5e918e9e534ae7b7be2d08a536ccc9b8", "packagetype": "bdist_wheel", "python_version": "py3", "requires_python": ">=3.6", "size": 34946, "upload_time": "2019-01-28T18:25:49", "url": "https://files.pythonhosted.org/packages/9f/65/01c081cb92464740725b698bf884210bb29936312f87c2a9bb19a4cbc07e/IMNN-0.1.dev6-py3-none-any.whl" }, { "comment_text": "", "digests": { "md5": "d8a17a51a224a45f05ad4394eb65131c", "sha256": "76c8e412a9bfa0a79d4fc55690fa98aa7762759a1e89d72d4671d51ede8ece40" }, "downloads": -1, "filename": "IMNN-0.1.dev6.tar.gz", "has_sig": false, "md5_digest": "d8a17a51a224a45f05ad4394eb65131c", "packagetype": "sdist", "python_version": "source", "requires_python": ">=3.6", "size": 48604, "upload_time": "2019-01-28T18:25:53", "url": "https://files.pythonhosted.org/packages/45/d1/97929f102dedc740ae0f985ee5cd40b4b8dc4cbe2d08040468429d01901c/IMNN-0.1.dev6.tar.gz" } ], "0.1.dev8": [ { "comment_text": "", "digests": { "md5": "b7ce6e5aa451b9219500449934154656", "sha256": "4da6851163337e0a355e399ed7482f571c3ee6b864e95113e959b4e785cad79f" }, "downloads": -1, "filename": "IMNN-0.1.dev8-py3-none-any.whl", "has_sig": false, "md5_digest": "b7ce6e5aa451b9219500449934154656", "packagetype": "bdist_wheel", "python_version": "py3", "requires_python": ">=3.6", "size": 38707, "upload_time": "2019-03-07T15:25:54", "url": "https://files.pythonhosted.org/packages/66/7a/9974dd11345297ccf8848bcb0fe23b87dba1c779da41cfa44fe785663e6c/IMNN-0.1.dev8-py3-none-any.whl" }, { "comment_text": "", "digests": { "md5": "e2b45deca9f2265d0ed5d54f9923fece", "sha256": "ee7add025a03cdaf48308c3f5d341771d4c751729970a283034998a073b887a5" }, "downloads": -1, "filename": "IMNN-0.1.dev8.tar.gz", "has_sig": false, "md5_digest": "e2b45deca9f2265d0ed5d54f9923fece", "packagetype": "sdist", "python_version": "source", "requires_python": ">=3.6", "size": 54410, "upload_time": "2019-03-07T15:25:56", "url": "https://files.pythonhosted.org/packages/63/c4/6a1254c4766211f4225b7999102c835fb79813594f6e1629dee963fabfc8/IMNN-0.1.dev8.tar.gz" } ], "0.1rc1": [ { "comment_text": "", "digests": { "md5": "6fbf225f0f3fb80b5004503f9242a754", "sha256": "4c2271a2d7d7d4c563bc15a98bafd3221486ceaf50ea28b25a1093498d60315d" }, "downloads": -1, "filename": "IMNN-0.1rc1-py3-none-any.whl", "has_sig": false, "md5_digest": "6fbf225f0f3fb80b5004503f9242a754", "packagetype": "bdist_wheel", "python_version": "py3", "requires_python": ">=3.6", "size": 37879, "upload_time": "2019-02-01T18:11:40", "url": "https://files.pythonhosted.org/packages/6e/45/fcef3827bbec6acf00af9c2e1dd77223ba03c0f6d7e673a83e7ec3162a07/IMNN-0.1rc1-py3-none-any.whl" }, { "comment_text": "", "digests": { "md5": "9bc2f2f3d09085b862335c9d6d388cca", "sha256": "8d810c18e7f4f5104f0968ace045712253b3b1abfdfe35c3858d9b67667bde63" }, "downloads": -1, "filename": "IMNN-0.1rc1.tar.gz", "has_sig": false, "md5_digest": "9bc2f2f3d09085b862335c9d6d388cca", "packagetype": "sdist", "python_version": "source", "requires_python": ">=3.6", "size": 54298, "upload_time": "2019-02-01T18:11:43", "url": "https://files.pythonhosted.org/packages/e7/71/31bb47c26cbf7bef9d91a8a0c2a03f687ece2f422334070beeaff0eb21de/IMNN-0.1rc1.tar.gz" } ] }, "urls": [ { "comment_text": "", "digests": { "md5": "6fbf225f0f3fb80b5004503f9242a754", "sha256": "4c2271a2d7d7d4c563bc15a98bafd3221486ceaf50ea28b25a1093498d60315d" }, "downloads": -1, "filename": "IMNN-0.1rc1-py3-none-any.whl", "has_sig": false, "md5_digest": "6fbf225f0f3fb80b5004503f9242a754", "packagetype": "bdist_wheel", "python_version": "py3", "requires_python": ">=3.6", "size": 37879, "upload_time": "2019-02-01T18:11:40", "url": "https://files.pythonhosted.org/packages/6e/45/fcef3827bbec6acf00af9c2e1dd77223ba03c0f6d7e673a83e7ec3162a07/IMNN-0.1rc1-py3-none-any.whl" }, { "comment_text": "", "digests": { "md5": "9bc2f2f3d09085b862335c9d6d388cca", "sha256": "8d810c18e7f4f5104f0968ace045712253b3b1abfdfe35c3858d9b67667bde63" }, "downloads": -1, "filename": "IMNN-0.1rc1.tar.gz", "has_sig": false, "md5_digest": "9bc2f2f3d09085b862335c9d6d388cca", "packagetype": "sdist", "python_version": "source", "requires_python": ">=3.6", "size": 54298, "upload_time": "2019-02-01T18:11:43", "url": "https://files.pythonhosted.org/packages/e7/71/31bb47c26cbf7bef9d91a8a0c2a03f687ece2f422334070beeaff0eb21de/IMNN-0.1rc1.tar.gz" } ] }