A Two-Parameter Family of Non-Parametric, Deformed Exponential Manifolds
A Two-Parameter Family of Non-Parametric, Deformed Exponential Manifolds
A Two-Parameter Family of Non-Parametric, Deformed Exponential Manifolds
https://doi.org/10.1007/s41884-022-00079-5
RESEARCH PAPER
Nigel J. Newton1
Abstract
We construct a new family of non-parametric statistical manifolds by means of a two-
parameter class of deformed exponential functions, that includes functions with power-
law, linear and sublinear rates of growth. The manifolds are modelled on weighted,
mixed-norm Sobolev spaces that are especially suited to this purpose, in the sense that
an important class of nonlinear superposition operators (those used in the construction
of divergences and tensors) act continuously on them. We analyse variants of these
operators, that map into “subordinate” Sobolev spaces, and evaluate the associated
gain in regularity. With appropriate choice of parameter values, the manifolds support
a large variety of the statistical divergences and entropies appearing in the literature,
as well as their associated tensors, eg. the Fisher-Rao metric. Manifolds of finite
measures and probability measures are constructed; the latter are shown to be smoothly
embedded submanifolds of the former.
1 Introduction
Great progress has been made during the last five decades on the theory of information
geometry, and its application in many scientific fields. The fundamental parametric
theory is well developed, and is treated pedagogically in a number of texts (See, for
example, [1, 3, 5, 9, 12]). The non-parametric theory, on the other hand, is largely
B Nigel J. Newton
nigeljnewton@gmail.com
123
S172 N. J. Newton
to be found in a series of research papers. A notable exception is the text [2], which
treats parametric and non-parametric theories in a unified way. The step from the
parametric to the non-parametric setting is not an easy one, since it introduces the
infinite-dimensional spaces of Functional Analysis.
The parametric exponential model is arguably the nucleus of the subject. Its exten-
sion to the non-parametric setting was accomplished by G. Pistone and his co-workers
in the fundamental series of papers [4, 7, 18, 19]. The manifolds there constructed are
“maximally inclusive” in a precise sense, and various statistical divergences, including
the Amari α-divergences, for α in the interval [−1, 1], are smooth on them. As with
parametric exponential manifolds, the log of the density is used as a chart. This requires
a model space with a particularly strong topology: the exponential Orlicz space, which
has a number of disadvantages. Since its publication, several variations of the expo-
nential Orlicz manifold have been developed. In [11], the exponential function was
replaced by the Tsallis q-deformed exponential, which has an important interpretation
in statistical mechanics. (See [20], and Chapter 7 in [13].) The model space used is
L ∞ , which significantly restricts membership of the manifold consructed. A large
class of deformed exponential functions (the “ϕ-functions”) was used in [21, 22] to
construct inclusive manifolds of probability measures, in which the model spaces are
Musielak-Orlicz spaces.
The constructions in these references begin with the tangent space at a generic
point, P, of a set of measures. A representation of tangent vectors, derived from the
(deformed) logarithm, is then used to construct a local chart, which naturally maps
to a model space defined in terms of P. However, the model spaces required in this
approach can be difficult to use in practice. A different approach was taken in [14, 15],
where a global chart was used with the specific deformed logarithm logd = y−1+log y
to construct an inclusive manifold modelled on Lebesgue L p spaces (including the
Hilbert space L 2 ). The corresponding deformed exponential has bounded derivatives
of all orders; a property that has a number of advantages. Both the probability density
and its (non-deformed) log (objects of central importance to information geometry)
belong to the model space and, considered as superposition operators mapping into
this space, are continuous.
The sample space in all these manifolds is an abstract probability space: a set
of “outcomes”, a class of measurable subsets of these outcomes, and a probability
measure attaching a number between 0 and 1 to each subset. This has the advantage
of generality: the sample space could be Rd , or the path space of a stochastic pro-
cess,…However, topologies, metrics and linear structures on the sample space play
important roles in most applications, including the theory of partial differential equa-
tions. A natural direction for research in the non-parametric theory is to specialise
the manifolds outlined above to such problems by incorporating the topology of the
sample space in the manifolds. One way of achieving this is to use model spaces of
Sobolev type. This was carried out in the context of the exponential Orlicz manifold
in [10], where it was applied to the spatially homogeneous Boltzmann equation. It
was carried out in the context of the L p manifolds in [17]. The resulting fusion of
information and sample space topologies is mutually beneficial. For example, it was
shown in [17] that log-Sobolev embedding strengthens the topology of the raw L p
manifolds in a useful way.
123
A two-parameter family of non-parametric... S173
This paper takes the approach of [14, 15, 17] further, by constructing a new class of
non-parametric manifolds based on a two-parameter deformed exponential, dubbed
the η-exponential. The paper has two primary aims: (i) to provide manifolds on which
a wider class of divergences and entropy functions can be accommodated; (ii) to refine
the Sobolev space methods of [17] in order to increase the degree of smoothness they
confer on these quantities. Regarding the first aim, there is a vast literature on the
importance of different divergences and entropies to particular branches of science,
a full review of which is beyond the scope of this article. Let us mention, however,
the special volume (13) of Entropy on applications of the Tsallis q-divergences and
entropies. The review article by Tsallis [20], in particular, cites many applications in
which the q parameter should be strictly greater than or strictly less than the value
q = 1 of the Boltzmann-Gibbs theory. In the context of Amari’s α-divergences, this
translates to values of α both greater than and less than 1. The author was motivated,
in particular, by the study of multi-objective measures of error in nonlinear filtering
that lead naturally to divergences with α < −1, [16].
The two-parameter η-exponential we use corresponds to a deformed exponential
introduced in [8], but is reparametrised into the Amari setting. The manifolds accom-
modate α-divergences and entropies over a range of α values, but are especially suited
to those for which α ∈ [η− , η+ ], for the chosen parameters −∞ < η− < 1 ≤ η+ <
∞. The parameter values ±1 yield the linear-growth deformed exponential of [14,
15, 17]; values of η− other than −1 yield deformed exponentials with power law or
sublinear growth. For a more detailed account of deformed exponentials, and their use
in Statistical Mechanics, the reader is referred to [13].
The paper is structured as follows. Section 2 introduces the model Sobolev spaces,
expanding considerably on the material in [17]. It also introduces a new class of “subor-
dinate” Sobolev spaces, which are later used in the analysis of superposition operators
derived from Amari’s α-embedding maps. (The latter can be used in the analsis of
divergences and entropies.) Section 3 introduces the two-parameter deformed expo-
nential and uses it to construct manifolds of finite measures. Section 4 then shows that
the subsets of probability measures are smoothly embedded submanifolds of those of
Sect. 3.
123
S174 N. J. Newton
√
such that lim z↓0 θ (z) < ∞, − θ is convex and, for some t ∈ (1, 2],
0 if z = 0
θ (z) = , where z 0 ≥ 0, and c ∈ R. (2)
c + zt if z ≥ z 0
where C ∈ R is such that μ(Rd ) = 1. (Some examples are given in [17] including
the Gaussian case, in which c = z 0 = 0 and t = 2.)
The model spaces used in the construction of the manifolds comprise measurable
functions defined on Rd having weak derivatives of various orders, that belong to
λ λ λ
the Lebesgue
λ
spacesλ L = L (μ) for various exponents λ. ( f ∈ L if and only if
Eμ | f | := | f (x)| μ(d x) < ∞.) Under (E), μ is a product measure and the model
spaces admit a log-Sobolev embedding result.
Let C ∞ (Rd ; R) be the space of continuous functions with continuous partial deriva-
tives of all orders, and let C0∞ (Rd ; R) be the subspace of those functions having
compact support. For any λ ∈ [2, ∞), and any 0 ≤ k ≤ λ, the space W k,λ is the
mixed-norm Sobolev space comprising measurable functions a ∈ L λ that have weak
partial derivatives up to order k, those of order i belonging to the Lebesgue space
L λ/i . We shall also use the “subordinate” spaces W k,λ;l , for certain integer values of
l. Let λ◦ be the following Lebesgue exponent: if (E) holds and k ≥ 1 then λ◦ = λ,
otherwise λ◦ = λ − for some 0 < << 1. For 1 ≤ l ≤ λ◦ , the space W k,λ;l com-
◦
prises measurable functions a ∈ L λ /l that have weak partial derivatives up to order
◦
kl := min{k, λ◦ − l}, those of order i belonging to the Lebesgue space L λ /(l+i) .
(For convenience W k,λ;0 := W k,λ .)
Model spaces with more general derivative structures were developed in [17],
including fixed-norm spaces; however, the Lebesgue exponents in W k,λ and its subor-
dinates are especially suited to the deformed logarithms used here. Weak derivatives
are defined in the usual way: for any ϕ ∈ C0∞ (Rd ; R),
∂a
(∂i a) ϕ d x = − a (∂i ϕ) d x where ∂i a is shorthand for . (4)
∂ xi
D s a = ∂1s1 · · · ∂dsd a
a λW k,λ = a λL λ + Ds a λ
L λ/(|s|)
s∈Sk
123
A two-parameter family of non-parametric... S175
λ λ λ
a W k,λ;l
= a ◦
L λ /l
+ Ds a ◦
L λ /(l+|s|)
, 1 ≤ l ≤ λ◦ . (5)
s∈Skl
Theorem 1 (i) For any 1 ≤ l ≤ λ◦ , W k,λ;l and W k,λ are Banach spaces with
respect to the norms in (5);
(ii) For any 1 ≤ l ≤ λ◦ , C0∞ (Rd ; R) is dense in W k,λ;l and W k,λ .
Proof Both parts are proved in Theorem 1 and Lemmas 1 and 2 in [17]. (The only
property required of the log density, lr , is its continuity.) Part (ii) is a consequence of
the non-increasing nature of the Lebesgue exponents in W k,λ and W k,λ;l .
˜
W k,λ;0 := W k,λ ≺ W k,λ;l ≺ W k,λ;l , where 1 ≤ l < l˜ ≤ λ◦ . (6)
The spaces W k,λ will be used as model spaces for manfolds of finite measures in Sect.
3, and centred versions of them, as model spaces for manifolds of probability measures
in Sect. 4. The following theorem derives some properties of particular types of map
acting on them. It will be used in the sequel.
Theorem 2 (i) For any ψ ∈ C ∞ (R; R) having bounded derivatives of all orders,
the nonlinear superposition operator : W k,λ → W k,λ , defined by (a)(x) =
ψ(a(x)), is continuous. Its spatial derivatives are given by the Faà di Bruno
formula
D s (a) = Fs (a) := ψ (|π |) (a) D σ a, (7)
π ∈ (s) σ ∈π
where π = {σ1 , . . . , σ|π | ∈ S|s| ; 1 ≤ |σ j | ≤ |s|, j σ j = s} is a partition of s,
|π | is the cardinal of π , and (s) is the set of all such partitions.
(ii) For appropriate Banach spaces of functions on Rd , A, B and C, let A,B :
A × B → C be defined by A,B (a, b)(x) = a(x)b(x). A,B is a well defined,
continuous, bilinear map in the following instances:
123
S176 N. J. Newton
(i)
l (u 1 , . . . , u i )(x) = ψ (i) (a(x))u 1 (x) · · · u i (x), for 1 ≤ i ≤ l. (10)
Lemma 1 Let a ∈ W k,λ , let (an = a) and (bn ) be sequences converging to a in the
sense of W k,λ , and let B be the unit ball of W k,λ . For any continuous, bounded function
f : R → R,
an − a −1
W k,λ
( f (bn ) − f (a))(an − a) L λ◦ → 0 (11)
and supu∈B ( f (an ) − f (a))u L λ◦ → 0. (12)
|an − a|λ
1/λ
( f (bn ) − f (a))(an − a) Lλ
◦ ≤ f (bn ) − f (a) A E , (13)
where A and E are the following Banach spaces. If (E) does not hold or k = 0, then
◦
A = L λλ / and E = L 1 ; (13) is then the classical Hölder inequality on dual Lebesgue
spaces. If (E) holds and k ≥ 1 then A = exp L 1/β (μ) and E = L 1 logβ L(μ), where
β = (t − 1)t; these are Orlicz spaces based on the complementary Young functions:
z z
G β (z) = exp(y 1/β
) − 1 dy and Fβ (z) = logβ (y + 1) dy. (14)
0 0
It follows from a log-Sobolev embedding theorem (see, for example, Theorem 7.12
in [6]), and the following representation for first-order weak derivatives
that |an − a|λ E ≤ K an − a λW k,λ , for some K < ∞. (See the proof of Lemma
4 in [17] for fuller details.) Now f (bn ) − f (a) is bounded and converges to zero in
probability, and so it converges to zero in the sense of A (with either definition of A),
and (11) follows. A similar argument establishes (12).
123
A two-parameter family of non-parametric... S177
˜
◦ /(l+l)
in the sense of L λ . Furthermore, for any s ∈ Skl+l˜ ,
s!
Hs ( f n , gn ) − Hs (a, gn ) = (D σ f n − D σ a)D s−σ gn ,
σ ≤s
σ !(s − σ )!
Hs ( f n , gn ) − Hs (a, gn ) Lλ
◦ /(|s|+l+l)
˜ → 0.
123
S178 N. J. Newton
−1
an − a R
W k,λ i
→ 0 for i = 1, 2, 3.
We can now apply the Leibniz and Faà di Bruno formulae to D s (ψ (1) (a)u) to show
that it is equal to (D s 1 )(1) u, and is continuous in (a, u). This proves (10) for the
case l = 1.
We now proceed by induction on l. Suppose that (10) is correct for l; then, since
W k,λ;l ≺ W k,λ;l+1 , l+1 is of class C l , with derivatives as in (10). Setting l,n =
ψ (l) (an ) − ψ (l) (a) − ψ (l+1) (a)(an − a), we can apply the arguments used above on
n of (15), and the fact that
(l)
where B is the unit ball of W k,λ , to show that l+1 is of class C 1 .
The charts of the statistical manifolds developed here are based on a two-parameter
family of η-deformed logarithms. These are defined in terms of Amari’s α-logarithms,
α : (0, ∞) → R:
2
y (1−α)/2 − 1 if α = 1
α (y) = 1−α (16)
log y if α = 1
The η-logarithm is defined for η = (η− , η+ ), (−∞ < η− < 1 ≤ η+ < ∞) as:
123
A two-parameter family of non-parametric... S179
The deformed logarithm log(−1,+1) is that used to construct a family of highly inclusive
statistical manifolds in [14, 15, 17]. Setting κ = (η+ −η− )/4 and r = (2−η− −η+ )/4,
logη is essentially the two-parameter (κ, r )-logarithm defined in [8]. The different
weightings for the two components of the deformed logarithm used here have no
effect on the membership or properties of the manifolds, but are more convenient in
the context of information geometry.
Now inf y logη y = −∞, sup y logη y = +∞, and logη ∈ C ∞ ((0, ∞); R) with
strictly positive first derivative y −(1+η− )/2 +y −(1+η+ )/2 and so, according to the inverse
function theorem, logη is a diffeomorphism from (0, ∞) onto R. Let expη be its inverse.
This can be thought of as a deformed exponential function. Using f (n) to denote the
n-th derivative of a function f , we have
δ/2
1 [1+η+ ]/2 expη [1+η− ]/2
exp(1)
η = δ/2
expη = δ/2
expη , (18)
1 + expη 1 + expη
If η− = −1 (as is the case in [14, 15, 17]) then the exponent is 1, and expη has linear
growth; otherwise it has sublinear or power law growth. The exponent itself grows
without limit as η− approaches 1 from below.
For any α ∈ R, the Amari embedding map ξα : R → R is as follows:
These maps can be used in the analysis of a large class of divergences, and their
associated tensors. The maps ξ−1 and ξ+1 are especially important since they will
represent the density of a measure and its log, respectively. The following lemma
establishes some of their properties in the context of the η-log.
Lemma 2 (i) For any α ∈ R, ξα ∈ C ∞ (R; R); its derivatives are
f α,i (expη )
ξα(i) = δ/2 (2i−1)
for 1 ≤ i < ∞, (22)
1 + expη
(1)
f α,i+1 (y) = (y δ/2 + y δ )y (η− +1)/2 f α,i (y) − (i − 1/2)δ y δ y (η− −1)/2 f α,i (y)
(1)
= (1 + y δ/2 )y (η+ +1)/2 f α,i (y) − (i − 1/2)δ y (2η+ −η− −1)/2 f α,i (y).
(23)
123
S180 N. J. Newton
1−α
lim sup z −βi |ξα(i) (z)| < ∞, where βi := − i. (24)
z→∞ 1 − η−
(i)
(iv) If α ∈ [η− , η+ ], then ξα is bounded for all 1 ≤ i < ∞.
Proof Part (i) is straightforward.
(1)
The power of y in f α,1 (y) is (η+ − α)/2 = δ/2 + (η− − α)/2, and so ξα (z) grows
as expη (z)(η− −α)/2 for large z. It now follows from (20) that (24) is correct for i = 1.
That it is also correct for i ≥ 2 follows from an induction argument based on the first
representation of f α,i+1 in (23).
If α ≤ η+ then the power of y in f α,1 (y) is greater than or equal to 0, and so
(25) is correct for i = 1. That it is also correct for i ≥ 2 follows from an induction
argument based on the second representation of f α,i+1 in (23). Part (iv) is an immediate
consequence of parts (ii) and (iii).
Let θ0 := (1 − η− )λ/2. We assume that θ0 > 1; it then follows from (20) and (24)
that, for any 0 ≤ i < λ/θ0 and any a ∈ L λ ,
exp(i)
η (a) ∈ L
θ0 λ/(λ−iθ0 )
. (26)
Proof It follows from (M2) that, for any P ∈ M, φ(P) ∈ G. Suppose, conversely,
that a ∈ G; since expη (a) ∈ L 1 , we can define the finite measure P(d x) =
expη (a(x))μ(d x). Since expη is strictly positive, P satisfies (M1). That it also satis-
fies (M2) follows from the fact that logη expη (a) = a ∈ G. We have thus shown that
P ∈ M, and clearly φ(P) = a.
123
A two-parameter family of non-parametric... S181
Remark 1 This proposition shows that M is, in one sense, nothing more than a (whole)
Banach space. Manifold theory enters the picture with the introduction of base-point
dependent tensors such as the Fisher-Rao metric and Amari-Chentsov tensor on the
tangent bundle.
U (d x) = exp(1)
η (a(x))u(x)μ(d x), for some u ∈ G, (29)
(i)
α,l (a)(u 1 , . . . , u i )(x) = ξα(i) (a(x))u 1 (x) · · · u i (x). (31)
(ii) For any 1 ≤ θ < θ0 (as defined before (26)), the superposition operator
Expη,θ : G → L θ , defined by Expη,θ (a)(x) = expη (a(x)) is of class C λ−1 ,
with derivatives:
(i)
Expη,θ (u 1 , . . . , u i )(x) = exp(i)
η (a(x))u 1 (x) · · · u i (x). (32)
Proof Part (i) is a special case of Theorem 2(iii). Part (ii) can be proved in a similar way;
the essential differences are that the derivatives of expη are not necessarily bounded,
and the range space of the superposition operator has a weaker topology. Let (an ∈
G \ {a}) be a sequence converging to a in the sense of G. For any 1 ≤ i ≤ λ − 1 let
δn = exp(i) (i)
η (βn an + (1 − βn )a) − expη (a) for some 0 ≤ βn (x) ≤ 1.
123
S182 N. J. Newton
It follows from (26) and Hölder’s inequality that, for any u 1 , . . . , u i in the unit ball of
G,
n u 1 · · · u i−1 Lθ ≤ n Lγ and n u 1 · · · u i Lθ ≤ n u i Lγ ,
where γ := λθ/(λ − iθ ). In order to prove part (ii), it thus suffices to show that
−1
an − a G n Lγ → 0 and sup n u Lγ → 0. (34)
u G =1
(i)
According to (26) and the de la Vallée-Poussin theorem, expη (an )γ is uniformly
integrable. Now δn and n both converge to zero in measure, and so (34) follows from
the Lebesgue-Vitaly theorem.
Remark 2 (i) There is a vast choice of range spaces for superposition operators of this
type, each of which results in operators with different properties. Proposition 2 is
not intended to be exhaustive, but to cover some of the more interesting and useful
cases.
(ii) The case l = 0 is worth special mention since the domain and range spaces of
the superposition operators are then both G. If, for example, η− ≤ −1 then the
density p (= ξ−1 (a) + 1) belongs to the model space and varies continuously on
the manifold, as does the log of the density.
The superposition operators α,l can be used in the analysis of divergences and
entropies. This analysis was carried out for the α-divergences, α ∈ [−1, 1], in [15],
where it was shown that they are of class C l (M × M), for values of l dependent
on λ. Although we do not pursue these issues here, it is clear that similar methods
can be used with the manifolds of this paper for any α ∈ [η− , η+ ]. We would also
expect the (κ, r ) divergences of [8] to exhibit an equivalent degree of smoothness.
Divergences can be used to define various tensor fields on M, which depend naturally
on the superposition operators, α,l . In particular, the Fisher-Rao metric on M can be
expressed in terms of the maps (ξα , α ∈ [η− , η+ ]) in two different ways, according
to the value of η− :
(1) (1)
Eμ ξ0 (a) ξ0 (a)uv if η− ≤ 0
U , V P = (1) (1) (35)
Eμ expη (a)η− ξη− (a) ξη− (a)uv if η− > 0,
Proof The case η− ≤ 1 follows from a repeated application of Lemma 1, starting with
(1)
ψ = (ξ0 )2 . A similar technique can be applied if η− > 0; the essential difference is
123
A two-parameter family of non-parametric... S183
that, at each stage, we must use Hölder’s inequality and Proposition 2(ii) with θ = η−
to remove the term in expη .
Let M0 ⊂ M be the subset of the manifold of Sect. 3, whose members are proba-
bility measures, and let L λ0 (respectively G 0 ) be the co-dimension 1 subspaces of L λ
(respectively G) whose members have zero μ-mean. Let φ0 : M0 → G 0 be defined
by
ρa(1) u = u − E Pa u
(2)
Eμ expη (ρ(a))(u − E Pa u)(v − E Pa v)
ρa(2) (u, v) = − (1)
, (38)
Eμ expη (ρ(a))
123
S184 N. J. Newton
(1,0)
ϒa,z u = Eμ exp(1) (0,1) (1)
η (a + z)u and ϒa,z = Eμ expη (a + z) > 0. (40)
and so lim z↑∞ ϒ(a, z) = ∞. Furthermore, the monotone convergence theorem shows
that
So ϒ(a, · ) is a bijection with strictly positive derivative, and the inverse function
theorem shows that it is a C λ−1 -isomorphism. The implicit mapping theorem shows
that Z : G 0 → R, defined by Z (a) = ϒ(a, · )−1 (1), is of class C λ−1 . For some
a ∈ G 0 , let P be the probability measure with density p = expη (a + Z (a)); then
φ0 (P) = a and P ∈ M0 , which proves part (i).
The argument above shows that the inclusion map, ρ, is of class C λ−1 . Let c :
G → G 0 be the (linear) superposition operator defined by c(a)(x) = a(x) − Eμ a;
(1)
then c is continuous, and has derivative ca u = u − Eμ u. Now c ◦ ρ is the identity
map of G 0 , which shows that ρ is homeomorphic onto its image, ρ(G 0 ), endowed
with the relative topology. Furthermore, for any u ∈ G 0 ,
(1)
u = (c ◦ ρ)a(1) u = cρ(a) ρa(1) u,
(1) (1)
and so ρa is a toplinear isomorphism, and its image, ρa G 0 , is a closed linear
subspace of G. Let E a be the one dimensional subspace of G defined by E a =
(1) (1)
{yexpη (ρ(a)) : y ∈ R}. If u ∈ E a and v ∈ ρa G 0 then there exist y ∈ R and
w ∈ G 0 such that
Eμ uv = yEμ exp(1)
η (ρ(a))(w − E Pa w) = 0.
So E a ∩ ρa(1) G 0 = {0}, and ρa(1) splits G into the direct sum E a ⊕ ρa(1) G 0 . We have
thus shown that ρ is a C λ−1 -immersion, and this completes the proof of part (ii).
Jensen’s inequality shows that there exists a K η < ∞ such that
(1)
So, for bounded B, inf P∈B Eμ expη (ρ(a)) > 0.
123
A two-parameter family of non-parametric... S185
(1) (1)
Then ◦ −1 0 (a, u) = (ρ(a), ρa u). For any (P, U ) ∈ T M0 , U φ = ρa u =
u − E Pa u, and so tangent vectors in TP M0 are distinguished from those merely in
TP M by the fact that their total mass is zero.
Any regularity possessed by divergences, entropies and tensors on M involving
fewer than λ derivatives is also enjoyed by their restrictions to M0 .
5 Concluding remarks
This paper has developed a family of non-parametric statistical manifolds that use the
two-parameter deformed logarithm of (17), and a variety of model spaces of Sobolev
type. It has shown that the mixed-norm space W k,λ is especially suited to this applica-
tion. The Amari embedding maps, ξα , which are central to the analysis of divergences,
entropies and associated tensors, “lift” to continuous nonlinear superposition oper-
ators acting on the Sobolev model spaces. (A rare property in the theory of such
operators.) Variants of the superposition operators having Sobolev range spaces with
weaker topologies enjoy greater regularity; they were shown to admit multiple deriva-
tives on the manifolds, according to the values of the parameters k, λ and η. Of course,
this paper takes only the first step in a fuller analysis of the information geometry of
the manifolds constructed. However, for reasons of space, we shall go no further here.
Data Availability Data sharing is not applicable to this article as no datasets were generated or analysed
during the current study.
Declarations
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which
permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give
appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence,
and indicate if changes were made. The images or other third party material in this article are included
in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If
material is not included in the article’s Creative Commons licence and your intended use is not permitted
by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the
copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.
123
S186 N. J. Newton
References
1. Amari, S.-I., Nagaoka, H.: Methods of Information Geometry, Translations of Mathematical Mono-
graphs, 191. American Mathematical Society, Providence (2000)
2. Ay, N., Jost, J., Van Lê, H., Schwachhöfer, L.: Information Geometry, Ergebnisse der Mathematik und
ihrer Grenzgebiete. 3. Folge / A Series of Modern Surveys in Mathematics, vol 64. Springer, Cham
(2017). https://doi.org/10.1007/978-3-319-56478-4
3. Barndorff-Nielsen, O.E.: Information and Exponential Families in Statistical Theory, Wiley (1978)
4. Cena, A., Pistone, G.: Exponential statistical manifold. Ann. Inst. Stat. Math. 59, 27–56 (2007)
5. Chentsov, N.N.: Statistical Decision Rules and Optimal Inference. Translations of Mathematical Mono-
graphs, vol. 53. American Mathematical Society, Providence (1982)
6. Cianchi, A., Pick, L., Slavíková, L.: Higher-order Sobolev embeddings and isoperimetric inequalities.
Adv. Math. 273, 568–650 (2015)
7. Gibilisco, P., Pistone, G.: Connections on non-parametric statistical manifolds by Orlicz space geom-
etry. Infin.-Dimens. Anal. Quantum Probab. Relat. Top 1, 325–347 (1998)
8. Kaniadakis, G., Lissia, M., Scarfone, A.M.: Two-parameter deformations of logarithm, exponential,
and entropy: a consistent framework for generalized statistical mechanics. Phys. Rev. E 71, 046128
(2005)
9. Lauritzen, S.L.: Statistical Manifolds, IMS Lecture Notes Series, 10, Institute of Mathematical Statistics
(1987)
10. Lods, B., Pistone, G.: Information geometry formalism for the spatially homogeneous Boltzmann
equation. Entropy 17, 4323–4363 (2015)
11. Loaiza, G., Quiceno, H.R.: A q-exponential statistical Banach manifold. J. Math. Anal. Appl. 398,
466–476 (2013)
12. Murray, M.K., Rice, J.W.: Differential Geometry and Statistics, Monographs in Statistics and Applied
Probability, 48, Chapman Hall (1993)
13. Naudts, J.: Generalised Thermostatistics, Springer (2011)
14. Newton, N.J.: An infinite-dimensional statistical manifold modelled on Hilbert space. J. Funct. Anal.
263, 1661–1681 (2012)
15. Newton, N.J.: Infinite-dimensional statistical manifolds based on a balanced chart. Bernoulli 22, 711–
731 (2016). https://doi.org/10.3150/14-BEJ673
16. Newton, N.J.: Nonlinear filtering and information geometry: a Hilbert manifold approach, in: Ay, N.,
Gibilisco, P., Matús̆, F. (eds.) Information Geometry and its Applications, Proceedings in Mathematics
and Statistics, 252, Springer, Cham, 189–208 (2018). https://doi.org/10.1007/978-3-319-97798-0_7
17. Newton, N.J.: A class of non-parametric statistical manifolds modelled on Sobolev space. Inf Geometry
Springer 2, 283–312 (2019). https://doi.org/10.1007/s41884-019-00024-z
18. Pistone, G., Rogantin, M.P.: The exponential statistical manifold: mean parameters, orthogonality and
space transformations. Bernoulli 5, 721–760 (1999)
19. Pistone, G., Sempi, C.: An infinite-dimensional geometric structure on the space of all the probability
measures equivalent to a given one. Ann. Stat. 23, 1543–1561 (1995)
20. Tsallis, C.: The nonadditive entropy Sq and its applications in Physics and elsewhere, Entropy, 13,
1765–1804 (2011). https://doi.org/10.3390/e13101765
21. Vigelis, R.F., Cavalcante, C.C.: On ϕ-families of probability distributions. J. Theor. Probab. 26, 870–
884 (2013)
22. Vieira, F.L.J., de Andrade, L.H.F., Vigelis, F.R., Cavalcante, C.C.: A Deformed Exponential Statistical
Manifold, Entropy, 21 (2019). https://doi.org/10.3390/e21050496
Publisher’s Note Springer Nature remains neutral with regard to jurisdictional claims in published maps
and institutional affiliations.
123