Introduction

Faà di Bruno–type formulas describe how higher derivatives behave under composition. In the classical one-variable setting, they express \((f\circ g)^{(n)}\) as a polynomial in the derivatives of \(f\) and \(g\) with explicitly computable combinatorial coefficients (often packaged by Bell polynomials). In 1855 Faà di Bruno derived the identity (for \(f,g:\IR\to\IR\)) $$ \begin{aligned} \frac{d^n}{dx^n}\bigl(f\circ g\bigr)(x) &= \sum_{k=1}^n f^{(k)}(g(x)) \ssum{(*)} \frac{n!}{m_1!\cdots m_n!} \prod_{j=1}^n \Bigl(\frac{g^{(j)}(x)}{j!}\Bigr)^{m_j}. \end{aligned} $$

Here the sum \((*)\) runs over all tuples \(m_1,\dots,m_n\in\IN_0\) constrained by the conditions \(m_1+\dots+m_n=k\) and \(m_1+2m_2+\dots+nm_n=n\).

This formula admits several equivalent repackagings in terms of Bell polynomials, partition sums, umbral expressions — each highlighting different facets of the combinatorial structure.

In several variables the underlying pattern remains the same, but the bookkeeping becomes substantially heavier: one must track multi-index exponents, manage products over vector-valued derivatives, and account for the richer symmetry of set partitions with block multiplicities. A complete explicit multi-variate formula with fully worked-out combinatorial coefficients was only established by Constantine and Savits in 1996 - 141 years after Faà di Bruno's original work.

This note takes a deliberately "analytic-first" viewpoint. We work with maps between real normed vector spaces and isolate the following question as the core of Faà di Bruno:

If \(\phi\) is approximated near \(x_0\) by a polynomial \(T^k_{x_0}\phi\) of degree \(\le k\), and \(\psi\) is approximated near \(y_0 = \phi(x_0)\) by a polynomial \(T^k_{y_0}\psi\) of degree \(\le k\), what polynomial approximates \(\psi\circ\phi\) near \(x_0\)?

There is one natural candidate: the polynomial composition \(T^k_{y_0}\psi\circ T^k_{x_0}\phi\), truncated to degree \(\le k\). Our first main result confirms that this is correct under minimal hypotheses: pointwise \(k\)-fold Fréchet differentiability at a single point (no global smoothness assumptions, and no completeness or finite-dimensionality requirements). Concretely, we show that if \(\phi\) and \(\psi\) admit pointwise Taylor decompositions of order \(k\) at \(x_0\) and \(y_0\) respectively, then \(\psi\circ\phi\) does as well, with $$ T^k_{x_0}(\psi\circ\phi) = \pi_{\le k}\bigl(T^k_{y_0}\psi\circ T^k_{x_0}\phi\bigr). $$

Once this “composition of Taylor polynomials” principle is established, the familiar combinatorial formulas fall out systematically:

  • the partition form is obtained by polarization (passing from homogeneous polynomials to symmetric multilinear maps), yielding a formula for \(D^k(\psi\circ\phi)(0)\) as a sum over set partitions of \([k]\);
  • the multi-index form is obtained by coefficient extraction in coordinates: expand the Taylor polynomials in a monomial basis and compute the coefficient of \(x^\nu\) in \(T^k_{y_0}\psi\bigl(T^k_{x_0}\phi(x)\bigr)\). This produces an explicit coefficient identity that recovers the multi-index Faà di Bruno formula (in particular the structure underlying the Constantine–Savits coefficients) as a corollary of a purely algebraic multinomial bookkeeping lemma.

As an application we include a higher-order Leibniz rule in two parallel forms: a partition identity for Fréchet derivatives and a multi-index identity for partial derivatives. The proof is deliberately routed through the Faà di Bruno formula in both incarnations.

There is a large literature around Faà di Bruno formulas and their multivariate generalizations; we briefly summarize the most relevant references and how they compare to the present treatment.

More directly related to this paper:

  • Faà di Bruno (1855, F. Faà di Bruno. "Sullo sviluppo delle funzioni") [FaaDiBruno1855] [FaaDiBruno1857]. The original one-dimensional formulas introducing the identity that now bears his name, in Italian (1855) and French (1857). These short notes are the primary historical sources.

  • Scott (1861, "Formulae of Successive Differentiation") [Scott1861] [Johnson2002]. In the one-variable case \(g\circ f:\IR\to\IR\), Scott’s identity (as recounted by Johnson) gives a 1D Taylor-composition viewpoint via coefficient extraction: apply Taylor’s theorem to \(g\) in the increment \(f(t+h)-f(t)\) and read off the coefficient of \(h^m\).

  • Constantine–Savits (1996, "A Multivariate Faà di Bruno Formula with Applications") [CS1996]. Foundational multivariate Faà di Bruno formula for arbitrary partial derivatives of a composition \(h(x)=f(g(x))\) with \(f:\IR^m\ra\IR\) and \(g:\IR^d\ra\IR^m\), expressed in full multiindex form using set partitions and multivariate Stirling numbers. Their proof is combinatorial and coordinate-based. The multiindex form stated here recovers their coefficient formula but is derived from a short argument based on jets and polarized Fréchet derivatives.

  • Hernández Encinas–Muñoz Masqué (2003, "A short proof of the generalized Faà di Bruno's formula") [EM2003]. Derive the multivariate Faà di Bruno formula by working at the level of \(r\)-jets (higher cotangent spaces) and using induced algebra morphisms on truncated local rings; in coordinates this amounts to composing truncated multivariate power series and applying the multinomial theorem. This is close in spirit to our "Taylor polynomials compose" viewpoint, but their formulation stays in the finite-dimensional multi-index setting and is phrased in the language of higher cotangent spaces rather than as an explicit functoriality statement for pointwise Taylor polynomials between general normed spaces.

  • Levy (2006, "Why do partitions occur in Faà di Bruno's chain rule for higher derivatives?") [Levy2006]. Gives what is, to our knowledge, the only explicit partition-form Faà di Bruno formula for \(n\)-th Fréchet derivatives between Banach spaces, and explains why partitions appear. The first part uses an algebraic description in terms of coalgebras on germs; the second part proves the Banach-space partition formula via iterated finite differences on parallelepipeds. The partition form proved in this paper agrees with Levy’s formula but is obtained from a very short jet-based computation.

  • Duarte–Torres (2008, "A discrete Faà di Bruno's formula") [DT2008]. Develop a discrete (finite-difference) Faà di Bruno formula for maps between linear spaces, valid for general functions of several variables, by building an algebra of symbolic finite-difference expressions and expanding \(\Delta^\alpha(f\circ g)\) in terms of finite differences of \(f\) and \(g\). Their analysis also yields a streamlined proof of the smooth Faà di Bruno formula when one passes from finite differences to derivatives; philosophically this is close to viewing higher derivatives as jets, but their primary emphasis is on the discrete setting.

Polynomials

We present a brief overview of the basic theory of Polynomials in an infinite dimensional setting of normed vector spaces following [Dineen1999]. The reader who is only interested in the finite dimensional setting can safely skip this section.

(1) Definition (Polynomials [Dineen1999]).

Let \(E,F\) be real normed vector spaces.

  • Write \(\KL(E,F)\) for the vector space of bounded linear maps \(A: E \ra F\), with operator norm: $$ |A| = \sup_{\substack{v \in E, |v| \leq 1}} |A(v)| $$

  • For \(k \ge 0\) let \(\KL(^k E,F)\) be the space of bounded \(k\)-fold multi-linear maps \(A_k: E^{\times k}\to F\), with operator norm: $$ |A_k| = \sup_{\substack{v_1,\dots,v_k \in E \\ |v_i| \leq 1}} |A_k(v_1,\dots,v_k)|. $$

  • Let \(\KL^s(^k E,F) \subset\KL(^k E,F)\) be the subspace of symmetric multi-linear maps.

  • A map \(p: E\to F\) is called a \(k\)-homogeneous polynomial if there exists \(A \in \KL^s(^k E,F)\) such that $$ p(x)=\frac{1}{k!}\,A[x,\dots,x]=\frac{1}{k!}\,A[x^{\tensor k}], \qquad x\in E. $$ We write \(\mathcal{P}^k(E,F) \subset Map(E,F)\) for the space of such maps.

  • A polynomial map \(p: E \to F\) is a finite sum of homogeneous polynomials: $$ p = \sum_{k=0}^n p_k \qtext{with} p_k \in \mathcal{P}^k(E,F). $$ We write \(\mathcal{P}(E,F):=\sum_{k\ge 0} \mathcal{P}^k(E,F) \subset Map(E,F)\) for the space of all polynomial maps, and \(\mathcal{P}_{\le n}(E,F):= \sum_{k=0}^n \mathcal{P}^k(E,F)\) subspace of polynomial maps of degree \(\le n\).

  • For \(F=\IR\) we write \(\mathcal{P}(E):=\mathcal{P}(E,\IR)\).

(2) Lemma (Polarization). Let \(p: E \to F\) be a polynomial map, represented as a sum of multi-linear forms \(p(x) = \sum_{k=0}^n p_k\), \(p_k = \frac{1}{k!} A_k[x^{\tensor k}]\) with \(A_k \in \KL^s(^k E, F)\).

  • The top-degree multi-linear form \(A_n \in \KL^s(^n E,F)\) can be recovered from \(p: E \ra F\) via: $$ A_n[v_1,\dots,v_n] = \sum_{I \subset [n]} (-1)^{n - |I|} p(\sum_{i \in I} v_i) $$ for \(v_i \in E\).

  • For a polynomial map \(p: E \ra F\), \(p \neq 0\) the representation $$ p(x) = \sum_{k=0}^n \frac{1}{k!} A_k[x^{\tensor k}] $$ in terms of \(n \in \IN_0, A_k \in \KL^s(^k E, F)\) is unique.

Proof. The polarization formula is Möbius inversion on the Boolean lattice \(2^{[n]}\): By multilinearity,

\[ p(\sum_{i \in I} v_i) = \sum_{k=0}^n \sum_{i_1,\dots,i_k \in I} \frac{1}{k!} A_k[v_{i_1},\dots,v_{i_k}] = \sum_{J \subset I} A_{|J|}[v_J] \]

where \(A_k[v_J]\) denotes evaluation on the \(|J|\)-tuple indexed by \(J\). The alternating sum \(\sum_{I \subset [n]} (-1)^{n-|I|}\) inverts this: all terms with \(J \subsetneq [n]\) cancel by the identity \(\sum_{I \supset J} (-1)^{n-|I|} = (1-1)^{n-|J|} = 0\), leaving only \(J = [n]\). Uniqueness follows by induction: recover \(A_n\) via polarization, subtract, and repeat.

(3) Definition (Degree Truncation).

Let \(p: E \ra F\) be a polynomial map, \(p \neq 0\). By Lemma (2) 🔗 the decomposition \(p(x) = \sum_{k=0}^n \frac{1}{k!} A_k[x^{\tensor k}]\) is unique. We can hence define:

  • \(\deg(p):= n \in \IN_0\) the degree of \(p\). By convention \(\deg(0) = - \infty\).
  • \(\pi_k(p) := p_k: E \ra F\) the degree-k part of \(p\).
  • \(\Pol_k(p) := A_k \in \KL^s(^k E, F)\) the degree-k polarization of \(p\).
  • \(\pi_{\leq k}(p) \in \mathcal{P}_{\le k}(E,F)\) the (lower) degree truncation of \(p\).
  • \(\pi_{> k}(p) \in \mathcal{P}_{> k}(E,F)\) the upper degree truncation of \(p\).

We will make use of the following properties of polynomial maps:

(4) Lemma (Polynomial Properties). Let \(p : E \ra F\), \(q: F \ra G\) be polynomial maps.

  1. (Composition) The composition \(q \circ p: E \ra G\) is a polynomial with \(\deg(q \circ p) \leq \deg(q) \cdot \deg(p)\).
  2. (Lipschitz) There exists \(\eps > 0, C>0\) such that \(\|\Delta_{0} p(x)\| \le C \|x\|\) for all \(\|x\| < \eps\).
  3. (Upper Vanishing) We have \(\| \pi_{> k}(p)(x)\| / \|x\|^{k} \ra 0\) for \(x \ra 0\).
  4. (Lower Vanishing) If \(\|p(x)\| / \|x\|^{k} \ra 0\) for \(x \ra 0\), and \(k \geq \deg(p)\) then \(p = 0\).

Proof. Ad 1) Expanding \(q(p(x))\) by multilinearity yields a sum of terms \(B[A_1[x^{\tensor j_1}], \dots, A_\ell[x^{\tensor j_\ell}]]\) where \(B \in \KL(^\ell F, G)\) and \(A_i \in \KL(^{j_i} E, F)\) are bounded multilinear.

The composition \((u_1,\dots,u_k) \mapsto B[A_1[u_{I_1}], \dots, A_\ell[u_{I_\ell}]]\) is bounded multilinear, hence each term is a homogeneous polynomial of degree \(j_1 + \cdots + j_\ell \leq \ell \cdot \deg(p)\). Thus \(q \circ p\) is a polynomial with \(\deg(q \circ p) \leq \deg(q) \cdot \deg(p)\).

Ad 2) Write \(\Delta_0 p(x) = p(x) - p(0) = \sum_{j=1}^m \frac{1}{j!} A_j[x^{\tensor j}]\) with \(A_j \in \KL^s(^j E, F)\) bounded. For \(\|x\| < 1\) we have \(\|x\|^j \leq \|x\|\), hence \(\|\Delta_0 p(x)\| \leq \sum_{j=1}^m \frac{\|A_j\|}{j!} \|x\|^j \leq C\|x\|\) with \(C = \sum_{j=1}^m \frac{\|A_j\|}{j!}\).

Ad 3) Each homogeneous component \(p_j\) of degree \(j > k\) satisfies \(\|p_j(x)\| \leq C_j \|x\|^j\), hence \(\|p_j(x)\|/\|x\|^k = C_j \|x\|^{j-k} \to 0\) as \(x \to 0\).

Ad 4) Assume \(p \neq 0\), \(n = \deg(p)\). Write \(p = \sum_{j=m}^n p_j\) with \(p_j = \pi_j(p)\) homogeneous, \(p_m \neq 0\) the minimal non-vanishing degree.

Let \(m = \min\{j : p_j \neq 0\} \leq \deg(p) \leq k\) be the bottom term, then \(p = p_m + \pi_{> d} p\), and by (3) \(\lim_{x \ra 0} \|p(x)\| / \|x\|^k = \|p_d(x)\|/\|x\|^k\)

If \(p \neq 0\), let \(j_0 := \min\{j : p_j \neq 0\} \leq \deg(p) \leq k\) and choose \(v \in E\) with \(p_{j_0}(v) \neq 0\). For \(t \to 0^+\): \(p(tv)/t^k = t^{j_0-k} p_{j_0}(v) + \pi_{> j_0} p(tv)\). Since \(j_0 \leq k\), the leading term \(t^{j_0-k} p_{j_0}(v)\) does not vanish as \(t \to 0^+\), contradicting \(\|p(x)\|/\|x\|^k \to 0\).

Fréchet Differentiability

We summarize the classical notion of Fréchet differentiability (following [LangRFA]), and introduce a weaker point-wise notion of Fréchet differentiability.

(5) Definition (Fréchet differentiability [LangRFA] XIII.6). Let \(E,F\) be normed vector spaces, \(U_E \subset E\) an open subset and \(\phi:U_E \ra F\) a map.

  • (Forward difference) For \(x_0 \in U_E\) let $$ \Delta_{x_0}\phi(h):= \phi(x_0+h) - \phi(x_0) $$ the forward difference, which is defined for \(h \in E\) in a neighbourhood of \(0\).

  • (Fréchet differentiable) We say that \(\phi\) is Fréchet differentiable at a point \(x_0 \in U_E\) if there exists a bounded linear map \(L \in \KL(E,F)\) such that $$ \lim_{h\to 0} \frac{|\Delta_{x_0}\phi(h) - L(h)|}{|h|} = 0. $$ The map \(L\) is uniquely determined and is called the derivative of \(\phi\) at \(x_0\), written \(D\phi(x_0)\).

  • (Continuous Fréchet differentiable) We say that \(\phi\) is continuously Fréchet differentiable on \(U_E\) if \(\phi\) is Fréchet differentiable at every point \(x \in U_E\) and the map: \(x \mapsto D\phi(x) \in \KL(E,F)\) is continuous.

  • (Continuous k-times Fréchet differentiability) We recursively define \(\phi\) to be k-times continuously Fréchet differentiable. For \(k \geq 1\) we require \(\phi: U_E \ra F\) to be continuously Fréchet differentiable on \(U_E\), and \(D\phi: U_E \ra \KL(E,F)\) \(k-1\)-times continuously Fréchet differentiable as a map between normed vector spaces.

  • We write \(C^k(U_E, F)\) for the space of \(k\)-times continuously Fréchet differentiable functions.

(6) Proposition (Taylor Approximation [LangRFA], XIII.6). Let \(\phi: U_E \ra F\) be \(C^k\) as above, and \(x_0 \in U_E\).

  1. The higher Fréchet differentials define symmetric multi-linear forms: $$ D^k\phi(x_0) = D( \dots (D\phi))(x_0) \in \KL(E, \dots, \KL(E,\KL(E,F)) \isom \KL^s(^k E, F). $$

  2. The associated degree-\(k\) polynomial: $$ T^k_{x_0}\phi(h) = \sum_{\ell=1}^k \frac{1}{\ell!} D^\ell\phi(x_0)[h^{\tensor \ell}]: E \ra F $$ is called the degree-\(k\) Taylor polynomial of \(\phi\) at \(x_0\). Note that we start this sum at \(\ell=1\) so that \(T^k_{x_0}\phi(0) = 0\), by convention.

  3. The Taylor Remainder \(R^k_{x_0} \phi = \Delta_{x_0} \phi - T^k_{x_0}\phi\) satisfies: $$ \Delta_{x_0} \phi = T^k_{x_0} \phi + R^k_{x_0} \phi \qtext{and} |R^k_{x_0}\phi(h)|/|h|^k \;\to\; 0 \qtext{as} h \to 0 . $$ We call this sum the Taylor Decomposition of \(\phi\) at \(x_0\).

  4. Let \(\Delta_{x_0} \phi = T + R\) be any other decomposition with: \(T \in \mathcal{P}_{\leq k}(E,F)\), \(T(0) = 0\) a degree \(\leq k\) polynomial and \(R: E \ra F\) with \(\|R(h)\|/\|h\|^k \ra 0\) for \(h \ra 0\). Then \(T = T^k_{x_0}\phi\) and \(R = R^k_{x_0} \phi\).

Proof. Properties 1-3 are standard and can be found, e.g. in [LangRFA], XIII.6. For the uniqueness assume that \(\Delta_{x_0} \phi = T + R = T' + R'\) are two Taylor decompositions of \(\phi\). Then \(p = T-T' = R-R'\) is a polynomial of degree \(\leq k\) with \(\|p(h)\|/\|h\|^k \ra 0\) for \(h \ra 0\). By Lemma (4) 🔗 we must have \(p = 0\).

In the light of this Proposition, we make the following definition of point-wise Fréchet differentiability as approximation property for functions by polynomial maps:

(7) Definition (Pointwise Fréchet decomposition at \(0\)).

  • We say that a map \(\phi:U_E\ra F\) is \(k\)-fold pointwise Fréchet differentiable at \(x_0\) if there exists a "Taylor" decomposition: $$ \Delta_{x_0} \phi = T + R $$ where a polynomial map \(T \in\mathcal{P}_{\le k}(E,F)\) with \(T(0)=0\) and a residual map \(R: U_E \ra F\) such that $$ |R(x)|\,/\,|x|^k \;\to\; 0 \qtext{as} x \to 0. $$

    By (6) 🔗 this decomposition is unique if it exists, and we write: \(T_{x_0}^k\phi = T\) and \(R^k_{x_0}\phi = R\).

  • We define the degree \(1 \leq \ell \leq k\) Fréchet differentials as polarization of the Taylor polynomial via: $$ D^\ell\phi(x_0) = \Pol_\ell T_{x_0}^k\phi \in \KL^s(^\ell E,F) $$ And set \(D^0\phi(x_0) = \phi(x_0) \in \KL^s(^0 E,F) = F\).

(8) Lemma (Lipschitz Continuity). If \(\phi: E \ra F\) is pointwise Fréchet differentiable at \(x\), then \(\phi\) is Lipschitz continuous at \(x\), i.e. there is \(C > 0, \eps > 0\) so that: \(\| \Delta_{x_0} \phi(h) \| < C \|h\|\) for all \(\|h\| < \eps\).

Proof. We may assume \(x_0 = 0\). Let \(\Delta_0\phi = P + R\) be a Taylor decomposition of order \(k\) at \(0\). By Lemma (4) 🔗 there exist \(\eps > 0\) and \(C_1 > 0\) such that \(\|P(h)\| \leq C_1\|h\|\) for \(\|h\| < \eps\). Since \(R(h) = o(\|h\|^k)\) we have \(\|R(h)\| \leq C_2\|h\|^k \leq C_2\|h\|\) for \(\|h\| < \min(\eps, 1)\). Thus \(\|\Delta_0\phi(h)\| \leq \|P(h)\| + \|R(h)\| \leq (C_1 + C_2)\|h\|\) for \(\|h\|\) small.

Faa di Bruno - Composition Form

(9) Theorem (Multivariate Faà di Bruno - Composition Form). Let \(E,F,G\) be normed vector spaces. Let \(U_E \subset E\) be open, and \(x_0 \in U_E\) and \(\phi:E \ra F\) be \(k\)-fold pointwise Fréchet differentiable at \(x_0\). Let \(U_F \subset F\) open with \(y_0 = \phi(x_0) \in U_F\) and \(\psi:U_F \ra G\) be \(k\)-fold pointwise Fréchet differentiable at \(y_0\).

Then:

  • The composition \(\psi \circ \phi: U_E \ra G\) is pointwise \(k\)-fold Fréchet differentiable at \(x_0\).
  • The Taylor Polynomials compose as polynomial maps in \(\mathcal{P}(E,G)\): $$ T_{x_0}^k (\psi \circ \phi) = \pi_{\leq k}( T_{y_0}^k \psi \circ T_{x_0}^k \phi). $$

Proof.

By translation, we may assume \(x_0=0\) and \(\phi(x_0) = y_0 = 0\) as well as \(\psi(y_0) = 0\) throughout the proof. In particular \(\Delta_{x_0} \phi = \phi\), \(\Delta_{y_0} \psi = \psi\).

Choose Taylor decompositions \(\phi = P + R\) and \(\psi = Q + S\), with polynomials \(P,Q\) of degree \(\le k\) and remainders \(R,S\) vanishing to order \(k\) at \(0\). Now write \(\psi\circ\phi = Q(\phi) + S(\phi)=\pi_{\le k}(Q\circ P) + E\), with: $$ E := Q(\phi) + S(\phi) - Q(P) + \pi_{> k}(Q\circ P). $$ We have to show that \(\|E(x)\| / \|x\|^k \to 0\) for \(x \to 0\).

Clearly \(\| \pi_{> k}(Q\circ P) \| / \|x\|^k \to 0\) for \(x \to 0\) be Lemma (4) 🔗.

To see that \(\|S(\phi(x))\|/\|x\|^k \to 0\) we argue as follows: By the Lipschitz property of \(\phi\) (Lemma (8) 🔗), we find \(\delta > 0, C > 0\) so that \(\|\phi(x)\| < C \|x\|\) for all \(\|x\| < \delta\). Since \(S\) is a Taylor residual we have \(\|S(y)\| < \eta(y) \|y\|^k\) with \(\eta(y) \to 0\) as \(y \to 0\). Hence \(\|S(\phi(x))\|/\|x\|^k \leq \eta(\phi(x)) C^k \to 0\) as \(x \to 0\).

For the term \(Q(P+R) - Q(P)\), we decompose \(Q(y) = \sum_{\ell=0}^k Q_\ell(y) = \sum_\ell \frac{1}{\ell!} B_\ell[y^{\tensor \ell}]\) where \(Q_\ell = \pi_\ell Q\) and \(B_\ell = \Pol_\ell Q\). Then $$ Q_\ell(P+R) - Q_\ell(P) = \sum_{\substack{i+j=\ell\ j\ge1}} \frac{1}{i!\,j!}\,B_\ell\big(P^{\tensor i}, R^{\tensor j}\big). $$

It suffices to show that each summand \(B_\ell\big(P^{\tensor i}, R^{\tensor j})\) is in \(o(\|x\|^k)\). To this end we note that:

  • \(\|B_\ell(P(x)^{\tensor i}, R(x)^{\tensor j})\| \leq \|B_\ell\| \cdot \|P(x)\|^i \|R(x)\|^j\) as \(B_\ell \in \KL^s(^\ell F,G)\) bounded.
  • \(\|P(x)\| \le C\|x\|\) for \(\|x\| < \delta\) as \(P\) is Lipschitz (Lemma (4) 🔗) and \(P(0) = 0\).
  • \(\|R(x)\| < \eta(x) \|x\|^k\) with \(\eta(x) \ra 0\) for \(x \to 0\) as \(R\) is a Taylor remainder.

So that: $$ |B_\ell(P^{\tensor i}(x),R^{\tensor j}(x))| / |x|^k \leq |B_\ell| \; C^i \; \eta(x)^j \; |x|^{i + (j - 1) k} \to 0 \qtext{as} x \to 0 $$ For the last step, note that \(j \geq 1\) so the factor \(\eta(x)^j \to 0\) as \(x \to 0\), and \(i + (j - 1) k \geq 0\) hence \(\|x\|^{i + (j - 1) k}\) stays bounded (and even goes to zero for \(j \geq 2\)).

Faa di Bruno - Partition Form

(10) Definition. Let \(I\) be a finite set.

  • An un-ordered (non-empty) partition \(\pi\) of \(I\) is a set \(\pi = \set{B_1,\dots,B_r}\) of subsets \(B_i \subset I\) ("blocks"), so that \(B_i\) are disjoint, \(I = \union_{i=1}^r B_i\) and \(B_i \neq \emptyset\). We write \(\pi \part I\) in this case and call \(|\pi| = r\) the rank of \(\pi\).

  • An ordered partition \(\pi\) of \(I\) is a tuple \(\pi = (B_1,\dots,B_r)\) of subsets \(B_i \subset I\), so that \(B_i\) are disjoint and \(I = \union_{i=1}^r B_i\). We write \(\pi \opart I\) in this case.

    Note that we allow empty blocks for ordered partitions, so that this datum is equivalent to a map \(\sigma: I \ra [r]\) with \(|\pi| = r\) \(B_i = \sigma^{-1}\set{i}\).

  • For a set \(X\), a tuple \(v = (v_i)_{i \in I} \in X^I\) and a block \(B \subset I\), we write \(v_B := (v_b)_{b \in B} \in X^B\) for the restricted tuple.

(11) Lemma (Composition Polarization). If \(k = m_1 + \dots + m_r \in \IN_0\) and \(A_i \in \KL^s(^{m_i} E,F)\), and \(B \in \KL^s(^r F,G)\). Let $$ H(x) = B[\frac{1}{m_1!}A_1[x^{\tensor m_1}], \dots, \frac{1}{m_r!}A_r[x^{\tensor m_r}]] $$ then: $$ \Pol_k(H)[u_1,\dots,u_k] = \ssum{(B_1,\dots,B_r) \opart [k], \\ |B_i| = m_i} B[A_1[u_{B_1}], \dots, A_r[u_{B_r}]]. $$

Proof. Let \(G[u_1,\dots,u_k]\) be the claimed expression for \(\Pol_k(H)[u_1,\dots,u_k]\). As both sides are symmetric, bounded k-multi-linear forms in \(u_1,\dots,u_k\) we only have to validate that \(\frac{1}{k!} G[x^{\tensor k}] = H\). We find: $$ G[x^{\tensor k}] = \sum_{(B_1,\dots,B_r) \opart [k], \\ |B_i| = m_i} B[A_1[x^{\tensor m_1}], \dots, A_r[x^{\tensor m_r}]] = \frac{k!}{m_1! \dots m_r!} \cdot B[A_1[x^{\tensor m_1}], \dots, A_r[x^{\tensor m_r}]] $$ as there are \(k! / m_1! \dots m_r!\) ordered partitions of a set of \([k]\) elements into subsets of order \(m_1,\dots,m_r\). Re-arranging the factorials yields the claimed.

(12) Theorem (Multivariate Faà di Bruno - Partition Form). Let \(x_0\in U_E \subset E\) and set \(y_0:=\phi(x_0)\in F\). Assume that \(\phi:U_E\ra F\) is pointwise \(k\)-fold Fréchet differentiable at \(x_0\) and that \(\psi:F\ra G\) is pointwise \(k\)-fold Fréchet differentiable at \(y_0\). For \(u_1,\dots,u_k\in E\) we have: $$ D^k(\psi\circ\phi)(x_0)[u_1,\dots,u_k] = \sum_{r=1}^k \ssum{\pi \part [k] \\ |\pi| = r} D^r\psi(y_0)\bigl[D^{|B_1|}\phi(x_0)[u_{B_1}] ,\dots, D^{|B_r|}\phi(x_0)[u_{B_r}]\bigr]. $$

Proof. Fix \(u_1,\dots,u_k\in E\). By translation, we may again assume \(x_0=0\) and \(y_0=0\) throughout the proof. Fix Taylor decompositions \(\phi = P + R\), \(\psi = Q + S\) with \(P,Q\) polynomial of degree \(\le k\). By (9) 🔗 we have \(T_{0}^k \psi \circ \phi = \pi_{\leq k}(Q \circ P)\), and hence \(D^k(\psi\circ\phi)(0) = \Pol_k (Q \circ P)\).

Write the polynomial maps in polarized form as $$ P(x) = \sum_{m=1}^k \frac{1}{m!}\,P_m[x^{\tensor m}], \qquad Q(y) = \sum_{r=1}^k \frac{1}{r!}\,Q_r[y^{\tensor r}]. $$

Expanding \(Q_r\) by multilinearity and isolating the degree \(k\) term gives:

\[ \pi_k Q(P(x)) = \sum_{r=1}^k \sum_{\substack{m_1,\dots,m_r\ge1 \\\\ m_1+\cdots+m_r = k}} \frac{1}{r!} Q_r\bigl[\frac{1}{m_1!} P_{m_1}[x^{\tensor m_1}],\dots,\frac{1}{m_r!} P_{m_r}[x^{\tensor m_r}]\bigr]. \]

Applying \(\Pol_k\) on both sides making use of the composition formula from Lemma (11) 🔗 now gives:

\[ D^k(\psi\circ\phi)(0)[u_1,\dots,u_k] = \sum_{r=1}^k \ssum{m_1,\dots,m_r\ge1 \\\\ m_1+\cdots+m_r = k} \ssum{(B_1,\dots,B_r) \opart [k] \\\\ |B_i| = m_i} \frac{1}{r!} Q_r\bigl[P_{m_1}[u_{B_1}],\dots,P_{m_r}[u_{B_r}]\bigr]. \]

Now observe that there are exactly \(r!\) re-orderings of each ordered partition \((B_1,\dots,B_r)\), and these re-orderings yield the same summand as \(Q_r\) is symmetric. Hence the sum over \(m_i, B \opart [k]\) can be replaced by a single sum over un-ordered partitions \(\pi \part [k]\) with given block sizes \(|B_i| = m_i\). Furthermore we have \(Q_r = D^r \psi(0)\), \(P_m=D^m \phi(0)\) by definition. Substituting these into the above formulas yields the claim.

Faa di Bruno - Multi-index Form

(13) Definition (Multi-Indices).

  • Elements \(\nu = (\nu_1,\dots,\nu_d) \in \IN_0^d\) are called multi-indices of dimension \(d\). We write \(|\nu| = \sum_i \nu_i\) for the degree and \(\nu! := \nu_1! \cdots \nu_d!\) for the factorial.

  • If \(A\) is a commutative algebra and \(v \in A^d\) is a \(d\)-tuple, then we write: \(v^\nu = \prod_i v_i^{\nu_i} \in A\).

  • If \(E = \IR^d\), write \(x^1,\dots,x^d: E \ra \IR\) for the coordinate projections, then the set \(x^\nu, \nu \in \IN_0^d\) is a basis of Polynomial functions \(\mathcal{P}(E)\).

  • If \(U \subset \IR^d\) is open and \(\phi: U \ra F\) is pointwise k-fold Fréchet differentiable at \(x_0 \in U\). Let \(\nu \in \IN_0^d\) with \(k = |\nu|\). We define the partial derivative \(\del^\nu \phi (x_0) \in F\) via the basis expansion of \(T_{x_0}^{k} \phi \in \mathcal{P}(\IR^d,F) = \mathcal{P}(\IR^d) \tensor F\) as coefficient of \(x^\nu/\nu!\): $$ T_{x_0}^{k} \phi = \sum_{1 \leq |\nu| \leq k} \frac{x^\nu}{\nu!} \; \del^\nu \phi(x_0). $$

(14) Theorem (Multivariate Faà di Bruno - Multi-index form). Let \(x_0 \in U \subset \IR^d\) and set \(y_0:=\phi(x_0)\in \IR^n\). Assume that \(\phi: U \ra \IR^n\) is pointwise \(k\)-fold Fréchet differentiable at \(x_0\) and that \(f:\IR^n \ra \IR\) is pointwise \(k\)-fold Fréchet differentiable at \(y_0\).

For \(\alpha \in \IN_0^d\), \(\alpha \neq 0\) the partial derivative of the composition are given by:

\[ \frac{\del^\alpha(f \circ \phi)(x_0)}{\alpha!} \;=\; \ssum{\beta \in \IN_0^n \\\\ 1\le|\beta|\le|\alpha|} \del^\beta f(y_0)\; \sum_{s\ge 1}\; \sum_{(*)}\; \prod_{r=1}^s \frac{1}{\beta_r!} \frac{\big( \del^{\alpha_r}\phi(x_0) \big)^{\beta_r}} {\alpha_r!^{|\beta_r|}}, \]

Where \((*)\) runs over all \((\alpha_1,\dots,\alpha_s;\,\beta_1,\dots,\beta_s)\) with \(\alpha_j\in\IN_0^{d}\), \(\beta_j\in\IN_0^{n}\), \(\alpha_j \neq 0, \beta_j \neq 0\) and \(\alpha_1<\cdots<\alpha_s\) for a total order on \(\IN_0^d\) satisfying the constraints: \(\alpha_1|\beta_1|+\cdots+\alpha_s|\beta_s|=\alpha\) and \(\beta_1+\cdots+\beta_s=\beta\).

Proof. We may assume that \(x_0 = y_0 = f(y_0) = 0\), without loss of generality.

Step 1: Taylor Expansion) Write \(x^1,\dots,x^d\) for the coordinate functions on \(\IR^d\) and \(y^1,\dots,y^n\) for the coordinates on \(\IR^n\). Expand the Taylor polynomials of \(\phi\) and \(f\) as $$ P = T_{x_0}^{k} \phi = \ssum{\alpha \in \IN_0^d \\ 1 \leq |\alpha| \leq k} \frac{x^\alpha}{\alpha!} \; p_\alpha, \qquad F = T_{y_0}^{k} f = \ssum{\beta \in \IN_0^n \\ 1\leq |\beta| \leq k} \frac{y^\beta}{\beta!} \; f_\beta, $$ with \(p_\alpha = \del^\alpha \phi(x_0) \in \IR^n\) with components \(p_\alpha^i = \del^\alpha \phi^{(i)}(x_0)\) and \(f_\beta = \del^\beta f(y_0) \in \IR\).

Step 2: Polynomial Composition) By Theorem (9) 🔗 we have \(\del^\nu (f \circ \phi)(0)/\nu! = [x^\nu](F \circ P)\). We compute \([x^\nu](F \circ P)\) as follows: Let \(m \geq 0\). For each coordinate \(y^i\) we have:

\[ (y^i\circ P)^{m} = \Bigl(\sum_{\alpha \in \IN_0^d} \frac{p_\alpha^i}{\alpha!}\; x^\alpha\Bigr)^{m} = \sum_{\alpha_1,\dots,\alpha_m \in \IN_0^d} \prod_{j=1}^m \frac{p_{\alpha_j}^i}{\alpha_j!} x^{\alpha_j} = \ssum{\rho_i:\IN_0^d \ra \IN_0 \\\\ \sum_\alpha \rho_i(\alpha) = m} \Bigl(\frac{m!}{\prod_{\alpha \in \IN_0^d} \rho_i(\alpha)!} \Bigr) \prod_{\alpha\in \IN_0^d} \Bigl(\frac{p_\alpha^i}{\alpha!}\,x^\alpha\Bigr)^{\rho_i(\alpha)}. \]

Note that there are only finitely many maps \(\rho:\IN_0^d \ra \IN_0\) with \(\sum_\alpha \rho(\alpha) = n\), and that each of those maps is finitely supported i.e. \(\rho(\alpha) = 0\) except for finitely many \(\alpha \in \IN_0^{d}\). In the second step the map \(\rho_i\) counts the number of occurrences of \(\alpha\) in a given tuple \((\alpha_1,\dots,\alpha_m)\), i.e. \(\rho_i(\alpha) = \# \Set{j}{\alpha_j = \alpha}\). For each given map \(\rho_i\) we have \(m!/\prod_{\alpha \in \IN_0^d} \rho_i(\alpha)!\) tuples which give the same map.

Now let \(\beta \in \IN_0^n\) and apply this formula to each \(m = \beta_i \in \IN_0\). Write \(\rho: \IN_0^d \ra \IN_0^n\), \(\rho(\alpha) = (\rho_1(\alpha),\dots,\rho_n(\alpha))\), so that:

\[\begin{align*} (y \circ P)^\beta &= \prod_{i=1}^n (y^i \circ P)^{\beta_i} = \ssum{\rho:\IN_0^d \ra \IN_0^n\\\\ \sum_\alpha \rho(\alpha) = \beta} \Bigl( \prod_{i=1}^n \frac{\beta_i!}{\prod_\alpha \rho_i(\alpha)!} \Bigr) \prod_{i=1}^n \prod_{\alpha \in \IN_0^d} \Bigl( \frac{p_\alpha^i}{\alpha!}\,x^\alpha \Bigr)^{\rho_i(\alpha)} \\\\ &= \beta! \; \ssum{\rho:\IN_0^d \ra \IN_0^n \\\\ \sum_\alpha \rho(\alpha) = \beta} \Bigl( \prod_{\alpha \in \IN_0^d} \frac{1}{\rho(\alpha)!} \frac{(p_\alpha)^{\rho(\alpha)}}{(\alpha!)^{|\rho(\alpha)|}} \Bigr) \; x^{\sum_\alpha \alpha|\rho(\alpha)|}. \end{align*}\]

Substituting into \(F(P(x))\) and extracting the \(x^\alpha\) coefficient gives:

\[\begin{align*} \frac{\del^\alpha (f \circ \phi) (0)}{\alpha!} = [x^\alpha]\, (F \circ P) = [x^\alpha]\, \sum_{\beta\in\IN_0^n} \frac{f_\beta}{\beta!}\,(y \circ P)^\beta = \sum_{\beta\in\IN_0^n} f_\beta\, \sum_{\rho\,(*)} \Bigl( \prod_{\gamma \in \IN_0^d} \frac{1}{\rho(\gamma)!} \frac{(p_\gamma)^{\rho(\gamma)}}{(\gamma!)^{|\rho(\gamma)|}} \Bigr) \end{align*}\]

Here \((*)\) runs over all maps \(\rho:\IN_0^d \ra \IN_0^n\) satisfying \(\sum_\gamma \rho(\gamma) = \beta\) and \(\sum_\gamma \gamma|\rho(\gamma)| = \alpha\). We convert this sum into a tuple sum as follows: Fix a total order "\(<\)" on \(\IN_0^d\) and write \(\supp(\rho) = \{\alpha_1 < \cdots < \alpha_s\} \subset \IN_0^{d}\). Setting \(\beta_j = \rho(\alpha_j) \in \IN_0^{n}\) for \(j = 1,\dots,s\), the datum of a map \(\rho: \IN_0^d \ra \IN_0^n\) is equivalent to a tuple \((\alpha_1,\dots,\alpha_s; \beta_1,\dots,\beta_s)\) with \(\alpha_r \in \IN_0^{d}\), \(\beta_r \in \IN_0^{n} \setminus \set{0}\) and \(\alpha_1 < \cdots < \alpha_s\). The constraints become: \(\beta_1 + \cdots + \beta_s = \beta\) and \(|\beta_1|\alpha_1 + \cdots + |\beta_s|\alpha_s = \alpha\).

Since \(P(0) = 0\), we have \(p_0 = 0\), so only tuples with \(\alpha_j \neq 0\) for all \(j\) contribute. The constraint \(|\beta_1|\alpha_1 + \cdots + |\beta_s|\alpha_s = \alpha\) forces \(1 \leq |\beta| \leq |\alpha|\): indeed \(|\alpha| = \sum_j |\alpha_j||\beta_j| \geq \sum_j |\beta_j| = |\beta|\) since \(|\alpha_j| \geq 1\), and \(\alpha \neq 0\) implies \(s \geq 1\) hence \(\beta \neq 0\). This concludes the proof.

As a by-product of the proof, we obtain the following map-form of the Faa di Bruno:

(15) Corollary (Map-Form Faa di Bruno). In the setting of (14) 🔗 we have the following formula:

\[\begin{align*} \frac{\del^\alpha (f \circ \phi) (x_0)}{\alpha!} = \sum_{\beta\in\IN_0^n} \del^\beta f(y_0)\, \sum_{\rho\,(*)} \Bigl( \prod_{\gamma \in \IN_0^d} \frac{1}{\rho(\gamma)!} \frac{(\del^\gamma \phi)^{\rho(\gamma)}}{(\gamma!)^{|\rho(\gamma)|}}. \Bigr) \end{align*}\]

Here \((*)\) runs over all maps \(\rho:\IN_0^d\setminus\set{0} \ra \IN_0^n\) satisfying \(\sum_\gamma \rho(\gamma) = \beta\) and \(\sum_\gamma \gamma|\rho(\gamma)| = \alpha\).

IRemark (Factorials and Symmetry). Controlling the complexity of combinatorial expressions is a key technical difficulty when working with Faà di Bruno. An important insight in this presentation is that factorials are always associated with symmetrization: passing from multilinear maps to polynomials via diagonal evaluation, or quotienting out ordering information (e.g., from ordered to unordered partitions). Formulas are often simpler when symmetry is not yet imposed. We deliberately keep each factorial adjacent to the corresponding symmetrized quantity to make the combinatorics transparent.

Leibniz Formula

As an example demonstrating the utility of the Faà di Bruno formula in composition form, we provide a general Leibniz rule.

(16) Proposition (Leibniz rule). Let \(E\) be a real normed vector space and let \(f_1,\dots,f_n:E\to\IR\) be \(k\)-times Fréchet differentiable at a point \(x \in E\). Denote by \(\widetilde{T}^k_{x}(f) = f(x) + T^k_{x}(f) = \sum_{m = 0}^k \frac{1}{m!} D^m f[h^{\tensor m}]\) the augmented Taylor polynomial (including the degree=0) term. Then:

  1. (Taylor form). $$ \widetilde{T}^k_{x}(f_1 \cdots f_n) = \pi_{\leq k}(\widetilde{T}^k_{x}(f_1) \cdots \widetilde{T}^k_{x}(f_n)). $$

  2. (Partition form.) For \(u_1,\dots,u_k\in E\), $$ D^k (f_1 \cdots f_n)(x)[u_1,\dots,u_k] = \ssum{(B_1,\dots,B_n) \opart [k]}\ D^{|B_1|} f_1(x)[u_{B_1}] \cdots D^{|B_n|} f_n(x)[u_{B_n}] $$ Where we use the identity \(D^0 f_i(x) [] = f_i(x)\) for the empty block.

  3. (Multi-index form.) If \(E=\IR^d\) and \(\nu\in\IN_0^d\) with \(|\nu|=k\), then $$ \frac{\del^\nu}{\nu!}(f_1 \cdots f_n)(x) = \ssum{\nu_1,\dots,\nu_n \in \IN_0^d \\ \nu_1 + \dots + \nu_n = \nu}\ \frac{\del^{\nu_1}f_1(x)}{\nu_1!} \cdots \frac{\del^{\nu_n}f_n(x)}{\nu_n!} $$ where the sum ranges over all tuples \((\nu_1,\dots,\nu_n)\in(\IN_0^d)^n\) with \(\nu_i \in \IN_0^d\) with \(\nu_1+\cdots+\nu_n=\nu\).

Proof. For \(k = 0\) both expressions are trivially true. Hence we may assume \(k \geq 1\).

Ad 1) Set \(\phi = (f_1,\dots,f_n): E \ra \IR^n\), and \(\psi(y) = \prod_{i=1}^n y_i: \IR^n \ra \IR\), \(c_i = f_i(x)\) so that \(\psi \circ \phi = f_1\cdots f_n\) and \(c = \phi(x)\).

As \(\psi(y) = \prod_{i=1}^n y_i\) is a polynomial itself, the Taylor polynomial at \(c\) can be calculated by translation and degree truncation: \(T^k_{c} \psi (h) = \pi_{\leq k} (\psi(c + h) - \psi(c))\). Furthermore \(T^k_{x} \phi = (T^k_{x} f_1, \dots, T^k_{x}f_n) \in \mathcal{P}(E,\IR^n)=\mathcal{P}(E)\tensor \IR^n\).

Now Faa di Bruno in Composition form (Theorem (9) 🔗) yields \(T_{x}^k (f_1 \cdots f_n) = \pi_{\leq k} (\prod_{i=1}^n (c_i + T_{x}^k f_i) - \prod_{i=1}^n c_i)\). Adding \(\prod_{i=1}^n c_i\) on both sides gives the claimed identity: $$ \widetilde{T}^k_{x}(f_1 \cdots f_n) = \pi_{\leq k}(\widetilde{T}^k_{x}(f_1) \cdots \widetilde{T}^k_{x}(f_n)). $$

Ad 2) The partition formula is derived by applying \(Pol_k\) to both sides of (1). We hence find: $$ D^k (f_1 \cdots f_n)(x) = \Pol_k \prod_{i=1}^n \sum_{m=0}^k \frac{1}{m!} D^m f_i(x)[h^{\tensor m}] = \Pol_k \ssum{m_1,\dots,m_n \in \IN_0 \\ m_1 + \dots + m_n = k} \prod_{i=1}^n \frac{1}{m_i!} D^{m_i} f_i(x)[h^{\tensor m_i}]. $$

The claim follows by the Polarization Lemma (11) 🔗 applied to \(B[a_1,\dots,a_n] = \prod a_i\).

Ad 3) Expanding the taylor series in multi-index form \(\widetilde{T}_x^k f_i(h) = \sum_{0 \leq |\mu| \leq k} \frac{h^\mu}{\mu!} \del^\mu f_i(x)\), we find: $$ \frac{\del^\nu F(x)}{\nu!} = [h^\nu] \prod_{i=1}^n \sum_{0 \leq |\mu| \leq k} \frac{h^\mu}{\mu!} \del^\mu f_i(x) = \sum_{\nu_1 + \cdots + \nu_n = \nu} \frac{\del^{\nu_1} f_1(x)}{\nu_1!} \cdots \frac{\del^{\nu_n} f_n(x)}{\nu_n!}. $$