Submodularity ⇔ Diminishing Returns — A Clean, Self-Contained Proof
Section 1 — Why this matters
Submodular functions sit at the heart of modern discrete optimization, powering algorithms for data summarization, influence maximization, active learning, cut/flow, and beyond. Their magic is twofold:
- Algorithmic tractability: submodular minimization is polynomial-time solvable; submodular maximization admits strong approximation guarantees (often via greedy).
- Modeling clarity: they capture the ubiquitous “diminishing returns” (DR) effect: the more you already have, the less benefit you gain from adding one more item.
This post proves the classic equivalence:
Theorem. For a set function F:2V→R on a finite ground set V, the following are equivalent:
- (Submodularity) For all subsets S,T⊆V,
F(S)+F(T)≥F(S∪T)+F(S∩T).
- (Diminishing Returns) For all A⊆B⊆V and all x∈V∖B,
F(A∪{x})−F(A)≥F(B∪{x})−F(B).
We’ll prove both directions.
Section 2 — The proof
Notation
- 2V is the Boolean lattice of all subsets of V.
- We write marginal gains as
ΔF(x∣S):=F(S∪{x})−F(S).
Direction 1: Submodular ⇒ Diminishing Returns
Goal. If F is submodular, then for all A⊆B and x∈/B,
ΔF(x∣A)≥ΔF(x∣B).
Proof. Apply submodularity to the pair (S,T)=(A∪{x},B):
F(A∪{x})+F(B)≥F((A∪{x})∪B)+F((A∪{x})∩B).
Because A⊆B and x∈/B, we have
(A∪{x})∪B=B∪{x},(A∪{x})∩B=A.
Hence
F(A∪{x})+F(B)≥F(B∪{x})+F(A),
which rearranges to
ΔF(x∣A)≥ΔF(x∣B).
□
Direction 2: Diminishing Returns ⇒ Submodular
Assumption (DR). For all A⊆B and x∈/B,
ΔF(x∣A)≥ΔF(x∣B).
Goal. Show that for all S,T⊆V,
F(S)+F(T)≥F(S∪T)+F(S∩T).
It is equivalent to
F(T)−F(S∩T)≥F(S∪T)−F(S).
We should notice that T⊂S∪T and
R:=T∖(S∩T)=(S∪T)∖S
To prove above inequality, we will compare the following extensions
- From S∩T, to T with elements in R
- From S, to S∪T with elements in R
Step 1 — Decompose the symmetric difference
Let
U:=S∩T,W:=S∪T,R:=W∖S=T∖U.
which is
F(T)−F(U)≥F(W)−F(S).
Thus R consists exactly of the elements that are in T but not in S. List R in an arbitrary order:
R={r1,r2,…,rk}.
Step 2 — Telescope from U to W and from S to W
Consider the telescoping expansions
F(T)−F(U)=j=1∑k[F(U∪{r1,…,rj})−F(U∪{r1,…,rj−1})]=j=1∑kΔF(rj∣U∪{r1,…,rj−1}),
and
F(W)−F(S)=j=1∑k[F(S∪{r1,…,rj})−F(S∪{r1,…,rj−1})]=j=1∑kΔF(rj∣S∪{r1,…,rj−1}).
Step 3 — Compare term-by-term using DR
For each j, we have
U∪{r1,…,rj−1}⊆S∪{r1,…,rj−1},
and rj∈/S∪{r1,…,rj−1} by construction. The DR assumption therefore gives
ΔF(rj∣U∪{r1,…,rj−1})≥ΔF(rj∣S∪{r1,…,rj−1}).
Summing over j=1,…,k yields
F(T)−F(U)≥F(W)−F(S).
This is exactly the submodularity inequality. □
Takeaways
- The DR form says marginal gains shrink as the context grows. It is the discrete analog of concavity.
- Submodularity on the lattice 2V is equivalent to DR. Either form can be used depending on whether you want clean algebraic inequalities (submodularity) or stepwise algorithmic reasoning (DR).
- This equivalence underlies the correctness and guarantees of greedy and related algorithms, and it is the bridge to convex analysis via the Lovász extension.