Near log-convexity of heat and the $k$-Hamming distance problem

Mert Sağlam

University of Washington

Hamming distance defined

$\Ham(x,y)\colon$ number of mismatches of $x,y$.

$\Ham(x,y) = 4$

$k$-Hamming distance is hard to compute...

Comunication complexity. Any $k$-Hamming distance protocol requires $\Omega(k \log k)$ bits (best previous $\Omega(k)$)
Property testing. Any property tester for $k$-linearity requires $\Omega(k \log k)$ queries. (best previous $\Omega(k)$)
Parity decision trees. Any randomized parity decision tree for $k$-Hamming weight has size $\exp(\Omega(k \log k))$

(best previous unknown)

...due to the way heat behaves

Model I

Communication complexity

Communication complexity: $k$-Hamming distance

Alice receives $x\in\{0,1\}^n$.
Bob receives $y\in\{0,1\}^n$.
Promised: $\Ham(x,y)\le k$.
Both have access to shared randomness R.
Goal: output $\Ham(x,y)$

An $O(k\log k)$ bits protocol

Consider the graph on $V=\{0,1\}^n$
Put an edge between $u,v$ if $\Ham(u,v)\le 2k$
Alice and Bob fix a proper coloring of this graph

Protocol.

Alice sends the color of $x$ over to Bob.
Bob declares closest string to $y$ with the given color as $x$. This recovers $x$. Suppose not

Takes $O(k\log(n/k))$ bits of communication.
By hashing, we may assume $n\approx 10 k^2$

Known bounds until 2011

Bound	Rounds	Error	Via	Reference
$O(k\log (k/\delta))$	1-round	$\delta$		Folklore
$\Omega(k)$	any	1/3	Disj	Kalyanasundaram, Schnitger'1987
$\Omega(k \log (1/\delta))$	1-round	$\delta$	Index	Sağlam'2011
$\Omega(k \log (1/\delta))$	1-round	$\delta$	Augmented Index	Jayram, Woodruff'2011
$\Omega(k)$, quantum	any	1/3	$k$-Intersection	Razborov'2002 Huang, Shi, Zhang, Zhu'2005

Model II

Property testing

Property testing: $k$-linearity

Definition. A function $f\colon\field_2^n\to\field_2$ is $k$-linear if $f(x) = \langle x,w\rangle$ for a $w\in\field_2^n$ with $\|w\|_1\le k$.

Given blackbox query access to an unknown function $g\colon\field_2^n\to\field_2$,

Accept, if $g$ is $k$-linear,
Reject, if for any $k$-linear $f$, we have $\|f-g\|_1\ge \eps 2^n$.

What is the minimum number of queries to distinguish the two cases?

$\epsilon 2^n$

$k$-linear
$\eps$ away from $k$-linear

Property testing

Blais, Brody, Matulef'2011:
A new connection between $k$-Hamming distance and property testing
Lower bound for $k$-Hamming distance ⇒
lower bounds for property testing of
- $k$-linearity
- $k$-juntas
- $k$-DNFs
A new wave of interest in $k$-Hamming distance problem

Property testing: $k$-linearity

Bound	Adaptivity	Error	Via	Reference
$O(k^{3/2})$	1-round	$1/k$	$k$-juntas	Blais'2008
$O(k\log k)$	$O(\log k)$-round	$1/k$	$k$-juntas	Blais'2009
$O(k\log k)$	1-round	$1/k$	Direct	Buhrman, Garcia-Soriano, Matsliah, de Wolf'2012
$\Omega(k)$	any	$1/3$	Direct	Blais, Kane'2012
$\Omega(k)$	any	$1/3$	Communication complexity	Blais, Brody, Matulef'2011

The same gap as before!

Back to communication complexity:

$k$-Hamming distance bounds after 2011

Bound	Rounds	Error	Via	Reference
$\Omega(k\log k)$	1-round	1/3	$\Disj_k$	Buhrman, Garcia-Soriano, Matsliah, de Wolf'2012
$\Omega(k\log k)$	1-round	1/3	$\Disj_k$	Dasgupta, Kumar, Sivakumar'2012
$\Omega(k\log^{(r)} k)$	$r$-round	1/3	$\Disj_k$	Sağlam, Tardos'2013
$\Omega(k\log (1/\delta))$	any	$\delta$	OR∘1vs3	Blais, Brody, Ghazi'2014
$\Omega(k\log (k/\delta))$	any	$\delta$	$k$-Hamming	Sağlam'2018 (This work)

Model III

Parity decision trees

Parity decision trees: $k$-Hamming weight

Suppose a secret $w\in\field_2^n$ is chosen
Task: Decide if $\|w\|_1\lt k$
We can only pick query points $x\in\field_2^n$ and receive $\langle x,w\rangle\in\field_2^n$
We measure
Depth: longest path in the tree
Size: number of nodes

Whole picture

From PDTs to random walks

Given a PDT that on input $w$
- accepts if $\|w\|_1\le k$ and,
- rejects if $\|w\|_1\gt k$.
Create a PDT that on input $w$
- accepts if $\|w\|_1 = k$,
- rejects if $\|w\|_1 \in \{k-2, k+2\}$
Let $a_k = $ fraction of weight-$k$ inputs accepted
The way heat behaves ⇒ If PDT is small, $\ldots, a_{k-2}, a_{k}, a_{k+2},\ldots$
is nearly log-convex

Contribution of a single leaf to $a_k$:

Which $w\in\{0,1\}^n$ are accepted at this leaf?

The set $T = \{w\mid b=Aw\}$
Can $T$ have a lot of weight-$k$ vectors, but very few weight-$(k-2)$ and weight-$(k+2)$ vectors?

From PDTs to random walks

Consider the random walk on $\{0,1\}^n$:
1. Start at $w_0=0$
2. Flip a random coordinate of $w_t$ in each time step $t$, obtain $w_{t+1}$
Induces a random walk on $Aw$.
Observe: If we walk for $t\le \frac{n^{1/2}}{10}$ steps, $\Pr[\|w_t\|_1 = t]\ge 0.9$

From PDTs to Markov chains

This leaf accepts $w_t$ iff $Aw_t = b$
Define $m_k =$ fraction of weight $k$ strings accepted by this leaf
We have $m_t \approx\Pr[Aw_t = b]$. (for $t\le n^{1/2}/10$)
Understand: probability that we start from node 0 and hit node $b$ at time $t$.

A question on graphs

Start from node $0$. Perform a discrete time random walk.
Let $S$ be the normalized adjacency matrix of the graph (i.e., stochastic)
We have $m_t =\langle \indicate_b, S^t \indicate_0\rangle$
Note that $\|\indicate_b\|_2 = \|\indicate_0\|_2 = 1$.

A question on Markov chains

More generally for positive valued $u,v\colon\Omega\to\reals_+$ with $\|u\|_2=\|v\|_2=1$
$m_t = \langle v, S^tu\rangle$
What can we say about $t\mapsto m_t$ that holds for any $S, u, v$?

The Erdős-Simonovits conjecture

Conjecture. (Erdős, Simonovits'1982)
For $S$ a 0-1 matrix, $u = v$ normalized uniform,
and $k\gt t$ natural numbers the same parity we have $m_{k}^{1/k}\ge m_t^{1/t}$.

Alternative stmt. For a graph $G=(V,E)$,
$w_k(v)\colon$ number of $k$ walks starting at $v$

$k\mapsto (\frac{1}{|G|}\sum_v \;w_k(v))^{1/k}$

is monotone in increments of two.

Recall.
$m_t = \langle v,S^tu\rangle$

The Blakley-Dixon conjecture

Conjecture. (Blakley, Dixon'1966)
For nonnegative $u = v$, and $k\gt t$ natural numbers the same parity we have $m_{k}^{1/k}\ge m_t^{1/t}$.

Our results

Theorem I. (This work)
For nonnegative $u$, $v$ which may be different, and $k\gt t$ natural numbers the same parity we have $m_{k}^{1/k}\ge m_t^{1/t}$. (Equivalent: $m_{t+2}\ge m_t^{1+2/t}$)

Theorem II. (This work)
For any $\eps\gt 0$, there is a $\delta\gt 0$ such that
for nonnegative $u$, $v$ which may be different, either

$m_{t+2} m_{t-2}\ge \delta m_t^2$
$m_{t+2} \ge t^{1-\eps} m_t^{1+2/t}$

Recall.
$m_t = \langle v,S^tu\rangle$

An example random walk

Consider $\Omega=\{x_0,\ldots,x_t\}$
$u= \indicate_{x_0}$ and $v=\indicate_{x_t}$

$m_{t-4}$	$\,=\,$	0
$m_{t-2}$	$\,=\,$	0
$m_{t}$	$\,=\,$	$\alpha^t$
$m_{t+2}$	$\,=\,$	$t\cdot\alpha^{t+2}$
$m_{t+4}$	$\,=\,$	$t(t+1)/2\cdot \alpha^{t+4}$

Theorem II. We have either

$m_{t+2} m_{t-2}\ge \delta m_t^2$, or
$m_{t+2} \ge t^{1-\eps} m_t^{1+2/t}$

Theorem I: A proof sketch

The probabilistic setup

Recall $m_t=\langle v,S^tu\rangle$
Set $\mu \defeq u/\|u\|_1$ and add state a with $S(a,x)=\mu(x)$
Set $\nu \defeq v/\|v\|_1$ and add state b with $S(x,b)=\nu(x)$
Relate $m_t\approx\Pr$ [a walker goes from a to b in exactly $t+2$ steps].

The forward walk $F^t$

$F^t=(F_{-1},F_0,\ldots,F_t,F_{t+1})$.
Start at a, move to $x\in\Omega$ with probability $\mu(x)$.
Move according to $S$ for $t$ steps
Move to b with probability $\nu(F_t)$ and fail with probability $1-\nu(F_t)$.
$\Pr[F_{t+1}=b] = \langle \nu, S^t\mu\rangle$

The backward walk $B^t$

$B^t=(B_{-1},B_0,\ldots,B_t,B_{t+1})$.
At time $t+1$ start at b, move to $x\in\Omega$ with probability $\nu(x)$.
Move according to $S$ for steps $t, t-1, \ldots, 1$. (time ticks backwards)
Move to a with probability $\mu(B_0)$ and fail with probability $1-\mu(B_0)$.
$\Pr[B_{-1}=a] = \langle \nu, S^t\mu\rangle$ (by symmetry of $S$)

Conditioned walk $X^t$

Define $X\defeq (F^t\mid F_{t+1}=b)$.
By symmetry, it also holds $X=(B^t\mid B_{-1}=a)$.
Moreover, it can be shown $X$ is Markovian:
$\dist(X_{i+1}\mid X_{\le i}=x_{\le i}) = \dist(X_{i+1}\mid X_i=x_i)$

The Kullback-Leibler divergence

Definition. Given two distributions $p,q\colon\Omega\to\reals_+$,

$\kldiv{p}{q}\defeq\sum_x\; p(x)\log\frac{p(x)}{q(x)}$

Lemma (Gibbs' inequality).

If $p = (q\mid E)$ for some event $E$, then $\kldiv{p}{q} = \log\frac{1}{q(E)}$

Lemma (Chain Rule).

$\kldiv{A_1A_2}{B_1B_2}=\kldiv{A_1}{B_1} +\;\kldiv{A_2\mid A_1}{B_2\mid B_1}$.

The proof idea

By Gibbs' inequality, $\kldiv{X^t}{F^t} = \kldiv{X^t}{B^t} = -\log\iprod{\nu}{S^t\mu}$.
Likewise, $\kldiv{X^{t+2}}{F^{t+2}} = -\log\iprod{v}{S^{t+2}u}$.
“To lower bound $m_{t+2}\approx \iprod{\nu}{S^{t+2}\mu}$, show $\kldiv{X^{t+2}}{F^{t+2}}$ is small”
To show $\kldiv{X^{t+2}}{F^{t+2}}$ is small construct a random walk $Z$ which
- Goes from a to b in $t+4$ steps...
  ...which certifies $\kldiv{X^{t+2}}{F^{t+2}}\le\;\kldiv{Z}{F^{t+2}}$
- Has, roughly, $\kldiv{Z}{F^{t+2}}\le \frac{t+2}{t}\kldiv{X^t}{F^t}$...
  ...which certifies $m_{t+2}\ge m_t^{1+2/t}$

The proof idea

The $Z$ walk

Pick $J$ uniform random in $[t]$
$(Z\mid J)$ will be Markovian constructed as:
For time steps $i\lt J$, walk like $\dist(X_{i+1}\mid X_i)$.
At time step $j=J$, walk according to $\dist(X_{j-1}\mid X_j)$ (backwards in time)
For $i\gt J$, walk like $\dist(X_{i-1}\mid X_{i-2})$ (Lagging 2 steps behind).

Analyzing $Z$

$\kldiv{X}{F}$ $=$ $\kldiv{X_0}{F_0} +$ $\kldiv{X_1\mid X_0}{F_1\mid F_0}$ $+\ldots +$ $\kldiv{X_{t+1}\mid X_t}{F_{t+1}\mid F_t}$
$\kldiv{X}{B}$ $=$ $\kldiv{X_t}{B_t} +$ $\kldiv{X_{t-1}\mid X_t}{B_{t-1}\mid B_t}$ $+\ldots +$ $\kldiv{X_{-1}\mid X_0}{B_{-1}\mid B_0}$
$\kldiv{Z\mid J=j}{F}$ $=$ $\kldiv{X_0}{F_0} +$ $\kldiv{X_1\mid X_0}{F_1\mid F_0} + \ldots$
$\kldiv{X_{j-1}\mid X_j}{B_{j-1}\mid B_j} + $ $\ldots + \kldiv{X_{t+1}\mid X_t}{F_{t+1}\mid F_t}$

Analyzing $Z$

Expand $Z\mid J$; averaging over all $j$, the red terms sum to
$\frac{1}{t}(\kldiv{X}{B^t} -\;\kldiv{X_t}{B_t} -\;\kldiv{X_{-1}\mid X_0}{B_{-1}\mid B_0})$
The blue terms sum to
$\kldiv{X}{F} + \;\frac{1}{t}(\kldiv{X}{F^t} -\;\kldiv{X_0}{F_0} -\;\kldiv{X_{t+1}\mid X_t}{F_{t+1}\mid F_t})$

Analyzing $Z$

Analyze missing terms $\kldiv{X_t}{B_t} + \;\kldiv{X_{t+1}\mid X_t}{F_{t+1}\mid F_t}$ together.
Via convexity, $\kldiv{X_t}{B_t} +\;\kldiv{X_{t+1}\mid X_t}{F_{t+1}\mid F_t}\ge -\log\|\nu\|_2^2$.
Likewise for $\kldiv{X_0}{F_0} +\;\kldiv{X_{-1}\mid X_0}{B_{-1}\mid B_0}$,
a lower bound $-\log\|\mu\|_2^2$ holds.

Finishing the proof

But this slack exactly compensates the terms coming from $u\to\mu$ and $v\to\nu$ substitution.
We get exactly $m_{t+2}\ge m_t^{1+2/t}$

An analogue for continuous time?

$m_t = \frac{1}{(2\pi t)^{1/2}} e^{-b^2 / (2t)}$

Conjecture. Let $u,v\colon\Omega\to\reals_+$ with $\|u\|_2=\|v\|_2=1$
be positive valued unit vectors. Defining

$m_t=\langle v, e^{t(S-I)}u\rangle$,

we have for $\alpha\ge 0$,

$m_{t(1+\alpha)} \cdot m_{t/(1+\alpha)}\ge m_t^{2 + \alpha^2/(1+\alpha)}$.

Thank you!

&

Questions?

Conjecture. Let $u,v\colon\Omega\to\reals_+$ with $\|u\|_2=\|v\|_2=1$
be positive valued unit vectors. Defining

$m_t=\langle v, e^{t(S-I)}u\rangle$,

we have for $\alpha\ge 0$,

$m_{t(1+\alpha)} \cdot m_{t/(1+\alpha)}\ge m_t^{2 + \alpha^2/(1+\alpha)}$.