Near log-convexity of heat and the $k$-Hamming distance problem

Mert Sağlam

University of Washington

Hamming distance defined

$\Ham(x,y)\colon$ number of mismatches of $x,y$.
$\Ham(x,y) = 4$

$k$-Hamming distance is hard to compute...

  1. Comunication complexity. Any $k$-Hamming distance protocol requires $\Omega(k \log k)$ bits (best previous $\Omega(k)$)
  2. Property testing. Any property tester for $k$-linearity requires $\Omega(k \log k)$ queries. (best previous $\Omega(k)$)
  3. Parity decision trees. Any randomized parity decision tree for $k$-Hamming weight has size $\exp(\Omega(k \log k))$
  4. (best previous unknown)

...due to the way heat behaves

Model I

Communication complexity

Communication complexity: $k$-Hamming distance

$x=010010101010101110$$y=010010101010010100$$\Ham(x,y)=?$
  • Alice receives $x\in\{0,1\}^n$.
  • Bob receives $y\in\{0,1\}^n$.
  • Promised: $\Ham(x,y)\le k$.
  • Both have access to shared randomness R.
  • Goal: output $\Ham(x,y)$
R

An $O(k\log k)$ bits protocol

  • Consider the graph on $V=\{0,1\}^n$
  • Put an edge between $u,v$ if $\Ham(u,v)\le 2k$
  • Alice and Bob fix a proper coloring of this graph
    Protocol.
  1. Alice sends the color of $x$ over to Bob.
  2. Bob declares closest string to $y$ with the given color as $x$. This recovers $x$. Suppose not
  • Takes $O(k\log(n/k))$ bits of communication.
  • By hashing, we may assume $n\approx 10 k^2$
$\Ham(u,v)\le 2k$yx yxx'≤k≤Ham(x, y)≤k≤2k (triangle ineq.)

Known bounds until 2011

BoundRoundsErrorViaReference
$O(k\log (k/\delta))$1-round$\delta$Folklore
$\Omega(k)$any1/3DisjKalyanasundaram, Schnitger'1987
$\Omega(k \log (1/\delta))$1-round$\delta$IndexSağlam'2011
$\Omega(k \log (1/\delta))$1-round$\delta$Augmented IndexJayram, Woodruff'2011
$\Omega(k)$, quantumany1/3$k$-IntersectionRazborov'2002
Huang, Shi, Zhang, Zhu'2005

Model II

Property testing

Property testing: $k$-linearity

Definition. A function $f\colon\field_2^n\to\field_2$ is $k$-linear if $f(x) = \langle x,w\rangle$ for a $w\in\field_2^n$ with $\|w\|_1\le k$.
Given blackbox query access to an unknown function $g\colon\field_2^n\to\field_2$,
  • Accept, if $g$ is $k$-linear,
  • Reject, if for any $k$-linear $f$, we have $\|f-g\|_1\ge \eps 2^n$.
What is the minimum number of queries to distinguish the two cases?
$\epsilon 2^n$
$k$-linear
$\eps$ away from $k$-linear

Property testing

  • Blais, Brody, Matulef'2011:
    A new connection between $k$-Hamming distance and property testing
  • Lower bound for $k$-Hamming distance
    lower bounds for property testing of
    • $k$-linearity
    • $k$-juntas
    • $k$-DNFs
  • A new wave of interest in $k$-Hamming distance problem

Property testing: $k$-linearity

BoundAdaptivityErrorViaReference
$O(k^{3/2})$1-round$1/k$$k$-juntasBlais'2008
$O(k\log k)$ $O(\log k)$-round$1/k$$k$-juntasBlais'2009
$O(k\log k)$1-round$1/k$DirectBuhrman, Garcia-Soriano, Matsliah, de Wolf'2012
$\Omega(k)$any$1/3$DirectBlais, Kane'2012
$\Omega(k)$ any$1/3$Communication complexityBlais, Brody, Matulef'2011
The same gap as before!

Back to communication complexity:

$k$-Hamming distance bounds after 2011

BoundRoundsErrorViaReference
$\Omega(k\log k)$1-round1/3$\Disj_k$Buhrman, Garcia-Soriano, Matsliah, de Wolf'2012
$\Omega(k\log k)$1-round1/3$\Disj_k$Dasgupta, Kumar, Sivakumar'2012
$\Omega(k\log^{(r)} k)$$r$-round1/3$\Disj_k$Sağlam, Tardos'2013
$\Omega(k\log (1/\delta))$any$\delta$OR∘1vs3Blais, Brody, Ghazi'2014
$\Omega(k\log (k/\delta))$any$\delta$$k$-HammingSağlam'2018 (This work)

Model III

Parity decision trees

Parity decision trees: $k$-Hamming weight

$\langle x, w\rangle=?$01$\langle x_0, w\rangle=?$$x_{00}$$x_{01}$010101NONOYESYES$x_{011}$NO01$x_1$$x_{11}$NONOYES01
  • Suppose a secret $w\in\field_2^n$ is chosen
  • Task: Decide if $\|w\|_1\lt k$
  • We can only pick query points $x\in\field_2^n$ and receive $\langle x,w\rangle\in\field_2^n$
  • We measure
    Depth: longest path in the tree
    Size: number of nodes

Whole picture

k-Hamming weight
log-PDT-size
$\le$$k$-Hamming weight
PDT-depth
k-Hamming distance
Comm. complexity
$\le$, simulation
k-linearity
Property testing
$\le$, simulation
$\le$, Blais, Brody, Matulef'2011A near-log-convexity statement about Markov chains / heat
(Main theorem)
$\Omega(k\log k)$$\Omega(k\log k)$
PDT: Parity Decision Tree

From PDTs to random walks

  • Given a PDT that on input $w$
    • accepts if $\|w\|_1\le k$ and,
    • rejects if $\|w\|_1\gt k$.
  • Create a PDT that on input $w$
    • accepts if $\|w\|_1 = k$,
    • rejects if $\|w\|_1 \in \{k-2, k+2\}$
  • Let $a_k = $ fraction of weight-$k$ inputs accepted
  • The way heat behaves ⇒ If PDT is small, $\ldots, a_{k-2}, a_{k}, a_{k+2},\ldots$
    is nearly log-convex

Contribution of a single leaf to $a_k$:

$\langle x, w\rangle=?$$x_0$$x_{01}$$x_{011}$$x_1$$x_{11}$$x_{00}$011YES0001NOYES1NO01NO01NONOYES01Ab
  • Which $w\in\{0,1\}^n$ are accepted at this leaf?
  • The set $T = \{w\mid b=Aw\}$
  • Can $T$ have a lot of weight-$k$ vectors, but very few weight-$(k-2)$ and weight-$(k+2)$ vectors?

From PDTs to random walks

  • Consider the random walk on $\{0,1\}^n$:
    1. Start at $w_0=0$
    2. Flip a random coordinate of $w_t$ in each time step $t$, obtain $w_{t+1}$
  • Induces a random walk on $Aw$.
  • Observe: If we walk for $t\le \frac{n^{1/2}}{10}$ steps, $\Pr[\|w_t\|_1 = t]\ge 0.9$

From PDTs to Markov chains

  • This leaf accepts $w_t$ iff $Aw_t = b$
  • Define $m_k =$ fraction of weight $k$ strings accepted by this leaf
  • We have $m_t \approx\Pr[Aw_t = b]$. (for $t\le n^{1/2}/10$)
  • Understand: probability that we start from node 0 and hit node $b$ at time $t$.

A question on graphs

0b
  • Start from node $0$. Perform a discrete time random walk.
  • Let $S$ be the normalized adjacency matrix of the graph (i.e., stochastic)
  • We have $m_t =\langle \indicate_b, S^t \indicate_0\rangle$
  • Note that $\|\indicate_b\|_2 = \|\indicate_0\|_2 = 1$.

A question on Markov chains

  • More generally for positive valued $u,v\colon\Omega\to\reals_+$ with $\|u\|_2=\|v\|_2=1$
    $m_t = \langle v, S^tu\rangle$
  • What can we say about $t\mapsto m_t$ that holds for any $S, u, v$?

The Erdős-Simonovits conjecture

Conjecture. (Erdős, Simonovits'1982)
For $S$ a 0-1 matrix, $u = v$ normalized uniform,
and $k\gt t$ natural numbers the same parity we have $m_{k}^{1/k}\ge m_t^{1/t}$.
Alternative stmt. For a graph $G=(V,E)$,
$w_k(v)\colon$ number of $k$ walks starting at $v$

$k\mapsto (\frac{1}{|G|}\sum_v \;w_k(v))^{1/k}$

is monotone in increments of two.
Recall.
$m_t = \langle v,S^tu\rangle$

The Blakley-Dixon conjecture

Conjecture. (Blakley, Dixon'1966)
For nonnegative $u = v$, and $k\gt t$ natural numbers the same parity we have $m_{k}^{1/k}\ge m_t^{1/t}$.

Our results

Theorem I. (This work)
For nonnegative $u$, $v$ which may be different, and $k\gt t$ natural numbers the same parity we have $m_{k}^{1/k}\ge m_t^{1/t}$. (Equivalent: $m_{t+2}\ge m_t^{1+2/t}$)
Theorem II. (This work)
For any $\eps\gt 0$, there is a $\delta\gt 0$ such that
for nonnegative $u$, $v$ which may be different, either
  • $m_{t+2} m_{t-2}\ge \delta m_t^2$
  • $m_{t+2} \ge t^{1-\eps} m_t^{1+2/t}$
Recall.
$m_t = \langle v,S^tu\rangle$

An example random walk

$x_0$$x_1$$x_{t-1}$$x_t$$\alpha$$\alpha$. . .$\alpha$$\alpha$
Consider $\Omega=\{x_0,\ldots,x_t\}$
$u= \indicate_{x_0}$ and $v=\indicate_{x_t}$
$m_{t-4}$$\,=\,$0
$m_{t-2}$$\,=\,$0
$m_{t}$$\,=\,$$\alpha^t$
$m_{t+2}$$\,=\,$$t\cdot\alpha^{t+2}$
$m_{t+4}$$\,=\,$$t(t+1)/2\cdot \alpha^{t+4}$
Theorem II. We have either
  • $m_{t+2} m_{t-2}\ge \delta m_t^2$, or
  • $m_{t+2} \ge t^{1-\eps} m_t^{1+2/t}$

Theorem I: A proof sketch

The probabilistic setup

x1x2x3x4x5x6Ωa$\mu(x_1)$$\mu(x_2)$$\mu(x_3)$$\mu(x_4)$b$\nu(x_2)$$\nu(x_6)$$\nu(x_5)$
  1. Recall $m_t=\langle v,S^tu\rangle$
  2. Set $\mu \defeq u/\|u\|_1$ and add state a with $S(a,x)=\mu(x)$
  3. Set $\nu \defeq v/\|v\|_1$ and add state b with $S(x,b)=\nu(x)$
  4. Relate $m_t\approx\Pr$ [a walker goes from a to b in exactly $t+2$ steps].

The forward walk $F^t$

x1x2x3x4x5x6Ωa$\mu(x_1)$$\mu(x_3)$b$\nu(x_2)$$\nu(x_6)$$\nu(x_5)$
  1. $F^t=(F_{-1},F_0,\ldots,F_t,F_{t+1})$.
  2. Start at a, move to $x\in\Omega$ with probability $\mu(x)$.
  3. Move according to $S$ for $t$ steps
  4. Move to b with probability $\nu(F_t)$ and fail with probability $1-\nu(F_t)$.
  5. $\Pr[F_{t+1}=b] = \langle \nu, S^t\mu\rangle$

The backward walk $B^t$

x1x2x3x4x5x6Ωa$\mu(x_1)$$\mu(x_3)$b$\nu(x_2)$$\nu(x_6)$$\nu(x_5)$
  1. $B^t=(B_{-1},B_0,\ldots,B_t,B_{t+1})$.
  2. At time $t+1$ start at b, move to $x\in\Omega$ with probability $\nu(x)$.
  3. Move according to $S$ for steps $t, t-1, \ldots, 1$. (time ticks backwards)
  4. Move to a with probability $\mu(B_0)$ and fail with probability $1-\mu(B_0)$.
  5. $\Pr[B_{-1}=a] = \langle \nu, S^t\mu\rangle$ (by symmetry of $S$)

Conditioned walk $X^t$

x1x2x3x4x5x6Ωa$\mu(x_1)$$\mu(x_3)$b$\nu(x_2)$$\nu(x_6)$$\nu(x_5)$
  • Define $X\defeq (F^t\mid F_{t+1}=b)$.
  • By symmetry, it also holds $X=(B^t\mid B_{-1}=a)$.
  • Moreover, it can be shown $X$ is Markovian:

    $\dist(X_{i+1}\mid X_{\le i}=x_{\le i}) = \dist(X_{i+1}\mid X_i=x_i)$

The Kullback-Leibler divergence

Definition. Given two distributions $p,q\colon\Omega\to\reals_+$,

$\kldiv{p}{q}\defeq\sum_x\; p(x)\log\frac{p(x)}{q(x)}$

Lemma (Gibbs' inequality).
If $p = (q\mid E)$ for some event $E$, then $\kldiv{p}{q} = \log\frac{1}{q(E)}$
Lemma (Chain Rule).

$\kldiv{A_1A_2}{B_1B_2}=\kldiv{A_1}{B_1} +\;\kldiv{A_2\mid A_1}{B_2\mid B_1}$.

The proof idea

  • By Gibbs' inequality, $\kldiv{X^t}{F^t} = \kldiv{X^t}{B^t} = -\log\iprod{\nu}{S^t\mu}$.
  • Likewise, $\kldiv{X^{t+2}}{F^{t+2}} = -\log\iprod{v}{S^{t+2}u}$.

    “To lower bound $m_{t+2}\approx \iprod{\nu}{S^{t+2}\mu}$, show $\kldiv{X^{t+2}}{F^{t+2}}$ is small”

  • To show $\kldiv{X^{t+2}}{F^{t+2}}$ is small construct a random walk $Z$ which
    • Goes from a to b in $t+4$ steps...
      ...which certifies $\kldiv{X^{t+2}}{F^{t+2}}\le\;\kldiv{Z}{F^{t+2}}$
    • Has, roughly, $\kldiv{Z}{F^{t+2}}\le \frac{t+2}{t}\kldiv{X^t}{F^t}$...
      ...which certifies $m_{t+2}\ge m_t^{1+2/t}$

The proof idea

Gibbs' inequalitySpecial property of $Z$
(Gap here exactly compensates for $u\to\mu$ and $v\to\nu$ change)
By virtue of $Z$ going from a to b in $t+4$ steps
$-\;\frac{t+2}{t}\log\iprod{\nu}{S^t\mu}=\frac{t+2}{t}\kldiv{X^t}{F^t}\gtapprox \;\kldiv{Z}{F^{t+2}}\ge\;\kldiv{X^{t+2}}{F^{t+2}} = -\log\iprod{\nu}{S^{t+2}\mu}$

The $Z$ walk

$X^t$$X^t$ab$Z$
  • Pick $J$ uniform random in $[t]$
  • $(Z\mid J)$ will be Markovian constructed as:
  • For time steps $i\lt J$, walk like $\dist(X_{i+1}\mid X_i)$.
  • At time step $j=J$, walk according to $\dist(X_{j-1}\mid X_j)$ (backwards in time)
  • For $i\gt J$, walk like $\dist(X_{i-1}\mid X_{i-2})$ (Lagging 2 steps behind).

Analyzing $Z$

$X^t$$X^t$ab$Z$
  • $\kldiv{X}{F}$ $=$ $\kldiv{X_0}{F_0} +$ $\kldiv{X_1\mid X_0}{F_1\mid F_0}$ $+\ldots +$ $\kldiv{X_{t+1}\mid X_t}{F_{t+1}\mid F_t}$
  • $\kldiv{X}{B}$ $=$ $\kldiv{X_t}{B_t} +$ $\kldiv{X_{t-1}\mid X_t}{B_{t-1}\mid B_t}$ $+\ldots +$ $\kldiv{X_{-1}\mid X_0}{B_{-1}\mid B_0}$
  • $\kldiv{Z\mid J=j}{F}$ $=$ $\kldiv{X_0}{F_0} +$ $\kldiv{X_1\mid X_0}{F_1\mid F_0} + \ldots$
    $\kldiv{X_{j-1}\mid X_j}{B_{j-1}\mid B_j} + $ $\ldots + \kldiv{X_{t+1}\mid X_t}{F_{t+1}\mid F_t}$

Analyzing $Z$

$X^t$$X^t$ab$Z$
  • Expand $Z\mid J$; averaging over all $j$, the red terms sum to

    $\frac{1}{t}(\kldiv{X}{B^t} -\;\kldiv{X_t}{B_t} -\;\kldiv{X_{-1}\mid X_0}{B_{-1}\mid B_0})$

  • The blue terms sum to

    $\kldiv{X}{F} + \;\frac{1}{t}(\kldiv{X}{F^t} -\;\kldiv{X_0}{F_0} -\;\kldiv{X_{t+1}\mid X_t}{F_{t+1}\mid F_t})$

Analyzing $Z$

$X^t$$X^t$$Z$
  • Analyze missing terms $\kldiv{X_t}{B_t} + \;\kldiv{X_{t+1}\mid X_t}{F_{t+1}\mid F_t}$ together.
  • Via convexity, $\kldiv{X_t}{B_t} +\;\kldiv{X_{t+1}\mid X_t}{F_{t+1}\mid F_t}\ge -\log\|\nu\|_2^2$.
  • Likewise for $\kldiv{X_0}{F_0} +\;\kldiv{X_{-1}\mid X_0}{B_{-1}\mid B_0}$,
    a lower bound $-\log\|\mu\|_2^2$ holds.

Finishing the proof

We showed true with a slack
$-\;\log\|u\|_2^2 -\log\|v\|_2^2$

$-\;\frac{t+2}{t}\log\iprod{\nu}{S^t\mu}=\frac{t+2}{t}\kldiv{X^t}{F^t}\gtapprox \;\kldiv{Z}{F^{t+2}}\ge\;\kldiv{X^{t+2}}{F^{t+2}} = -\log\iprod{\nu}{S^{t+2}\mu}$

  • But this slack exactly compensates the terms coming from $u\to\mu$ and $v\to\nu$ substitution.
  • We get exactly $m_{t+2}\ge m_t^{1+2/t}$

An analogue for continuous time?

$m_t = \frac{1}{(2\pi t)^{1/2}} e^{-b^2 / (2t)}$
Conjecture. Let $u,v\colon\Omega\to\reals_+$ with $\|u\|_2=\|v\|_2=1$
be positive valued unit vectors. Defining

$m_t=\langle v, e^{t(S-I)}u\rangle$,

we have for $\alpha\ge 0$,

$m_{t(1+\alpha)} \cdot m_{t/(1+\alpha)}\ge m_t^{2 + \alpha^2/(1+\alpha)}$.

Thank you!

&

Questions?

Conjecture. Let $u,v\colon\Omega\to\reals_+$ with $\|u\|_2=\|v\|_2=1$
be positive valued unit vectors. Defining

$m_t=\langle v, e^{t(S-I)}u\rangle$,

we have for $\alpha\ge 0$,

$m_{t(1+\alpha)} \cdot m_{t/(1+\alpha)}\ge m_t^{2 + \alpha^2/(1+\alpha)}$.