Gábor Tardos)
Manuscript 2020
We answer a 1982 conjecture of Erdős and Simonovits about the growth of number of $k$-walks in a graph, which incidentally was studied even earlier by Blakley and and Dixon in 1966. We prove this conjecture in a more general setup than the earlier treatment, furthermore, through a refinement and strengthening of this inequality, we resolve two related open questions in complexity theory: the communication complexity of the $k$-Hamming distance is $\Omega(k\log k)$ and that consequently any property tester for $k$-linearity requires $\Omega(k\log k)$ queries.
Suppose some initial heat configuration $u\colon\Omega\to\reals_+$ over a discrete space $\Omega$ is given. The configuration evolves according to the map $w\mapsto S w$ in discrete time steps $t=0,1,\ldots$ for some symmetric stochastic matrix $S\colon\Omega\times\Omega\to\reals_+$ and we measure the amount of heat contained in a certain region $v$ over time: $m_t=\langle v,S^tu\rangle$ denotes our measurement at time $t$, where $\|u\|_2 = \|v\|_2 = 1$ for normalization. What can we say about $\{m_t\}_{t=0}^{\infty}$ that holds true for any $S,u,v$?
We show that the $r$-round communication complexity of the $k$-disjointness problem is precisely $\Theta(k\log^{(r)} k)$ bits. Setting $r=\log^* k$, our upper bound gives an $O(k)$-bits, $\log^* k$-rounds protocol with error probability exponentially small in $k$. This improves the best previous protocol due to Håstad and Wigderson, which ran in $O(\log k)$ rounds and erred with constant probability.
Our lower bound applies even to the simpler task of computing the OR of $k$ independent instances of equality. An important aspect of our work is that the lower bound we get is super-linear in $k$, even though a single instance of equality can be solved with $O(1)$ bits and constant error probability.
We present $\eps$-error $L_p$ sampling algorithms using roughly $O(\eps^{-p}\log^2 n)$ bits of space for $p\in[0,2)$ in the streaming model and show that $\Omega(\log^2 n)$ space is required by any such algorithm.
As an application, we present an $O(\log^2 n)$ space algorithm for finding a duplicate in a stream of length $n+1$ over the alphabet $[n]$, improving on the previous $O(\log^3 n)$ bound due to Gopalan and Radhakrishnan. Our $\Omega(\log^2 n)$ lower bound applies here as well, showing this improvement is final.
Update 2: In FOCS 2018, Rajesh Jayaram and David P. Woodruff gave perfect $L_p$ samplers for $p\in[0,2)$ with $O(\log^2 n)$ bits of space, improving on our $O(\eps^{-p}\log^2 n)$ space $\eps$-error sampler. [arXiv]
Update 1: In FOCS 2017, Michael Kapralov, Jelani Nelson, Jakub Pachocki, Zhengyu Wang, David P. Woodruff, Mobin Yahyazadeh gave an $\Omega(\log^2 n \log(1/\delta))$ lower bound to universal relation, where $\delta$ is the error probability of the protocol, improving on our $\Omega(\log^2 n)$ lower bound. [arXiv]
These together settle the 2 out of the 3 questions we left open in this work.We give an $O(\log^2 n)$ space one pass streaming algorithm that finds the period of a stream, provided the stream is periodic. At the core of this algorithm is a new one pass $O(\log n\log m)$ space pattern matching algorithm for finding occurrences of a length $m$ pattern in a text of size $n$. This algorithm uses similar ideas to Porat and Porat’s algorithm in FOCS 2009 but it does not need an offline pre-processing stage and is simpler.
In the second part, we study distance to $p$-periodicity under the Hamming metric and provide space efficient approximation algorithms.