The Smallest Grammar Problem
Moses Charikar, Eric Lehman, Ding Liu, Rina Panigrahy, Manoj Prabhakaran, Amit Sahai, abhi shelat
IEEE Transactions on Information Theory, Vol. 51, Issue 7, Jul 2005, p2554-2576.
This paper subsumes
Approximating the Smallest Grammar: Kolmogorov Complexity in Natural Models
Moses Charikar, Eric Lehman, Ding Liu, Rina Panigrahy, Manoj Prabhakaran, April Rasala, Amit Sahai, abhi shelat
STOC 2002
IEEE Transactions on Information Theory, Vol. 51, Issue 7, Jul 2005, p2554-2576.
This paper subsumes
Approximating the Smallest Grammar: Kolmogorov Complexity in Natural Models
Moses Charikar, Eric Lehman, Ding Liu, Rina Panigrahy, Manoj Prabhakaran, April Rasala, Amit Sahai, abhi shelat
STOC 2002
This paper addresses the {\em smallest grammar problem}: What is the {\em smallest} context-free grammar that generates exactly one given string $\sigma$?
This is a natural question about a fundamental object connected to many fields, including data compression, Kolmogorov complexity, pattern identification, and addition chains.
Due to the problem's inherent complexity, our objective is to find an approximation algorithm which finds a {\em small} grammar for the input string. We focus attention on the {\em approximation ratio} of the algorithm (and implicitly, worst-case behavior) to establish provable performance guarantees and to address short-comings in the classical measure of {\em redundancy} in the literature.
Our first results are a variety of hardness results, most notably that every efficient algorithm for the smallest grammar problem has approximation ratio at least $\frac{8569}{8568}$ unless $P = NP$.
We then bound approximation ratios for several of the best-known grammar-based compression algorithms, including {\sc LZ78}, {\sc Bisection}, {\sc Sequential}, {\sc Longest Match}, {\sc Greedy}, and {\sc Re-Pair}. Among these, the best upper bound we show is $O(n^{1/2})$.
We finish by presenting two novel algorithms with exponentially better ratios of $O(\log^3 n)$ and $O(\log(n / m^*))$, where $m^*$ is the size of the smallest grammar for that input. The latter highlights a connection between grammar-based compression and {\sc LZ77}.
[PDF]
This is a natural question about a fundamental object connected to many fields, including data compression, Kolmogorov complexity, pattern identification, and addition chains.
Due to the problem's inherent complexity, our objective is to find an approximation algorithm which finds a {\em small} grammar for the input string. We focus attention on the {\em approximation ratio} of the algorithm (and implicitly, worst-case behavior) to establish provable performance guarantees and to address short-comings in the classical measure of {\em redundancy} in the literature.
Our first results are a variety of hardness results, most notably that every efficient algorithm for the smallest grammar problem has approximation ratio at least $\frac{8569}{8568}$ unless $P = NP$.
We then bound approximation ratios for several of the best-known grammar-based compression algorithms, including {\sc LZ78}, {\sc Bisection}, {\sc Sequential}, {\sc Longest Match}, {\sc Greedy}, and {\sc Re-Pair}. Among these, the best upper bound we show is $O(n^{1/2})$.
We finish by presenting two novel algorithms with exponentially better ratios of $O(\log^3 n)$ and $O(\log(n / m^*))$, where $m^*$ is the size of the smallest grammar for that input. The latter highlights a connection between grammar-based compression and {\sc LZ77}.
[PDF]
0 TrackBacks
Listed below are links to blogs that reference this entry: The Smallest Grammar Problem.
TrackBack URL for this entry: http://www.cs.virginia.edu/~shelat/mt/mt-tb.cgi/7
Leave a comment