Compression, Complexity, and P=NP?

Model 1 vs. Model 2 for Compression

A Flaw in Compression Model 1: Source code for decompression algorithm itself might be highly complex.

Question 1: Comprehensible Compression

Is there a "comprehensible" compression takes as input a target bitstream B, and outputs useful, readable Java code?

Question 2: Optimal Compression

Is there an optimal compression takes as input a target bitstream B, and outputs the shortest possible Java program that outputs this bitstream?

Kolmogorov Complexity

The Java-Kolmogorov complexity K_J (B) is the length of the shortest Java program (in bytes) that generates B.

Fact #1: Kolmogorov Complexity is effectively independent of language. For any bit stream, the Java-Kolmogorov Complexity is no more than a constant factor larger than the Python-Kolmogorov Complexity.

Could write a Python interpreter in Java and then runs the Python program.

It means that most bitstreams are fundamentally incompressible no matter what programming language is used for the compression algorithm.

Space/Time Bounded Compression

An optimal compression algorithm takes as input a target bitstream B, and outputs the shortest possible Java program that outputs this bitstream.

No “optimal compression” algorithm exists.

Space Bounded Compression

Takes two inputs, a target bitstream B and a size S, and outputs a Java program of length ≤ S that outputs B.

Does not exist, since it could be used to find K_J (B) by binary searching on S.

Space/Time Bounded Compression

Takes three inputs, a target bitstream B, a size S, a maximum number of lines of bytecode executed T, and outputs a Java program of length ≤ S that outputs B in fewer than T executed lines of bytecode.

For each possible program p of length S or less:

  • If p compiles, run program p until either:

  • p terminates.

  • We output B.

  • T lines of bytecode are executed.

    Runtime: O(T x 2 ^ S).

Efficient Space/Time Bounded Compression

  • Need to make a more precise definition of what we mean by “efficient”.

  • Closely related to an important puzzle in computer science: Does P = NP?

P = NP?

An efficient solution to these three problems (3SAT, Independent Set, and Longest Path) would also give an efficient space/time bounded compression algorithm.

Space/time bounded compression reduces to 3SAT, Independent Set, and Longest Path.

Definition

Two important classes of yes/no problems: P: Efficiently solvable problems. NP: Problems with solutions that are efficiently verifiable.

Examples of problems in P:

  • Is this array sorted?

  • Does this array have duplicates?

Examples of problems in NP:

  • Is there a solution to this 3SAT problem?

  • In graph G, does there exist a path from s to t of weight > k?

Any decision problem for which a yes answer can be efficiently verified can be transformed into a 3SAT problem. This transformation is also "efficient" (polynomial time).

Last updated