EE274 (Fall 23): Homework-2
- Focus area: Lossless Compression
- Due Date: Nov 1, midnight (11:59 PM)
- Weightage: 15%
- Total Points: 110
- Submission Instructions: Provided at the end of HW (ensure you read these!)
- Submission link:
- For written part: HW2-Written
- For programming part: HW2-Code
Please ensure that you follow the Stanford Honor Code while doing the homework. You are encouraged to discuss the homework with your classmates, but you should write your own solutions and code. You are also encouraged to use the internet to look up the syntax of the programming language, but you should not copy-paste code from the internet. If you are unsure about what is allowed, please ask the instructors.
Note for the coding part
Before starting the coding related HW1 questions ensure following instructions from HW1 are followed:
- Ensure you are using the latest version of the SCL
EE274/HWs
GitHub branch. To ensure run the following command in the SCL repository you cloned from HW1:
You should get an output sayinggit status
On branch EE274_Fall23/HWs
. If not, run the following command to switch to the correct branch:
Finally ensure you are on the latest commit by running:git checkout EE274_Fall23/HWs
You should see agit pull
HW2
folder in theHWs
folder. - Ensure you are in the right conda environment you created in
HW1
. To ensure run the following command:conda activate ee274_env
- Before starting, ensure the previous tests are passing as expected. To do so, run following from
stanford_compression_library
folder:
and ensure tests except in HW folders pass.find scl -name "*.py" -exec py.test -s -v {} +
Q1: Camping Trip (20 points)
During one of the camping trips, Pulkit was given rope segments of lengths by Kedar, and was asked to join all the ropes into a single long segment of length . Pulkit can only join two ropes at a time and the "effort" in joining ropes using a knot is . Pulkit is lazy and would like to minimize his "effort" to join all the segments together.
- [5 points] Do you see any parallels between the problem and one of the prefix-free codes you have learnt about? Please justify your answer.
- [5 points] Let be the optimal (minimal) value for the effort required to join all the segments. Without solving the problem, can Pulkit get an estimate of what the would be? Leaving your answer in terms of inequalities is fine.
- [10 points] Implement the function
compute_minimum_effort()
in the filehw2_p1.py
.
HINT: You may use one of the prefix-free code implementations in the SCL Library
Q2: Generating random non-uniform data (25 points)
Consider the problem of sampling a non-uniform discrete distribution, when you given samples from a uniform distribution.
Let's assume that we are given a single sample of random variable U
, uniformly distributed in the unit interval [0,1)
(e.g. U = numpy.random.rand()
). The goal is to generate samples from a non-uniform discrete distribution . We will assume that is a rational distribution. i.e. for any symbol , , where are integers.
-
[5 points] Given a non-uniform rational distribution, we can sample a single random value from
U
by finding the cumulative distribution bin in which the sampleU
falls since the cumulative distribution also lies between[0,1)
. For example, if is the distribution such asP = {A: 2/7, B: 4/7, C: 1/7}
, then we output the symbolsA,B,C
based on the intervals in which they lie as follows:if U in [0.0, 2/7) -> output A if U in [2/7, 6/7) -> output B if U in [6/7, 1.0) -> output C
Generalize the above algorithm to any given rational distribution, and describe it in a few lines. For a distribution of alphabet size (e.g. in the example above), what is the time/memory complexity of your algorithm wrt ?
-
[5 points] Complete the function
generate_samples_vanilla
inhw2_p2.py
which takes in a non-uniform rational distributionP
and returnsdata_size
number of samples from it. You can assume that the distribution is given as a dictionary, where the keys are the symbols and the values are the frequency of occurrence in the data. For example, the distributionP = {A: 2/7, B: 4/7, C: 1/7}
can be represented asP = Frequencies({'A': 2, 'B': 4, 'C': 1})
. Ensure that your code passes thetest_vanilla_generator
test inhw2_p2.py
. Feel free to add more test cases. -
[5 points] Given a single sample of a uniform random variable
U
, how can you extend your algorithm in part2.1
above to sample i.i.d random values ? Provide a concrete algorithm forP= {A: 2/7, B: 4/7, C: 1/7}
. Generalize your method to an arbitrary number of samples and describe it in a few sentences. -
[5 points] Pulkit suggested that we can slightly modify Arithmetic Entropy Coder (
AEC
) we learnt in the class to sample a potentially infinite number of i.i.d samples given any rational distribution from a single uniform random variable sampleU
! You can look at Pulkit's implementationgenerate_samples_aec
inhw2_p2.py
to get an idea of how to exploitAEC
. Can you justify the correctness of Pulkit's method in a few lines?Note: Even though we say that we can sample potentially infinite number of samples, in practice we are limited by the precision of floats.
-
[5 points] Now let's say that you have to sample data from a Markov distribution . Recall for a Markov Chain, . Can you use your technique from
Q2.3
or the technique suggested by Pulkit inQ2.4
to sample any number of samples with Markov distribution from a single uniform random variable sampleU in [0,1)
? Describe the algorithm in a few lines.
Q3: Conditional Entropy (20 points)
In this problem we will get familiar with conditional entropy and its properties.
We learnt a few properties of conditional entropy in class:
We will use these properties to show some other useful properties about conditional entropy and solve some fun problems!
-
[5 points] Let be an arbitrary function which maps to a discrete set . Then show that: . Can you intuitively explain why this is the case?
-
[5 points] Show that , i.e. processing the data in any form is just going to reduce the entropy. Also, show that the equality holds if the function is invertible, i.e. if there exists a function such that . Can you intuitively explain why this is the case?
-
[4 points] In the HW1 of the 2023 edition of EE274, Pulkit and Shubham had an assignment to compress the Sherlock novel (lets call it ). Pulkit computed the empirical
0th
order distribution of the letters and used those with Arithmetic coding to compress , and received average codelength . While working on the assignment, Shubham accidentally replaced all letters with lowercase (i.eA -> a
,B -> b
etc.). Lets call this modified text . Shubham compressed with the same algorithm as Pulkit, and got average codelength . Do you expect , or . Justify based on properties of Arithmetic coding andQ3.2
. -
[6 points] We say that random variables are pairwise independent, if any pair of random variables are independent. Let be three pairwise independent random variables, identically distributed as . Then show that:
- [3 points] . When is equality achieved?
- [3 points] . When is equality achieved?
-
[NOT GRADED, THINK FOR FUN!] Let be i.i.d random variables. Then show that using the , you can generate pairwise independent random variables, identically distributed as .
Q4: Bits-Back coding and rANS (45 points)
In class, we learnt about rANS. We started with the basic idea of encoding a uniform distribution on {0, ..., 9}
(see Introduction section in notes) and then extended it to non-uniform distributions. In this problem we will look at a different way of thinking about rANS, which will help us understand the modern entropy coder.
Let's start by first generalizing our idea of encoding a uniform distribution on {0, ..., 9}
to a uniform distribution on {0, ..., M-1}
, where M
can be thought of as number of symbols in the alphabet. The encoding works as following:
def encode_symbol(state, s, M):
state = (state * M + s)
return state
The encoder maintains a single state, which we increase when we encode a symbol. Finally, we save the state by simply writing it out in the binary form. This function is implemented in hw2_p4.py
as UniformDistEncoder
.
-
[2 points] Write a
decode_symbol
pseudocode which takes in the encoded statestate
andM
and returns the symbols
and the updated statestate
. Show that your decoder along with above encoding is lossless, i.e. we can recover the original symbols
from the encoded statestate
andM
. -
[3 points] Show that given data in
{0, ..., M-1}
, the encoder/decoder pair can be used to achieve lossless compression with~ log2(M)
bits/symbol as we encode large number of symbols. -
[5 points] Now, implement your decoder in
hw2_p4.py
asUniformDistDecoder
. Specifically, you just need to implementdecode_op
function. Ensure that your code passes thetest_uniform_coder
test inhw2_p4.py
. Feel free to add more test cases.
Now, we want to implement a compressor which works well for non-uniform data. We will use the base encode/decode functions from UniformDistEncoder
, UniformDistDecoder
and approach this problem from a new perspective. In class, we learnt for any two random-variables and . We will use this property to encode the data . Note that this identity implies
Interpreting this intuitively, it should be possible to encode data X
in following two steps:
a. Encode X,Z
together using a joint distribution P(X,Z)
. Assuming an ideal compressor, this will require H(X,Z)
bits.
b. Decode H(Z|X)
from the encoded state using distribution P(Z|X)
. Again, assuming an ideal compressor, this lets us recover H(Z|X)
bits.
Step b gives you an intuition for the question name -- bits-back
! Here Z
is an additional random variable typically latent variable (latent means hidden).
Be careful while reading this step, as it is not a compressor for X
but a decompressor for Z|X
! This compressor assumes knowledge of X
and decodes Z
from the encoded state.
More concretely, we will have an encoder-decoder pair which will work as follows:
def encode_symbol(state, X):
state, Z = decode_zgx(state, X) # decode Z|X from state; knows X. returns Z and updated state.
state = encode_xz(state, (X, Z)) # use this Z, and known X, to encode X,Z together. returns updated state.
return state
def decode_symbol(state):
state, (X, Z) = decode_xz(state) # decode X,Z from state. returns X,Z and updated state.
state = encode_zgx(state, Z, X) # encode Z|X by updating state; knows X. returns updated state.
return state, X
Note that how the encode step now involves both a decoding and an encoding step from another compressor! This is one of the key idea behind bits-back coding. We will now implement the above encoder/decoder pair for non-uniform data.
To see the idea, let's work out a simple case together. As usual let's assume that the non-uniform input is parametrized by frequencies freq_list[s] for s in {1,..,num_symbols}
and M = sum(freq_list)
. For instance, if we have 3 symbols A,B,C
with probabilities probs = [2/7, 4/7, 1/7]
, then we can think of frequencies as freq_list = [2,4,1]
and M = 7
.
We can also create cumulative frequency dict as we saw in class, i.e. cumul = {A: 0, B: 2, C: 6}
.
We will assume Z
to be a uniform random variable in {0, ..., M-1}
. Then, we can think of X
as a function of Z
as follows:
def X(Z):
for s in symbols:
if cumul(s) <= Z < cumul(s) + freq_list[s]:
return s
i.e. X
is the symbol s
such that Z
lies in the interval [cumul(s), cumul(s)+freq_list[s])
. This is shown in the figure below for the example above.
For example if Z = 4
, then X = B
as cumul(B) = 2 <= 4 < 6 = cumul(B) + freq_list[B]
.
-
[2 points] What are the values of in the above example?
-
[3 points] What is the distribution and entropy of given , i.e. and in the above example? Note that is a function of
x
and hence you need to report it for each symbolx
in the alphabetA,B,C
.
Now, let's implement the encoder/decoder pair for the above example. We will use the following notation:
state
is the encoded states
is the symbol to be encoded (X
)freq_list
is the list of frequencies of the symbols- We want to code the above scheme utilizing the joint compressors over
X, Z
(whereZ
is the latent variable).Z
is uniformly distributed in{0, ..., M-1}
.combined_symbol
is the joint random variableY = (X, Z)
. In our example, we only need to encodeZ
asX
is a deterministic function ofZ
, i.e.combined_symbol = Z
and will be in{0, ..., M-1}
. For above example, say we want to encodeY = (B, 4)
, then we only need to encodeZ = 4
asX=B
is implied.fake_locator_symbol
is the value ofZ|X=x
relative tocumul(x)
. This is a function ofX=x
. For above example ofX = B
,Z|X=B
can be{2, 3, 4, 5}
and hencefake_locator_symbol
can be{0, 1, 2, 3}
respectively ascumul(B) = 2
.- Therefore,
combined_symbol
andfake_locator_symbol
allows us to have encoder/decoder pair for both(X,Z)
andZ|X
. Continuing our example, if we were to encodeY = (X=B, Z=4)
, we will encode it ascombined_symbol = 4
andfake_locator_symbol = 2
. Conversely, if we were to decodecombined_symbol = 4
, we will decode it asY = (Z=4, X=B)
and infer thatfake_locator_symbol=2
.
-
[5 points] We have implemented the
encode_symbol
function forNonUniformDistEncoder
. Looking at the pseudocode above and the example, explain briefly how it works. Specifically, explain howencode_symbol
achieves the relevantdecode_zgx
andencode_xz
functions given in pseudocode in the context of example above. -
[10 points] Now implement the
decode_symbol
function forNonUniformDistEncoder
. You can use theencode_op
and thedecode_op
from uniform distribution code. Ensure that your code passes thetest_non_uniform_coder
test inhw2_p4.py
. Feel free to add more test cases. -
[5 points] Show that for a symbol
s
with frequencyfreq[s]
, the encoder uses~ log2(M/freq[s])
bits and is hence optimal.
Great, we have now implemented a bits-back encoder/decoder pair for a non-uniform distribution. Let us now see how it is equivalent to rANS we studied in class. As a reminder, the rANS base encoding step looks like
def rans_base_encode_step(x,s):
x_next = (x//freq[s])*M + cumul[s] + x%freq[s]
return x_next
- [5 points] Justify that
encode_symbol
inNonUniformDistEncoder
performs the same operation as above.
Similarly, as a reminder, the rANS base decoding step looks like
def rans_base_decode_step(x):
# Step I: find block_id, slot
block_id = x//M
slot = x%M
# Step II: Find symbol s
s = find_bin(cumul_array, slot)
# Step III: retrieve x_prev
x_prev = block_id*freq[s] + slot - cumul[s]
return (s,x_prev)
- [5 points] Justify that
decode_symbol
inNonUniformDistEncoder
performs the same operation as above.
Therefore, bits-back coding is equivalent to rANS! Thinking again in terms of , it allows us to interpret rANS in terms of latent variables in a more intuitive way. This question was highly motivated by the IT-Forum talk by James Townsend in 2022. Check out the YouTube video if you are interested to learn more!
Submission Instructions
Please submit both the written part and your code on Gradescope in their respective submission links. We will be using both autograder and manual code evaluation for evaluating the coding parts of the assignments. You can see the scores of the autograded part of the submissions immediately. For code submission ensure following steps are followed for autograder to work correctly:
-
As with HW1, you only need to submit the modified files as mentioned in the problem statement.
- Compress the
HW2
folder into a zip file. One way to obtain this zip file is by running the following zip command in theHWs
folder, i.e.
Note: To avoid autograder errors, ensure that the directory structure is maintained and that you have compressedcd HWs zip -r HW2.zip HW2
HW2
folder containing the relevant files and notHWs
folder, or files inside or something else. Ensure the name of the files inside the folder are exactly as provided in the starter code, i.e.hw2_p2.py
,hw2_p4.py
etc. In summary, your zip file should be uncompressed to following directory structure (with same names):HW2 ├── hw2_p1.py ├── hw2_p2.py └── hw2_p4.py
- Compress the
-
Submit the zip file (
HW2.zip
obtained above) on Gradescope Programming Part Submission Link. Ensure that the autograded parts runs and give you correct scores.
Before submitting the programming part on Gradescope, we strongly recommend ensuring that the code runs correctly locally.