Random Curiosities, University of Warwick

September 12, 2017

The secret of success?

This TED talk makes a lot of sense to me, and chimes in with more than 35 years of lecturing experience. Have a look and see what it suggests to you.

Wilfrid Kendall : 12 Sep 2017 17:07 | Tags: Motivation St220 | Comments (0) | Close comments | Report a problem

Secret–sharing and independence

The following remarkable procedure is entirely feasible: if I have a class of n students then I can distribute n different binary images, one to each student, such that each student's image looks like white noise,

typical image distributed to a student: just white noise!

and yet if all images are combined together in the right way then a meaningful picture emerges.

result of combining all images using multiplication (if black pixels are coded as -1 and white pixels are coded as -1)

What's more, I can arrange matters such that if any strict subset of the n students tried to collaborate, then all they would get would be more white noise, no matter how they manipulated their n-1 images!

So any strict subset of the students would possess no information at all about the butterfly picture, but combined all together they would be in a position to produce a perfect reproduction of the image.

How can this be done?

Code each black pixel of each image as -1, each white pixel as +1, and view each distributed image as a long vector sequence of +/-1 values.
Let X₀ be the vector encoding the target image (the butterfly above). Generate entirely random independent vectors X₁, X₂, ..., X_n_-1 and distribute the corresponding white noise images to the first n-1 students.
Student n is given an image corresponding to the vector X_n obtained by multiplying (coordinate-wise) all the other vectors:
X_n = X₀* X₁ * X₂ * ... * X_n_-1where "*" denotes coordinate-wise multiplication.
It is simple arithmetic that X₀ = X_n* X₁ * X₂ * ... * X_n_-1. So all students working together possess the information to recover the butterfly image.
On the other hand one can use elementary probability to show that, if one selects any subset of size n-1 of the vectors X₁, X₂, ..., X_n, then this subset behaves as if it is a statistically independent collection of vectors corresponding to white-noise images. (It suffices to consider just one pixel at a time, and show that the corresponding sequence of n-1 random +/-1 values obey all possible multiplication laws.) So no strict subset of the students has any information at all about the butterfly image.

There are many other ways to implement secret-sharing (Google/Bing/DuckDuckGo the phrase "secret sharing"). But this one is nice for probabilists, because it provides a graphic example of why pairwise independence (independence of any two events taken from a larger collection of events) need not imply complete independence.

Wilfrid Kendall : 12 Sep 2017 16:12 | Tags: Encryption Independence Motivation Multiplication Laws St220 | Comments (0) | Close comments | Report a problem

July 04, 2017

An old advertisement for the ST318 "Probability Theory" module

In your third year at Warwick you often have a drink with a friend who enjoys probability (yes, you make some strange friends here). On the Sunday evening before the start of Term 2 she challenges you to a curious game of chance. Hiding a fair six-sided die (numbered 0,1,2,3,4,5) from your observation, she throws it ten times and secretly notes each outcome on a convenient beer mat. Then she turns to you and announces:
"I shall now read you the running averages of the dice throws, but in reverse order.''

(You must have looked puzzled, because she breaks off to explain)
"Let X_n be the result of the n^th throw, so the n^th running average is Y_n=(X₁+...+X_n)/n. I will read out the numbers Y₁₀, Y₉, ..., Y₁ in that order.''

Clearly a flicker of comprehension must have somehow crossed your face, because she now continues:
"At any stage you can stop me reciting the sequence. If the last average I read out was Y_n then with probability Y_n/5 (arranged using that book of random numbers which I carry everywhere) I will buy you a pint and the game ends. If you don't stop me at all then clearly you get a pint with probability Y₁/5. Otherwise you buy me a pint.''

You calculate rapidly: E[Y₁/5]=E[X₁/5]=(1/6) (0+1+...+5)/5=1/2 even if you wait till the last number, and you get to choose when to stop. Must be to your advantage!
"Seems a reasonably good deal to me,'' you say.

"I'm glad you think so;'' she replies, "before we begin you can buy me a pint to compensate. Then we can play the game K times, where K is as large as you think it should be so that on average you come out ahead.''

How big should K be?

The very next day you hear about this course, ST318 Probability Theory; by the end of the course you'll have learned enough to be able to answer the above question instantly without any calculation at all. All this and 15 CATS credit too. Now that is what you call fair! How can you resist?

Wilfrid Kendall : 04 Jul 2017 17:03 | Tags: Motivation St220 | Comments (0) | Close comments | Report a problem

June 03, 2017

An example of a random variable with finite Fisher information but infinite entropy

The following notes concern some calculations I have made relating to Zanella et al (2017). The considerations laid out below are entirely trivial, but are helpful in making it clear whether or not certain conditions might be logically independent of each other.

Consider a random variable X with probability density f(x) = e^φ(x).

The entropy of X is given by H = - ∫ log(f(x)) f(x) dx = - ∫ φ(x) f(x) dx = - E[ φ(X) ];
One is also interested in something loosely related to Fisher information, namely Ι = ∫ φ'(x)² f(x) dx = E[ φ'(X)² ].

Question:

Is it possible for Ι to be finite while H is infinite?

Answer:

Yes.

Consider a density f(x) which is proportional (for large positive x) to 1/(x log(x)²). Consequently φ(x) = constant - (log x + 2 log log x) for large x.

1: Using a change of variable (e^u = x) it can be shown that H = - ∫ log(f(x)) f(x) dx is infinite. The contribution to the entropy H for large x is given by - ∫^∞ log(f(x)) f(x) dx, controlled by
∫^∞ (log x + 2 log log x) dx /(x log(x)²) = ∫^∞ (u + 2 log u) e^u du /(u² e^u) ≥ ∫^∞du / u = ∞.

2: On the other hand elementary bounds show that Ι = ∫ φ'(x)² f(x) dx can be finite. The contribution to the "Fisher information" Ι for large x is given by - ∫^∞ φ'(x)² f(x) dx, related to
∫^∞ (1 / x + 2 / (x log x) )² dx /(x log(x)²) = ∫^∞ (1 + 2 / log x )² dx /(x³ log(x)²) < constant × ∫^∞ dx / x³ < ∞.

An example of such a density (behaving in the required manner at ±∞) is

f(x) = log(2) |x| / ((2+x²) (log(2+x²))²) .

Reference

Zanella, G., Bédard, M., & Kendall, W. S. (2017). A Dirichlet Form approach to MCMC Optimal Scaling. Stochastic Processes and Their Applications, to appear, 22pp. http://doi.org/10.1016/j.spa.2017.03.021

Wilfrid Kendall : 03 Jun 2017 16:04 | Tags: Entropy Fisher Information | Comments (0) | Close comments | Report a problem

A puzzle about inference

Here is a puzzle which I often use as a starter for a course of lectures on probabilistic coupling. The puzzle arose during some research -- I devised it as a simple example to show myself why a particular idea would not work -- and I developed the solution using calculations. In a lunch queue, I asked James Norris of Cambridge whether he could think of a calculation-free answer, and he found one almost immediately.

You go to Starbucks with a friend.
You are the quiet type, and stay in the front room near the window, sipping a coffee slowly to make it last.
She is extrovert, and has gone through to a further room where the louder people gather.
There she plays a game: toss n coins, win if r heads or more.
After the game should have finished, a third friend comes through and says,
either: "She won the first toss'';
or: "She won at least one toss''.
(Why don't people speak more clearly?!)
Which is better news for your friend? Prove your answer without calculation!

Wilfrid Kendall : 03 Jun 2017 14:24 | Tags: Coupling Motivation St220 | Comments (0) | Close comments | Report a problem

May 30, 2017

Conditional Convergence in Probability

Convergence in probability is the simplest form of convergence for random variables: for any positive ε it must hold that P[ | X_n - X | > ε ] → 0 as n → ∞. This kind of convergence is easy to check, though harder to relate to first-year-analysis convergence than the associated notion of convergence almost surely: P[ X_n → X as n → ∞] = 1.

Convergence in probability is implied by convergence almost surely (most direct proof: express convergence almost surely in terms of more elementary events using countable unions and intersections and do some simple reasoning on the result), but is not implied by it. On the other hand a sequence of random variables that converges in probability always has a sub-sequence that converges almost surely. Moreover if every sub-sequence of a sequence of random variables contains a sub-sub-sequence that converges almost surely, then the original sequence converges in probability.

While studying the application of Dirichlet forms to Markov chain Monte Carlo (developing the work of Zanella et al, 2017), the following convergence-in-probability question arose:

Question

Work on a given probability space (Ω, F, P). Suppose that random variables X₁, X₂, ... converge to X in probability.
Given a sub-σ-algebra G < F, is it the case that random variables X₁, X₂, ... converge to X in G-conditional probability?
In other words, is it the case that, for every ε>0,

P[ | X_n - X | > ε | G ] → 0 almost surely?

If the conditioning is simply a conditioning on a single event of positive probability, then the answer is yes; consider that

P[ A_n | B ] = P[ A_n ∩ B ] / P[ B ] ≤ P[ A_n ] / P[ B ],

so P[ A_n | B ] will converge to zero if P[ A_n ] does.

However the general answer is, of course, no. In a nutshell, consider a sequence of random variables that converges in probability but not almost surely. Condition on the entire sequence(!), thus rendering it entirely deterministic. There is a positive chance that the conditioned sequence fails to converge; and if so then it cannot converge in (conditional) probability. We now give an explicit example of a sequence that converges in probability but not almost surely, and spell out the details of why conditional convergence in probability then fails.

Example

Consider a Uniform random variable U defined on [0, 1), using the usual Lebesgue σ-algebra F. Define X₁, X₂, ... as follows: consider the ensemble of events [(k-1)2^-m, k2^-m ) for k = 1, ..., m and m= 1, 2, ... Order these and let X_n be the indicator random variable corresponding to the nth event, while X= 0.. Then P[ X_n =1]=2^-mif X_n is the indicator random variable corresponding to [(k-1)2^-m, k2^-m ), hence P[ | X_n - X | > ε ] → 0 if 0 < ε < 1. So certainly X₁, X₂, ... converge to X in probability. On the other hand, if G = F then almost surely the sequence X₁, X₂, ... contains infinitely many 1's as well as infinitely many 0's, so similarly for P[ | X_n - X | > ε | G ] = [X_n = 0], so almost surely P[ | X_n - X | > ε | G ] can never converge.

Discussion

More generally, this sort of problem arises whenever the σ-algebra G contains a random variable whose distribution is not atom-free.

Exactly the same argument shows that L_p convergence does not imply "conditional L_p convergence".

However the facts that

convergence in probability implies existence of almost surely convergent subsequences,
while convergence in probability itself is implied by existence of almost surely convergent sub-subsequences for every subsequence,

can be used to evade the issues raised here. For example, in the case of the application of Dirichlet forms to Markov chain Monte Carlo, even though convergence in probability is not preserved under conditioning, these considerations can be used to prove a strategic conditional CLT ...

Reference

Zanella, G., Bédard, M., & Kendall, W. S. (2017). A Dirichlet Form approach to MCMC Optimal Scaling. Stochastic Processes and Their Applications, to appear, 22pp. http://doi.org/10.1016/j.spa.2017.03.021

Wilfrid Kendall : 30 May 2017 10:01 | Tags: Conditional Probability Convergence Almost Surely Convergence In Mean Convergence In Probability | Comments (2) | Close comments | Report a problem

September 12, 2017

The secret of success?

Secret–sharing and independence

July 04, 2017

An old advertisement for the ST318 "Probability Theory" module

June 03, 2017

An example of a random variable with finite Fisher information but infinite entropy

Question:

Answer:

Reference

A puzzle about inference

May 30, 2017

Conditional Convergence in Probability

Question

Example

Discussion

Reference

December 2024

Search this blog

Galleries

Most recent comments

Blog archive

Types of entry

Mo	Tu	We	Th	Fr	Sa	Su
Nov		\| Today \|
						1
2	3	4	5	6	7	8
9	10	11	12	13	14	15
16	17	18	19	20	21	22
23	24	25	26	27	28	29
30	31