This is all standard physics. Consider the two-slit experiment—a light beam, two slits, and a screen–with y being the place on the screen that lights up. For simplicity, think of the screen as one-dimensional. So y is a continuous random variable.
Consider four experiments:
1. Slit 1 is open, slit 2 is closed. Shine light through the slit and observe where the screen lights up. Or shoot photons through one at a time, it doesn’t matter. Either way you get a distribution, which we can call p1(y).
2. Slit 1 is closed, slit 2 is open. Same thing. Now we get p2(y).
3. Both slits are open. Now we get p3(y).
4. Now run experiment 3 with detectors at the slits. You’ll find out which slit each photon goes through. Call the slit x. So x is a discrete random variable taking on two possible values, 1 or 2. Assuming the experiment has been set up symmetrically, you’ll find that Pr(x=1) = Pr(x=2) = 1/2.
You can also record y, thus you can get p4(y), and you can also observe the conditional distributions, p4(y|x=1) and p4(y|x=2). You’ll find that p4(y|x=1) = p1(y) and p4(y|x=2) = p2(y). You’ll also find that p4(y) = (1/2) p1(y) + (1/2) p2(y). So far, so good.
The problem is that p4 is not the same as p3. Heisenberg’s uncertainty principle: putting detectors at the slits changes the distribution of the hits on the screen.
This violates the laws of conditional probability, in which you have random variables x and y, and in which p(x|y) is the distribution of x if you observe y, p(y|x) is the distribution of y if you observe x, and so forth.
A dissenting argument (that doesn’t convince me)
To complicate matters, Bill Jefferys writes:
As to the two slit experiment, it all depends on how you look at it. Leslie Ballentine wrote an article a number of years ago in The American Journal of Physics, in which he showed that conditional probability can indeed be used to analyze the two slit experiment. You just have to do it the right way.
I looked at the Ballentine article and I’m not convinced. Basically he’s saying that the reasoning above isn’t a correct application of probability theory because you should really be conditioning on all information, which in this case includes the fact that you measured or did not measure a slit. I don’t buy this argument. If the probability distribution changes when you condition on a measurement, this doesn’t really seem to be classical “Boltzmannian” probability to me.
In standard probability theory, the whole idea of conditioning is that you have a single joint distribution sitting out there–possibly there are parts that are unobserved or even unobservable (as in much of psychometrics)–but you can treat it as a fixed object that you can observe through conditioning (the six blind men and the elephant). Once you abandon the idea of a single joint distribution, I think you’ve moved beyond conditional probability as we usually know it.
And so I think I’m justified in pointing out that the laws of conditional probability are false. This is not a new point with me—I learned it in college, and obviously the ideas go back to the founders of quantum mechanics. But not everyone in statistics knows about this example, so I thought it would be useful to lay it out.
What I don’t know are whether there are any practical uses to this idea in statistics, outside of quantum physics. For example, would it make sense to use “two-slit-type” models in psychometrics, to capture the idea that asking one question affects the response to others? I just don’t know.
"If the probability distribution changes when you condition on a measurement, this doesn't really seem to be classical "Boltzmannian" probability to me."
But how is this different from the case where you update your prior using Bayes' Theorem? In your first 3 cases you make one measurement, at y. In the fourth, you make 2 measurements, once at the slit, and once at y. So p4(y) has been updated twice, but p3(y) only once.
Quantum physics tells us that random variable X doesn't even exist in experiment 3, so why should we expect p3(y) to be equal to p4(y)?
At a deeper level, the basic physical quantity about which we have information is not the photon — it's the <a>phase space configuration.
The Feynman lectures on physics has an excellent discussion of this point. See Vol. III. Chapter 1. A sentence sums up his explanation:
"If an apparatus is capable of determining which hole the electron goes through, it cannot be so delicate that it does not disturb the pattern in an essential way."
Measurements change the probability distributions.
Andrew: You have to be very careful when discussing quantum mechanics. There is no joint distribution! Your attempt to refute Ballentine's argument assumes that there is, but you are essentially assuming a "hidden variable" model of quantum mechanics; but it is well-known that such models do not work.
What we know is that the observations we make in physics have to be conditioned on the experimental setup. This is nothing new and has been well understood for a long time. That's all Ballentine is saying, and he's right.
Perhaps I'm missing the intended point, but…
There are at least two reasons for P3(y) and P4(y) to be different; either they are describing different variables (i.e. experiments 3 and 4 are different physical systems), or the laws of conditional probability are false. I really don't see how the y in experiments 3 and 4 could be considered to be the same variable, other than you chose the names to be the same. Likewise for experiments 1 and 2, though there is a variable named y in each of them, they are not inherently the same y as in experiments 3 and 4. They only become the same variable when you show that they measure the same physical system.
Perhaps I am missing some vital background or I am taking your title too seriously (quite possible!), but I do not see how the behavior of a physical system can falsify a consequence of the axioms of standard probability. Sure, the system cannot be adequately modeled with conditional probability (one may argue), but that no more falsifies them than the failure of any mathematical model to describe a physical system, unless you take conditional probability to be less a model and more a claim about the nature of the universe. It would be perhaps more accurate to say that conditional probability fails to model certain quantum phenomena. I suspect that Cosma might have some interesting insights into this issue.
Ben Marlin (formerly at University of Toronto, now UBC) has done a lot of interesting work on non-random missing data models, which has a similar flavor to this — conditioning on which data elements you observe (not their values) affects the underlying distributions. One of the practical applications is in collaborative filtering, where people tend to rate movies (or songs, books, etc) that they like more frequently than ones that they don't like. Ben collected a data set in conjunction with Yahoo music to confirm this bias, and he has several models in his thesis that show how explicitly modeling this relationship between response vectors and ratings can improve (among other things) recommendation system predictions. Take a look at chapter 5 of the Ph.D. thesis:
http://people.cs.ubc.ca/~bmarlin/research/researc…
Quantum probability is inherently different from classical probability.
The state of a quantum system is described by a point in an abstract Hilbert space or, equivalently, as a function, called a wavefunction, of either position or moment variables in a complex L2 space. The square of the modulus of a wavefunction has the properties of a probability density on positions or momenta of the system and it is interpreted as such.
A classical state is described as a point in a Euclidean space. A point in this space gives the positions and momenta of all particles in the system. This is called the phase space of the system.
Wigner came up with a way to represent a quantum state by a "probability density" over the system's corresponding classical phase space. It has the appropriate marginal distributions over positions and momenta, i.e. squares of the moduli of the corresponding wavefunctions, but it differs from a classical pdf in that it can be negative in some regions of the phase space. The negativity is tied to the issues of simultaneous measurements of non-communicating observables, which position and momentum are. Hence quantum probability is essential different than classical probability.
Bell's analysis of the EPR thought experiment, which involves what Einstein, the E in EPR, called spooky action at a distance, showed that no classical distribution over a phase space can account for the results predicted by quantum mechanics. Physical experiments have shown that the quantum predictions are correct. A nice expository article on this by David Mermin can be found <a href="http://scitation.aip.org/vsearch/servlet/VerityServlet?KEY=PHTOAD&smode=results&sort=rel&maxdisp=25&origquery=(mermin)+&disporigquery=(mermin)+&threshold=0&pjournals=PHTOAD&pyears=&possible1=mermin&possible1zone=article&possible3=mermin+&possible3zone=author&bool3=and&OUTLOG=NO&viewabs=PHTOAD&key=DISPLAY&docID=78&page=4&chapter=0" rel="nofollow">here.
So if you want to employ quantum probability where classical probability is currently used you would, in my opinion, have to demonstrate that the results obtain from the former are in better agreement with the world than are the latter. Only then could you argue that not just the laws of conditional probability are wrong, but all of classical probability is wrong, ar at least inappropriate in some limited circumstances.
Chris Fuchs, now at the Perimeter Institute, has been tackling that question from the other side: how should quantum states and quantum measurements be modeled so that probabilistic inference is sound for quantum phenomena. He has a series of papers on arxiv.org. I've read some of the earlier ones, but there's a very recent update that I've not read yet: Quantum-Bayesian Coherence, with Rüdiger Schack. Here's part of their abstract, which seems very relevant to your question:
This type of thing is pretty interesting. I don't think that the laws of conditional probability are false, however. I think that the assumption that the photon passes through one slit in the absence of a detector is what is suspect. In the Copenhagen interpretation, it is *as if* the photon goes through both slits when both slits are open and there is no detector. This renders the marginalization over x in {1,2} irrelevant as we are really dealing with a different system when the detector is there and when it isn't. In other words (and speaking loosely), when we don't have a detector, the x in the joint distribution is not restricted to taking values in {1,2}, giving rise to p3. When we do have a detector this forces x to be in {1,2} and thus alters the distribution p4.
When a physicist told me about this experiment I was really confused as I had (like most statisticians, I expect) assumed a (local) hidden variable theory to hold but this has been shown to be impossible. There is a non-local hidden variable theory, apparently, but I don't understand even the Wikipedia page on it!
This page gives a fascinating description with very little jargon of the type of behaviour one sees in a double slit experiment with or without detection:
http://grad.physics.sunysb.edu/~amarch/
The delayed erasure stuff is particularly strange, even after reading everything up to that point.
Andrew
This is not quite right.
First, this is not Heisenberg's uncertainty principle
which is:
dx dp >= h/2
Second, there is no contradiction with
ordinary probability
theory. You just have to reason more carefully.
I suggest you have a look at
"Consistent Quantum Theory" by Robert Griffiths
or even
http://en.wikipedia.org/wiki/Consistent_histories
Best wishes
Larry
As a quick, not-having-thought-too-deeply-about-it response to your objection to conditioning on the measurement, and say that the result of the measurement changes your information in a way that makes p3 and p4 probability distributions for different propositions. When you run the experiment with the detectors, you always detect the photon at one and only one slit; p4 is therefore the distribution of y conditional on the photon passing through one or the other of the slits, but not both (an 1 xor 2). When you run the experiment without the detectors, what you observe is the probability distribution for y, given that the photon passed through slit 1 or slit 2 or both. The difference between the xor and the inclusive or affects the proposition you're conditioning on.
The experiment is not a counterexample to the laws of probability theory (or at least not in a way that's obvious to me), but rather a validation of the wave-particle duality posited by quantum mechanics. The uncertainty principle gaurantees that by measuring the position of the photon, you destroy the coherence that produces the interference fringes you see in p3.
While checking this out, I stumbled upon some interesting literature extending classical probability theory into a quantum form, including (of course)quantum conditional probability. This seems an excellent summary, and this SEP article has much to chew on. I suspect, but have not done the hard work to find out for sure, that such a system on the quantum level cancels-out, nullifies, or otherwise collapses into classical probability at the macro-level in most cases, as "wierd" quantum phenomena tend to do.
I see the situation as likely being much like Euclidean geometry and non-Euclidean geometry following Einstein's famous application of the latter. The former becomes something of a special case of the latter.
Then again, my physics is pretty rusty.
I hadn't expected you to grow French post-modern tastes so soon…
Even better, these course notes discuss frequentist and Bayesian takes on quantum probability, as well as conditioning on measurement, collapse of the wave function and quantum probability as non-commutative probability and much more.
You are a brave man tossing this out there.
It is different than you describe. In Test 4 you are not getting the distribution of photons that you have detected going through given slit. What you are getting is the distribution of photons that did not go through the detectors slit. That is, there is no detector that will tell if a photon "went through a slit" without stopping it (i.e. interacting and entangling) so what is generally done is to figure out where they went and then look at the resulting distribution as if they all went through the other slit. You literally physically cannot do what you say in 4. That is why there is what is called entanglement.
This is normal QM, but I don't see why you think 4 is the same as 3 and that you could apply normal conditional rules. It is NOT the same, that is one of the big things we learned when we learned QM. So the fact that you are applying a function that doesn't apply in this situation you get a wrong answer … why is this even surprising? In fact in 4 you will get exactly the same results as in 1 and 2 added together. What would you expect? It is 1 or 2 run over and over. There the conditional rules work perfectly.
Test 3 is the odd man out. It is fundamentally different. It is not the same as one or the other slit open, so the fact that there is a different distribution is to be expected. What is the "conditional probability" in Test 3? The photon does not go through one or the other slit in test 3 so what exactly are you conditioning on?
All glibness aside, when you start monitoring the slits, you introduce biases caused by the observer. How do you treat bias in Bayesian analysis? Is there even such a thing?
The consequence of detectors at the individual slits changing the measurements isn't Heisenberg's Uncertainty Principle. The latter only says you can't measure both position and velocity precisely. They're related phenomenon — measuring requires observation and the Copenhagen interpretation of quantum mechanics uses this to explain the observer effect — but they're not the same.
Probability, esp conditional probability, are used all the time in quantum mechanics. It just requires working in a particular model. Your explanation implicitly assumes that the observation act doesn't effectively change the distributions. True at the sort of macro level of real-world (human-sized) statistics and theoretical statistics where you can have atomic, precisely known values. At the quantum level, the observation act also contributes to the distribution, because it requires using particle/waves/space aliens/whatever at the same scale as the thing you're trying to measure.
From what you said, it seems the laws of conditional probability hold: When running an experiment in which you know which slits the photons go through, then
p4(y) = .5( p1(y) + p2(y) )
Shining the light through the slits without recording(interfering) with the photons' locations is a different experiment. Its weird that p4!=p3, but I don't see that this says anything along the lines of "The Laws of Conditional Probability are False" and that we should stop teaching p(y) = int p(y|x)p(x)dx. Rather, the results simply say that the two experiments are in fact different. The results of the two slit experiment didn't make physicists reject the hypothesis that the rules of conditional probability are true, rather, physicists rejected the hypothesis that the knowing which slit the photon went through doesn't matter.
Wave-particle duality and the results of the two slit experiment are hard to have an intuition about. Your blog posting reminds me of one of my favorite passages from one of Feynman's books:
"The difficulty (with the two-slit experiment) really is psychological and exists in the perpetual torment that results from your saying to yourself, `But how can it be like that?' which is a reflection of the uncontrolled but utterly vain desire to see it in terms of familiar."
"There was a time when the newspapers said that only twelve men understood the theory of relativity. I do not believe there ever was such a time. There might have been a time when only one man did…but after people read the paper a lot of people understood, certainly more than twelve. On the other hand, I think I can safely say that nobody understands quantum mechanics."
This may reflect my complete ignorance of the Physics issues, but I do not see the difficulty: if observing X modifies the distribution of Y, ie if we need to account for a binary variable Z as the indicator that X is observed, we are indeed talking of two different distributions, p3(y) and p4(y)… So p(y|x=1)=p1(y) but p(y|x=1,x is observed to be 1) is **not** p1(y), your derivation in 4 is actually the construction of p3(y), while p4(y) should be the marginal of p(x)p(z|x)p(y|x,z)…
«For example, would it make sense to use "two-slit-type" models in psychometrics, to capture the idea that asking one question affects the response to others? I just don't know.»
But other people may know. I suspect that one way your argument might be reformulated is that classic conditional proability theory is based on extensional logic, and quantum and "one question affects the response to others" probability theory is based on intensional (modal) logic.
That most of ordinary maths is based on extensional logic and there is (several varieties of) intensional logic is surprisingly little known even by mathematicians and physicists (and programmers, who make use unwittingly of intensional logic every time they assing a value to a variable).
BTW when you write "you have a single joint distribution sitting out there–possibly there are parts that are unobserved or even unobservable (as in much of psychometrics)–but you can treat it as a fixed object that you can observe through conditioning" I think the terminology is imprecise; what we have is a (possibly unknowable) population, and samples, and we attempt to infer from summaries of the sample the properties of summaries of the population.
In your "one question affects the response to others" example you are making use of something similar if the *order* of the questions (as you imply) matters. Because if that happens, that means that an operation of sampling makes an unelastic modification to the summaries of the population (e.g. it may alter the mind state of the members of the population), and thus measurements have time (ordinal or cardinal) dependent effects (note that this is different from extracting balls from an urn).
Are the laws of conditional probability false, or do they simply not apply to the realm of particle physics?
Wow, you take an experiment, apply an inapplicable theory (Newtonian mechanics) to interpret the results, and then use this to falsify the laws of conditional probability!
Why stop there? Describe the quantum bomb detector, and use the results to prove that the entire field of probability is false.
The effect in psychometrics is called priming. This is an entire course in Political Psychology. The most often cited (related) work in public opinion is Zaller (1992)(but also see Jamie Druckman or Paul Brewer's work for more direct discussion from psychologists' perspectives).
When you ask a question (in other words, observe at the slits) you are priming particular frames of reference and stories and thus altering responses not only to the question just asked, but to all subsequent questions as respondents consciously or subconsciously create a story arc of the survey and inferences about what vague questions are really asking. it can be controlled but not eliminated. Often, the most critical dependent variables or most vague but interesting questions are asked first to get an unprimed response, followed by questions about more specific domains.
When we ask, for example, a very general question: "Please rate your feelings about country X on a thermometer from 0 to 100, with 100 being very warm, friendly, and 0 being very cold, hostile, and 50 being neutral…" and so on, we have absolutely no idea what matrix of variables X the respondent is pulling from to answer y, and we assume that most respondents don't either (otherwise we'd just ask them, and political psychologists like me would be out of a job). That is what we try to model after the fact with no expectation whatsoever of explaining all of the variance. But, if you ask question 1 about America's debt to China, question 2 about a conflict between China and Taiwan, and then as question 3 you ask the thermometer question above about China, you have increased the probability of X containing thoughts about debt and Taiwan, and the probability of y being a response to those questions. But that is not the case for all respondents. Some will still pull from any number of potential responses.
In reality, y has multiple distributions depending on X, and there is no observable "real" distribution of y, even when X is set to zero, as slight changes in the question wording, inter alia, can alter the distribution of y.
We understand this problem and don't freak out about it too much because of the squishiness of what we measure. Political psychologists don't expect to be able to make direct observations because our tools are not sharp enough, so expectations about the conditional probability are always tempered by unexplained or unaccounted for variables. We assume going in that error is going to be fairly large (and pray that is is going to be normally distributed) because of what we measure, not because we measured it.
This makes me much more sympathetic to the unobserved variable rebuttal to your argument. You don't have a full conditional probability until you have accounted for all sets of conditions. Political psychologists know that we can't account for everything, though we do believe we are measuring something real.
Most physicists believe in probability theory, including its application to conditional probabilities, and also believe in quantum mechanics. You have to be careful about state space, but they do believe there is a joint probability distribution over the right set of states. There are disputes about the right way to resolve this, and my horse in this race is here. But make no mistake, most physicists believe that probability theory will hold however quantum mechanics is resolved.
I'd hope conditional probability would break down because this is where our understanding in physics breaks down as well. Can't really have the one without the other. And you didn't talk about the real problem that these effects are seen over time as though individual particles somehow know where to go or fit themselves to a pre-existing pattern that depends on which way we choose to look at things.
Robin: The link you posted does not work.
Previous commenter left an incorrect link to his own site. The correct link is http://hanson.gmu.edu/mangledworlds.html.
My understanding of the physics is incomplete, but I believe that the particle is actually treated as a probability distribution, but a probability distribution that includes, erm, imaginary (i.e. complex, i.e., sqrt(-1)) components.
And there is already work applying quantum probability to judgment and decision making, largely pioneered by Jerome Busemeyer.
This isn't just an issue of the detector is affecting the behavioral of the experiment? and therefore violating the necessary assumptions for the conditional probability laws.
Does this not just say P(y|x) is not equal to P(y|x and there's a piece of observing equipment there affecting the behaviour of the particles), which doesn't violate any fundamental conditional probability laws?
Bill: That's my point: there is no joint distribution, when we use but classical probability theory (of the sort that we use all the time to analyze surveys, experiments, observational studies, etc.) we are assuming that there is an underlying joint distribution.
Again, I know this is not at all a new point (and I agree with the earlier commenter that the Feynman lectures give a good discussion). But I think it's a point worth making.
To many of the other commenters: You're reminding me how, in different ways, the joint distribution model is inappropriate to the two-slit experiment. That's exactly my point: that the classical probability model, which we use all the time, is inappropriate for modeling quantum uncertainty.
Again, when we teach probability and statistics, we typically imply that uncertainty can always be represented probabilistically. The question is only whether we're using an inappropriate model. But here's a case in which no joint distribution does the trick.
John Taylor writes: "It would be perhaps more accurate to say that conditional probability fails to model certain quantum phenomena." I'll accept that. By "The laws of conditional probability are false," what I meant was, "The laws of conditional probability are not always true."
Danny: Thanks for the link.
Larry: When you say I "just have to reason more carefully," I think you mean that if I include more information in the joint distribution, I can set this up as a coherent probability model. That's fine, but then you're going beyond the usual way we model things probabilistically. If you have to add a new random variable every time you condition on a measurement, I don't think of this as being the same sort of probability theory that we usually use. I'd call this a generalization of classical probability theory, which is my point: classical probability theory (which we use all the time in poli sci, econ, psychometrics, astronomy, etc) needs to be generalized to apply to quantum mechanics. Which makes me wonder if it should be generalized for other applications too.
And this is definitely related to Heisenberg's uncertainty principle–my point was not that it was exactly the same but that it was the same concept (as discussed in Malcom's comment). The two-slit experiment is just a simple tabletop-experiment way to focus on the key issues here.
John: Thanks for the link. Among other things, this just reminds me that physicists are smarter than the rest of us (or, at least, that they have more practice thinking hard about applied mathematics.)
Markk: You write, "I don't see why you think 4 is the same as 3 and that you could apply normal conditional rules." Again, the real point of this is not quantum mechanics–we're all in agreement about how to get right and wrong answers for the two-slit experiment–but rather that . . . what if the classical hidden-variable logic that gives the wrong answer for quantum problems, also gives wrong answers for other problems in applied statistics. We typically assume that joint distribution models can always be applied, and clearly this isn't true.
Peter, Christian: Again, yes, if you accept that taking a measurement changes the distribution, there's no problem at all! But in our everyday use of statistical modeling, we don't do that. We talk about measurement being the same as conditioning. (Think of those classical problems in conditional probability, such as the rare-disease problem or the three-envelopes problem.) In quantum mechanics, we have the experimental data and so we know not to do this reasoning. But what about other settings?
Gorobei: I do think "the entire field of probability is false," if I am following the mathematical convention that "not always true" = "false." (For example, the statement x^2 = 2*x is false. It happens to be true for x=2, but it is not always true, so we say it's false.)
David: Interesting discussion. Priming does seem like a good analogy–it's an example where it's fruitful to introduce new variables to the model to include the measurement process itself.
Winter: Thanks for the link. This looks exactly like the sort of thing I was looking for: an application of a generalized, "quantum" version of probability theory. I'll have to take a look.
Richard: See response to Peter and Christian above.
Andrew, I cannot understand why, if you admit that there's no problem if you accept that taking a measurement changes the distribution, that you still think that this means the laws of conditional probability are false. Sure, we don't encounter this in every day life in statistics, but we don't encounter a lot of things that are the norm in every day life. Statistics is always all about adjusting the model to be better based on your knowledge. If you know that taking a measurement changes the distribution, you adjust for that. The fact that you have to adjust doesn't mean that the mechanism is false.
I rethought my questions and realized that I didn't quite get what you were asking. It is a weird form of conditioning that happens when an observer is stuck into a quantum mechanical system. Multiplication, as we think of it arithmetically, isn't what we are doing when we calculate the probability of outcomes then. [And you beat me to the punch with your reply to us this morning; re: more general procedures for conditioning].
By the way, did you read Steve Hsu's Black Hole Information/Decoherence Slides. Slide 5 is begging for you to take it on.
You could also think of it this way. P3 only applies when it went through both slits. If you measure it going through one or the other, it didn't go through both it went through the one you measured. P3 comes from the interference (with itself) of the wavefunction going through both. So there really are 3 cases.
1) Photon through slit one.
2) Photon through slit two.
3) Photon through both.
Physicists usually don't call it going through both, but it is a critical part of the experiment that it is able too. And the wavefunction certainly does, which is why you include both paths in the calculation and it interferes with itself. If the wavefunction cannot go through both slits (and a measurement of the photon going through prevents this) then you only go through one and no interference.
If your detectors don't always catch the photon going through the slits, say just measure the photon about half the time, you'll end up with
1/4 P1 + 1/4 * P2 + 1/2 * P3
I'm sorry if it's been covered, but in psychology questionnaires, sometimes the ordering of the questions changes responses. For example, asking "how happy are you?" and then "how successful is your dating life?" is different from "how successful is your dating life?" and then "how happy are you?" The idea being that in the 2nd ordering, the happiness measure then becomes conditional on the dating success.
In this example, we're trying to measure the joint distribution of happiness and dating success, but in the 2nd ordering, we're probably getting a conditional response.
W2: Exactly. My question is whether a quantum-probability-style model would make sense in such a setting.
This is a great piece. I might also note a very interesting treatment of the bemusements of quantum physics by Tom Stoppard in his play 'Hapgood'.
Nary / New Delhi / India
I would second the suggestion of a previous poster: Read Robert Griffiths' "Consistent Quantum Mechanics" or Roland Omnes' books on the subject. We're not in the 1920's or 1960's. And "quantum logic" has not shown itself to be useful as an attempt for an interpretation of quantum mechanics.
I missed Andrew's last comment, so this is very late.
Andrew, you commented:
"Bill: That's my point: there is no joint distribution, when we use but classical probability theory (of the sort that we use all the time to analyze surveys, experiments, observational studies, etc.) we are assuming that there is an underlying joint distribution."/
Not every classical probability situation has a joint distribution, as Jim Berger pointed out to me (when I first mentioned the quantum mechanics example to him). You can have conditional distributions without a joint distribution. For example, x ~ N(y,1), y ~ N(x|1) has perfectly good conditional distributions, but there is no underlying joint distribution (you can't normalize the obvious distribution).
In the case of quantum mechanics, the situation is more subtle, of course. The lack of an underlying joint distribution isn't simply an inability to normalize. The paper I showed Jim is one, by Mermin, I believe, that described the famous Kochen-Specker theorem in a more transparent way.
http://en.wikipedia.org/wiki/Kochen-Specker_theor…
In this example you can construct explicitly a proof of the non-existence of an underlying joint distribution.
Interestingly, the Kochen-Specker experiment has recently been confirmed experimentally. Both physicists are still alive, and I think this is Nobel-worthy work.
I see that I already mentioned this in another blog entry:
http://www.stat.columbia.edu/~cook/movabletype/ar…
Apologies, Andrew!