Archive for the 'Jeff Biddle' Category

Dr. Popper: Or How I Learned to Stop Worrying and Love Metaphysics

Introduction to Falsificationism

Although his reputation among philosophers was never quite as exalted as it was among non-philosophers, Karl Popper was a pre-eminent figure in 20th century philosophy. As a non-philosopher, I won’t attempt to adjudicate which take on Popper is the more astute, but I think I can at least sympathize, if not fully agree, with philosophers who believe that Popper is overrated by non-philosophers. In an excellent blog post, Phillipe Lemoine gives a good explanation of why philosophers look askance at falsificationism, Popper’s most important contribution to philosophy.

According to Popper, what distinguishes or demarcates a scientific statement from a non-scientific (metaphysical) statement is whether the statement can, or could be, disproved or refuted – falsified (in the sense of being shown to be false not in the sense of being forged, misrepresented or fraudulently changed) – by an actual or potential observation. Vulnerability to potentially contradictory empirical evidence, according to Popper, is what makes science special, allowing it to progress through a kind of dialectical process of conjecture (hypothesis) and refutation (empirical testing) leading to further conjecture and refutation and so on.

Theories purporting to explain anything and everything are thus non-scientific or metaphysical. Claiming to be able to explain too much is a vice, not a virtue, in science. Science advances by risk-taking, not by playing it safe. Trying to explain too much is actually playing it safe. If you’re not willing to take the chance of putting your theory at risk, by saying that this and not that will happen — rather than saying that this or that will happen — you’re playing it safe. This view of science, portrayed by Popper in modestly heroic terms, was not unappealing to scientists, and in part accounts for the positive reception of Popper’s work among scientists.

But this heroic view of science, as Lemoine nicely explains, was just a bit oversimplified. Theories never exist in a vacuum, there is always implicit or explicit background knowledge that informs and provides context for the application of any theory from which a prediction is deduced. To deduce a prediction from any theory, background knowledge, including complementary theories that are presumed to be valid for purposes of making a prediction, is necessary. Any prediction relies not just on a single theory but on a system of related theories and auxiliary assumptions.

So when a prediction is deduced from a theory, and the predicted event is not observed, it is never unambiguously clear which of the multiple assumptions underlying the prediction is responsible for the failure of the predicted event to be observed. The one-to-one logical dependence between a theory and a prediction upon which Popper’s heroic view of science depends doesn’t exist. Because the heroic view of science is too simplified, Lemoine considers it false, at least in the naïve and heroic form in which it is often portrayed by its proponents.

But, as Lemoine himself acknowledges, Popper was not unaware of these issues and actually dealt with some if not all of them. Popper therefore dismissed those criticisms pointing to his various acknowledgments and even anticipations of and responses to the criticisms. Nevertheless, his rhetorical style was generally not to qualify his position but to present it in stark terms, thereby reinforcing the view of his critics that he actually did espouse the naïve version of falsificationism that, only under duress, would be toned down to meet the objections raised to the usual unqualified version of his argument. Popper after all believed in making bold conjectures and framing a theory in the strongest possible terms and characteristically adopted an argumentative and polemical stance in staking out his positions.

Toned-Down Falsificationism

In his tone-downed version of falsificationism, Popper acknowledged that one can never know if a prediction fails because the underlying theory is false or because one of the auxiliary assumptions required to make the prediction is false, or even because of an error in measurement. But that acknowledgment, Popper insisted, does not refute falsificationism, because falsificationism is not a scientific theory about how scientists do science; it is a normative theory about how scientists ought to do science. The normative implication of falsificationism is that scientists should not try to shield their theories by making just-so adjustments in their theories through ad hoc auxiliary assumptions, e.g., ceteris paribus assumptions, to shield their theories from empirical disproof. Rather they should accept the falsification of their theories when confronted by observations that conflict with the implications of their theories and then formulate new and better theories to replace the old ones.

But a strict methodological rule against adjusting auxiliary assumptions or making further assumptions of an ad hoc nature would have ruled out many fruitful theoretical developments resulting from attempts to account for failed predictions. For example, the planet Neptune was discovered in 1846 by scientists who posited (ad hoc) the existence of another planet to explain why the planet Uranus did not follow its predicted path. Rather than conclude that the Newtonian theory was falsified by the failure of Uranus to follow the orbital path predicted by Newtonian theory, the French astronomer Urbain Le Verrier posited the existence of another planet that would account for the path actually followed by Uranus. Now in this case, it was possible to observe the predicted position of the new planet, and its discovery in the predicted location turned out to be a sensational confirmation of Newtonian theory.

Popper therefore admitted that making an ad hoc assumption in order to save a theory from refutation was permissible under his version of normative faslisificationism, but only if the ad hoc assumption was independently testable. But suppose that, under the circumstances, it would have been impossible to observe the existence of the predicted planet, at least with the observational tools then available, making the ad hoc assumption testable only in principle, but not in practice. Strictly adhering to Popper’s methodological requirement of being able to test independently any ad hoc assumption would have meant accepting the refutation of the Newtonian theory rather than positing the untestable — but true — ad hoc other-planet hypothesis to account for the failed prediction of the orbital path of Uranus.

My point is not that ad hoc assumptions to save a theory from falsification are ok, but to point out that a strict methodological rules requiring rejection of any theory once it appears to be contradicted by empirical evidence and prohibiting the use of any ad hoc assumption to save the theory unless the ad hoc assumption is independently testable might well lead to the wrong conclusion given the nuances and special circumstances associated with every case in which a theory seems to be contradicted by observed evidence. Such contradictions are rarely so blatant that theory cannot be reconciled with the evidence. Indeed, as Popper himself recognized, all observations are themselves understood and interpreted in the light of theoretical presumptions. It is only in extreme cases that evidence cannot be interpreted in a way that more or less conforms to the theory under consideration. At first blush, the Copernican heliocentric view of the world seemed obviously contradicted by direct sensory observation that earth seems flat and the sun rise and sets. Empirical refutation could be avoided only by providing an alternative interpretation of the sensory data that could be reconciled with the apparent — and obvious — flatness and stationarity of the earth and the movement of the sun and moon in the heavens.

So the problem with falsificationism as a normative theory is that it’s not obvious why a moderately good, but less than perfect, theory should be abandoned simply because it’s not perfect and suffers from occasional predictive failures. To be sure, if a better theory than the one under consideration is available, predicting correctly whenever the one under consideration predicts correctly and predicting more accurately than the one under consideration when the latter fails to predict correctly, the alternative theory is surely preferable, but that simply underscores the point that evaluating any theory in isolation is not very important. After all, every theory, being a simplification, is an imperfect representation of reality. It is only when two or more theories are available that scientists must try to determine which of them is preferable.

Oakeshott and the Poverty of Falsificationism

These problems with falsificationism were brought into clearer focus by Michael Oakeshott in his famous essay “Rationalism in Politics,” which though not directed at Popper himself (whose colleague at the London School of Economics he was) can be read as a critique of Popper’s attempt to prescribe methodological rules for scientists to follow in carrying out their research. Methodological rules of the kind propounded by Popper are precisely the sort of supposedly rational rules of practice intended to ensure the successful outcome of an undertaking that Oakeshott believed to be ill-advised and hopelessly naïve. The rationalist conceit in Oakesott’s view is that there are demonstrably correct answers to practical questions and that practical activity is rational only when it is based on demonstrably true moral or causal rules.

The entry on Michael Oakeshott in the Stanford Encyclopedia of Philosophy summarizes Oakeshott’s position as follows:

The error of Rationalism is to think that making decisions simply requires skill in the technique of applying rules or calculating consequences. In an early essay on this theme, Oakeshott distinguishes between “technical” and “traditional” knowledge. Technical knowledge is of facts or rules that can be easily learned and applied, even by those who are without experience or lack the relevant skills. Traditional knowledge, in contrast, means “knowing how” rather than “knowing that” (Ryle 1949). It is acquired by engaging in an activity and involves judgment in handling facts or rules (RP 12–17). The point is not that rules cannot be “applied” but rather that using them skillfully or prudently means going beyond the instructions they provide.

The idea that a scientist’s decision about when to abandon one theory and replace it with another can be reduced to the application of a Popperian falsificationist maxim ignores all the special circumstances and all the accumulated theoretical and practical knowledge that a truly expert scientist will bring to bear in studying and addressing such a problem. Here is how Oakeshott addresses the problem in his famous essay.

These two sorts of knowledge, then, distinguishable but inseparable, are the twin components of the knowledge involved in every human activity. In a practical art such as cookery, nobody supposes that the knowledge that belongs to the good cook is confined to what is or what may be written down in the cookery book: technique and what I have called practical knowledge combine to make skill in cookery wherever it exists. And the same is true of the fine arts, of painting, of music, of poetry: a high degree of technical knowledge, even where it is both subtle and ready, is one thing; the ability to create a work of art, the ability to compose something with real musical qualities, the ability to write a great sonnet, is another, and requires in addition to technique, this other sort of knowledge. Again these two sorts of knowledge are involved in any genuinely scientific activity. The natural scientist will certainly make use of observation and verification that belong to his technique, but these rules remain only one of the components of his knowledge; advances in scientific knowledge were never achieved merely by following the rules. . . .

Technical knowledge . . . is susceptible of formulation in rules, principles, directions, maxims – comprehensively, in propositions. It is possible to write down technical knowledge in a book. Consequently, it does not surprise us that when an artist writes about his art, he writes only about the technique of his art. This is so, not because he is ignorant of what may be called asesthetic element, or thinks it unimportant, but because what he has to say about that he has said already (if he is a painter) in his pictures, and he knows no other way of saying it. . . . And it may be observed that this character of being susceptible of precise formulation gives to technical knowledge at least the appearance of certainty: it appears to be possible to be certain about a technique. On the other hand, it is characteristic of practical knowledge that it is not susceptible of formulation of that kind. Its normal expression is in a customary or traditional way of doing things, or, simply, in practice. And this gives it the appearance of imprecision and consequently of uncertainty, of being a matter of opinion, of probability rather than truth. It is indeed knowledge that is expressed in taste or connoisseurship, lacking rigidity and ready for the impress of the mind of the learner. . . .

Technical knowledge, in short, an be both taught and learned in the simplest meanings of these words. On the other hand, practical knowledge can neither be taught nor learned, but only imparted and acquired. It exists only in practice, and the only way to acquire it is by apprenticeship to a master – not because the master can teach it (he cannot), but because it can be acquired only by continuous contact with one who is perpetually practicing it. In the arts and in natural science what normally happens is that the pupil, in being taught and in learning the technique from his master, discovers himself to have acquired also another sort of knowledge than merely technical knowledge, without it ever having been precisely imparted and often without being able to say precisely what it is. Thus a pianist acquires artistry as well as technique, a chess-player style and insight into the game as well as knowledge of the moves, and a scientist acquires (among other things) the sort of judgement which tells him when his technique is leading him astray and the connoisseurship which enables him to distinguish the profitable from the unprofitable directions to explore.

Now, as I understand it, Rationalism is the assertion that what I have called practical knowledge is not knowledge at all, the assertion that, properly speaking, there is no knowledge which is not technical knowledge. The Rationalist holds that the only element of knowledge involved in any human activity is technical knowledge and that what I have called practical knowledge is really only a sort of nescience which would be negligible if it were not positively mischievous. (Rationalism in Politics and Other Essays, pp. 12-16)

Almost three years ago, I attended the History of Economics Society meeting at Duke University at which Jeff Biddle of Michigan State University delivered his Presidential Address, “Statistical Inference in Economics 1920-1965: Changes in Meaning and Practice, published in the June 2017 issue of the Journal of the History of Economic Thought. The paper is a remarkable survey of the differing attitudes towards using formal probability theory as the basis for making empirical inferences from the data. The underlying assumptions of probability theory about the nature of the data were widely viewed as being too extreme to make probability theory an acceptable basis for empirical inferences from the data. However, the early negative attitudes toward accepting probability theory as the basis for making statistical inferences from data were gradually overcome (or disregarded). But as late as the 1960s, even though econometric techniques were becoming more widely accepted, a great deal of empirical work, including by some of the leading empirical economists of the time, avoided using the techniques of statistical inference to assess empirical data using regression analysis. Only in the 1970s was there a rapid sea-change in professional opinion that made statistical inference based on explicit probabilisitic assumptions about underlying data distributions the requisite technique for drawing empirical inferences from the analysis of economic data. In the final section of his paper, Biddle offers an explanation for this rapid change in professional attitude toward the use of probabilistic assumptions about data distributions as the required method of the empirical assessment of economic data.

By the 1970s, there was a broad consensus in the profession that inferential methods justified by probability theory—methods of producing estimates, of assessing the reliability of those estimates, and of testing hypotheses—were not only applicable to economic data, but were a necessary part of almost any attempt to generalize on the basis of economic data. . . .

This paper has been concerned with beliefs and practices of economists who wanted to use samples of statistical data as a basis for drawing conclusions about what was true, or probably true, in the world beyond the sample. In this setting, “mechanical objectivity” means employing a set of explicit and detailed rules and procedures to produce conclusions that are objective in the sense that if many different people took the same statistical information, and followed the same rules, they would come to exactly the same conclusions. The trustworthiness of the conclusion depends on the quality of the method. The classical theory of inference is a prime example of this sort of mechanical objectivity.

Porter [Trust in Numbers: The Pursuit of Objectivity in Science and Public Life] contrasts mechanical objectivity with an objectivity based on the “expert judgment” of those who analyze data. Expertise is acquired through a sanctioned training process, enhanced by experience, and displayed through a record of work meeting the approval of other experts. One’s faith in the analyst’s conclusions depends on one’s assessment of the quality of his disciplinary expertise and his commitment to the ideal of scientific objectivity. Elmer Working’s method of determining whether measured correlations represented true cause-and-effect relationships involved a good amount of expert judgment. So, too, did Gregg Lewis’s adjustments of the various estimates of the union/non-union wage gap, in light of problems with the data and peculiarities of the times and markets from which they came. Keynes and Persons pushed for a definition of statistical inference that incorporated space for the exercise of expert judgment; what Arthur Goldberger and Lawrence Klein referred to as ‘statistical inference’ had no explicit place for expert judgment.

Speaking in these terms, I would say that in the 1920s and 1930s, empirical economists explicitly acknowledged the need for expert judgment in making statistical inferences. At the same time, mechanical objectivity was valued—there are many examples of economists of that period employing rule-oriented, replicable procedures for drawing conclusions from economic data. The rejection of the classical theory of inference during this period was simply a rejection of one particular means for achieving mechanical objectivity. By the 1970s, however, this one type of mechanical objectivity had become an almost required part of the process of drawing conclusions from economic data, and was taught to every economics graduate student.

Porter emphasizes the tension between the desire for mechanically objective methods and the belief in the importance of expert judgment in interpreting statistical evidence. This tension can certainly be seen in economists’ writings on statistical inference throughout the twentieth century. However, it would be wrong to characterize what happened to statistical inference between the 1940s and the 1970s as a displace-ment of procedures requiring expert judgment by mechanically objective procedures. In the econometric textbooks published after 1960, explicit instruction on statistical inference was largely limited to instruction in the mechanically objective procedures of the classical theory of inference. It was understood, however, that expert judgment was still an important part of empirical economic analysis, particularly in the specification of the models to be estimated. But the disciplinary knowledge needed for this task was to be taught in other classes, using other textbooks.

And in practice, even after the statistical model had been chosen, the estimates and standard errors calculated, and the hypothesis tests conducted, there was still room to exercise a fair amount of judgment before drawing conclusions from the statistical results. Indeed, as Marcel Boumans (2015, pp. 84–85) emphasizes, no procedure for drawing conclusions from data, no matter how algorithmic or rule bound, can dispense entirely with the need for expert judgment. This fact, though largely unacknowledged in the post-1960s econometrics textbooks, would not be denied or decried by empirical economists of the 1970s or today.

This does not mean, however, that the widespread embrace of the classical theory of inference was simply a change in rhetoric. When application of classical inferential procedures became a necessary part of economists’ analyses of statistical data, the results of applying those procedures came to act as constraints on the set of claims that a researcher could credibly make to his peers on the basis of that data. For example, if a regression analysis of sample data yielded a large and positive partial correlation, but the correlation was not “statistically significant,” it would simply not be accepted as evidence that the “population” correlation was positive. If estimation of a statistical model produced a significant estimate of a relationship between two variables, but a statistical test led to rejection of an assumption required for the model to produce unbiased estimates, the evidence of a relationship would be heavily discounted.

So, as we consider the emergence of the post-1970s consensus on how to draw conclusions from samples of statistical data, there are arguably two things to be explained. First, how did it come about that using a mechanically objective procedure to generalize on the basis of statistical measures went from being a choice determined by the preferences of the analyst to a professional requirement, one that had real con-sequences for what economists would and would not assert on the basis of a body of statistical evidence? Second, why was it the classical theory of inference that became the required form of mechanical objectivity? . . .

Perhaps searching for an explanation that focuses on the classical theory of inference as a means of achieving mechanical objectivity emphasizes the wrong characteristic of that theory. In contrast to earlier forms of mechanical objectivity used by economists, such as standardized methods of time series decomposition employed since the 1920s, the classical theory of inference is derived from, and justified by, a body of formal mathematics with impeccable credentials: modern probability theory. During a period when the value placed on mathematical expression in economics was increasing, it may have been this feature of the classical theory of inference that increased its perceived value enough to overwhelm long-standing concerns that it was not applicable to economic data. In other words, maybe the chief causes of the profession’s embrace of the classical theory of inference are those that drove the broader mathematization of economics, and one should simply look to the literature that explores possible explanations for that phenomenon rather than seeking a special explanation of the embrace of the classical theory of inference.

I would suggest one more factor that might have made the classical theory of inference more attractive to economists in the 1950s and 1960s: the changing needs of pedagogy in graduate economics programs. As I have just argued, since the 1920s, economists have employed both judgment based on expertise and mechanically objective data-processing procedures when generalizing from economic data. One important difference between these two modes of analysis is how they are taught and learned. The classical theory of inference as used by economists can be taught to many students simultaneously as a set of rules and procedures, recorded in a textbook and applicable to “data” in general. This is in contrast to the judgment-based reasoning that combines knowledge of statistical methods with knowledge of the circumstances under which the particular data being analyzed were generated. This form of reasoning is harder to teach in a classroom or codify in a textbook, and is probably best taught using an apprenticeship model, such as that which ideally exists when an aspiring economist writes a thesis under the supervision of an experienced empirical researcher.

During the 1950s and 1960s, the ratio of PhD candidates to senior faculty in PhD-granting programs was increasing rapidly. One consequence of this, I suspect, was that experienced empirical economists had less time to devote to providing each interested student with individualized feedback on his attempts to analyze data, so that relatively more of a student’s training in empirical economics came in an econometrics classroom, using a book that taught statistical inference as the application of classical inference procedures. As training in empirical economics came more and more to be classroom training, competence in empirical economics came more and more to mean mastery of the mechanically objective techniques taught in the econometrics classroom, a competence displayed to others by application of those techniques. Less time in the training process being spent on judgment-based procedures for interpreting statistical results meant fewer researchers using such procedures, or looking for them when evaluating the work of others.

This process, if indeed it happened, would not explain why the classical theory of inference was the particular mechanically objective method that came to dominate classroom training in econometrics; for that, I would again point to the classical theory’s link to a general and mathematically formalistic theory. But it does help to explain why the application of mechanically objective procedures came to be regarded as a necessary means of determining the reliability of a set of statistical measures and the extent to which they provided evidence for assertions about reality. This conjecture fits in with a larger possibility that I believe is worth further exploration: that is, that the changing nature of graduate education in economics might sometimes be a cause as well as a consequence of changing research practices in economics. (pp. 167-70)

The correspondence between Biddle’s discussion of the change in the attitude of the economics profession about how inferences should be drawn from data about empirical relationships is strikingly similar to Oakeshott’s discussion and depressing in its implications for the decline of expert judgment by economics, expert judgment having been replaced by mechanical and technical knowledge that can be objectively summarized in the form of rules or tests for statistical significance, itself an entirely arbitrary convention lacking any logical, or self-evident, justification.

But my point is not to condemn using rules derived from classical probability theory to assess the significance of relationships statistically estimated from historical data, but to challenge the methodological prohibition against the kinds of expert judgments that many statistically knowledgeable economists like Nobel Prize winners such as Simon Kuznets, Milton Friedman, Theodore Schultz and Gary Becker routinely used to make in their empirical studies. As Biddle notes:

In 1957, Milton Friedman published his theory of the consumption function. Friedman certainly understood statistical theory and probability theory as well as anyone in the profession in the 1950s, and he used statistical theory to derive testable hypotheses from his economic model: hypotheses about the relationships between estimates of the marginal propensity to consume for different groups and from different types of data. But one will search his book almost in vain for applications of the classical methods of inference. Six years later, Friedman and Anna Schwartz published their Monetary History of the United States, a work packed with graphs and tables of statistical data, as well as numerous generalizations based on that data. But the book contains no classical hypothesis tests, no confidence intervals, no reports of statistical significance or insignificance, and only a handful of regressions. (p. 164)

Friedman’s work on the Monetary History is still regarded as authoritative. My own view is that much of the Monetary History was either wrong or misleading. But my quarrel with the Monetary History mainly pertains to the era in which the US was on the gold standard, inasmuch as Friedman simply did not understand how the gold standard worked, either in theory or in practice, as McCloskey and Zecher showed in two important papers (here and here). Also see my posts about the empirical mistakes in the Monetary History (here and here). But Friedman’s problem was bad monetary theory, not bad empirical technique.

Friedman’s theoretical misunderstandings have no relationship to the misguided prohibition against doing quantitative empirical research without obeying the arbitrary methodological requirement that statistical be derived in a way that measures the statistical significance of the estimated relationships. These methodological requirements have been adopted to support a self-defeating pretense to scientific rigor, necessitating the use of relatively advanced mathematical techniques to perform quantitative empirical research. The methodological requirements for measuring statistical relationships were never actually shown to be generate more accurate or reliable statistical results than those derived from the less technically advanced, but in some respects more economically sophisticated, techniques that have almost totally been displaced. One more example of the fallacy that there is but one technique of research that ensures the discovery of truth, a mistake even Popper was never guilty of.

Methodological Prescriptions Go from Bad to Worse

The methodological requirement for the use of formal tests of statistical significance before any quantitative statistical estimate could be credited was a prelude, though it would be a stretch to link them causally, to another and more insidious form of methodological tyrannizing: the insistence that any macroeconomic model be derived from explicit micro-foundations based on the solution of an intertemporal-optimization exercise. Of course, the idea that such a model was in any way micro-founded was a pretense, the solution being derived only through the fiction of a single representative agent, rendering the entire optimization exercise fundamentally illegitimate and the exact opposite of micro-founded model. Having already explained in previous posts why transforming microfoundations from a legitimate theoretical goal into methodological necessity has taken a generation of macroeconomists down a blind alley (here, here, here, and here) I will only make the further comment that this is yet another example of the danger of elevating technique over practice and substance.

Popper’s More Important Contribution

This post has largely concurred with the negative assessment of Popper’s work registered by Lemoine. But I wish to end on a positive note, because I have learned a great deal from Popper, and even if he is overrated as a philosopher of science, he undoubtedly deserves great credit for suggesting falsifiability as the criterion by which to distinguish between science and metaphysics. Even if that criterion does not hold up, or holds up only when qualified to a greater extent than Popper admitted, Popper made a hugely important contribution by demolishing the startling claim of the Logical Positivists who in the 1920s and 1930s argued that only statements that can be empirically verified through direct or indirect observation have meaning, all other statements being meaningless or nonsensical. That position itself now seems to verge on the nonsensical. But at the time many of the world’s leading philosophers, including Ludwig Wittgenstein, no less, seemed to accept that remarkable view.

Thus, Popper’s demarcation between science and metaphysics had a two-fold significance. First, that it is not verifiability, but falsifiability, that distinguishes science from metaphysics. That’s the contribution for which Popper is usually remembered now. But it was really the other aspect of his contribution that was more significant: that even metaphysical, non-scientific, statements can be meaningful. According to the Logical Positivists, unless you are talking about something that can be empirically verified, you are talking nonsense. In other words they were deliberately hoisting themselves on their petard, because their discussions about what is and what is not meaningful, being discussions about concepts, not empirically verifiable objects, were themselves – on the Positivists’ own criterion of meaning — meaningless and nonsensical.

Popper made the world safe for metaphysics, and the world is a better place as a result. Science is a wonderful enterprise, rewarding for its own sake and because it contributes to the well-being of many millions of human beings, though like many other human endeavors, it can also have unintended and unfortunate consequences. But metaphysics, because it was used as a term of abuse by the Positivists, is still, too often, used as an epithet. It shouldn’t be.

Certainly economists should aspire to tease out whatever empirical implications they can from their theories. But that doesn’t mean that an economic theory with no falsifiable implications is useless, a judgment whereby Mark Blaug declared general equilibrium theory to be unscientific and useless, a judgment that I don’t think has stood the test of time. And even if general equilibrium theory is simply metaphysical, my response would be: so what? It could still serve as a source of inspiration and insight to us in framing other theories that may have falsifiable implications. And even if, in its current form, a theory has no empirical content, there is always the possibility that, through further discussion, critical analysis and creative thought, empirically falsifiable implications may yet become apparent.

Falsifiability is certainly a good quality for a theory to have, but even an unfalsifiable theory may be worth paying attention to and worth thinking about.

What’s so Great about Science? or, How I Learned to Stop Worrying and Love Metaphysics

A couple of weeks ago, a lot people in a lot of places marched for science. What struck me about those marches is that there is almost nobody out there that is openly and explicitly campaigning against science. There are, of course, a few flat-earthers who, if one looks for them very diligently, can be found. But does anyone — including the flat-earthers themselves – think that they are serious? There are also Creationists who believe that the earth was created and designed by a Supreme Being – usually along the lines of the Biblical account in the Book of Genesis. But Creationists don’t reject science in general, they reject a particular scientific theory, because they believe it to be untrue, and try to defend their beliefs with a variety of arguments couched in scientific terms. I don’t defend Creationist arguments, but just because someone makes a bad scientific argument, it doesn’t mean that the person making the argument is an opponent of science. To be sure, the reason that Creationists make bad arguments is that they hold a set of beliefs about how the world came to exist that aren’t based on science but on some religious or ideological belief system. But people come up with arguments all the time to justify beliefs for which they have no evidentiary or “scientific” basis.

I mean one of the two greatest scientists that ever lived criticized quantum mechanics, because he couldn’t accept that the world was not fully determined by the laws of nature, or, as he put it so pithily: “God does not play dice with the universe.” I understand that Einstein was not religious, and wasn’t making a religious argument, but he was basing his scientific view of what an acceptable theory should be on certain metaphysical predispositions that he held, and he was expressing his disinclination to accept a theory inconsistent with those predispositions. A scientific argument is judged on its merits, not on the motivations for advancing the argument. And I won’t even discuss the voluminous writings of the other one of the two greatest scientists who ever lived on alchemy and other occult topics.

Similarly, there are climate-change deniers who question the scientific basis for asserting that temperatures have been rising around the world, and that the increase in temperatures results from human activity that discharges greenhouse gasses into the atmosphere. Deniers of global warming may be biased and may be making bad scientific arguments, but the mere fact – and for purposes of this discussion I don’t dispute that it is a fact – that global warming is real and caused by human activity does not mean that to dispute those facts unmasks that person as an opponent of science. R. A. Fisher, the greatest mathematical statistician of the first half of the twentieth century, who developed most of the statistical techniques now used in experimental research, severely damaged his reputation by rejecting or dismissing evidence that smoking tobacco is a primary cause of cancer. Some critics accused Fisher of having been compromised by financial inducements from the tobacco industry, while others attribute his positions to his own smoking habits or anti-puritanical tendencies. In any event, Fisher’s arguments against a causal link between smoking tobacco and lung cancer are now viewed as an embarrassing stain on an otherwise illustrious career. But Fisher’s lapse of judgment, and perhaps of ethics, don’t justify accusing him of opposition to science. Climate-change deniers don’t reject science; they reject or disagree with the conclusions of most climate scientists. They may have lousy reasons for their views – either that the climate is not changing or that whatever change has occurred is unrelated to the human production of greenhouse gasses – but holding wrong or biased views doesn’t make someone an opponent of science.

I don’t say that there are no people who dislike science – I mean don’t like it because of what it stands for, not because they find it difficult or boring. Such people may be opposed to teaching science and to funding scientific research and don’t want scientific knowledge to influence public policy or the way people live. But, as far as I can tell, they have little influence. There is just no one out there that wants to outlaw scientific research, or trying to criminalize the teaching of science. They may not want to fund science, but they aren’t trying to ban it. In fact, I doubt that the prestige and authority of science has ever been higher than it is now. Certainly religion, especially organized religion, to which science was once subordinate if not subservient, no longer exercises anything near the authority that science now does.

The reason for this extended introduction into the topic that I really want to discuss is to provide some context for my belief that economists worry too much about whether economics is really a science. It was such a validation for economists when the Swedish Central Bank piggy-backed on the storied Nobel Prize to create its ersatz “Nobel Memorial Prize” for economic science. (I note with regret the recent passing of William Baumol, whose failure to receive the Nobel Prize in economics, like that of Armen Alchian, was in fact a deplorable failure of good judgment on the part of the Nobel Committee.) And the self-consciousness of economists about the possibly dubious status of economics as a science is a reflection of the exalted status of science in society. So naturally, if one is seeking to increase the prestige of his own occupation and of the intellectual discipline in which one does research, it helps enormously to be able to say: “oh, yes, I am an economist, and economics is a science, which means that I really am a scientist, just like those guys that win Nobel Prizes.” It also helps to be able to show that your scientific research involves a lot of mathematics, because scientists use math in their theories, sometimes a lot of math, which makes it hard for non-scientists to understand what scientists are doing. We economists also use math in our theories, sometimes a lot math, and that’s why it’s just as hard for non-economists to understand what we economists are doing as it is to understand what real scientists are doing. So we really are scientists, aren’t we?”

Where did this obsession with science come from? I think it’s fairly recent, but my sketchy knowledge of the history of science prevents me from getting too deeply into that discussion. But until relatively modern times, science was subsumed under the heading of philosophy — Greek for the love of wisdom. But philosophy is a very broad subject, so eventually that part of philosophy that was concerned with the world as it actually exists was called natural philosophy as opposed to say, ethical and moral philosophy. After the stunning achievements of Newton and his successors, and after Francis Bacon outlined an inductive method for achieving knowledge of the world, the disjunction between mere speculative thought and empirically based research, which was what science supposedly exemplifies, became increasingly sharp. And the inductive method seemed to be the right way to do science.

David Hume and Immanuel Kant struggled with limited success to make sense of induction, because a general proposition cannot be logically deduced from a set of observations, however numerous. Despite the logical problem of induction, early in the early twentieth century a philosophical movement based in Vienna called logical positivism arrived at the conclusion that not only is all scientific knowledge acquired inductively through sensory experience and observation, but no meaning can be attached to any statement unless the statement makes reference to something about which we have or could have sensory experience; to be meaningful a statement must be verified or at least verifiable, so that its truth could be either verified or refuted. Any reference to concepts that have no basis in sensory experience is simply meaningless, i.e., a form of nonsense. Thus, science became not just the epitome of valid, certain, reliable, verified knowledge, which is what people were led to believe by the stunning success of Newton’s theory, it became the exemplar of meaningful discourse. Unless our statements refer to some observable, verifiable object, we are talking nonsense. And in the first half of the twentieth century, logical positivism dominated academic philosophy, at least in the English speaking world, thereby exercising great influence over how economists thought about their own discipline and its scientific status.

Logical positivism was subjected to rigorous criticism by Karl Popper in his early work Logik der Forschung (English translation The Logic of Scientific Discovery). His central point was that scientific theories are less about what is or has been observed, but about what cannot be observed. The empirical content of a scientific proposition consists in the range of observations that the theory says are not possible. The more observations excluded by the theory the greater its empirical content. A theory that is consistent with any observation, has no empirical content. Thus, paradoxically, scientific theories, under the logical positivist doctrine, would have to be considered nonsensical, because they tell us what can’t be observed. And because it is always possible that an excluded observation – the black swan – which our scientific theory tells us can’t be observed, will be observed, scientific theories can never be definitively verified. If a scientific theory can’t verified, then according to the positivists’ own criterion, the theory is nonsense. Of course, this just shows that the positivist criterion of meaning was nonsensical, because obviously scientific theories are completely meaningful despite being unverifiable.

Popper therefore concluded that verification or verifiability can’t be a criterion of meaning. In its place he proposed the criterion of falsification (i.e., refutation, not misrepresentation), but falsification became a criterion not for distinguishing between what is meaningful and what is meaningless, but between science and metaphysics. There is no reason why metaphysical statements (statements lacking empirical content) cannot be perfectly meaningful; they just aren’t scientific. Popper was misinterpreted by many to have simply substituted falsifiability for verifiability as a criterion of meaning; that was a mistaken interpretation, which Popper explicitly rejected.

So, in using the term “meaningful theorems” to refer to potentially refutable propositions that can be derived from economic theory using the method of comparative statics, Paul Samuelson in his Foundations of Economic Analysis adopted the interpretation of Popper’s demarcation criterion between science and metaphysics as if it were a demarcation criterion between meaning and nonsense. I conjecture that Samuelson’s unfortunate lapse into the discredited verbal usage of logical positivism may have reinforced the unhealthy inclination of economists to feel the need to prove their scientific credentials in order to even engage in meaningful discourse.

While Popper certainly performed a valuable service in clearing up the positivist confusion about meaning, he adopted a very prescriptive methodology aimed at making scientific practice more scientific in the sense of exposing theories to, rather than immunizing them against, attempts at refutation, because, according to Popper, it is only if after our theories survive powerful attempts to show that they are false that we can have confidence that those theories may be truthful or at least come close to being truthful. In principle, Popper was not wrong in encouraging scientists to formulate theories that are empirically testable by specifying what kinds of observations would be inconsistent with their theories. But in practice, that advice has been difficult to follow, and not only because researchers try to avoid subjecting their pet theories to tests that might prove them wrong.

Although Popper often cited historical examples to support his view that science progresses through an ongoing process of theoretical conjecture and empirical refutation, historians of science have had no trouble finding instances in which scientists did not follow Popper’s methodological rules and continued to maintain theories even after they had been refuted by evidence or after other theories had been shown to generate more accurate predictions than their own theories. Popper parried this objection by saying that his methodological rules were not positive (i.e., descriptive of science), but normative (i.e., prescriptive of how to do good science). In other words, Popper’s scientific methodology was itself not empirically refutable and scientific, but empirically irrefutable and metaphysical. I point out the unscientific character of Popper’s methodology of science, not to criticize Popper, but to point out that Popper himself did not believe that science is itself the final authority and ultimate arbiter of scientific practice.

But the more important lesson from the critical discussions of Popper’s methodological rules seems to me to be that they are too rigid to accommodate all the considerations that are relevant to assessing scientific theories and deciding whether those theories should be discarded or, at least tentatively, maintained. And Popper’s methodological rules are especially ill-suited for economics and other disciplines in which the empirical implications of theories depend on a large number of jointly-maintained hypotheses, so that it is hard to identify which of several maintained hypotheses is responsible for the failure of a predicted outcome to match the observed outcome. That of course is the well-known ceteris paribus problem, and it requires a very capable practitioner to know when to apply the ceteris paribus condition and which variables to hold constants and which to allow to vary. Popper’s methodological rules tell us to reject a theory when its predictions are mistaken, and Popper regarded the ceteris paribus quite skeptically as an illegitimate immunizing stratagem. That describes a profound dilemma for economics. On the one hand, it is hard to imagine how economic theory could be applied without using the ceteris paribus qualification, on the other hand, the qualification diminishes empirical content of economic theory.

Empirical problems are amplified by the infirmities of the data that economists typically use to derive quantitative predictions from their models. The accuracy of the data is often questionable, and the relationships between the data and the theoretical concepts they are supposed to measure are often dubious. Moreover, the assumptions about the data-generating process (e.g., independent and identically distributed random variables, randomly selected observations, omitted explanatory variables are uncorrelated with the dependent variable) necessary for the classical statistical techniques to generate unbiased estimates of the theoretical coefficients are almost impossibly stringent. Econometricians are certainly well aware of these issues, and they have discovered methods of mitigating them, but the problems with the data routinely used by economists and the complicated issues involved in developing and applying techniques to cope with those problems make it very difficult to use statistical techniques to reach definitive conclusions about empirical questions.

Jeff Biddle, one of the leading contemporary historians of economics, has a wonderful paper (“Statistical Inference in Economics 1920-1965: Changes in Meaning and Practice”)– his 2016 presidential address to the History of Economics Society – discussing how the modern statistical techniques based on concepts and methods derived from probability theory gradually became the standard empirical and statistical techniques used by economists, even though many distinguished earlier researchers who were neither unaware of, nor unschooled in, the newer techniques believed them to be inappropriate for analyzing economic data. Here is the abstract of Biddle’s paper.

This paper reviews changes over time in the meaning that economists in the US attributed to the phrase “statistical inference”, as well as changes in how inference was conducted. Prior to WWII, leading statistical economists rejected probability theory as a source of measures and procedures to be used in statistical inference. Haavelmo and the econometricians associated with the early Cowles Commission developed an approach to statistical inference based on concepts and measures derived from probability theory, but the arguments they offered in defense of this approach were not always responsive to the concerns of earlier empirical economists that the data available to economists did not satisfy the assumptions required for such an approach. Despite this, after a period of about 25 years, a consensus developed that methods of inference derived from probability theory were an almost essential part of empirical research in economics. I close the paper with some speculation on possible reasons for this transformation in thinking about statistical inference.

I quote one passage from Biddle’s paper:

As I have noted, the leading statistical economists of the 1920s and 1930s were also unwilling to assume that any sample they might have was representative of the universe they cared about. This was particularly true of time series, and Haavelmo’s proposal to think of time series as a random selection of the output of a stable mechanism did not really address one of their concerns – that the structure of the “mechanism” could not be expected to remain stable for long periods of time. As Schultz pithily put it, “‘the universe’ of our time series does not ‘stay put’” (Schultz 1938, p. 215). Working commented that there was nothing in the theory of sampling that warranted our saying that “the conditions of covariance obtaining in the sample (would) hold true at any time in the future” (Advisory Committee 1928, p. 275). As I have already noted, Persons went further, arguing that treating a time series as a sample from which a future observation would be a random draw was not only inaccurate but ignored useful information about unusual circumstances surrounding various observations in the series, and the unusual circumstances likely to surround the future observations about which one wished to draw conclusions (Persons 1924, p. 7). And, the belief that samples were unlikely to be representative of the universe in which the economists had an interest applied to cross section data as well. The Cowles econometricians offered to little assuage these concerns except the hope that it would be possible to specify the equations describing the systematic part of the mechanism of interest in a way that captured the impact of factors that made for structural change in the case of time series, or factors that led cross section samples to be systematically different from the universe of interest.

It is not my purpose to argue that the economists who rejected the classical theory of inference had better arguments than the Cowles econometricians, or had a better approach to analyzing economic data given the nature of those data, the analytical tools available, and the potential for further development of those tools. I only wish to offer this account of the differences between the Cowles econometricians and the previously dominant professional opinion on appropriate methods of statistical inference as an example of a phenomenon that is not uncommon in the history of economics. Revolutions in economics, or “turns”, to use a currently more popular term, typically involve new concepts and analytical methods. But they also often involve a willingness to employ assumptions considered by most economists at the time to be too unrealistic, a willingness that arises because the assumptions allow progress to be made with the new concepts and methods. Obviously, in the decades after Haavelmo’s essay on the probability approach, there was a significant change in the list of assumptions about economic data that empirical economists were routinely willing to make in order to facilitate empirical research.

Let me now quote from a recent book (To Explain the World) by Steven Weinberg, perhaps – even though a movie about his life has not (yet) been made — the greatest living physicist:

Newton’s theory of gravitation made successful predictions for simple phenomena like planetary motion, but it could not give a quantitative account of more complicated phenomena, like the tides. We are in a similar position today with regard to the strong forces that hold quarks together inside the protons and neutrons inside the atomic nucleus, a theory known as quantum chromodynamics. This theory has been successful in accounting for certain processes at high energy, such as the production of various strongly interacting particles in the annihilation of energetic electrons and their antiparticles, and its successes convince us that the theory is correct. We cannot use the theory to calculate precise values for other things that we would like to explain, like the masses of the proton and neutron, because the calculations is too complicated. Here, as for Newton’s theory of the tides, the proper attitude is patience. Physical theories are validated when they give us the ability to calculate enough things that are sufficiently simple to allow reliable calculations, even if we can’t calculate everything that we might want to calculate.

So Weinberg is very much aware of the limits that even physics faces in making accurate predictions. Only a small subset (relative to the universe of physical phenomena) of simple effects can be calculated, but the capacity of physics to make very accurate predictions of simple phenomena gives us a measure of confidence that the theory would be reliable in making more complicated predictions if only we had the computing capacity to make those more complicated predictions. But in economics the set of simple predictions that can be accurately made is almost nil, because economics is inherently a theory a complex social phenomena, and simplifying the real world problems to which we apply the theory to allow testable predictions to be made is extremely difficult and hardly ever possible. Experimental economists try to create conditions in which this can be done in controlled settings, but whether these experimental results have much relevance for real-world applications is open to question.

The problematic relationship between economic theory and empirical evidence is deeply rooted in the nature of economic theory and the very complex nature of the phenomena that economic theory seek to explain. It is very difficult to isolate simple real-world events in which economic theories can be put to decisive empirical tests that allow us to put competing theories to decisive tests based on unambiguous observations that are either consistent with or contrary to the predictions generated by those theories. Under those circumstances, if we apply the Popperian criterion for demarcation between science and metaphysics to economics, it is not at all clear to me whether economics is more on the science side of the line than on the metaphysics side.

Certainly, there are refutable implications of economic theory that can be deduced, but these implications are often subject to qualification, so the refutable implications are often refutable only n principle, but not in practice. Many fastidious economic methodologists, notably Mark Blaug, voiced unhappiness about this state of affairs and blamed economists for not being more ruthless in applying Popperian test of empirical refutation to their theories. Surely Blaug had a point, but the infrequency of empirical refutation of theories in economics is, I think, less attributable to bad methodological practice on the part of economists than to the nature of the theories that economists work with and the inherent ambiguities of the empirical evidence with which those theories can be tested. We might as well just face up to the fact that, to a large extent, empirical evidence is simply not clear cut enough to force us to discard well-entrenched economic theories, because well-entrenched economic theories can be adjusted and reformulated in response to apparently contrary evidence in ways that allow those theories to live on to fight another day, theories typically having enough moving parts to allow them to be adjusted as needed to accommodate anomalous or inconvenient empirical evidence.

Popper’s somewhat disloyal disciple, Imre Lakatos, talked about scientific theories in the context of scientific research programs, a research program being an amalgam of related theories which share a common inner core of theoretical principles or axioms which are not subject to refutation. Lakatos called these deep axiomatic core of principles the hard core of the research program. The hard core defines the program so it is fundamentally fixed and not open to refutation. The empirical content of the research program is provided by a protective belt of specific theories that are subject to refutation and, when refuted, can be replaced as needed with alternative theories that are consistent with both the theoretical hard core and the empirical evidence. What determines the success of a scientific research program is whether it is progressive or degenerating. A progressive research program accumulates an increasingly dense, but evolving, protective belt of theories in response to new theoretical and empirical problems or puzzles that are generated within the research program to keep researchers busy and to attract into the program new researchers seeking problems to solve. In contrast, a degenerating research program is unable to find enough interesting new problems or puzzles to keep researchers busy much less attract new ones.

Despite its Popperian origins, the largely sociological Lakatosian account of how science evolves and progresses was hardly congenial to Popper’s sensibilities, because the success of a research program is not strictly determined by the process of conjecture and refutation envisioned by Popper. But the important point for me is that a Lakatosian research program can be progressive even if it is metaphysical and not scientific. What matters is that it offer opportunities for researchers to find and to solve or even just to talk about solving new problems, thereby attracting new researchers into the program.

It does appear that economics has for at least two centuries been a progressive research program. But it is not clear that is a really scientific research program, because the nature of economic theory is so flexible that it can be adapted as needed to explain almost any set of observations. Almost any observation can be set up and solved in terms of some sort of constrained optimization problem. What the task requires is sufficient ingenuity on the part of the theorist to formulate the problem in such a way that the desired outcome can be derived as the solution of a constrained optimization problem. The hard core of the research program is therefore never at risk, and the protective belt can always be modified as needed to generate the sort of solution that is compatible with the theoretical hard core. The scope for true refutation has thus been effectively narrowed to eliminate any real scope for refutation, leaving us with a progressive metaphysical research program.

I am not denying that it would be preferable if economics could be a truly scientific research program, but it is not clear to me how much can be done about it. The complexity of the phenomena, the multiplicity of the hypotheses required to explain the data, and the ambiguous and not fully reliable nature of most of the data that economists have available devilishly conspire to render Popperian falsificationism an illusory ideal in economics. That is not an excuse for cynicism, just a warning against unrealistic expectations about what economics can accomplish. And the last thing that I am suggesting is that we stop paying attention to the data that we have or stop trying to improve the quality of the data that we have to work with.


About Me

David Glasner
Washington, DC

I am an economist in the Washington DC area. My research and writing has been mostly on monetary economics and policy and the history of economics. In my book Free Banking and Monetary Reform, I argued for a non-Monetarist non-Keynesian approach to monetary policy, based on a theory of a competitive supply of money. Over the years, I have become increasingly impressed by the similarities between my approach and that of R. G. Hawtrey and hope to bring Hawtrey’s unduly neglected contributions to the attention of a wider audience.

My new book Studies in the History of Monetary Theory: Controversies and Clarifications has been published by Palgrave Macmillan

Follow me on Twitter @david_glasner

Archives

Enter your email address to follow this blog and receive notifications of new posts by email.

Join 3,263 other subscribers
Follow Uneasy Money on WordPress.com