Posts Tagged 'methodology'

Dr. Popper: Or How I Learned to Stop Worrying and Love Metaphysics

Introduction to Falsificationism

Although his reputation among philosophers was never quite as exalted as it was among non-philosophers, Karl Popper was a pre-eminent figure in 20th century philosophy. As a non-philosopher, I won’t attempt to adjudicate which take on Popper is the more astute, but I think I can at least sympathize, if not fully agree, with philosophers who believe that Popper is overrated by non-philosophers. In an excellent blog post, Phillipe Lemoine gives a good explanation of why philosophers look askance at falsificationism, Popper’s most important contribution to philosophy.

According to Popper, what distinguishes or demarcates a scientific statement from a non-scientific (metaphysical) statement is whether the statement can, or could be, disproved or refuted – falsified (in the sense of being shown to be false not in the sense of being forged, misrepresented or fraudulently changed) – by an actual or potential observation. Vulnerability to potentially contradictory empirical evidence, according to Popper, is what makes science special, allowing it to progress through a kind of dialectical process of conjecture (hypothesis) and refutation (empirical testing) leading to further conjecture and refutation and so on.

Theories purporting to explain anything and everything are thus non-scientific or metaphysical. Claiming to be able to explain too much is a vice, not a virtue, in science. Science advances by risk-taking, not by playing it safe. Trying to explain too much is actually playing it safe. If you’re not willing to take the chance of putting your theory at risk, by saying that this and not that will happen — rather than saying that this or that will happen — you’re playing it safe. This view of science, portrayed by Popper in modestly heroic terms, was not unappealing to scientists, and in part accounts for the positive reception of Popper’s work among scientists.

But this heroic view of science, as Lemoine nicely explains, was just a bit oversimplified. Theories never exist in a vacuum, there is always implicit or explicit background knowledge that informs and provides context for the application of any theory from which a prediction is deduced. To deduce a prediction from any theory, background knowledge, including complementary theories that are presumed to be valid for purposes of making a prediction, is necessary. Any prediction relies not just on a single theory but on a system of related theories and auxiliary assumptions.

So when a prediction is deduced from a theory, and the predicted event is not observed, it is never unambiguously clear which of the multiple assumptions underlying the prediction is responsible for the failure of the predicted event to be observed. The one-to-one logical dependence between a theory and a prediction upon which Popper’s heroic view of science depends doesn’t exist. Because the heroic view of science is too simplified, Lemoine considers it false, at least in the naïve and heroic form in which it is often portrayed by its proponents.

But, as Lemoine himself acknowledges, Popper was not unaware of these issues and actually dealt with some if not all of them. Popper therefore dismissed those criticisms pointing to his various acknowledgments and even anticipations of and responses to the criticisms. Nevertheless, his rhetorical style was generally not to qualify his position but to present it in stark terms, thereby reinforcing the view of his critics that he actually did espouse the naïve version of falsificationism that, only under duress, would be toned down to meet the objections raised to the usual unqualified version of his argument. Popper after all believed in making bold conjectures and framing a theory in the strongest possible terms and characteristically adopted an argumentative and polemical stance in staking out his positions.

Toned-Down Falsificationism

In his tone-downed version of falsificationism, Popper acknowledged that one can never know if a prediction fails because the underlying theory is false or because one of the auxiliary assumptions required to make the prediction is false, or even because of an error in measurement. But that acknowledgment, Popper insisted, does not refute falsificationism, because falsificationism is not a scientific theory about how scientists do science; it is a normative theory about how scientists ought to do science. The normative implication of falsificationism is that scientists should not try to shield their theories by making just-so adjustments in their theories through ad hoc auxiliary assumptions, e.g., ceteris paribus assumptions, to shield their theories from empirical disproof. Rather they should accept the falsification of their theories when confronted by observations that conflict with the implications of their theories and then formulate new and better theories to replace the old ones.

But a strict methodological rule against adjusting auxiliary assumptions or making further assumptions of an ad hoc nature would have ruled out many fruitful theoretical developments resulting from attempts to account for failed predictions. For example, the planet Neptune was discovered in 1846 by scientists who posited (ad hoc) the existence of another planet to explain why the planet Uranus did not follow its predicted path. Rather than conclude that the Newtonian theory was falsified by the failure of Uranus to follow the orbital path predicted by Newtonian theory, the French astronomer Urbain Le Verrier posited the existence of another planet that would account for the path actually followed by Uranus. Now in this case, it was possible to observe the predicted position of the new planet, and its discovery in the predicted location turned out to be a sensational confirmation of Newtonian theory.

Popper therefore admitted that making an ad hoc assumption in order to save a theory from refutation was permissible under his version of normative faslisificationism, but only if the ad hoc assumption was independently testable. But suppose that, under the circumstances, it would have been impossible to observe the existence of the predicted planet, at least with the observational tools then available, making the ad hoc assumption testable only in principle, but not in practice. Strictly adhering to Popper’s methodological requirement of being able to test independently any ad hoc assumption would have meant accepting the refutation of the Newtonian theory rather than positing the untestable — but true — ad hoc other-planet hypothesis to account for the failed prediction of the orbital path of Uranus.

My point is not that ad hoc assumptions to save a theory from falsification are ok, but to point out that a strict methodological rules requiring rejection of any theory once it appears to be contradicted by empirical evidence and prohibiting the use of any ad hoc assumption to save the theory unless the ad hoc assumption is independently testable might well lead to the wrong conclusion given the nuances and special circumstances associated with every case in which a theory seems to be contradicted by observed evidence. Such contradictions are rarely so blatant that theory cannot be reconciled with the evidence. Indeed, as Popper himself recognized, all observations are themselves understood and interpreted in the light of theoretical presumptions. It is only in extreme cases that evidence cannot be interpreted in a way that more or less conforms to the theory under consideration. At first blush, the Copernican heliocentric view of the world seemed obviously contradicted by direct sensory observation that earth seems flat and the sun rise and sets. Empirical refutation could be avoided only by providing an alternative interpretation of the sensory data that could be reconciled with the apparent — and obvious — flatness and stationarity of the earth and the movement of the sun and moon in the heavens.

So the problem with falsificationism as a normative theory is that it’s not obvious why a moderately good, but less than perfect, theory should be abandoned simply because it’s not perfect and suffers from occasional predictive failures. To be sure, if a better theory than the one under consideration is available, predicting correctly whenever the one under consideration predicts correctly and predicting more accurately than the one under consideration when the latter fails to predict correctly, the alternative theory is surely preferable, but that simply underscores the point that evaluating any theory in isolation is not very important. After all, every theory, being a simplification, is an imperfect representation of reality. It is only when two or more theories are available that scientists must try to determine which of them is preferable.

Oakeshott and the Poverty of Falsificationism

These problems with falsificationism were brought into clearer focus by Michael Oakeshott in his famous essay “Rationalism in Politics,” which though not directed at Popper himself (whose colleague at the London School of Economics he was) can be read as a critique of Popper’s attempt to prescribe methodological rules for scientists to follow in carrying out their research. Methodological rules of the kind propounded by Popper are precisely the sort of supposedly rational rules of practice intended to ensure the successful outcome of an undertaking that Oakeshott believed to be ill-advised and hopelessly naïve. The rationalist conceit in Oakesott’s view is that there are demonstrably correct answers to practical questions and that practical activity is rational only when it is based on demonstrably true moral or causal rules.

The entry on Michael Oakeshott in the Stanford Encyclopedia of Philosophy summarizes Oakeshott’s position as follows:

The error of Rationalism is to think that making decisions simply requires skill in the technique of applying rules or calculating consequences. In an early essay on this theme, Oakeshott distinguishes between “technical” and “traditional” knowledge. Technical knowledge is of facts or rules that can be easily learned and applied, even by those who are without experience or lack the relevant skills. Traditional knowledge, in contrast, means “knowing how” rather than “knowing that” (Ryle 1949). It is acquired by engaging in an activity and involves judgment in handling facts or rules (RP 12–17). The point is not that rules cannot be “applied” but rather that using them skillfully or prudently means going beyond the instructions they provide.

The idea that a scientist’s decision about when to abandon one theory and replace it with another can be reduced to the application of a Popperian falsificationist maxim ignores all the special circumstances and all the accumulated theoretical and practical knowledge that a truly expert scientist will bring to bear in studying and addressing such a problem. Here is how Oakeshott addresses the problem in his famous essay.

These two sorts of knowledge, then, distinguishable but inseparable, are the twin components of the knowledge involved in every human activity. In a practical art such as cookery, nobody supposes that the knowledge that belongs to the good cook is confined to what is or what may be written down in the cookery book: technique and what I have called practical knowledge combine to make skill in cookery wherever it exists. And the same is true of the fine arts, of painting, of music, of poetry: a high degree of technical knowledge, even where it is both subtle and ready, is one thing; the ability to create a work of art, the ability to compose something with real musical qualities, the ability to write a great sonnet, is another, and requires in addition to technique, this other sort of knowledge. Again these two sorts of knowledge are involved in any genuinely scientific activity. The natural scientist will certainly make use of observation and verification that belong to his technique, but these rules remain only one of the components of his knowledge; advances in scientific knowledge were never achieved merely by following the rules. . . .

Technical knowledge . . . is susceptible of formulation in rules, principles, directions, maxims – comprehensively, in propositions. It is possible to write down technical knowledge in a book. Consequently, it does not surprise us that when an artist writes about his art, he writes only about the technique of his art. This is so, not because he is ignorant of what may be called asesthetic element, or thinks it unimportant, but because what he has to say about that he has said already (if he is a painter) in his pictures, and he knows no other way of saying it. . . . And it may be observed that this character of being susceptible of precise formulation gives to technical knowledge at least the appearance of certainty: it appears to be possible to be certain about a technique. On the other hand, it is characteristic of practical knowledge that it is not susceptible of formulation of that kind. Its normal expression is in a customary or traditional way of doing things, or, simply, in practice. And this gives it the appearance of imprecision and consequently of uncertainty, of being a matter of opinion, of probability rather than truth. It is indeed knowledge that is expressed in taste or connoisseurship, lacking rigidity and ready for the impress of the mind of the learner. . . .

Technical knowledge, in short, an be both taught and learned in the simplest meanings of these words. On the other hand, practical knowledge can neither be taught nor learned, but only imparted and acquired. It exists only in practice, and the only way to acquire it is by apprenticeship to a master – not because the master can teach it (he cannot), but because it can be acquired only by continuous contact with one who is perpetually practicing it. In the arts and in natural science what normally happens is that the pupil, in being taught and in learning the technique from his master, discovers himself to have acquired also another sort of knowledge than merely technical knowledge, without it ever having been precisely imparted and often without being able to say precisely what it is. Thus a pianist acquires artistry as well as technique, a chess-player style and insight into the game as well as knowledge of the moves, and a scientist acquires (among other things) the sort of judgement which tells him when his technique is leading him astray and the connoisseurship which enables him to distinguish the profitable from the unprofitable directions to explore.

Now, as I understand it, Rationalism is the assertion that what I have called practical knowledge is not knowledge at all, the assertion that, properly speaking, there is no knowledge which is not technical knowledge. The Rationalist holds that the only element of knowledge involved in any human activity is technical knowledge and that what I have called practical knowledge is really only a sort of nescience which would be negligible if it were not positively mischievous. (Rationalism in Politics and Other Essays, pp. 12-16)

Almost three years ago, I attended the History of Economics Society meeting at Duke University at which Jeff Biddle of Michigan State University delivered his Presidential Address, “Statistical Inference in Economics 1920-1965: Changes in Meaning and Practice, published in the June 2017 issue of the Journal of the History of Economic Thought. The paper is a remarkable survey of the differing attitudes towards using formal probability theory as the basis for making empirical inferences from the data. The underlying assumptions of probability theory about the nature of the data were widely viewed as being too extreme to make probability theory an acceptable basis for empirical inferences from the data. However, the early negative attitudes toward accepting probability theory as the basis for making statistical inferences from data were gradually overcome (or disregarded). But as late as the 1960s, even though econometric techniques were becoming more widely accepted, a great deal of empirical work, including by some of the leading empirical economists of the time, avoided using the techniques of statistical inference to assess empirical data using regression analysis. Only in the 1970s was there a rapid sea-change in professional opinion that made statistical inference based on explicit probabilisitic assumptions about underlying data distributions the requisite technique for drawing empirical inferences from the analysis of economic data. In the final section of his paper, Biddle offers an explanation for this rapid change in professional attitude toward the use of probabilistic assumptions about data distributions as the required method of the empirical assessment of economic data.

By the 1970s, there was a broad consensus in the profession that inferential methods justified by probability theory—methods of producing estimates, of assessing the reliability of those estimates, and of testing hypotheses—were not only applicable to economic data, but were a necessary part of almost any attempt to generalize on the basis of economic data. . . .

This paper has been concerned with beliefs and practices of economists who wanted to use samples of statistical data as a basis for drawing conclusions about what was true, or probably true, in the world beyond the sample. In this setting, “mechanical objectivity” means employing a set of explicit and detailed rules and procedures to produce conclusions that are objective in the sense that if many different people took the same statistical information, and followed the same rules, they would come to exactly the same conclusions. The trustworthiness of the conclusion depends on the quality of the method. The classical theory of inference is a prime example of this sort of mechanical objectivity.

Porter [Trust in Numbers: The Pursuit of Objectivity in Science and Public Life] contrasts mechanical objectivity with an objectivity based on the “expert judgment” of those who analyze data. Expertise is acquired through a sanctioned training process, enhanced by experience, and displayed through a record of work meeting the approval of other experts. One’s faith in the analyst’s conclusions depends on one’s assessment of the quality of his disciplinary expertise and his commitment to the ideal of scientific objectivity. Elmer Working’s method of determining whether measured correlations represented true cause-and-effect relationships involved a good amount of expert judgment. So, too, did Gregg Lewis’s adjustments of the various estimates of the union/non-union wage gap, in light of problems with the data and peculiarities of the times and markets from which they came. Keynes and Persons pushed for a definition of statistical inference that incorporated space for the exercise of expert judgment; what Arthur Goldberger and Lawrence Klein referred to as ‘statistical inference’ had no explicit place for expert judgment.

Speaking in these terms, I would say that in the 1920s and 1930s, empirical economists explicitly acknowledged the need for expert judgment in making statistical inferences. At the same time, mechanical objectivity was valued—there are many examples of economists of that period employing rule-oriented, replicable procedures for drawing conclusions from economic data. The rejection of the classical theory of inference during this period was simply a rejection of one particular means for achieving mechanical objectivity. By the 1970s, however, this one type of mechanical objectivity had become an almost required part of the process of drawing conclusions from economic data, and was taught to every economics graduate student.

Porter emphasizes the tension between the desire for mechanically objective methods and the belief in the importance of expert judgment in interpreting statistical evidence. This tension can certainly be seen in economists’ writings on statistical inference throughout the twentieth century. However, it would be wrong to characterize what happened to statistical inference between the 1940s and the 1970s as a displace-ment of procedures requiring expert judgment by mechanically objective procedures. In the econometric textbooks published after 1960, explicit instruction on statistical inference was largely limited to instruction in the mechanically objective procedures of the classical theory of inference. It was understood, however, that expert judgment was still an important part of empirical economic analysis, particularly in the specification of the models to be estimated. But the disciplinary knowledge needed for this task was to be taught in other classes, using other textbooks.

And in practice, even after the statistical model had been chosen, the estimates and standard errors calculated, and the hypothesis tests conducted, there was still room to exercise a fair amount of judgment before drawing conclusions from the statistical results. Indeed, as Marcel Boumans (2015, pp. 84–85) emphasizes, no procedure for drawing conclusions from data, no matter how algorithmic or rule bound, can dispense entirely with the need for expert judgment. This fact, though largely unacknowledged in the post-1960s econometrics textbooks, would not be denied or decried by empirical economists of the 1970s or today.

This does not mean, however, that the widespread embrace of the classical theory of inference was simply a change in rhetoric. When application of classical inferential procedures became a necessary part of economists’ analyses of statistical data, the results of applying those procedures came to act as constraints on the set of claims that a researcher could credibly make to his peers on the basis of that data. For example, if a regression analysis of sample data yielded a large and positive partial correlation, but the correlation was not “statistically significant,” it would simply not be accepted as evidence that the “population” correlation was positive. If estimation of a statistical model produced a significant estimate of a relationship between two variables, but a statistical test led to rejection of an assumption required for the model to produce unbiased estimates, the evidence of a relationship would be heavily discounted.

So, as we consider the emergence of the post-1970s consensus on how to draw conclusions from samples of statistical data, there are arguably two things to be explained. First, how did it come about that using a mechanically objective procedure to generalize on the basis of statistical measures went from being a choice determined by the preferences of the analyst to a professional requirement, one that had real con-sequences for what economists would and would not assert on the basis of a body of statistical evidence? Second, why was it the classical theory of inference that became the required form of mechanical objectivity? . . .

Perhaps searching for an explanation that focuses on the classical theory of inference as a means of achieving mechanical objectivity emphasizes the wrong characteristic of that theory. In contrast to earlier forms of mechanical objectivity used by economists, such as standardized methods of time series decomposition employed since the 1920s, the classical theory of inference is derived from, and justified by, a body of formal mathematics with impeccable credentials: modern probability theory. During a period when the value placed on mathematical expression in economics was increasing, it may have been this feature of the classical theory of inference that increased its perceived value enough to overwhelm long-standing concerns that it was not applicable to economic data. In other words, maybe the chief causes of the profession’s embrace of the classical theory of inference are those that drove the broader mathematization of economics, and one should simply look to the literature that explores possible explanations for that phenomenon rather than seeking a special explanation of the embrace of the classical theory of inference.

I would suggest one more factor that might have made the classical theory of inference more attractive to economists in the 1950s and 1960s: the changing needs of pedagogy in graduate economics programs. As I have just argued, since the 1920s, economists have employed both judgment based on expertise and mechanically objective data-processing procedures when generalizing from economic data. One important difference between these two modes of analysis is how they are taught and learned. The classical theory of inference as used by economists can be taught to many students simultaneously as a set of rules and procedures, recorded in a textbook and applicable to “data” in general. This is in contrast to the judgment-based reasoning that combines knowledge of statistical methods with knowledge of the circumstances under which the particular data being analyzed were generated. This form of reasoning is harder to teach in a classroom or codify in a textbook, and is probably best taught using an apprenticeship model, such as that which ideally exists when an aspiring economist writes a thesis under the supervision of an experienced empirical researcher.

During the 1950s and 1960s, the ratio of PhD candidates to senior faculty in PhD-granting programs was increasing rapidly. One consequence of this, I suspect, was that experienced empirical economists had less time to devote to providing each interested student with individualized feedback on his attempts to analyze data, so that relatively more of a student’s training in empirical economics came in an econometrics classroom, using a book that taught statistical inference as the application of classical inference procedures. As training in empirical economics came more and more to be classroom training, competence in empirical economics came more and more to mean mastery of the mechanically objective techniques taught in the econometrics classroom, a competence displayed to others by application of those techniques. Less time in the training process being spent on judgment-based procedures for interpreting statistical results meant fewer researchers using such procedures, or looking for them when evaluating the work of others.

This process, if indeed it happened, would not explain why the classical theory of inference was the particular mechanically objective method that came to dominate classroom training in econometrics; for that, I would again point to the classical theory’s link to a general and mathematically formalistic theory. But it does help to explain why the application of mechanically objective procedures came to be regarded as a necessary means of determining the reliability of a set of statistical measures and the extent to which they provided evidence for assertions about reality. This conjecture fits in with a larger possibility that I believe is worth further exploration: that is, that the changing nature of graduate education in economics might sometimes be a cause as well as a consequence of changing research practices in economics. (pp. 167-70)

The correspondence between Biddle’s discussion of the change in the attitude of the economics profession about how inferences should be drawn from data about empirical relationships is strikingly similar to Oakeshott’s discussion and depressing in its implications for the decline of expert judgment by economics, expert judgment having been replaced by mechanical and technical knowledge that can be objectively summarized in the form of rules or tests for statistical significance, itself an entirely arbitrary convention lacking any logical, or self-evident, justification.

But my point is not to condemn using rules derived from classical probability theory to assess the significance of relationships statistically estimated from historical data, but to challenge the methodological prohibition against the kinds of expert judgments that many statistically knowledgeable economists like Nobel Prize winners such as Simon Kuznets, Milton Friedman, Theodore Schultz and Gary Becker routinely used to make in their empirical studies. As Biddle notes:

In 1957, Milton Friedman published his theory of the consumption function. Friedman certainly understood statistical theory and probability theory as well as anyone in the profession in the 1950s, and he used statistical theory to derive testable hypotheses from his economic model: hypotheses about the relationships between estimates of the marginal propensity to consume for different groups and from different types of data. But one will search his book almost in vain for applications of the classical methods of inference. Six years later, Friedman and Anna Schwartz published their Monetary History of the United States, a work packed with graphs and tables of statistical data, as well as numerous generalizations based on that data. But the book contains no classical hypothesis tests, no confidence intervals, no reports of statistical significance or insignificance, and only a handful of regressions. (p. 164)

Friedman’s work on the Monetary History is still regarded as authoritative. My own view is that much of the Monetary History was either wrong or misleading. But my quarrel with the Monetary History mainly pertains to the era in which the US was on the gold standard, inasmuch as Friedman simply did not understand how the gold standard worked, either in theory or in practice, as McCloskey and Zecher showed in two important papers (here and here). Also see my posts about the empirical mistakes in the Monetary History (here and here). But Friedman’s problem was bad monetary theory, not bad empirical technique.

Friedman’s theoretical misunderstandings have no relationship to the misguided prohibition against doing quantitative empirical research without obeying the arbitrary methodological requirement that statistical be derived in a way that measures the statistical significance of the estimated relationships. These methodological requirements have been adopted to support a self-defeating pretense to scientific rigor, necessitating the use of relatively advanced mathematical techniques to perform quantitative empirical research. The methodological requirements for measuring statistical relationships were never actually shown to be generate more accurate or reliable statistical results than those derived from the less technically advanced, but in some respects more economically sophisticated, techniques that have almost totally been displaced. One more example of the fallacy that there is but one technique of research that ensures the discovery of truth, a mistake even Popper was never guilty of.

Methodological Prescriptions Go from Bad to Worse

The methodological requirement for the use of formal tests of statistical significance before any quantitative statistical estimate could be credited was a prelude, though it would be a stretch to link them causally, to another and more insidious form of methodological tyrannizing: the insistence that any macroeconomic model be derived from explicit micro-foundations based on the solution of an intertemporal-optimization exercise. Of course, the idea that such a model was in any way micro-founded was a pretense, the solution being derived only through the fiction of a single representative agent, rendering the entire optimization exercise fundamentally illegitimate and the exact opposite of micro-founded model. Having already explained in previous posts why transforming microfoundations from a legitimate theoretical goal into methodological necessity has taken a generation of macroeconomists down a blind alley (here, here, here, and here) I will only make the further comment that this is yet another example of the danger of elevating technique over practice and substance.

Popper’s More Important Contribution

This post has largely concurred with the negative assessment of Popper’s work registered by Lemoine. But I wish to end on a positive note, because I have learned a great deal from Popper, and even if he is overrated as a philosopher of science, he undoubtedly deserves great credit for suggesting falsifiability as the criterion by which to distinguish between science and metaphysics. Even if that criterion does not hold up, or holds up only when qualified to a greater extent than Popper admitted, Popper made a hugely important contribution by demolishing the startling claim of the Logical Positivists who in the 1920s and 1930s argued that only statements that can be empirically verified through direct or indirect observation have meaning, all other statements being meaningless or nonsensical. That position itself now seems to verge on the nonsensical. But at the time many of the world’s leading philosophers, including Ludwig Wittgenstein, no less, seemed to accept that remarkable view.

Thus, Popper’s demarcation between science and metaphysics had a two-fold significance. First, that it is not verifiability, but falsifiability, that distinguishes science from metaphysics. That’s the contribution for which Popper is usually remembered now. But it was really the other aspect of his contribution that was more significant: that even metaphysical, non-scientific, statements can be meaningful. According to the Logical Positivists, unless you are talking about something that can be empirically verified, you are talking nonsense. In other words they were deliberately hoisting themselves on their petard, because their discussions about what is and what is not meaningful, being discussions about concepts, not empirically verifiable objects, were themselves – on the Positivists’ own criterion of meaning — meaningless and nonsensical.

Popper made the world safe for metaphysics, and the world is a better place as a result. Science is a wonderful enterprise, rewarding for its own sake and because it contributes to the well-being of many millions of human beings, though like many other human endeavors, it can also have unintended and unfortunate consequences. But metaphysics, because it was used as a term of abuse by the Positivists, is still, too often, used as an epithet. It shouldn’t be.

Certainly economists should aspire to tease out whatever empirical implications they can from their theories. But that doesn’t mean that an economic theory with no falsifiable implications is useless, a judgment whereby Mark Blaug declared general equilibrium theory to be unscientific and useless, a judgment that I don’t think has stood the test of time. And even if general equilibrium theory is simply metaphysical, my response would be: so what? It could still serve as a source of inspiration and insight to us in framing other theories that may have falsifiable implications. And even if, in its current form, a theory has no empirical content, there is always the possibility that, through further discussion, critical analysis and creative thought, empirically falsifiable implications may yet become apparent.

Falsifiability is certainly a good quality for a theory to have, but even an unfalsifiable theory may be worth paying attention to and worth thinking about.


Two Cheers (Well, Maybe Only One and a Half) for Falsificationism

Noah Smith recently wrote a defense (sort of) of falsificationism in response to Sean Carroll’s suggestion that the time has come for scientists to throw falisficationism overboard as a guide for scientific practice. While Noah isn’t ready to throw out falsification as a scientific ideal, he does acknowledge that not everything that scientists do is really falsifiable.

But, as Carroll himself seems to understand in arguing against falsificationism, even though a particular concept or entity may itself be unobservable (and thus unfalsifiable), the larger theory of which it is a part may still have implications that are falsifiable. This is the case in economics. A utility function or a preference ordering is not observable, but by imposing certain conditions on that utility function, one can derive some (weakly) testable implications. This is exactly what Karl Popper, who introduced and popularized the idea of falsificationism, meant when he said that the aim of science is to explain the known by the unknown. To posit an unobservable utility function or an unobservable string is not necessarily to engage in purely metaphysical speculation, but to do exactly what scientists have always done, to propose explanations that would somehow account for some problematic phenomenon that they had already observed. The explanations always (or at least frequently) involve positing something unobservable (e.g., gravitation) whose existence can only be indirectly perceived by comparing the implications (predictions) inferred from the existence of the unobservable entity with what we can actually observe. Here’s how Popper once put it:

Science is valued for its liberalizing influence as one of the greatest of the forces that make for human freedom.

According to the view of science which I am trying to defend here, this is due to the fact that scientists have dared (since Thales, Democritus, Plato’s Timaeus, and Aristarchus) to create myths, or conjectures, or theories, which are in striking contrast to the everyday world of common experience, yet able to explain some aspects of this world of common experience. Galileo pays homage to Aristarchus and Copernicus precisely because they dared to go beyond this known world of our senses: “I cannot,” he writes, “express strongly enough my unbounded admiration for the greatness of mind of these men who conceived [the heliocentric system] and held it to be true […], in violent opposition to the evidence of their own senses.” This is Galileo’s testimony to the liberalizing force of science. Such theories would be important even if they were no more than exercises for our imagination. But they are more than this, as can be seen from the fact that we submit them to severe tests by trying to deduce from them some of the regularities of the known world of common experience by trying to explain these regularities. And these attempts to explain the known by the unknown (as I have described them elsewhere) have immeasurably extended the realm of the known. They have added to the facts of our everyday world the invisible air, the antipodes, the circulation of the blood, the worlds of the telescope and the microscope, of electricity, and of tracer atoms showing us in detail the movements of matter within living bodies.  All these things are far from being mere instruments: they are witness to the intellectual conquest of our world by our minds.

So I think that Sean Carroll, rather than arguing against falisficationism, is really thinking of falsificationism in the broader terms that Popper himself laid out a long time ago. And I think that Noah’s shrug-ability suggestion is also, with appropriate adjustments for changes in expository style, entirely in the spirit of Popper’s view of falsificationism. But to make that point clear, one needs to understand what motivated Popper to propose falsifiability as a criterion for distinguishing between science and non-science. Popper’s aim was to overturn logical positivism, a philosophical doctrine associated with the group of eminent philosophers who made up what was known as the Vienna Circle in the 1920s and 1930s. Building on the British empiricist tradition in science and philosophy, the logical positivists argued that our knowledge of the external world is based on sensory experience, and that apart from the tautological truths of pure logic (of which mathematics is a part) there is no other knowledge. Furthermore, no meaning could be attached to any statement whose validity could not checked either by examining its logical validity as an inference from explicit premises or verified by sensory experience. According to this criterion, much of human discourse about ethics, morals, aesthetics, religion and much of philosophy was simply meaningless, aka metaphysics.

Popper, who grew up in Vienna and was on the periphery of the Vienna Circle, rejected the idea that logical tautologies and statements potentially verifiable by observation are the only conveyors of meaning between human beings. Metaphysical statements can be meaningful even if they can’t be confirmed by observation. Metaphysical statements are meaningful if they are coherent and are not nonsensical. If there is a problem with metaphysical statements, the problem is not necessarily because they have no meaning. In making this argument, Popper suggested an alternative criterion of demarcation to that between meaning and non-meaning: a criterion of demarcation between science and metaphysics. Science is indeed different from metaphysics, but the difference is not that science is meaningful and metaphysics is not. The difference is that scientific statements can be refuted (or falsified) by observations while metaphysical statements cannot be refuted by observations. As a matter of logic, the only way to refute a proposition by an observation is for the proposition to assert that the observation was not possible. Unless you can say what observation would refute what you are saying, you are engaging in metaphysical, not scientific, talk. This gave rise to Popper’s then very surprising result. If you positively assert the existence of something – an assertion potentially verifiable by observation, and hence for logical positivists the quintessential scientific statement — you are making a metaphysical, not a scientific, statement. The statement that something (e.g., God, a string, or a utility function) exists cannot be refuted by any observation. However the unobservable phenomenon may be part of a theory with implications that could be refuted by some observation. But in that case it would be the theory not the posited object that was refuted.

In fact, Popper thought that metaphysical statements not only could be meaningful, but could even be extremely useful, coining the term “metaphysical research programs,” because a metaphysical, unfalsifiable idea or theory could be the impetus for further research, possibly becoming scientifically fruitful in the way that evolutionary biology eventually sprang from the possibly unfalsifiable idea of survival of the fittest. That sounds to me pretty much like Noah’s idea of shrug-ability.

Popper was largely successful in overthrowing logical positivism, though whether it was entirely his doing (as he liked to claim) and whether it was fully overthrown are not so clear. One reason to think that it was not all his doing is that there is still a lot of confusion about what the falsification criterion actually means. Reading Noah Smith and Sean Carroll, I almost get the impression that they think the falsification criterion distinguishes not just between science and non-science but between meaning and non-meaning. Otherwise, why would anyone think that there is any problem with introducing an unfalsifiable concept into scientific discussion. When Popper argued that science should aim at proposing and testing falsifiable theories, he meant that one should not design a theory so that it can’t be tested, or adopt stratagems — ad hoc hypotheses — that serve only to account for otherwise falsifying observations. But if someone comes up with a creative new idea, and the idea can’t be tested, at least given the current observational technology, that is not a reason to reject the theory, especially if the new theory accounts for otherwise unexplained observations.

Another manifestation of Popper’s imperfect success in overthrowing logical positivism is that Paul Samuelson in his classic The Foundations of Economic Analysis chose to call the falsifiable implications of economic theory, meaningful theorems. By naming those implications “meaningful theorems,” Samuelson clearly was operating under the positivist presumption that only a proposition that could (at least in principle) be falsified by observation was meaningful. However, that formulation reflected an untenable compromise between Popper’s criterion for distinguishing science from metaphysics and the logical positivist criterion for distinguishing meaningful from meaningless statements. Instead of referring to meaningful theorems, Samuelson should have called them, more modestly, testable or scientific theorems.

So, at least as I read Popper, Noah Smith and Sean Carroll are only discovering what Popper already understood a long time ago.

At this point, some readers may be wondering why, having said all that, I seem to have trouble giving falisficationism (and Popper) even two cheers. So I am afraid that I will have to close this post on a somewhat critical note. The problem with Popper is that his rhetoric suggests that scientific methodology is a lot more important than it really is. Apart from some egregious examples like Marxism and Freudianism, which were deliberately formulated to exclude the possibility of refutation, there really aren’t that many theories entertained by scientists that can be ruled out of order on strictly methodological grounds. Popper can occasionally provide some methodological reminders to scientists to avoid relying on ad hoc theorizing — at least when a non-ad-hoc alternative is handy — but beyond that I don’t think methodology counts for very much in the day to day work of scientists. Many theories are difficult to falsify, but the difficulty is not necessarily the result of deliberate choices by the theorists, it is the result of the nature of the problem and the nature of the evidence that could potentially refute the theory. The evidence is what it is. It is nice to come up with a theory that predicts a novel fact that can be observed, but nature is not always so accommodating to our theories.

There is a kind of rationalistic (I am using “rationalistic” in the pejorative sense of Michael Oakeshott) faith that following the methodological rules that Popper worked so hard to formulate will guarantee scientific progress. Those rules tend to encourage an unrealistic focus on making theories testable (especially in economics) when by their nature the phenomena are too complex for theories to be formulated in ways that are susceptible to decisive testing. And although Popper recognized that empirical testing of a theory has very limited usefulness unless the theory is being compared to some alternative theory, too often discussions of theory testing are in the context of testing a single theory in isolation. Kuhn and others have pointed out that science is not routinely carried out in the way that Popper suggested it should be. To some extent, Popper acknowledged the truth of that observation, though he liked to cite examples from the history of science to illustrate his thesis, but argued that he was offering a normative, not a positive, theory of scientific discovery. But why should we assume that Popper had more insight into the process of discovery for particular sciences than the practitioners of those sciences actually doing the research? That is the nub of the criticism of Popper that I take away from Oakeshott’s work. Life and any form of endeavor involves the transmission of ways of doing things, traditions, that cannot be reduced to a set of rules, but require education, training, practice and experience. That’s what Kuhn called normal science. Normal science can go off the tracks too, but it is naïve to think that a list of methodological rules is what will keep science moving constantly in the right direction. Why should Popper’s rules necessarily trump the lessons that practitioners have absorbed from the scientific traditions in which they have been trained? I don’t believe that there is any surefire recipe for scientific progress.

Nevertheless, when I look at the way economics is now being practiced and taught, I can’t help but think that a dose of Popperianism might not be the worst thing that could be administered to modern economics. But that’s a discussion for another day.

Microfoundations (aka Macroeconomic Reductionism) Redux

In two recent blog posts (here and here), Simon Wren-Lewis wrote sensibly about microfoundations. Though triggered by Wren-Lewis’s posts, the following comments are not intended as criticisms of him, though I think he does give microfoundations (as they are now understood) too much credit. Rather, my criticism is aimed at the way microfoundations have come to be used to restrict the kind of macroeconomic explanations and models that are up for consideration among working macroeconomists. I have written about microfoundations before on this blog (here and here)  and some, if not most, of what I am going to say may be repetitive, but obviously the misconceptions associated with what Wren-Lewis calls the “microfoundations project” are not going to be dispelled by a couple of blog posts, so a little repetitiveness may not be such a bad thing. Jim Buchanan liked to quote the following passage from Herbert Spencer’s Data of Ethics:

Hence an amount of repetition which to some will probably appear tedious. I do not, however, much regret this almost unavoidable result; for only by varied iteration can alien conceptions be forced on reluctant minds.

When the idea of providing microfoundations for macroeconomics started to catch on in the late 1960s – and probably nowhere did they catch on sooner or with more enthusiasm than at UCLA – the idea resonated, because macroeconomics, which then mainly consisted of various versions of the Keynesian model, seemed to embody certain presumptions about how markets work that contradicted the presumptions of microeconomics about how markets work. In microeconomics, the primary mechanism for achieving equilibrium is the price (actually the relative price) of whatever good is being analyzed. A full (or general) microeconomic equilibrium involves a set of prices such that each of markets (whether for final outputs or for inputs into the productive process) are in equilibrium, equilibrium meaning that every agent is able to purchase or sell as much of any output or input as desired at the equilibrium price. The set of equilibrium prices not only achieves equilibrium, the equilibrium, under some conditions, has optimal properties, because each agent, in choosing how much to buy or sell of each output or input, is presumed to be acting in a way that is optimal given the preferences of the agent and the social constraints under which the agent operates. Those optimal properties don’t always follow from microeconomic presumptions, optimality being dependent on the particular assumptions (about preferences, production and exchange technology, and property rights) adopted by the analyst in modeling an individual market or an entire system of markets.

The problem with Keynesian macroeconomics was that it seemed to overlook, or ignore, or dismiss, or deny, the possibility that a price mechanism is operating — or could operate — to achieve equilibrium in the markets for goods and for labor services. In other words, the Keynesian model seemed to be saying that a macoreconomic equilibrium is compatible with the absence of market clearing, notwithstanding that the absence of market clearing had always been viewed as the defining characteristic of disequilibrium. Thus, from the perspective of microeconomic theory, if there is an excess supply of workers offering labor services, i.e., there are unemployed workers who would be willing to be employed at the same wage that currently employed workers are receiving, there ought to be market forces that would reduce wages to a level such that all workers willing to work at that wage could gain employment. Keynes, of course, had attempted to explain why workers could only reduce their nominal wages, not their real wages, and argued that nominal wage cuts would simply induce equivalent price reductions, leaving real wages and employment unchanged. The microeconomic reasoning on which that argument was based hinged on Keynes’s assumption that nominal wage cuts would trigger proportionate price cuts, but that assumption was not exactly convincing, if only because the percentage price cut would seem to depend not just on the percentage reduction in the nominal wage, but also on the labor intensity of the product, Keynes, habitually and inconsistently, arguing as if labor were the only factor of production while at the same time invoking the principle of diminishing marginal productivity.

At UCLA, the point of finding microfoundations was not to create a macroeconomics that would simply reflect the results and optimal properties of a full general equilibrium model. Indeed, what made UCLA approach to microeconomics distinctive was that it aimed at deriving testable implications from relaxing the usual informational and institutional assumptions (full information, zero transactions costs, fully defined and enforceable property rights) underlying conventional microeconomic theory. If the way forward in microeconomics was to move away from the extreme assumptions underlying the perfectly competitive model, then it seemed plausible that relaxing those assumptions would be fruitful in macroeconomics as well. That led Armen Alchian and others at UCLA to think of unemployment as largely a search phenomenon. For a while that approach seemed promising, and to some extent the promise was fulfilled, but many implications of a purely search-theoretic approach to unemployment don’t seem to be that well supported empirically. For example, search models suggest that in recessions, quits increase, and that workers become more likely to refuse offers of employment after the downturn than before. Neither of those implications seems to be true. A search model would suggest that workers are unemployed because they are refusing offers below their reservation wage, but in fact most workers are becoming unemployed because they are being laid off, and in recessions workers seem likely to accept offers of employment at the same wage that other workers are getting. Now it is possible to reinterpret workers’ behavior in recessions in a way that corresponds to the search-theoretic model, but the reinterpretation seems a bit of a stretch.

Even though he was an early exponent of the search theory of unemployment, Alchian greatly admired and frequently cited a 1974 paper by Donald Gordon “A Neoclassical Theory of Keynesian Unemployment,” which proposed an implicit-contract theory of employer-employee relationship. The idea was that workers make long-term commitments to their employers, and realizing their vulnerability, after having committed themselves to their employer, to exploitation by a unilateral wage cut imposed by the employer under threat of termination, expect some assurance from their employer that they will not be subjected to a unilateral demand to accept a wage cut. Such implicit understandings make it very difficult for employers, facing a reduction in demand, to force workers to accept a wage cut, because doing so would make it hard for the employer to retain those workers that are most highly valued and to attract new workers.

Gordon’s theory of implicit wage contracts has a certain similarity to Dennis Carlton’s explanation of why many suppliers don’t immediately raise prices to their steady customers. Like Gordon, Carlton posits the existence of implicit and sometimes explicit contracts in which customers commit to purchase minimum quantities or to purchase their “requirements” from a particular supplier. In return for the assurance of having a regular customer on whom the supplier can count, the supplier gives the customer assurance that he will receive his customary supply at the agreed upon price even if market conditions should change. Rather than raise the price in the event of a shortage, the supplier may feel that he is obligated to continue supplying his regular customers at the customary price, while raising the price to new or occasional customers to “market-clearing” levels. For certain kinds of supply relationships in which customer and supplier expect to continue transacting regularly over a long period of time, price is not the sole method by which allocation decisions are made.

Klein, Crawford and Alchian discussed a similar idea in their 1978 article about vertical integration as a means of avoiding or mitigating the threat of holdup when a supplier and a customer must invest in some sunk asset, e.g., a pipeline connection, for the supply relationship to be possible. The sunk investment implies that either party, under the right circumstances, could threaten to holdup the other party by threatening to withdraw from the relationship leaving the other party stuck with a useless fixed asset. Vertical integration avoids the problem by aligning the incentives of the two parties, eliminating the potential for holdup. Price rigidity can thus be viewed as a milder form of vertical integration in cases where transactors have a relatively long-term relationship and want to assure each other that they will not be taken advantage of after making a commitment (i.e., foregoing other trading opportunities) to the other party.

The search model is fairly easy to incorporate into a standard framework because search can be treated as a form of self-employment that is an alternative to accepting employment. The shape and position of the individual’s supply curve reflects his expectations about future wage offers that he will receive if he chooses not to accept employment in the current period. The more optimistic the worker’s expectation of future wages, the higher the worker’s reservation wage in the current period. The more certain the worker feels about the expected future wage, the more elastic is his supply curve in the neighborhood of the expected wage. Thus, despite its empirical shortcomings, the search model could serve as a convenient heuristic device for modeling cyclical increases in unemployment because of the unwillingness of workers to accept nominal wage cuts. From a macroeconomic modeling perspective, the incorrect or incomplete representation of the reason for the unwillingness of workers to accept wage cuts may be less important than the overall implication of the model, which is that unanticipated aggregate demand shocks can have significant and persistent effects on real output and employment. For example in his reformulation of macroeconomic theory, Earl Thompson, though he was certainly aware of Donald Gordon’s paper, relied exclusively on a search-theoretic rationale for Keynesian unemployment, and I don’t know (or can’t remember) if he had a specific objection to Gordon’s model or simply preferred to use the search-theoretic approach for pragmatic modeling reasons.

At any rate, these comments about the role of search models in modeling unemployment decisions are meant to illustrate why microfoundations could be useful for macroeconomics: by adding to the empirical content of macromodels, providing insight into the decisions or circumstances that lead workers to accept or reject employment in the aftermath of aggregate demand shocks, or why employers impose layoffs on workers rather than offer employment at reduced wages. The spectrum of such microeconomic theories of employer-employee relationships have provided us with a richer understanding of what the term “sticky wages” might actually be referring to, beyond the existence of minimum wage laws or collective bargaining contracts specifying nominal wages over a period of time for all covered employees.

In this context microfoundations meant providing a more theoretically satisfying, more micreconomically grounded explanation for a phenomenon – “sticky wages” – that seemed somehow crucial for generating the results of the Keynesian model. I don’t think that anyone would question that microfoundations in this narrow sense has been an important and useful area of research. And it is not microfoundations in this sense that is controversial. The sense in which microfoundations is controversial is whether a macroeconomic model must show that aggregate quantities that it generates can be shown to consistent with the optimizing choices of all agents in the model. In other words, the equilibrium solution of a macroeconomic model must be such that all agents are optimizing intertemporally, subject to whatever informational imperfections are specified by the model. If the model is not derived from or consistent with the solution to such an intertemporal optimization problem, the macromodel is now considered inadequate and unworthy of consideration. Here’s how Michael Woodford, a superb economist, but very much part of the stifling microfoundations consensus that has overtaken macroeconomics, put in his paper “The Convergence in Macroeconomics: Elements of the New Synthesis.”

But it is now accepted that one should know how to render one’s growth model and one’s business-cycle model consistent with one another in principle, on those occasions when it is necessary to make such connections. Similarly, microeconomic and macroeconomic analysis are no longer considered to involve fundamentally different principles, so that it should be possible to reconcile one’s views about household or firm behavior, or one’s view of the functioning of individual markets, with one’s model of the aggregate economy, when one needs to do so.

In this respect, the methodological stance of the New Classical school and the real business cycle theorists has become the mainstream. But this does not mean that the Keynesian goal of structural modeling of short-run aggregate dynamics has been abandoned. Instead, it is now understood how one can construct and analyze dynamic general-equilibrium models that incorporate a variety of types of adjustment frictions, that allow these models to provide fairly realistic representations of both shorter-run and longer-run responses to economic disturbances. In important respects, such models remain direct descendants of the Keynesian macroeconometric models of the early postwar period, though an important part of their DNA comes from neoclassical growth models as well.

Woodford argues that by incorporating various imperfections into their general equilibrium models, e.g.., imperfectly competitive output and labor markets, lags in the adjustment of wages and prices to changes in market conditions, search and matching frictions, it is possible to reconcile the existence of underutilized resources with intertemporal optimization by agents.

The insistence of monetarists, New Classicals, and early real business cycle theorists on the empirical relevance of models of perfect competitive equilibrium — a source of much controversy in past decades — is not what has now come to be generally accepted. Instead, what is important is having general-equilibrium models in the broad sense of requiring that all equations of the model be derived from mutually consistent foundations, and that the specified behavior of each economic unit make sense given the environment created by the behavior of the others. At one time, Walrasian competitive equilibrium models were the only kind of models with these features that were well understood; but this is no longer the case.

Woodford shows no recognition of the possibility of multiple equilibria, or that the evolution of an economic system and time-series data may be path-dependent, making the long-run neutrality propositions characterizing most DSGE models untenable. If the world – the data generating mechanism – is not like the world assumed by modern macroeconomics, the estimates derived from econometric models reflecting the worldview of modern macroeconomics will be inferior to estimates derived from an econometric model reflecting another, more accurate, world view. For example, if there are many possible equilibria depending on changes in expectational parameters or on the accidental deviations from an equilibrium time path, the idea of intertemporal optimization may not even be meaningful. Rather than optimize, agents may simply follow certain simple rules of thumb. But, on methodological principle, modern macroeconomics treats the estimates generated by any alternative econometric model insufficiently grounded in the microeconomic principles of intertemporal optimization as illegitimate.

Even worse from the perspective of microfoundations are the implications of something called the Sonnenchein-Mantel-Debreu Theorem, which, as I imperfectly understand it, says something like the following. Even granting the usual assumptions of the standard general equilibrium model — continuous individual demand and supply functions, homogeneity of degree zero in prices, Walras’s Law, and suitable boundary conditions on demand and supply functions, there is no guarantee that there is a unique stable equilibrium for such an economy. Thus, even apart from the dependence of equilibrium on expectations, there is no rationally expected equilibrium because there is no unique equilibrium to serve as an attractor for expectations. Thus, as I have pointed out before, as much as macroeconomics may require microfoundations, microeconomics requires macrofoundations, perhaps even more so.

Now let us compare the methodological demand for microfoundations for macroeconomics, which I would describe as a kind of macroeconomic methodological reductionism, with the reductionism of Newtonian physics. Newtonian physics reduced the Keplerian laws of planetary motion to more fundamental principles of gravitation governing the motion of all bodies celestial and terrestrial. In so doing, Newtonian physics achieved an astounding increase in explanatory power and empirical scope. What has the methodological reductionism of modern macroeconomics achieved? Reductionsim was not the source, but the result, of scientific progress. But as Carlaw and Lipsey demonstrated recently in an important paper, methodological reductionism in macroeconomics has resulted in a clear retrogression in empirical and explanatory power. Thus, methodological reductionism in macroeconomics is an antiscientific exercise in methodological authoritarianism.

About Me

David Glasner
Washington, DC

I am an economist in the Washington DC area. My research and writing has been mostly on monetary economics and policy and the history of economics. In my book Free Banking and Monetary Reform, I argued for a non-Monetarist non-Keynesian approach to monetary policy, based on a theory of a competitive supply of money. Over the years, I have become increasingly impressed by the similarities between my approach and that of R. G. Hawtrey and hope to bring Hawtrey's unduly neglected contributions to the attention of a wider audience.


Enter your email address to follow this blog and receive notifications of new posts by email.

Join 2,161 other followers

Follow Uneasy Money on