Ideas, Topics, and Mistakes

A lot of the posts this week seem to be concerned with the actual methods employed by topic modeling programs, whether it is perplexity at the amount of statistics one must learn in order to use it, or the scarcity of funds to hire a CS major who is actually competent in Java. These concerns seem to me well-founded—so, to avoid beating the horse to death, I’d like to focus on the (vast?) possibility of human error.

Since we’re dealing with text mining, and a relatively small subset of all texts are “literary,” and since this course is titled Digital Humanities (rather than Digital English), I wonder what these methods can tell us about the history of ideas? I decided to plug ideas “capitalism” and “communism” (I know, try to contain your shock) into nGram and see what I could come up with:


So communism was slightly more widely discussed from the inception of the term until the lines cross in 1903; however, neither word was all that common. Fast forward until 1925, when the usage of “capitalism” begins to increase exponentially. Again, not very surprising; it doesn’t boggle my mind that, as America begins to enter the Great Depression, economics becomes a more commonly discussed topic. I’m not sure how to explain capitalism’s sharp dip between 1936 and 1939, but the rest seems to confirm (what I imagine to be) the standard narrative: communism peaks in 1968 before slinking off into the shadows, capitalism rises to a fever pitch by the 1980s, and, after the end of history, the two ideas flatline in comfortable stagnation. Brilliant.

But then again, communism and capitalism are global (?) phenomenon, and their life as ideas is not confined to one language. What would happen if I did the same nGram search in the French corpus (for several reasons: 1) My enthusiasm for studying Casablanca above all the other films in Steve’s class has made me realize that, if TJ is our program’s resident Anglophile, I must certainly be the resident Francophile. 2) I feel that, of all the NATO countries, communism experienced its most vibrant political life in France. 3) In French, I just have to add an “e” to the end of the words, and I don’t feel like looking up the translations in Russian or Chinese)?


Some interesting trends emerge. Communism attained quite a bit of popularity around the time Marx and Engels published the Communist Manifesto. From 1900-1960, there seemed to be a genuine struggle between the two ideas; there are even a few periods when communism seemed to be winning. Capitalism peaked in 1976 before falling rapidly, and the gap appears quite small by 2000. What do these two graphs tell us?


Well that’s a bit surprising. From 1940 onwards, the use of capitalisme in French absolutely dwarfs capitalism in English. The overall discussion of economics seems to be far more frequent in the French corpus—at the close of the century, communisme in French was just about as commonly discussed as capitalism was in English.

So is nGram actually a useful tool for generating new knowledge? Perhaps. But a lot (most?) of this new knowledge might be attributable to the fact that I’m much more familiar with the history of these two ideas in English, and the results might seem quite mundane to a French scholar. Aside from the fact that there will always be someone who knows more than me, I realize that the process through which I decided to pursue these ideas’ history in French was entirely subjective (Francophilia, vague grasp of history, laziness). Of course such subjectivity is inescapable in our field; nevertheless, I’m worried by the severity of errors which could result from mixing this approach with quantitative research.

This seems especially apparent in topic modeling. This is where I’d really like to include some more graphs, but your imagination will have to suffice. What patterns might arise if we use the topic [labor, surplus, exploitation, value, commodity, class, means] to stand for communism? How would this differ from [struggle, revolution, vanguard, party, socialism, freedom, west]? Or [hegemony, coercive, structure, base, dialectic, philosophy, critical, negative]? Or all of them together. It will take someone smarter than me to figure this out. I guess the point of this rambling post is that I remain unconvinced that quantitative methods will necessarily produce more reliable information about history (literary or otherwise).


Algorithmic Theory

Success in small matters. Perseverance furthers. At the beginning good fortune, at the end disorder. Water over fire: the image of the condition in AFTER COMPLETION. Thus the superior man takes thought of misfortune and arms himself against it in advance.

  1. Nine at the beginning means: He breaks his wheels. He gets his tail in the water. No blame.
  2. Six in the second place means: The woman loses the curtain of her carriage. Do not run after it; on the seventh day you will get it.
  3. Nine in the third place means: The Illustrious Ancestor disciplines the Devil’s Country.  After three years he conquers it. Inferior people must not be employed.
  4. Six in the fourth place means: The finest clothes turn to rags. Be careful all day long.
  5. Nine in the fifth place means: The neighbor in the east who slaughters an ox does not attain as much real happiness as the neighbor in the west with his small offering.
  6. Six at the top means: He gets his head in the water. Danger.

I was intrigued by Ramsay’s discussion of how the I Ching dissolves “boundaries between creation and interpretation,” generating a worldview “liberated from the suspicion that subjectivity compromises meaning” (45). Instead of one contiguous texts, think of the I Ching as a set of 4096 texts which only become available through an algorithmic deformation. The random number generating process (completed with coins, yarrow stalks, or a program) determines whether each line of the hexagram is broken/changeable, broken/unchangeable, unbroken/changeable, or unbroken/unchangeable. This process is repeated six times for a total of 4^6, or 4096 possible outcomes.

Ramsay uses the I Ching as an example of a text which can only be interpreted after it is deformed. Interpretation is inherently subjective because it requires the critic to choose one or more meanings from the set of all possible meanings contained within a text; some meanings are always left out. The I Ching sidesteps this dilemma by containing only one meaning: an algorithmic deformation will “determine the auspiciousness or inauspiciousness of a course of action and [give] some sense of how that course is likely to unfold” (38) in accordance to the readers ability to interpret it. The nonsensical nature of these deformations when taken at face value subverts the temptation to assume that the text itself contains some inner meaning which the reader must uncover (an assumption frequently made about horoscopes, for example); objective meaning only exists to the extent that the reader can subjectively arrive at it.

If I understand Ramsay correctly, he’s arguing in favor of developing algorithms which can strategically deform a text in order to enable readings which would have been otherwise impossible. In contrast to McGann’s narcissistic Ivanhoe Game, these deformations would be objectively produced through a pre-determined process. I wonder, however, if there might be some other way to merge the subjectivity of deformation with the objectivity of algorithms.

I wonder what it would look like to read a text based on an algorithmically determined theory. Instead of giving a text a feminist reading, for example, or a Marxist reading, one would employ some sort of algorithm to think up an entirely new theory and attempt to apply it to the text. After all, the I Ching is supposed to provide a theory for dealing with a specific aspect of everyday life. Put another way: what sort of reading of, say, “The Road Not Taken” might we come up with if you used the I Ching hexagram 63 (quoted above) as our theoretical text? What if we used hexagram xx (quoted below)? There are only so many post-colonial readings of Heart of Darkness–at some point they will all have been written. Algorithmic theories, if properly engineered, can be infinite.

In adversity it furthers one to be persevering. The light has sunk into the earth: the image of DARKENING OF THE LIGHT. Thus does the superior man live with the great mass: he veils his light, yet still shines.

  1. Nine at the beginning means: Darkening of the light during flight. He lowers his wings. The superior man does not eat for three days on his wanderings. But he has somewhere to go. The host has occasion to gossip about him.
  2. Six in the second place means: Darkening of the light injures him in the left thigh. He gives aid with the strength of a horse. Good fortune.
  3. Nine in the third place means: Darkening of the light during the hunt in the south. Their great leader is captured. One must not expect perseverance too soon.
  4. Six in the fourth place means: He penetrates the left side of the belly. One gets at the very heart of the darkening of the light, and leaves gate and courtyard.
  5. Six in the fifth place means: Darkening of the light as with Prince Chi. Perseverance furthers.
  6. Six at the top means: Not light but darkness. First he climbed up to heaven, then plunged into the depths of the earth.

(All quotations from the I Ching adapted from the Richard Wilhelm translation, chosen at random)