Ideas, Topics, and Mistakes

A lot of the posts this week seem to be concerned with the actual methods employed by topic modeling programs, whether it is perplexity at the amount of statistics one must learn in order to use it, or the scarcity of funds to hire a CS major who is actually competent in Java. These concerns seem to me well-founded—so, to avoid beating the horse to death, I’d like to focus on the (vast?) possibility of human error.

Since we’re dealing with text mining, and a relatively small subset of all texts are “literary,” and since this course is titled Digital Humanities (rather than Digital English), I wonder what these methods can tell us about the history of ideas? I decided to plug ideas “capitalism” and “communism” (I know, try to contain your shock) into nGram and see what I could come up with:

Image

So communism was slightly more widely discussed from the inception of the term until the lines cross in 1903; however, neither word was all that common. Fast forward until 1925, when the usage of “capitalism” begins to increase exponentially. Again, not very surprising; it doesn’t boggle my mind that, as America begins to enter the Great Depression, economics becomes a more commonly discussed topic. I’m not sure how to explain capitalism’s sharp dip between 1936 and 1939, but the rest seems to confirm (what I imagine to be) the standard narrative: communism peaks in 1968 before slinking off into the shadows, capitalism rises to a fever pitch by the 1980s, and, after the end of history, the two ideas flatline in comfortable stagnation. Brilliant.

But then again, communism and capitalism are global (?) phenomenon, and their life as ideas is not confined to one language. What would happen if I did the same nGram search in the French corpus (for several reasons: 1) My enthusiasm for studying Casablanca above all the other films in Steve’s class has made me realize that, if TJ is our program’s resident Anglophile, I must certainly be the resident Francophile. 2) I feel that, of all the NATO countries, communism experienced its most vibrant political life in France. 3) In French, I just have to add an “e” to the end of the words, and I don’t feel like looking up the translations in Russian or Chinese)?

Image

Some interesting trends emerge. Communism attained quite a bit of popularity around the time Marx and Engels published the Communist Manifesto. From 1900-1960, there seemed to be a genuine struggle between the two ideas; there are even a few periods when communism seemed to be winning. Capitalism peaked in 1976 before falling rapidly, and the gap appears quite small by 2000. What do these two graphs tell us?

Image

Well that’s a bit surprising. From 1940 onwards, the use of capitalisme in French absolutely dwarfs capitalism in English. The overall discussion of economics seems to be far more frequent in the French corpus—at the close of the century, communisme in French was just about as commonly discussed as capitalism was in English.

So is nGram actually a useful tool for generating new knowledge? Perhaps. But a lot (most?) of this new knowledge might be attributable to the fact that I’m much more familiar with the history of these two ideas in English, and the results might seem quite mundane to a French scholar. Aside from the fact that there will always be someone who knows more than me, I realize that the process through which I decided to pursue these ideas’ history in French was entirely subjective (Francophilia, vague grasp of history, laziness). Of course such subjectivity is inescapable in our field; nevertheless, I’m worried by the severity of errors which could result from mixing this approach with quantitative research.

This seems especially apparent in topic modeling. This is where I’d really like to include some more graphs, but your imagination will have to suffice. What patterns might arise if we use the topic [labor, surplus, exploitation, value, commodity, class, means] to stand for communism? How would this differ from [struggle, revolution, vanguard, party, socialism, freedom, west]? Or [hegemony, coercive, structure, base, dialectic, philosophy, critical, negative]? Or all of them together. It will take someone smarter than me to figure this out. I guess the point of this rambling post is that I remain unconvinced that quantitative methods will necessarily produce more reliable information about history (literary or otherwise).

Advertisements

3 thoughts on “Ideas, Topics, and Mistakes

  1. Pingback: Week 7: Crunchy | "Digital Humanities": Emerging Tools and Debates in Literary Study

  2. I suppose this is where Ryan Heuser and Long Le-Khac would argue that when you continue to “square [your] hunch with additional data,” you can come to more solid conclusions from what preliminary data would allow (84). Perhaps the point with topic modeling is that you will eventually reach a point where you are beyond confirming what you already know and what already lines up with your preconceived expectations.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s