Update on Visualizing 2001: A Space Odyssey

One of my major goals in this project is to streamline the process of creating a visualization by freeing myself from ImageJ. I’m currently working on a Python script which nominally accomplishes this goal, and I figure this is as good a time as any to give you all an update.

My shotorg script makes use of several free software tools. Perhaps the most essential is shotdetect, a program written by Google software engineer Johan Mathe that computes the time at which each shot begins and exports the data to an xml file. From there it’s just a matter of using the Beautiful Soup Python library to read the data (specifically the millisecond at which each shot begins and how many milliseconds it lasts) and feed arguments to ffmpeg, which will rip the specified frames from the film and place them in their own directory.

Running the organize function on 2001: A Space Odyssey according to my current settings creates 596 folders and 27570 png files totaling 133.9 MB. This is the point in the process when I would normally turn to ImageJ; however, I’ve found that ImageMagick is able to do almost everything I want faster, simpler, and more reliably than ImageJ. I can also run ImageMagick from the command line, which makes it much easier to integrate into my Python script. The two visualization functions I’ve written so far, visualize and loop, transform each shot into a 1-row “film strip” montage and an animated gif, respectively. I wanted to make the visualize function go even further, composing each film strip into a larger montage of each shot in the film arranged chronologically from top to bottom, but doing so causes ImageMagick to run out of memory long before completing its task. Luckily I was able to run the function on several chunks of the film and finish the composition in GIMP.

This visualization’s original version was 112.1 MB with a resolution of 33000×21204, and each individual frame was only 80×36! I think the key element that sets a visualization like this apart from a bar chart (which it certainly resembles) is its interactivity. You can zoom in on any point in the graph until an individual frame fills your entire screen. However, this feature is for the most part neutralized when the image is resized for the web, and my computer can barely handle 100 MB images when I view them locally!

I’ll write more about how useful I think this process is in my next post. Meanwhile, if you’re interested in duplicating my results, feel free to download my horrifically crude and inelegant Python script and customize it for your computer. Creating visualizations for single scenes is super easy, just follow these steps:

ffmpeg -ss (start time) -t (duration of clip) -i (yourvideo) -r (frame rate) -s (size of ripped frames) -f image2 out%03d.png

This will start ripping frames from your video file at a specified point, naming them out000.png, out002.png, etc. (change %03d for more initial zeros). For example, entering

ffmpeg -ss 01:23:45 -t 00:00:05 -i /home/user/videos/my_video.mp4 -r 12 -s 1920x1080 -f image2 out%02d.png

will result start ripping 1920×1080 resolution frames from my_video.mp4 starting at 1:23:45 for 5 seconds, saving them as out00.png to out60.png. Now we can make a montage:

montage *.png -tile (colxrow) -geometry +0+0 -background black out.png

or a gif:

convert -delay 15 -loop 0 *.png out.gif

2001: A Space Odyssey as an Animated Barcode

2so reslice

Here you see a “movie barcode” created from 600 186×84 thumbnails which represent the first frame of every shot in 2001: A Space Odyssey. I originally stumbled upon this monstrosity by randomly playing around with some of ImageJ’s built-in functions. This particular one is called “Reslice;” it transforms a stack of images into something like the above, with the option of exporting it as a single frame or an animated gif. At first, I was genuinely baffled by what I had created—now I think I can explain it as follows.

The gif is 600 pixels wide, so it seems like a reasonable assumption that each 1px column represents a frame from the original film. The gif’s height is 186px, which corresponds to the width of the original thumbnails. And finally, the animation has 84 frames, which corresponds to height of the original thumbnails.

So effectively what we’re seeing is a series of 1px-wide slices of the film arranged chronologically from left to right. Over time, we see 84 lengthwise slices of each frame, cutting (I think) from top to bottom. Think of it as a throbbing film-strip style montage. I think this is an interesting way of reducing frames to sheer color, almost as if I were making 84 different color palettes per frame and juxtaposing them in time. Not sure how useful this will be as an analytic tool, but you have to admit it’s damn mesmerizing!

Eyes Wide Shut Graph


This is my first attempt at mapping frames onto an x-y axis. In this case, the x axis is median hue while the y axis is median saturation; all values were determined by Software Studies’s ImagePlot macro for ImageJ. I think the main limitation of this method is the sheer quantity of images I’m graphing. As you can see from my earlier post, the vast majority of frames in Eyes Wide Shut seem to be composed of very warm yellow colors; in this visualization, such frames would appear on the far left of the graph. However, since ImageJ processes the images in numerical order, chronologically early frame are often covered over by later ones. If you look closely, for instance, you can see that every frame from the ballroom scene has been obscured by later scenes (mostly the prostitute scene and the bedroom argument). The solution is obviously fewer images, but I’m not sure if I should simply decimate them or try to pick out representation frames from each shot.

Aside from that, I feel like this initial result is really promising. The graph really lets you visualize the extreme distance between Bill’s domestic life and his nocturnal wanderings, epitomized by the orgy scene. You can actually see the progression starting at the warm yellows of incandescent light bulbs and soft Christmas lights which illuminate Bill’s apartment. Next, the deep blues which characterize the scenes which most unsettle Bill and drive him into the New York underworld: Nick Nightingale’s description of his next job, Alice’s description of her “nightmare,” and Bill’s fantasy of his wife with the naval officer. Almost all of the reds occur in the costume store (although oddly juxtaposed with the final scene at the toy store), which serve as the gateway to voluptuous pinks and violets. It’s interesting to note that, at the bottom of the saturation spectrum, Bill continues to go about his normal life—scenes from the office, from the hospital, walking home, etc.—completely oblivious of the turmoil above.

I’m excited to see how the rest of the visualizations come out. Obviously there is still tweaking to be done—I’m not sure what to make of the three or four vertical lines—and I’d like to be able to represent individual scenes in some way. Progress is slow, as each visualization takes close to an hour to complete. Nevertheless, we’ll soldier on; watch this space for further updates!

Conceiving of Kubrick, Quantifying Color

Because it was often not practical to collect data about the whole population, the idea of sampling was the foundation of 20th century applications of statistics.

In some application [sic] of media visualization, we face the same limitations. For instance, in our visualizations of Kingdom Hearts we sampled the complete videos of game play using a systematic sampling method. Ideally, if imageJ software was capable of creating a montage using all of video frames, we would not have to do this.

In preparation for this post, I attempted to use the imageJ software to create a montage using all of a video’s frames—specifically, those of the 1968 film 2001: A Space Odyssey. I used a program called Avidemux to extract each frame of the film and save it as a jpg, resulting in a total of 209,457 files (10.5 GB). Unfortunately, imageJ kept running out of memory before completing the montage creation process (what else would you expect from a program written in Java?). I’ve tweaked the settings a bit so that the montage only contains 1 out of every 100 frames at a fraction of their original size.  You can view the resulting image below:


I chose 2001 for several reasons. Manovich’s work seems to focus almost exclusively on Vertoz’s films, especially on the length of their shots. Vertoz is of course well known for his rapid-fire juxtaposition of short shots, a trait shared with several other early Soviet directors. I was curious to see how Manovich’s visualization methods would look when applied to a film generally known for its longer shots. I also wanted to see a montage of a color film, especially one such as 2001 which uses color in such interesting ways.

Several patterns emerge, but I’m not sure how interesting they are. The film begins and ends (“The Dawn of Man” and “Jupiter and Beyond the Infinite”) with a pure black screen accompanied by György Ligeti’s soundtrack (there is also an intermission which follows this same formate. Colorful scenes similarly bookend the film, with drab blues and grays characterizing the two middle acts (“TMA-1” and “Jupiter Mission”).  Individual scenes seem to be relatively monochromatic—orange for primordial earth, white on the space station, red in HAL’s processor core, and blue in the mysterious apartment.

Undoubtedly, I’d have to spend a lot more time tweaking the visualizations in order to really stumble upon anything interesting, but the possibilities intrigue me. Joe’s post argues that the most compelling element of Manovich’s project is the visualizations of entire shots “averaged” into a single image. This allows us to analyze the degree of camera movement within individual shots, along with the movement vectors of objects being filmed. I feel like analysis of color could be another incredibly productive use of Manovich’s techniques. In part 7 of his “Visualizing Vertov” project, Manovich graphs Man with a Movie Camera‘s shots according to their grey scale x number of shapes. What if we graphed all the shots in 2001 according to their averaged hue x saturation? Over the course of random Googling in preparation for this source, I came across this site which features a “Movie Palette” of the film’s most commonly used colors. Although I’m not sure how the author arrived at this data, it seems like something which could be relevant to a Manovichian project.

To summarize in a single sentence (and provide another possible subtitle for our class): this stuff seems incredibly useful, but I have no idea how to use it. I’m still suspicious of quantitative analysis of words because their meaning is so thoroughly subjective, but things like color, camera movement, length of shots, and shapes feel a lot more like objective quantifiable data. I think further inquiry into this sub-field of DH has the potential to yield some really interesting and meaningful results.

Ideas, Topics, and Mistakes

A lot of the posts this week seem to be concerned with the actual methods employed by topic modeling programs, whether it is perplexity at the amount of statistics one must learn in order to use it, or the scarcity of funds to hire a CS major who is actually competent in Java. These concerns seem to me well-founded—so, to avoid beating the horse to death, I’d like to focus on the (vast?) possibility of human error.

Since we’re dealing with text mining, and a relatively small subset of all texts are “literary,” and since this course is titled Digital Humanities (rather than Digital English), I wonder what these methods can tell us about the history of ideas? I decided to plug ideas “capitalism” and “communism” (I know, try to contain your shock) into nGram and see what I could come up with:


So communism was slightly more widely discussed from the inception of the term until the lines cross in 1903; however, neither word was all that common. Fast forward until 1925, when the usage of “capitalism” begins to increase exponentially. Again, not very surprising; it doesn’t boggle my mind that, as America begins to enter the Great Depression, economics becomes a more commonly discussed topic. I’m not sure how to explain capitalism’s sharp dip between 1936 and 1939, but the rest seems to confirm (what I imagine to be) the standard narrative: communism peaks in 1968 before slinking off into the shadows, capitalism rises to a fever pitch by the 1980s, and, after the end of history, the two ideas flatline in comfortable stagnation. Brilliant.

But then again, communism and capitalism are global (?) phenomenon, and their life as ideas is not confined to one language. What would happen if I did the same nGram search in the French corpus (for several reasons: 1) My enthusiasm for studying Casablanca above all the other films in Steve’s class has made me realize that, if TJ is our program’s resident Anglophile, I must certainly be the resident Francophile. 2) I feel that, of all the NATO countries, communism experienced its most vibrant political life in France. 3) In French, I just have to add an “e” to the end of the words, and I don’t feel like looking up the translations in Russian or Chinese)?


Some interesting trends emerge. Communism attained quite a bit of popularity around the time Marx and Engels published the Communist Manifesto. From 1900-1960, there seemed to be a genuine struggle between the two ideas; there are even a few periods when communism seemed to be winning. Capitalism peaked in 1976 before falling rapidly, and the gap appears quite small by 2000. What do these two graphs tell us?


Well that’s a bit surprising. From 1940 onwards, the use of capitalisme in French absolutely dwarfs capitalism in English. The overall discussion of economics seems to be far more frequent in the French corpus—at the close of the century, communisme in French was just about as commonly discussed as capitalism was in English.

So is nGram actually a useful tool for generating new knowledge? Perhaps. But a lot (most?) of this new knowledge might be attributable to the fact that I’m much more familiar with the history of these two ideas in English, and the results might seem quite mundane to a French scholar. Aside from the fact that there will always be someone who knows more than me, I realize that the process through which I decided to pursue these ideas’ history in French was entirely subjective (Francophilia, vague grasp of history, laziness). Of course such subjectivity is inescapable in our field; nevertheless, I’m worried by the severity of errors which could result from mixing this approach with quantitative research.

This seems especially apparent in topic modeling. This is where I’d really like to include some more graphs, but your imagination will have to suffice. What patterns might arise if we use the topic [labor, surplus, exploitation, value, commodity, class, means] to stand for communism? How would this differ from [struggle, revolution, vanguard, party, socialism, freedom, west]? Or [hegemony, coercive, structure, base, dialectic, philosophy, critical, negative]? Or all of them together. It will take someone smarter than me to figure this out. I guess the point of this rambling post is that I remain unconvinced that quantitative methods will necessarily produce more reliable information about history (literary or otherwise).

Algorithmic Theory

Success in small matters. Perseverance furthers. At the beginning good fortune, at the end disorder. Water over fire: the image of the condition in AFTER COMPLETION. Thus the superior man takes thought of misfortune and arms himself against it in advance.

  1. Nine at the beginning means: He breaks his wheels. He gets his tail in the water. No blame.
  2. Six in the second place means: The woman loses the curtain of her carriage. Do not run after it; on the seventh day you will get it.
  3. Nine in the third place means: The Illustrious Ancestor disciplines the Devil’s Country.  After three years he conquers it. Inferior people must not be employed.
  4. Six in the fourth place means: The finest clothes turn to rags. Be careful all day long.
  5. Nine in the fifth place means: The neighbor in the east who slaughters an ox does not attain as much real happiness as the neighbor in the west with his small offering.
  6. Six at the top means: He gets his head in the water. Danger.

I was intrigued by Ramsay’s discussion of how the I Ching dissolves “boundaries between creation and interpretation,” generating a worldview “liberated from the suspicion that subjectivity compromises meaning” (45). Instead of one contiguous texts, think of the I Ching as a set of 4096 texts which only become available through an algorithmic deformation. The random number generating process (completed with coins, yarrow stalks, or a program) determines whether each line of the hexagram is broken/changeable, broken/unchangeable, unbroken/changeable, or unbroken/unchangeable. This process is repeated six times for a total of 4^6, or 4096 possible outcomes.

Ramsay uses the I Ching as an example of a text which can only be interpreted after it is deformed. Interpretation is inherently subjective because it requires the critic to choose one or more meanings from the set of all possible meanings contained within a text; some meanings are always left out. The I Ching sidesteps this dilemma by containing only one meaning: an algorithmic deformation will “determine the auspiciousness or inauspiciousness of a course of action and [give] some sense of how that course is likely to unfold” (38) in accordance to the readers ability to interpret it. The nonsensical nature of these deformations when taken at face value subverts the temptation to assume that the text itself contains some inner meaning which the reader must uncover (an assumption frequently made about horoscopes, for example); objective meaning only exists to the extent that the reader can subjectively arrive at it.

If I understand Ramsay correctly, he’s arguing in favor of developing algorithms which can strategically deform a text in order to enable readings which would have been otherwise impossible. In contrast to McGann’s narcissistic Ivanhoe Game, these deformations would be objectively produced through a pre-determined process. I wonder, however, if there might be some other way to merge the subjectivity of deformation with the objectivity of algorithms.

I wonder what it would look like to read a text based on an algorithmically determined theory. Instead of giving a text a feminist reading, for example, or a Marxist reading, one would employ some sort of algorithm to think up an entirely new theory and attempt to apply it to the text. After all, the I Ching is supposed to provide a theory for dealing with a specific aspect of everyday life. Put another way: what sort of reading of, say, “The Road Not Taken” might we come up with if you used the I Ching hexagram 63 (quoted above) as our theoretical text? What if we used hexagram xx (quoted below)? There are only so many post-colonial readings of Heart of Darkness–at some point they will all have been written. Algorithmic theories, if properly engineered, can be infinite.

In adversity it furthers one to be persevering. The light has sunk into the earth: the image of DARKENING OF THE LIGHT. Thus does the superior man live with the great mass: he veils his light, yet still shines.

  1. Nine at the beginning means: Darkening of the light during flight. He lowers his wings. The superior man does not eat for three days on his wanderings. But he has somewhere to go. The host has occasion to gossip about him.
  2. Six in the second place means: Darkening of the light injures him in the left thigh. He gives aid with the strength of a horse. Good fortune.
  3. Nine in the third place means: Darkening of the light during the hunt in the south. Their great leader is captured. One must not expect perseverance too soon.
  4. Six in the fourth place means: He penetrates the left side of the belly. One gets at the very heart of the darkening of the light, and leaves gate and courtyard.
  5. Six in the fifth place means: Darkening of the light as with Prince Chi. Perseverance furthers.
  6. Six at the top means: Not light but darkness. First he climbed up to heaven, then plunged into the depths of the earth.

(All quotations from the I Ching adapted from the Richard Wilhelm translation, chosen at random)