Blog Post

below is a script for a talk i gave at the 2022 annual sfsu cinema studies conference. one day i might put it in a form better suited for reading. thank you to andrey norkin for taking the time to speak to me.

Hello all, and thank you for inviting me to speak at this year’s SFSU Cinema Studies conference. Seeing as this year’s theme was streaming, I wanted to talk about one very specific and interesting technology that has been spearheaded by Netflix and its development team, and what that technology means for filmmakers and scholars.

But first, there’s been a discourse surrounding Netflix specifically that I feel I should mention. And that is the phenomenon of the “Netflix look.” This Netflix promotional video illustrates the sort of fluffy language Netflix PR uses to dress up the camera restrictions and tech specifications for their in-house productions. In Gita Jackson’s article “Why Does Everything On Netflix Look Like That?” for Vice, Jackson interviews J.D. Connor from USC about this visual phenomenon, and he points to a certain approach the company takes towards shooting resolutions: “‘They did what they call future proofing their content. They wanted it all to be shot in 4K HDR… When it gets compressed, and jams through the cable pipe, or the fiber to get to your television, Netflix takes as much information out of that as they can through compression in order to reduce the amount of data that's going through, so you have a smoother streaming experience.’” Clearly, the aesthetic trends audiences are noticing in regards to a platform like Netflix, where its technical specifications are creating a specific “in-house look,” illustrate a blurring of the lines between platform, content, and technology. Connor’s statement is interesting, then: the notion of compression as contributing to the visual result is there, but what if the techniques used to lower bitrate and squeeze more efficiency out of the encode, while also maintaining visual fidelity, could be seen directly on the screen?

Enter film grain synthesis. Film grain is infamous for its ability to balloon data rates and increase video file sizes, which is bad news for streaming video. Typically, traditional methods of compression could render film grain illegible and destroy the grain structure. Andrey Norkin and Neil Birkbeck, of Netflix and Google respectively, published a paper in 2018 that detailed an alternate method: erase the film grain, generate an algorithm for new grain, and apply that layer to the original video at the end-user screen [FIGURE 2]. Norkin and Birkbeck target the randomness of film grain as the key culprit for difficulties in encoding and streaming, and this new method takes advantage of AV1, an open-source streaming software with algorithms that facilitate this process.

I was able to have email correspondence with Norkin for this talk. Due to confidentiality issues, he couldn’t answer any questions about the types of work distributed on Netflix that got the AV1 compression treatment, or if this software was Netflix-specific. However, he did get into a little about clarifying the purpose and vision of AV1 film synthesis: “Essentially, the film grain was originally a feature of analog film. Nowadays, it is often applied during the post-production if a movie is produced digitally. The idea behind the grain modeling technology was to enable better grain reproduction in video compression and streaming. The problem with compressing the grain with traditional video compression is that grain is random. Video codecs use prediction to reduce the size of the video and have difficulties predicting random things, such as grain. If one reduces the bitrate to fit the available bandwidth, the grain starts disappearing at a certain (usually quite early) point. The film grain synthesis essentially tries to capture the characteristics of the original grain in the parameters of the grain model, send these parameters to the decoder, and then use the model with the received parameters to reproduce the grain as close to the original as possible. The grain on the encoder side is typically removed by a denoising process and compression. The film grain synthesis essentially tries to capture the characteristics of the original grain in the parameters of the grain model, send these parameters to the decoder, and then use the model with the received parameters to reproduce the grain as close to the original as possible. The grain on the encoder side is typically removed by a denoising process and compression. So, the current approach is to try to reproduce the grain as close to the original as possible. The end user would not experiment with the grain, it would be applied in the way it was estimated in the encoder. The input videos are also the same as in the case of typical compression. There is a denoiser that removes the grain from the original video, and the estimator estimates the grain parameters.”

Now, I want to take a little bit of time here to illustrate my own personal journey with film grain synthesis. When I found out about Netflix’s use of AV1 to synthesize film grain, I had to try it myself. The process was… a difficult one. I’m a filmmaker by trade, and my knowledge of coding stops at displaying “hello world” text, so I had to dive deep to try and figure out how this works. Adding to that, Netflix does not have officially released information about how their specific film grain synthesis is accomplished, so the process had to be reverse-engineered. What I found was fascinating, though. Using a clip from a 16mm film I made a couple years back (which is about the recreation of nostalgic artifacts in the present day, which I thought to be an interesting bit of synergy), I tried to see how AV1’s film grain generation treats the image, and if I could tell the difference myself. [CLIP 2] Here we have a ProRes version of the clip. It’s not raw or uncompressed by any means, but it is the initial data I had to work with once the film reels were digitized. It maintains a fair amount of the original grain structure, and would definitely choke up if I tried to watch this product straight-to-streaming. [CLIP 3] This is the same clip, compressed with the H.264 method. [CLIP 4] And here is AV1’s compression, complete with the film grain synthesis going on. It’s apparent that AV1 has something here, the grain seems to come through far better than the H.264 method. To my eye there is some wonkiness here, but I admit that could be due to my familiarity with the original versions of these clips.

To me, this experiment illustrates a few things. One, that this technology WORKS (if I can do a crude version of it, I’m sure Netflix engineers have figured out the advanced potential of these algorithms), and two, the separation of a “fully-formed” image into its malleable parts is here to stay, and used fairly regularly. Implementation of this “ripping-apart” of the image is still difficult to wrap my head around due to the “baked-in” conceptual presumption of film grain that I have, but these techniques are not new. Discourse surrounding adjacent techniques has been alive since the inception of the digital cinema. Aylish Wood, VFX artist and professor at the University of Kent, detailed her conception of the “layers'' of digital cinema and post-production. Wood argues that with the advent of digital post-production processes (via Digital Intermediates, or DI), every part of the image became a distinct mutable element that has opened itself up to manipulation and creative exploration from filmmakers: “The impact of the DI on color grading has already begun to be considered within cinema studies (Higgins; Prince), and though editing and color remain sites of expressive control through analog techniques and are extended through digital ones, the DI now also allows the isolation and control of individual elements of an image—a facet of micromanipulation that is only just beginning to be exploited” (74) [FIGURE 3]. Wood’s succinct characterization of this new mode of image manipulation facilitated by digital technology applies heavily to Norkin and Birkbeck’s work in film grain synthesis. An artifact, typically an inherent part of an image, one made highly visible by a transition into digital cinematography and processing, has become another manipulatable element of the image. What complicates this paradigm in regards to film grain generation is the osmosis between the post-production process and exhibition. The algorithm rips a visual element from the file, and re-introduces it at the end, in a sort of black-box process that is more clearly open to the streaming engineers and web designers than it is the filmmakers. This is where post-cinema comes in.

Shane Denson, a prominent scholar of the Post-Cinema, notes in his seminal work Discorrelated Images: “We do not usually think of our screens as cameras, but that is precisely what smart tvs and computational display devices of all sorts in fact are: each screening of a (digital or digitized) ‘film’ becomes in fact a refilming of it, as the smart tv generates millions of original images, more than the original film itself—images unanticipated by the filmmaker and not contained in the source material. To ‘render’ the film computationally is in fact to offer an original rendition of it, never before performed, and hence to re-produce the film through a decidedly post-cinematic camera” (41). Denson’s conception of the screen as a “camera,” therefore suggesting the breakdown between exhibition, post-production, and even cinematic capture itself, is one of the hallmarks of the post-cinematic mode. The film grain synthesis capabilities of AV1 surely illustrate Denson’s point, effectively recreating even the artifacts in an end-user process that is possibly a more obvious screen-as-camera mechanism than the motion smoothing television feature he mentions (after all, the new Arri Alexa 35 cinema cameras feature a digital film grain generation mechanism of its own).

But perhaps, we can look back to find an exploratory conception of film grain synthesis. Instead of a digital future, a cinematic past could offer some connections. The fracturing of cinema as an experience is a hallmark of post-cinematic theory, but is it possible some answers lie in the distant history of film? And yes, literally, celluloid film? Mary Ann Doane remarks that there was a perceivable lack of sound in the early cinema that resulted in “the gestures and contortions of the face” (33). With the advent of the Vitaphone, the film industry could record sound and play it back in theaters on a completely separate system, “timed” to sound (Gitt 262). The ontological relationship here is clear: a lack of film grain in digital images means we pursue a double system technology in order to bring the artifacts back in a similar pursuit to how early cinema managed to bring sound into the fold (after all, digital artifacting is referred to as noise). Except instead of a lack that we can perceive in a hypothetical future, in the case of cinema’s sound, we experience a lack of the past that must be solved with new digital algorithms.

In Claire Cronin’s book “Blue Light of the Screen: On Horror, Ghosts, and God,” Cronin reflects on the nature of digital images and their relationship to the photograph: “A ghost, in Catholic speechlessness, is like a photograph or silent film. A demon, with its howls and its promises, is more like a TV. Because both demons and ghosts lack physical bodies, however, we could say that all spirits are akin to electronic media. Flickering between presence and absence, able to cross boundaries of space and time, this media is ontologically ghostly. It is as incorporeal as thought… Because horror’s ghosts are noisy and rarely come for prayers, they are ‘phantoms’ rather than souls — from the Greek phantasma: image, unreality. Inside the screen, ghostly apparitions are made of light and time, and now, most often, digitized. The silicon-age phantom seems most virtual: unreal as both a superstitious fiction and an immaterial form. The power of this medium lies in its ancestor, photography, which reveals its pact with death in portraits of real ghosts: images of people once alive and now deceased. Like death, the camera takes a moving body and cuts it off from time, freezing it in its last place. The photo is a death mask, a memorial, a corpse” (47). Perhaps this synthesis algorithm in practice acts as a sort-of transubstantiation, an attempt to make the phantom of digital streaming media, in its perfect, smoothed out form, whole again. To experience living, textural cinema through the choked-out processes of streaming, we have resorted to eliminating the grain altogether, and reconstituting the image with our own algorithmic jumping artifacts.

To me, this is a ghost story.