Writing With Images
Imagetext, Multiples, and Other Mixed Modes

It is axiomatic in the history of technologies that innovation in technology does not immediately bring about a corresponding change in forms and meanings. For example, academics who make a course Web page working from an already existing paper syllabus often simply put up the syllabus (i.e., the document they hand out the first day) as one continuous document with few or no links. They seem to be assuming that the students will print the document and refer to their paper copy as usual. If they thought instead that the students would use the document on line to answer various questions over the course of the term, they might well design the page differently. Such transfers from a familiar medium to a new one will inevitably occur until the new medium and its possibilites become familiar. Special glory accrues to persons who discover expressive uses for a new piece of technology. Their uses of it, and the meaning they convey, are quickly picked up by others and come to seem quite natural and automatic. For example, the early photographers very soon discovered that double exposure or "compositing" two negatives onto a single print afforded the possibility of representing the thoughts or dreams of a depicted figure, and from thoughts it is just a short step to visiting spirits. This fairly quickly developed into the thriving cultural practice of "spirit photography" much debated in the later 19th century (see the online exhibit at The American Museum of Photography). Experiments with the expressive possibilities of compositing (photomontage) continued throughout the twentieth century, but photomontage remained a fairly difficult and somewhat arty practice until digital imaging tools made it extremely easy to do and digital photomontage moved into the mainstream—not, to be sure, for trafficking with spirits, but for other, newer meanings. Photomontage is one source of "multiple" images—images that suggest not one viewing from one point but several—which are so much a part of Web visual sensibility. The emergence of digital text and image does transform many things, starting with the material work of fabricating texts and images, but these changes in material practice do not directly and immediately change signifying practices. It is this complicated picture of emerging practices, with main focus on the signifying ones, that this book sets out to trace.

In recent years, the creation of new software with new possibilities has outrun criticism and theory. The last substantial work on the semiotics of images, Gunther Kress and Theo van Leeuwen's Reading Images: The Grammar of Visual Design (1996), says nothing about digital imagery, HTML, or etext. In fact, it says nothing about Photoshop, which has transformed image-making more than anything since the invention of photography. It concentrates on single images rather than multiples (i.e. photomontage and collage); this may be sensible as a restriction to simple cases, but it has little to offer on the very strong revivals of multiple images stimulated to a considerable degree by Photoshop. These are not to be reckoned flaws of their ground-breaking work: they are simply a reflection of how fast things have been changing and how slow criticism and theory are to articulate a perspective on the current scene. Because Reading Images employs the somewhat questionable metaphor of grammar (i.e. a system) of visual signification, it does not readily accommodate change (as Kress himself later noted in "'English'" at the Crossroads"), confining itself to the broad notion of the rise of the visual over the twentieth century.

The use of multiples, growing out of photomontage and collage, is one of three major innovations that characterize digital, especially Web, writing that will concern us. The other two are the development of new forms of imagetext (to use W.J.T. Mitchell's term) and the related preference for images and texts of mixed modes that require more or less simultaneous processing of text and image, or text and image and sound. Such mixtures did not begin with digital writing, of course, but the mixtures are far more various since experiments with new combinations are so easy to carry out. These new forms of Web textual-visual culture are arguably even more important than the development of the hypertext link, although the latter has attracted vastly more theory and analysis than these changes surrounding the visual. Sketching that argument here would take us off point; it may suffice to note that when critics discuss the connection of texts by hypertext linking, they very often have recourse to talk of juxtaposition, collage, and montage, and that a very active area is the visual modelling of hypertext-based structures. These are signs that the visual and visual/textual modes to be studied here are providing ideas even for the very textual worlds of theory and criticism.

Before examining specific developments in these three related areas, we will survey accounts of the general "rise of the visual" and of differences between the visual and the textual as signifying modes.

1. The Fall and Rise of Imagetext

Scholars and critics have begun writing accounts of this "pictorial turn," "turn to the visual," or "foregrounding of the visual." 1 These accounts cluster in two groups: stories of a Fall or falling out between words and images leading to the subordination of images to texts and closely related stories about the rise (or restoration) of images to the role of a new international medium or, alternatively, to the emergence of a new integration of imagetext or visual language. Within each cluster several main threads can be discerned.

One version of the Fall is W. J. T. Mitchell's, which starts from a notion of imagetext as a level of meaning which is not medium specific, in which word and image mutually complement and reinforce each other (not necessarily by "saying the same thing"). Mitchell disputes the Modernist notion that each medium has its own unique mode of operation and is best pursued by avoiding doing the work appropriate to another medium. This radical separation of media, Mitchell argues, may have the effect of protecting the visual arts from being annexed to talk and to verbal accounts of visual meaning, but it is not easy to find pure types anywhere in history nor is it exactly clear how we can talk about writing without involving the visual. There are no pure media, Mitchell argues, in the sense that Clement Greenberg among others assumed. So Modernism is the Fall for Mitchell, with its tendency to look down on Blake's illustrated books on the high art side and Trudeau's Doonesbury on the side of more popular culture. Mitchell identifies a major theme of post Modern art, especially visual art: how and in what interesting ways can the two signifying systems be set in motion within a single work?

For some, the root of many evils is the division of labor, and George Landow shows a certain Pre-Raphaelite nostalgia for a day gone by when an author might preside over the making of a whole book, text, layout, visuals, tout ensemble. Today, Landow says, authors are enjoined to "just write" and to leave the design and visual aspects of their work to other professionals (Landow, Hypertext 2.0: 49-52). This sends two messages: text is more important than look, and text people make poor designers. Other economic forces have conspired to separate authors from developing the visual aspects of their ideas, most notably the relatively greater cost of printing images and graphics than pages of print. Academics learn to keep the budget for graphics down, since they are not likely to recoup the extra printing costs from the sales of their books and articles. These attitudes are the dead hand of the past, in that images can now be very inexpensively included and disseminated and design ideas quickly implemented—on the Web. But that also involves a new model of publication which we have not yet fully assimilated or embraced. A related legacy of industrialization is the sharp split of visual representation in service of Utility (CAD, formerly "mechanical drawing") and in the service of Commerce (advertising) from Art—or, as writing instructors might say, expository graphics, persuasive graphics, creative graphics.

This last thread leads directly to a third view of the Fall which Kress and van Leeuwen put forth in Reading Images. They remind us that another site of the Edenic imagetext is childhood; children's books begin almost all image and no text, then fade in more text , then fade images out, so that as the years go by in school, then college, the images gradually wither away as the unquestioned sway of print in serious documents—the densely printed page— is learned and accepted by yet another generation. Images, we learn, are for the less capable readers, suitable for illustrated classic comic books, the manga of ideas. Not only that, the print gets smaller—another sign that more advanced literacy is expected. To which we might add that, as noted, images, especially big, colored images, traditionally have been costly to make and reproduce; whoever pays the bill expects a return on their investment. That is, we know that images are used to control our attention, disarm critical skepticism (especially humorous cartoony clip art) and focus our desires. So we become suspicious that we are being condescended to or manipulated with images, just like television. And the emergence of the Web as a new medium for scholarly exchange and artistic exhibition has if anything exacerbated these attitudes. So rapidly has commercialism transformed the Web from the scholar's rather exclusive medium that writers and artists question whether publishing on the Web would be a waste of time because their writings and images (and sounds) will never be given a serious and sustained look on the Web.

But, say Kress and van Leeuwen, the days of the text's domination of the visual are rapidly drawing to a close. They see a turn toward the visual in which the ratio of text to image drops and the functions change: instead of image illustrating the main-meaning-bearing text, text now furnishes commentary on main-meaning-bearing images. Comparing a 1935 high school textbook treatment of magnetism to one from 1976 describing the concept of a circuit, Kress in a later article concludes that the image displays what the world is like, the text orients the reader in relation to that information (Kress, "'English' at the Crossroads":76). He links this change in textbooks to the difference in text/image ratios between high-brow newspapers and tabloids. (Jay David Bolter cites the rise of USA Today, others might point to the New York Times giving more and more inches of its front page to color photographs every day.) Some would (and do) find such changes distressing, since they doubt that images can convey all of the meanings we need to express with the precision and subtlety of natural languages. Jay David Bolter, for example, writes of the "Breakout of the Visual" from the containment of printed prose and speaks of rivalry and competition, assaults on prose by images throughout the twentieth century, and the dominance of images on the Web (Bolter, c.4). But Kress and van Leeuwen are not distressed, for their book aims to demonstrate that images draw upon a system of meaningful structures that is every bit as rich and complex as language and indeed is very similar to it, so that their title, Reading Images: Towards a Grammar of Design, for them does not involve much stretching or loosening of terms. For others, speaking of viewing as reading is the work of linguistic imperialism and the devil. Before making our own frontal assault on this most contentious of questions, it will be useful to collect the points on which images and texts have been said to differ.

2. Images Compared to Texts

Comparisons of texts and images are conducted at a high level of generality where one may suspend awareness that not all texts are linear and literal (some are poems) and not all images work by resemblance (some include diagrammatic and iconographic symbols). So, attempting to hold firmly in mind that the parenthesized properties are not those of the prototype texts and images being compared, we procede.

  1. holistic v. analytic
  2. digital v. analog
  3. images v. propositions
  4. code v. context
  5. naturalism v. abstraction
  6. documentary v. manipulated
  7. art v. nonart

I will say a bit more about each of them before proceeding, using clip art to illustrate the issues. It is fair to note at the outset that conducting this discussion in words insidiously and perhaps irresistibly enacts language as the master discourse, as if in talking about images we make more precise and certain meanings that waver and swim in and out of focus (!) as we look at something. We will know that we are resisting this habit of thought when we feel that we have grasped what an image means without feeling the need to provide a verbal explication. There are times, as John Berger says, when appearances cohere into evidently meaningful configurations, and this cohering occurs without any obvious mediation by language (Berger and Mohr, 11-12). Nonetheless, as Benveniste noted in "Semiotics," language is the great interpreter of all other systems of signification.

Der Sprach Brockhaus, s.v. Lage

Figure 0.1
Der Sprach Brockhaus, s.v. Lage

2.1 holistic v. analytic: Images are representations of shapes, objects and configurations of objects in space. It is much easier to convey what a circle is by drawing one than by offering a definition ("a line all the points of which are equidistant from a single point called the center") and similarly to convey what "inside" means by drawing a simple "canonical" container (a box, wouldn't you expect?) with a pointer or shading or simply the word inside in the interior of the space rather than trying to define it in simpler terms (so no "at or in the interior of"). As James Mathewson points out, science has many master images that are better conveyed with a pencil than a keyboard, such as path, circuit, gradient, branch, cycle. Also, relative magnitudes and locations can be grasped in a single glance at a photograph or drawing done to scale but are very tedious to describe verbally even partially. So, for example, Der Sprach Brockhaus uses cheaply printed old black and white clip art to illustrate the spatial relations conveyed by the German prepositions vor and hinter, and especially the subtle difference of vorn and hinten, by sketches with strongly indicated perspective. Yvonne M. Hansen claims that systems, patterns, and chaos more generally are best represented with images.

Much can be made also of the image as presenting itself with all of its parts simultaneously, where language is inevitably sequential and where its meaning is a compositional function of its component words. We may note certain things in an image that we would want to include in our overall account of what the image means, but we do not put this account together by some sort of compositional syntax. In very famous and much discussed lines at the end of Yeats' "Among Schoolchildren," the poet glimpses a moment of wholeness and exclaims "Who can tell the dancer from the dance?" The illustrator might reply, "Grammar requires a subject and a verb, but to draw a dancer, or a dance, I would draw the dancer dancing."

Corel Office 2000/educaton/teachr6

Figure 0.2
ISOTYPE clip of teacher from Corel Office 2000 archive

2.2 digital v. analog: For Nelson Goodman, the fundamental property of images is that they are instances of "dense" rather than notational systems, where the latter is built of discrete elements, not continua. 2 Notational systems (Goodman's example is musical notation) have often been called "digital." It is sometimes said that word meaning is digital—that it can be given as a binary checklist of criteria that some thing or action must meet in order to be signified by the word. So, for example, pediatricians are medical doctors who restrict their practice to the treatment of children. The definition has to do with certification and function, not visible form. Similarly for teachers ("people who impart knowledge or skill"). In addition to lists of criteria for words, however, we have prototypes or stereotypes of typical instances of the category, and these are almost always thought of as images. Most contemporary desktop dictionaries include hundreds of illustrations to accompany definitions, suggesting that recognition of many things is done by matching to a visual prototype rather than checking off a list of features required by a definition.

If handed a camera and sent off on assignment to produce a picture signifying "teacher," we would most likely find someone at the front of a classroom who would let us take a quick shot. The resulting picture would have a great deal in it that was not criterial for being a teacher (gender, gesture, clothes, posture, ethnicity, age, hair, facial features, ...) and would somehow have to be suppressed if the picture were meant to convey only the meaning "a person who imparts knowledge or skill." 3 Illustrators have a somewhat easier time of it, since they have latitude in what to put in and what to leave out. For example, there is a clip art style of "faceless people" for clip art illustrating "people at work," and even more extreme geometric simplifications as well. But anything more specific that the ISOTYPE-style teacher of Figure 0.2 will introduce features perhaps typical but not necessarily present in all instances of the category and hence begin to invoke and propagate stereotypes. In fact, wielding a pointer is surely not a criterial feature for a teacher, nor is wearing a skirt. The pointer is very much on a par with the bloody apple, globe of the world, or mortar board used to signal "teacher" in many pieces of clip art: surely these are not part of the definition of teacher. But word meaning cannot be viewed simply as a checklist of criterial features; it has an analogical side as well: many words have a last, loose sense given them in dictionaries "anything like or as like <word> in form or function," which extends blanket approval to analogical extension of the word's denotation set. (Check, for instance, dictionaries' definitions of the word tree.)

Semiotic principles (or rules of thumb) are more like analogical rules than rules of syntax in two ways: they are never necessary (unless explicitly invoked) and they are not fixed in number. They contrast in these ways with rules of grammar, which are rules for combining discrete elements (words and affixes) into connected graphs ("trees"). Every word and affix must be attached by one of the rules and (in most versions of syntax) only one of the rules, or else the string of words fails to parse as a sentence. But since images do not have definable parts ("words"), anything in an image may trigger a semiotic rule, and perhaps more than one rule, or none at all. This is what is meant by calling them rules of thumb and why it may be misleading to speak of a "grammar of visual design" except in a general sense. There is no such thing as parsing or failing to parse an image, unless in the sense of sorting out a complexly jumbled scene (of which we will see many instances later on). When interpreting texts, semiotic rules are not necessarily attached to words and affixes and apply in the same variable way they do with images. Semiotic rules for verbal texts include those for recognizing and interpreting figurative language, and with figurative language, we leave sentence grammar and sentence meaning behind.

2.3 images v. propositions: For an image to work like a sentence, it would have to have clearly defined, separate objects and some other parts indicating the relations of the objects. That is basically what a diagram is. As Stephen Kosslyn defines them,

Diagrams are pictures of objects or events that use conventionally defined symbols to convey information—to show the wiring of your kitchen, the nitrogen cycle, or the assembly of your model 1968 Mustang. Diagrams combine literal elements (pictures of parts) and symbolic ones (arrows to show movement, direction, or association; shading to show curvature). (Kosslyn, 244) 4

Or, to put it somewhat crudely, the image-bits representing objects function as nouns, the arrows and other shapes predicate the properties and relations of the nouns. Kosslyn's grouping of shading with the symbolic elements is a little surprising—one would have thought it a part of "literal" pictorial appearance—but he is probably thinking of the ways shading is used in a drafting class to indicate a contour, where it is not part of any systematic rendering of an illuminated scene, or the use of hatched lines and "bracelet shading" (shading by close, parallel curved lines that are taken to represent a curved surface, as if a bracelet (many bracelets) following the shape of a limb) by artists. (See Willatts, 135-38) Note in Figure 0.3 the shading on the funnel, which is certainly no attempt at photorealism.

clipgallerylive Style 497

Figure 0.3
digitalgallerylive: Style 497

Figure 0.3 is a classic diagram complete with arrows, though one of the "objects" (information, data) has no canonical shape at all and the funnel represents a function rather than a form. (Another clip in this series has a spigot projecting from the back of the head with information coming out of it and with excitation marks around the head.) This particular style (Style 497 at digitalgallerylive) concentrates on mental events and is very ambitious and inventive in finding ways to diagram them, though it also happily uses standard symbols (light bulb for inspiration, brain for mind, padlock for mental limitation, flowerpot and stem for growth, plugs and wires for power).


Figure 0.4
digitalgallerylive: metaphor

Figure 0.4 seems to violate the rule of thumb that arrows convey the equivalent of predicates; the clue is that such an up-down-up arrow is heavily used in clip art for lines graphing trends, and so could stand in here for an up trend which the man is trying to "catch." This is an image that seems largely a literal imaging of the phrase "catch a trend"—all the physical associations of catching a fish do not appear relevant.


Figure 0.5
digitalgallerylive: teach

Although the vast majority of clip art is drawn with vector graphic drawing tools, not photographed, many clips are straight representational images: they portray typical scenes such as those of teaching where we can identify the various key entities and describe in full sentences what is going on, but there is no clear visual basis for segmenting the image into nouns and verbs, nor are the various parts shapes standing in for functions and other abstractions. So what we have in Figure 0.5 is a depicted scene, not a diagram, and hence not language-like in its way of meaning.

 Lazlo Moholy-Nagy: Jealousy (George Eastman House collection) (19xx)

Figure 0.6
Lazlo Moholy-Nagy: Jealousy (George Eastman House collection) (1927)

A famous and influential style of diagram was developed by László Moholy-Nagy in the 1920s. It mixes photographic image and diagrammatic lines in an abstract, mainly white space, though one with depth cues. Some of these are collected and exhibited as art, but he made commercial ones as well, as did others such as Anton Stankowski. 5 The astonishingly contemporary look of these "photodiagrams" arises in part from their mixing the most concrete and abstract modes of representation, and using them to represent (here) the complexly but abstractly configured passion of jealousy. Other titles from this group include "The Broken Marriage," "Boxing, Militarism," "Love Your Neighbor," "Joseph and Potiphar's Wife," and "Rape of the Sabine" (all at the Getty Museum). Photodiagrams were very popular in the 1920s in many of the regions of the avant-garde from Gustav Klustis and Alexandr Rodchenko to Jan Tschichold and Moholy-Nagy. The style is sometimes called photomontage, but the open, white, abstract space is quite different from what we will be calling photomontage, where images are merged in a single representational space.

Natsuki Kimura: Gone (2000)

Figure 0.7
Natsuki Kimura: Gone (2000)

It has traditionally been said that propositions with negative, hypothetical, potential, or future modality cannot be represented visually. Paul Messaris discusses a somewhat laborious way one could make a video of someone who was in a certain place but is not there now. Still images, however, cannot display a history. Natsuki Kimura uses the diagram convention of the dotted outline to indicate something that cannot be seen "here and now" but could be under other circumstances (namely, before now, after now, and possibly after now). 6 However, he also needs to add the word gone, because the dotted outline could otherwise be taken as a future building rather than a past one. It is interesting that he does not here make use of the most common visual devices of reduced opacity and blurred outline; especially in photomontage, these are used to indicate that something is imagined or remembered or even perhaps an alien manifestation (Kimura himself does this elsewhere), and in any case of somewhat reduced modality. Of course, if the point is to depict absence, you can't show anything of it at all.

2.4 code v. context: Here we should distinguish between photographs, drawings, and diagrams: all are images, but Roland Barthes made the much discussed claim that only photographs signify without being coded. This does not mean that the code for drawings and diagrams is known and evident to every one who sees them, or that there is only one possible decoding. Tech writing handbooks admonish writers to be sure to include keys spelling out the value of the shapes and symbols, but fairly often people use common images and symbols hoping that the context will make the value of the shape clear. This is the case with the "info brain funnel" above and I think it does succeed without a key specifying that the letters and numbers represent information, the brain represents the mind, and so on. Keys do assume at least the equivalence of shape and word, and perhaps the containment and control of shapes by words as well.


Figure 0.8
Corel Office 2000: environment/ preserve

Consider the case of the worldbulb (Figure 0.8), which could be taken as a roughly drawn novelty shop item (Archie McPhee's?), but finding it in the Corel Office 2000 Clipart archive under Environment/Preserve, we will search for some figurative extensions to interpret what is in that context some sort of metaphor. "The whole earth is inspired"? (or is it just Atlantic Rim?) "The earth had an idea"? "Think globally"? "You can see the whole world in a light bulb"? "Let your light so shine before men ..." The example would seem to illustrate the often-voiced observation that images convey the intentions of their makers rather weakly and lean upon captions and adjacent text and images for "anchorage" as Barthes called it.

2.5 naturalism v. abstraction: Parker and Deregowski distinguish eidolic from epitomic images according to whether the image makes some attempt to represent things in three dimensions. Epitomic images are epitomes of objects—they represent them as they are, not as they appear; they typically require less skill to draw (e.g. stick figures) and are earlier and more widely used than naturalistic (eidolic) renderings of objects in space. They can, however, be praised for abstracting away from the surfaces of things to features that are criterial for membership in the class of things denoted by the word—i.e. as supporting or expressing an analytic probing beneath the surface appearance of things to general, structural, and invariant properties. John Willats observes that Byzantine and Orthodox religious artists made only half-hearted attempts to represent three-dimensional bodies in space, as if deliberately undermining any illusion of physical presence in the very act of depicting the Father, the Son, the angels, and saints. (Willats, 339-50)

note brighter colors of sleeves on teacher and student

Figure 0.9
note brighter colors of sleeves on teacher and student

Depth is one of eight "markers of modality" according to Kress and van Leeuwen. Others include:

  • color saturation
  • color differentiation
  • color modulation
  • contextualization
  • pictorial detail
  • illumination
  • degrees of brightness
complete with linear perspective

Figure 0.10
complete with linear perspective

flipped from clipgallerylive version

Figure 0.11
flipped from digitalgallerylive version

These may be thought of as parameters with high values being associated with naturalistic representation (where the look of common-sense naturalism is to be found in 35mm color pictures (prints?) taken with a normal lens). One may wonder why Parker and Deregowski singled out the representation of depth; an at least partial answer would be that they are looking to compare styles of visual representation across a very wide range of cultural styles and levels of technology (e.g. from cave paintings to Greek vases etc.) and many of these parameters do not apply in all circumstances. Note in the photo clip (also from clipgallerylive) that the grass provides texture cues to depth and the color, contextualization, detail, and illumination values are much higher than those for even the most detailed vector clip. (These vector clips in their native WMF format have 2, 44, and 99 different colors, but the photoclip has over 14000.) The color in the photoclip (Figure 0.11) is actually rather weak in the blue midtones, giving the impression of very late afternoon sunlight, which is consistent with the long, deep shadows. The clip does not present a general or typical scene of instruction in the early years but looks more like a family or pre-school snapshot of a particular moment in the lives of these two individuals.

Figure 0.12
Scott McCloud: Abstraction Triangle

Scott McCloud, in Chapter Two of Understanding Comics: The Invisible Art, discusses the simplification of form, which is a major characteristic of cartooning and comics, arguing that it facilitates "amplification through simplification" whereby certain traits can be foregrounded. He also argues that the simplified figures of the comics more readily allow us to project onto them, citing in support the practice of making backgrounds considerably more realistic and detailed than the figure we are to identify with. McCloud also observes that images may depart from the fully naturalistic ("photographic") along two different parameters: towards abstraction of visual form (colors, shapes, etc. which do not represent at all) and towards the reduced detail of an iconic notation (happy face, stick figure) which ultimately breaks the visual link to represent symbolically (i.e. arbitrarily), as happens in the development of pictographic signs into the symbols of writing. These two parameters yield a triangular shaped space of possibilities, into which he plots the signature styles of a large number of writers of comics.

Figure 0.13
Donis Dondis's Fig. 4.20
reducing photograph to abstract structure

Dondis also discriminates two directions of abstraction: "Abstraction can exist in visual matters not only in the purity of a visual statement stripped down to minimal representational information but also as pure abstraction, which draws no connection with familiar visual data, environmental or experiential" (Dondis, 74). However, for her (unlike McCloud) any representational image, however detailed, always has also the abstract structure of elemental visual forces "which has enormous power over response" (80). From the examples given (Figure 0.13 is one), direct realization of the underlying abstract structure is very geometrical—in fact, exclusively polygons. She concludes the section on the level of abstract structure most grandly:

The abstract conveys the essential meaning, cutting though the conscious to the unconscious, from the experience of the substance of the sensory field directly to the nervous system, from event to perception. (81)

I don't know what there is to say about the power and unconscious appeal of deep structure polygons—perhaps nothing at all. It is not an idea that will be developed here.

This naturalism/abstraction scale is similar to Goodman's replete/attenuated one, but Goodman's example of a replete image is a Ukiyo-e woodcut by Hokusai (where every detail is said to signify) and the contrasting pole is represented by an EKG tracing, where only the values plotted on the line matter. "Replete" cannot be equated with abundance of naturalistic detail, since the latter may not signify very much at all. (See Elkins, 70-74; Salomon, 40-41)

We touch here on what philosophers call the problem of the "generic image"— Can there be an image of a chair-in-general? 7 Or at least a standard view of a typical chair? One use of various digital image processing filters and masks is to push the image in an "abstract" direction. And of course there is no reason why even a fairly naturalistically rendered image might not be taken generically if the context encouraged it. One place to find such images of concepts is in diagrams.

Our array of "over-shoulder-looking" depictions looks somewhat like Yvonne Larsen's "scale of abstraction" illustrated with representations of "shoe," starting with the word shoe and ending with a 3D model of a shoe.

Kress and van Leeuwen correctly point out that individual images may have different values on each of the parameters (i.e., may have a profile of, say, high saturation, low modulation, moderate contextualization, etc.) and further that even within individual images, some parts may be treated according to one profile, other parts to another, as for example when an advertisement renders the product with sharp focus and other markers of high modality while blurring and de-saturating other parts of the image. Presumably choices along these parameters enters into the meanings conveyed by the images. The advertising technique, for example, creates the effect of focusing on the product even if one tries to focus elsewhere.

Shirin Kouladjie

Figure 0.14
Shirin Kouladjie

Hence an image does not have to maintain one mode throughout: Figure 0.14 is one that combines the abstraction of hand-plotted mathematical curves with photographed curves and with penciled words. The effect is that of a engineering student's notebook, or perhaps an interior notebook of the mind. It is a collage counterpointing the abstractness of the catenary curve to one of the things in the world of material flesh that have its shape, along with an exercise in drawing trees where the tree line itself describes a catenary. The effect is of a punning linkage of correspondences, a playful illustration of one of the fundamental ideas of science. (The artist, Shirin Kouladjie (Moalie) was a math major in college.) This image raises the issue of mixed modes which we will take up shortly (link provided for the impatient). The mixture of geometric and photographic modes makes this a direct descendent of Moholy-Nagy's photodiagrams.

Vuk Cosic: The History of Art for Airports- Venus

Figure 0.15
Vuk Cosic: The History of Art for Airports- Venus

So now we should be in a position to answer the question: what makes Figure 0.15 hard to interpret? It is to be found in a rather small collection at a site called "History of Art for Airports." Does that help? You access it by clicking on the word "Venus." Does that? It represents one of the most reproduced works of art in the world, one which takes its identity as much from accidental damage as from the sculptor's hand. This is as wrong as anything can be for an ISOTYPE clip. There is no general category "statue of a nude missing both arms" and geometric bi-color is not capable of representing any of the fine details and modeling of the statue. Finally, though lightly draped from the waist, the Venus de Milo is not wearing a skirt, but a skirt there must be to mark gender according to the ISOTYPE "key."

2.6 documentary v. manipulated: This contrast begins with the common-sense notion of the camera as a recording eye; around this pole cluster the notions of no posing and no enhancement of the negative or print. Even the simplest snapshot allows certain adjustments, of course, such as choosing a flattering camera angle, removing distracting objects, and permitting some posing (which is what "Smile!" means). The purpose is to produce a decent-looking picture, not just the result of grab-and-shoot. We would still think of pictures taken under such circumstances as valid records of the way our subjects looked on the day the snapshot was taken. The raw recording function has made a strong digital comeback with surveillance cameras (used not only by banks, shops, apartment buildings, streets, and so on but quite heavily in multimedia installations, as for example those of Julia Scher). We now study the appearance of things on video monitors. Web cams have provided the possibility of "living documentary." In these uses, the technology does not allow much human intervention (selection, editing, filtering), and some of the fascination of Web cams arises from the sense of watching life being lived with only electronic mediation between ourselves and the scene. As a means of enhancing social and political awareness, documentary depends on our belief the that conditions or contradictions shown were there to be observed by anyone, so that we admire the photographer for the patience and cunning to make a good catch. Of course, there is no guarantee that the photographer did not fudge things a little or a lot, and this is the root of William J. Mitchell's claim that easy and undetectable digital manipulation spells the death of photography.

Manipulation can either be silent and hard or impossible to detect, or it can be quite open, as typically in the cases of photomontage and digitally "filtered" images. Although it has been much more discussed, concealed manipulation is an ethical problem rather than a semiotic one: that is, the image with concealed manipulation continues to be understood as it would without the manipulation. During the 1990s, the number of Photoshop and Photoshop-like filters burgeoned and little is fixed or as been written about how for example inverted and rotated palates change the sense of an image. We will take up these points in the chapter on Photomontage.

Documentary clearly aligns itself with "straight" photography—the Modernist school of hard-focus un-retouched, un-manipulated photography associated with Paul Strand, Edward Weston, Edward Steichen, Imogen Cunningham, and Ansel Adams in the f64 group. However, it should be noted that documentary photography, though anchored to the faithful and transparent recording of the unique object and moment, is usually understood to be offering the image as a typical instance of a more general class or condition. Clive Scott distinguishes documentary from photojournalism on just these grounds:"The documentary photographer is pushing the actual away towards the typical, the representative, the exemplary, and expanding event into environment" (62). This pushing, Scott argues, is done by caption, label, and presentation—words, mostly, which attempt to supply cues to the photographer's intentions beyond what the image provides, photographs being "intentionally weak" (i.e., susceptible to being framed and read in quite different ways).

2.7 art v. nonart: Along with the rise of images/the visual we have the instituting of new academic studies such as Image Studies, Visual Culture, and New Media. These all define their objects of study as media, rather than specific deployments of the media. These studies embrace all images, high and low, pure and commercial, maps, xrays, stone carvings et cetera. But some sorting out into major divisions is necessary lest one go mad wandering the maze of similar and different everything. And so art v. nonart (other terms for "nonart" are discussed by James Elkins, pp. 53-54) comes back up as a possible division within Image Studies.

Robert Burton: Frontispiece to The Anatomy of Melancholy

Figure 0.16
Robert Burton: Frontispiece to The Anatomy of Melancholy

The last chapter of Visual Explanations, Edward Tufte's classic work in information design, is devoted to what he calls "confections," which are assemblies of many visual events, selected and juxtaposed on paper, placed in scenes that no one could ever see (121). He offers as examples the highly detailed frontispiece illustrations from seventeenth-century books, science illustrations, several paintings and a soviet propaganda poster. He then begins to extend the reach of the term to include Joseph Cornell's assemblages, but pauses, distinguishing art from information design:

For art, collage (French, pasting) combines images so as to create pleasing or provoking visual experiences, hardly expressible in words and rarely based on words; on the other hand, confections brings images together to display visual information, often expressible in words and often derived from words. Confection-makers cut, paste, construct, and manage miniature theatres of information—a cognitive art that serves to illustrate an argument, make a point, explain a task, show how something works, list possibilities, narrate a story. (138)

To one trained in literary studies, this sounds like a reworked version of "the heresy of paraphrase." This was a New Critical doctrine that the meaning of a poem goes beyond anything that can be paraphrased ("discursive meaning") and if a work of art does seem mainly to illustrating an idea or taking a position, then it is an apologue or roman a clef or (shudder!) propaganda—all impure, inferior works of art. Similarly in visual art criticism, it evokes Clement Greenberg's Modernist view that each of the arts in its purest form is entirely distinct from the others and should be kept separate. (Tufte notes that confections frequently include words within the frame of the image.) The alignments invoked in Tufte's paragraph help to explain why we speak today of "information design" rather than "information art" and what MIT is trying to overcome in its Aesthetics + Computation Group and MIT/Leonardo in the book by Stephen Wilson, Information Arts: Intersections of Art, Science, and Technology. Art/nonart may not be the smartest way to divide the territory of Image Studies and in any case, note that Tufte fudges the lines as soon as he has drawn them with his "cognitive art."

Johanna Drucker points out that throughout the twentieth century, the rules governing the design of commercial, advertising texts have been broader and more varied than the texts of high culture. Convention breaking, eye catching, and advertising have all travelled together for a long time. Thus the Greenbergian strictures on purity and separation of the arts (and their associated semiotic modes) also functioned to separate Art from Nonart, especially commercial Nonart: "One effect of this division is a taboo against violating the literary page with any of the more obvious visual manipulations which typically characterize the commercial page" (Visible, p. 96).

Accounts such as Simon Morley's of the emergence of the new imagetext from the perspective of High, or gallery, art are caught in a paradox. In Writing on the Wall: Word and Image in Modern Art, Morley acknowledges that the Art/Nonart binary is heavily challenged by new media, but the avantgarde and new media (especially video) works he discusses were all exhibited in galleries, and he does not question that gallery art still epitomizes not High Culture, but Our Culture.

This study will focus on works that are semiotically adventurous or reflexive, and these are not likely to be found in science textbooks or TV advertisements (though the odds are small, they are better for the adverts than the textbooks). Upscale magazine advertisements are another matter, as they are ceaselessly in pursuit of the new look and new ways to ensnare readers. The adventurous are those who explore the uses of the medium with something less than complete assurance that they will be understood, and the reflexive users are those who make us aware of the rules and conventions of meaning-building that we normally apply without thinking. Such explorations and moves do not necessarily make art, or successful art, but they are one of the things artists are free to do and which some viewers and critics look for and value. Sometimes we will look at the work of illustrators, who constantly negotiate between new visual ideas and Artistic Directors. With respect to semiotic innovations in general, evidence does not support a pure trickle-down model with artists originating and illustrators parasitically copying the artists' work. All artists and all illustrators copy each other's work. Sometimes they are the same person (e.g. Man Ray).

3. Viewing Compared to Reading: Codes and Conventions

For our purposes, it will be useful to narrow and specify the meanings of viewing and reading to the processes by which we derive meaning from images and texts. These processes can fail: we can look at an image or book and not even grasp what we are looking at or attempting to read. We can sort the failures (and successes) that occur into two broad groups: those that have to do with "perceiving" the image as a scene (restricting the discussion for the time being to representational images) or the text as sentences and paragraphs that make sense, and those that have to do with "interpreting" the scenes and paragraphs as meaningful and significant. Reading can fail at the level of perceiving the text if one doesn't know the language in which the text is written. The language, then, is the code (or at least a major part of it) that enables readers to map the marks on the page onto sentences and paragraphs. The marks on the page are not iconic (they do not resemble the things they express) and hence are said to be arbitrarily related to the meanings the code assigns to them. Roland Barthes famously said that photographs can be perceived as scenes without using a code because they were made without using a code—directly, as it were, by light reflected from the surfaces of things darkening the chemicals deposited on the film. As soon as you recognize and categorize objects in the scene, however, you draw in much knowledge and many beliefs about the world, as you would in any instance of seeing something with or without photographic mediation.

There is an intermediate ground, however: a photograph (or any representational image) not only conveys a scene with objects in it; it conveys as well a particular framing and a view from a certain point on the scene, and in this area the debate about codedness and convention has been intense. Everyone would agree that interpretation relies heavily on learned conventions such as those of genre, and most would agree that at least some of the perception of images and of sentences is hardwired into the human brain, not conventional and not learned. The discussion is muddled by the term convention, which often but not always refers to an arbitrary mapping, where arbitrary means that an outsider would have difficulty figuring it out just on the face of it.

Figure 0.17

For example, Figure 0.17 does not depict the result of bombardment, but is a cut-away view allowing us to see the inner workings behind the surface of a device. In The Heritage of Giotto's Geometry, Stephen Edgerton shows how early seventeenth-century diagrams employing this convention as well as that of the transparent view were misunderstood by Chinese scholars redrawing the diagrams in translations of the European works. These "views," especially the transparent view, are ones that could be had only in the mind's eye. That is, there is no experience in our usual perceptual life that could provide a model for such a view, and we should have a little sympathy for the Chinese artists who interpreted the irregular edges of cutaway views as the smoke of spirit forces or billows of suddenly appearing waves. Once one knows the conventions, it seems plausible to understand the views "as if" the surfaces (not all of them, just the occluding ones) were transparent or partially cut away. The situation here is rather like that with compound nouns, where one can roughly guess a range of pertinence relations between the two nouns (pillow talk, bedroom eyes, water park) while still missing the specific meanings of the compounds. These still need to be learned as dictionary items. And indeed learning is a crucial issue: if one takes the view that viewing is highly conventional, and conventions require learning and instruction, then one has a prima facie case for teaching "visual literacy" alongside print literacy in the schools. If on the other hand, we hold that viewing (representational) images is very like viewing the world, then visual literacy as a special body of conventions and abilities shrinks considerably. That is the reason to examine this question more closely and in relation to the representation on a flat surface of three-dimensional space and the implicit scene of viewing.

Figure 0.18
Normal occlusion

occlusion: One of ways we can determine the nearness or distance of objects relative to us in space arises from the occlusion of more distant bodies by nearer ones. Flattened onto a single plane, occlusion appears as one object (the gray square or cube) interrupting another (the charcoal circle), which we could interpret as overlaying the other, but take as located between the viewer and the other object, as in Figure 0.18. This projection into three dimensions seems anything but arbitrary, since occluding objects do seem to break the visual outlines of those occluded.

Figure 0.19

Figure 0.19 presents a slightly different situation which the circle now appearing in front of the square, though not by blocking our view of part of the square but by dimming it. So we read this as resulting from the circle being only partially opaque—perhaps a 4 f-stop neutral density filter. Viewing through filtering objects is not particularly rare or exotic and provides a plausible rationale for what we see as a three-dimensional arrangement. We can, however, view the square as the filter instead of the circle—in fact, Figure 0.19 has the ambiguity of a Neckar cube, even without connecting edges.

Figure 0.20

Interestingly, Figure 0.20 resists a projection into three-dimensions, since there are no naturally occurring situations where an intervening object, translucent or not, preserves the shape of the object in the background but lightens it. 8 Instead, it looks like a slightly unusual Venn diagram where the lighter gray indicates the intersection of the sets. If the maker wanted us to see this with one object occluding the other, she would have to stipulate it, since there is neither naturalistic explanation or special convention for seeing things like Figure 0.20 in three dimensions.

These three diagrams also give a small illustration, by the way, of how computer graphics is offering us many combinations of shapes and images for which we have as yet no standard interpretations (conventions).

There are certain projections of three dimensions onto two, however, where the "rule of occlusion" does not apply. These we perceive as relatively more "naive" projections, since they tend to draw things as they are, not as they appear, or, put another way, to show all the things we might want to see, even if we have to change viewing angles to do it. Such are the examples of what Edgerton calls "squashed perspective" in pre-Renaissance European and Muslim culture. In showing more than you can see from a single viewing point, these older drawings foreshadow Cubism and David Hockney's "multiples."

Figure 0.21
View from SimCity 4

perspective: What we most often mean by perspective is linear or vanishing point perspective—a method of projecting three-dimensional space onto a plane that was developed in the Renaissance by a number of Italian painters and quickly became the most widely used projection throughout Europe. It's key features are that all parallel lines not parallel to the plane of viewing are drawn so as to converge on a single point (the vanishing point), and that all objects are drawn to scale within these converging parallels. When it was discovered and perfected in the Renaissance, it was proclaimed a miracle.

Nonetheless, another projection system, that of axonometric (aka parallel) perspective, is sometimes preferred by engineers because in it, all dimensions that measured the same appear the same (i.e., there is no foreshortening or perspectival shrinking along the receding parallels since they don't converge.) It is also preferred for various types of engineering drawings and by some computer game designers; it is the view we find in SimCity, for example. 9

Vanishing point perspective was taken in the heyday of conventionalism as a prime example of a convention of western European culture that purported to be natural—that purported, that is, to represent faithfully the way things in space look to anyone. A great deal has been written to this point; for this study, the question is whether perspective projections are arbitrary and need to be learned as such. Vanishing point perspective does not seem to be solely a cultural contrivance because it also appears in photographs (as does occlusion). This should be no surprise and no mystery, for the camera obscura was often used from the Renaissance on to do sketches for paintings. In fact, some scholars believe they can identify flaws in the particular lenses used by certain artists reflected in slight anomalies in the paintings that were ultimately produced. So if you can view photographs, you can view anything with vanishing point perspective. Conventionalists, therefore, have sought data on the comprehensibility of photographs to people with minimal contact with technology, but the results are equivocal. Early reports of noticeable differences were questioned, and it turned out that the pre-technological subjects of the study appeared to have more trouble with the rendering of the world in black-and-white outline (the Hudson Depth Perception Test) than with perspective, since they did better with color photographs. 10 Black and white outline drawings do seem to be a convention of representation that takes some learning, but not perspective.

Thus far, the conventionalist case has not picked up much support in the representation of depth by perspective and occlusion. One can, however, imagine conventionalists regrouping around the (mostly photographic) representation of subjectivity and social relations. Again we will consider two points of representation—angle of view and eye contact—which conventionalists understand to support their position.

angle of view: Kress and van Leeuwen, among others, discuss the standard point of film technique that low camera angle on a person indicates that the person is powerful relative to the viewer; high camera angle the reverse (Grammar, 146). But surely, Messaris says, looking up means looking toward greater power not because of any special convention of visual language but because we were all children and spent the earlier years of our lives looking up to more powerful individuals. So the meaning of low or high camera angle does not have to be learned as a convention of film. Proceeding in this way, he argues for the experiential grounding of a number of aspects of pictures that have been said to work by special visual conventions. He also discusses conventions for marking change of time and place (pages falling/being torn from a calendar, trains presumably conveying the characters from one setting to another) that have subsequently dropped away as no longer necessary.

Insofar as there is a convention involved in angle of view, it is that of imagining oneself inside the represented space so that one takes on a relation, spatial and what it suggests, to what is being viewed—even if, as in the case of SimCity, it is "God's view." This imagining oneself into the scene is crucial in the matter of the gaze as well.

Figure 0.22
Will Eisner on Framing and Point of View in the Comics

This imagining is not limited to viewing photographs or film; the famous writer of comics, Will Eisner, instructs writers in the use of framing and point of view first for clarity of scene and second for manipulating emotional states in the reader—detachment, fear, confinement, freedom—all of which he illustrates with the two views of Figure 0.22. He says, "The shape of the panel and perspective promote these reactions because we are responsive to environment," adding that smallness and confinement are "deep seated primitive feelings" (89). Indeed, this imagining might even more simply be forgetting that one is not part of the scene—forgetting, in other words, that one is viewing a representation, lapsing, as it were, into a sense of complete immediacy, to use Bolter and Grusin's term. There are a variety of effects, more than those already described, that could arise from this sense of being positioned in the same space as the object depicted, and it would seem that producing them all by means of conventions would miss a major generalization.

meeting the gaze: Consider our tendency to respond to photographs of people positioned fairly near the camera and looking directly at it as if we were encountering those people directly and were looking at each other in intersubjective recognition. Kress and van Leeuwen say that such images have a semiotic feature of appeal, which might be thought of as a vector drawn from the figure's eyes toward ours, a feature of "demand" (they call it) for recognition or response from the viewer. This situation contrasts with one where there is no "eye contact;" the figures "offer" themselves for view by the unobserved observer. Although they speak of demand/offer as a kind of binary, 11 and link the experience to the linguistic ones of address and grammatical person (as if the image is saying "you" to us), they do not refer to this phenomenon as a convention. What they are doing, however, is setting up demand/offer as the fundamental contrast of social interaction, not just speech acts.

It is discussed as a convention (of photography), however, by Martin Lister and Liz Wells in Van Leeuwen and Jewitt's Handbook of Visual Analysis. Analyzing a portrait of a young Black man by Robert Mapplethorpe, they note that he is gazing directly at the camera:

It is in looking at the camera that he appears to be looking at us. This is a convention known to portrait painters (who used linear perspective and ways of highlighting the eye, rather than a camera to achieve this) and film-makers who (except in special circumstances) strictly avoid the convention in order not to break the illusion that we are looking in on another world without ourselves being seen (75).

Lister and Wells cite camera position and framing as other photographic conventions. It is questionable, however, whether some or any of these points are specific photographic conventions. Indeed, it could be argued that only one "convention" is involved, namely that of viewers imagining themselves sharing a single space with the subject photographed, and in fact standing at the viewing point (i.e., where the camera is). And that space can function as space normally does socially in terms of proximity, gaze, and angle of view. This imagining, though a kind of immediacy, is not medium specific and is likely to take place with any fairly realistic scene of looking into a three-dimensional space. It does not seem to be activated by surveillance cameras and monitors, however, where the fixed position of the camera and low resolution and color prevent us from forgetting the mediation.

Although this line of thinking reduces the number of "visual conventions" quite significantly and not without a certain elegance, one might impatiently object, what's the difference? If we want to discuss the social and cultural forms and relations depicted in images—and there are many for whom analysis of images is such a springboard into cultural analysis or social semiotics—then who cares whether you call them conventions, especially when you make it clear that "in looking at a photograph, and finding meaning in it, we do not need to refer to a dictionary of conventions—we don't look them up" (75) because they are so well-known as to operate below the threshold of consciousness. This too makes them sound like linguistic rules that have been naturally acquired without instruction. This I suppose should relieve Messaris' concern that people will start writing such dictionaries and insert them into the curricula of the schools.

There are nonetheless two reasons to question the proliferation of "conventions" in the analysis of photographic images as well as realistic images done in other media: first, it amounts to a "linguististification" of visual meaning, as when Lister and Wells speak of photographs as the outcome of skill in handling 'language'(75)— the shudder quotes are theirs, indicating a sense of some looseness or impropriety. But what or how much? The implication is, not much. But as Messaris says, "As I see it, what makes images unique as a mode of communication is precisely the fact that they are not merely another form of arbitrary signfication" (39). Since one of the main themes of this study is how image and text can combine or rebound off each other in imagetext, it is crucial to maintain and articulate a sense of their differences. Second, it obscures the processes whereby the new, much less naturalistic images of collage, montage, and digital manipulation come to be understandable. These processes are not best thought of as the formation of new conventions linking visual, symbol-like things to meanings, but as the workings of analogical reasoning from common perceptual experience, just as one works out a meaning for viewer positioning in a depicted scene.

APEL graphic

Figure 0.23
Applied Process Engineering Laboratory splash page

Figure 0.23 (a half-size, grayscaled version of the original) is another hybrid or mixed mode image that mixes the conventional, diagrammatic mode with the photographic. I found it a little confusing because the conventions are a little off center. The central diagram of the laboratory is viewed in isometric projection with cutaways of some of the roof and walls. The cutaways do not function in the standard way, which is to allow us to see what is occluded by the roof and wall. Those inside views are instead given in photographic form and linked in by the very dynamic, curved arrows (red in the colored version), almost as if the room-scenes were escaping through the holes like a gas into the environment.

What suppresses these and other possible readings of the image is a sense that it is a standard instance of the "overview" genre, here built around a literal over view. There is a semiotic rule for overview diagrams that the relations they convey are static. That is, these are not "flow" arrows, but that still leaves their excess unaccounted for. What they suggest to me is the action of pop-up windows, as if the drawing were an imagemap and the cutaway areas were anchors that, when clicked, openned windows displaying the room-scenes. It is of some interest that we can interpret this image at all, since the conventions are not strictly observed. This suggests that with some diagrams, though not blueprints, we view to grasp the gist. Such diagrams are not maps (what maps are is another matter, one to be addressed in Chapter Six).

4. Mixed Modes

clipgallerylive: teach

Figure 0.24
MS clipgallerylive: teach

How then will text and image share the page/screen in the new imagetext? Will Eisner claims that since the 1940s we have already evolved one new format for the page, namely the comics. The comics do not pursue the full range of possibilities even within their limited resources because they are always (and by definition, for McCloud and Eisner) serial and overwhelmingly narrative. Their counterparts are developing rapidly on line: a number have cult followings (Home Star Runner, Neurotically Yours (and other adventures of Foamy the Squirrel), The Everyday Happenings of Weebl). These sites are done in simple Macromedia Flash animation with voices instead of text, but some have also reached print. Many of them can be downloaded for playing on a Palm or other PDA. Somewhat higher on the High/Low scale are animations entered into the Sundance Digital Festival such as Eun-Ha Paek's Strindberg and Helium.

Their own books, however, are not narratives but explorations of comic book semiotics (though they don't use the term) and in fact they are examples of another emerging genre which as yet has no standard name. Perhaps the comics of ideas? These "very short introductions" and "Dummies' Guides" make no pretenses to scholarship, and do make us aware of some of the pretenses of scholarship, though I think they are effective at conveying the gist of some difficult academic discourses.

Another print-medium argument/demonstration for a new, image rich mode of writing is Robert Horn's Visual Language. Horn is one of the most enthusiastic proponents of a new kind of literacy, but he chooses to avoid the either/or view and to depict the emergence of a new mode in which text and image are "tightly integrated." Neither medium alone is able to convey all the meanings one might want, and images in particular prove to be quite ambiguous when used without verbal and/or institutional framing and support. Horn gives examples of ambiguous images (57-8) in the course of arguing for VLicons (word + image). He reports for example that in one study of 108 ISOTYPE-style international symbols, 86 were clearly understood by less than half of the respondents and only three were understood by more than two-thirds of the respondents. He advocates not icons but "VLicons" (an icon with a word or two attached) and page layout that looks wildly over decorated for a piece of academic prose—though it might not be bad as a web page, except that Horn uses very basic, cheaply printed old black and white clip art for his images. Horn works mainly with business and information graphics, but synthesizes a wide range of academic research and scholarship and knows enough linguistics to make a serious and extended argument that Visual Language (hereafter VL) is a true language. His emphasis on integration means that image and text for him are locked in a relatively simple, single relation of support.

Figure 0.24 is not one of Horn's, but is drawn from the clipgallerylive collection at Microsoft (forerunner of digitalgallerylive) where it occurs both with and without the word teaching. Encountered without the word, it does occasion some hesitation as to what is being displayed. Thus the word assists an act of abstraction and suppression of irrelevant properties of the metaphor (e.g., that writing with chalk is impermanent; that ABCs are only the simplest rudiments). This image is unusual for the clip art archive in attempting to model metaphorically the act and result of teaching, rather than presenting a typical scene-of-teaching.

In an article delivered at the 23rd Annual Wittgenstein Symposium, placed on the Web, and announced as forthcoming in collection of articles, the Hungarian philosopher Kristóf Nyíri reviewed the debate over whether images are as capable of sustaining thought as words are and concludes that they are, and are perhaps even the primary medium of thinking. He points to the rise of images especially as advanced by computer technology: "The iconic revolution, made possible by the graphical capabilities of computer software which barely existed ten or fifteen years ago, now provides us with the instruments of a language in which verbal and visual elements coalesce. " A little further down he amplifies this point, emphasizing more the rise of images than the emergence of Visual Language:

We might hypothesize that the ability to have images is, today, again on the ascent - this is what I regard as a first layer of change. Secondly, people are becoming familiar with pictures, are acquiring a rich experience of dealing with pictures, to an extent unprecedented throughout written history. And thirdly, to repeat, there is the change connected to the use of today's computers: the easiness of producing pictures, the increasing everyday possibility to communicate via pictures.

Because he is a Wittgenstein scholar addressing same, he cannot resist being drawn into the words v. images debate even as he, more wisely I think, describes the new mode as one of coalescence, which is very much in keeping with Tufte's multiple and mixed confections and with James Elkin's observation that today mixed modes are the norm for discourses that have not been sorted out by conventions of art. And it is even true that those conventions have been challenged by the avant-garde repeatedly in the twentieth century, from Dada through Conceptualism to contemporary net.art. Coalescence leaves open a wider field of possibilities than Horn's "tight integration" of word and icon in Visual Language; in fact, it is compatible with W. J. T. Mitchell's "imagetext" and his claim that text and image need not support each other to make the same point (which I take it is Horn's goal for information design) but may work complexly on and even against each other.

Robert Cumming: Leaning Structures (1975)

Figure 0.25
Robert Cumming: Leaning Structures (1975)

By mixed modes I am referring to semiotic modes—ways that signs mean, or ways that we read signs. Thus far our attention has been directed mainly to mixtures of words and images, which are certainly mixed modes, but we can also distinguish different kinds of semiotic modes within the general domain of "images." Figure 0.25, which is one corner of a larger image, is the third "geometry and the body" example cited in this introduction where some highly abstract, mathematical looking lines are laid over bodies or link them in an abstract space (as in the Moholy-Nagy example and many others by him). This one, by Robert Cumming, clearly dates itself by its 1970s styles, and perhaps most directly and knowingly insists that it is a mixed mode in which we alternate among several modes and frames: the look of the leaning naked bodies, the symmetries and asymmetries and inferred play of forces, the "picture of couple at home" complete with pleasant, stoned smiles looking at the camera. This image is paired with one of another couple very similarly circumstanced but without a wedge to support their leaning pose. Taken together, they bear the title "Leaning Structures" —isn't anybody going to SAY anything? Semiotic modes differ in their purposes and characteristic applications, so that the mixing of disparate ones makes us more aware of mode than of content as we struggle to bring them into some sort of composed alignment. Some practices are characteristically mixed in mode, such as collage, and some are inherent mixed, as with photomontage, if only in the representation of different things in the same place. One might reasonably include the ISOTYPE Venus as a mixed mode image, though the other, naturalistic rendering in limestone, is not visible, only recalled. That is, we would be utterly unable to interpret the missing arm and broken-off one according to ISOTYPE semiotics or most other rules that might come to mind.

If we think of the recalled image of the Venus de Milo as triggered by the title "Venus," then we are on the verge of describing the action of a hypertext link, whether attached to text or image or part of image, and we can go on to say that texts with hypertext anchors are mixed in mode, since the anchors themselves are mixed (or dual), signifying in usual ways in the sentences or pages which we see and pointing at the same time to another page or image that we do not see (unless we click). Words and images which are hypertext anchors count for what they would on a printed page; in addition, they count for something else that they point to just in this instance. The closest print convention is that of the superscripted number, star or dagger in text or number in an image, that lead to a footnote, which is to say another text that intersects the one at hand at just this point. Footnote markers can trigger a mixed-mode double-mindedness and double voicing as we switch from word in context to word in gloss, or word in use to word in commentary. Images are not annotated very often, of course, and there are stabilizing conventions limiting the kinds of mixing of text and image that can go on in most print media. But it is just these conventions that are breaking down as we experiment almost ceaselessly with different combinations of texts and various types of images. One of the major new forms on the Web is the image-text chain, where images and pieces of text link to one to another via hypertext links.

partial screen captured from Mark Bernstein's More Than Legible

Figure 0.26
Screen from More Than Legible: on links that readers don't want to follow

Mark Bernstein has drawn several parallels between hypertext and visual modes in a talk given at Hypertext 2000 (and available as a Flash application online). What Bernstein appears to mean by juxtaposition is the effect of relatively inexplicit links—links, that is, that do not identify the connection of the second page to the one where the link appears in the underlined A anchor or in a tool-tip popup box or in the status bar. The resulting "you figure it out" effect is likened to the juxtaposition of collage ("juxtaposition of hypertext nodes in space") and of cinematic montage ("juxtaposition of hypertext nodes in time"). Pieces like Carol Flax and Trebor Scholz's Crossing the Divide have both, I think. We will not use the term montage this way in this document, but Bernstein shows how readily visual and hypertextual modes slide together.

So we have begun with Mitchell's claim that imagetext is becoming a (the?) central mode of expression. We have extended "imagetext" to include hypertext-linked imagetext. We can further gather these under the head of mixed modes. The current taste is for imagetexts and imagetext chains that assume semiotic dexterity on the part of viewers. We can understand this taste as an outcome of twentieth-century struggles between Modernism and Conceptualism (and other Post- isms), and similarly we can understand it as one outcome of semiotic practices nurtured in advertising, film, the comics, computer games, and TV. The pages that follow will have more to say from the former perspective than the latter—not because works of High Art and Art History are more original and richer in meaning, but because people have paid attention to how and what they mean. We treat them as objects for reflection, not for information, advocacy, entertainment, or identification with commodities. We will look at many intricate and engaging things, but our purpose is not appreciation but understanding of how they signify, though, to be sure, analyzing how something signifies can lead directly to appreciation.

The first chapter will deal with relations between words and images in imagetexts, beginning with Modernism's ban on any words in images at all (though this is also a convention of academic painting at the beginning of the twentieth century), thus confining imagetexts to advertisements, newspapers, and the comic pages, and with Magritte's explorations of representation by words and by images. We move then to relatively stable relations of framing and of subversion of image by text or text by image, and finally to the complete fusion of texts within images (textmontage) and chains of equivalent image/text links.

The next two chapters take up the two great twentieth-century modes, photomontage and collage. Chapter Two traces a history of photographs of double visual worlds as precursors of Photoshop's sculpted opacities, especially window reflections from Atget on. We then take up the characteristics of photomontage as visual form (soft, blending edges creating impossible worlds) and the ways we have learned to interpret these impossible scenes.

Chapter Three turns to collage as an anti-type of photomontage, where edges and fragments of images and texts are juxtaposed in ways very like hypertext.We will trace the development of digital collage, including animated and interactive collage (where you can arrange the fragments).

Chapters Four and Five discuss the bringing of contexts to bear on reading images and imagetexts. Chapter Four looks at the scene of viewing, specifically viewing images of people, some wearing no clothes. Here we critique and update prevailing accounts of voyeurism and the gaze with web sites that pursue the theme of reflexivity.

Chapter Five (Contexts) explores the inadequacies of a formalist view of the image as meaning by itself alone by reviewing the roles of museum and gallery in framing and reframing images and in providing a core "canon" of works to be alluded to and remade. It then examines the relation of the emerging body of net.art to museum and museum-like institutions and the interpretation of Web satire as culture-saturated.

Chapter Six addresses the visualization of abstract structures such as semantic relations of words, file directory structures, hypertext webs, and finally the Internet itself. Crucial here is the notion of visual metaphors and the possible collapse in some cases of map and territory. Because innovation in metaphors underlying particular structure maps is so rapid in this area, it is easy to see new semiotic rules forming.

This work exists both as a book and as a hypertext document available on line. The traditional division into sequentially numbered chapters would suggest that it is meant to be read in order, and the division of chapters into sections and frequently subsections would further suggest the hierarchical organization of a technical monograph. The numbering is really is a guide to how the parts hold together and a navigational aid to the reader. There is also a good bit of crosslinking, side pages, and demonstrating the point discussed that are absent from the printed document. I would hope that the Web form would be used at least as a supplement to the print form, as it illustrates many imagetext effects that can only be described in words and indicated in snapshots in the book. For example, this book's cover for the on-line version is an example of dynamic collage (at least in Internet Explorer) slightly modified (by permission) from a page by Dirk Hine called Visual Semiotics.

Despite a sense of living in a time of extremely rapid transformation in the means and conditions of image and imagetext making, the focus of this study is not on where "We" are as a culture. With over 3 billion pages, the Web alone can furnish ample support for any number of claims, and when we adds in all the other media, we approach a statistics of infinity. But forms do not signify by preponderance or familiarity only, any more than they do by slapping some things together and seeing what looks and sounds good. There are regularities and principles that emerge from looking at many works by many hands, things that are learned from viewing and sharpened by analysis. We are not living on the other side of a cataclysmic rupture with the past— that is the one grand generalization that the following pages will support.

