Duck Hunt

Google Artists + Machine Intelligence Grant Proposal

Kyle Steinfeld

Architecture was once a narrative art form. Before the printing press, people communicated through building; as historical events, cultural symbols, and even religious laws were inscribed in stone and read through bas-relief and stained glass. As the burden of recording these narratives shifted from the physical permanence offered by the material longevity of stone, to the informational permanence offered by the easy replication of the printed word, the function of architectural ornament was lost. This cultural and technological shift was captured by Victor Hugo's adage "This will kill that. The book will kill the edifice", a 19th century insight that anticipated the struggle of 20th century architects to come to grips with new means of production, and new aesthetic approaches in which narrative concerns, such as ornament and figuration, largely took a back seat to practical concerns, such as structure and function. The foregrounding of practical concerns may be clearly seen in the history of Modernist architectural design, and may be read through the nature of the tools that were developed to support it. While a host of computational design tools are available to the contemporary architect, most are extensions of the early 20th century Modernist project, and are in direct support of functional or structural justifications for design decisions. Very few design tools may be seen to support the Post-Modern project, as captured by Venturi and Scott Brown's notion of the "duck" and the "decorated shed", which emphasize the sort of allegoric and symbolic structures of storytelling that is the pre-Modern inheritance of architecture.

It is in the context that this project seeks to rekindle the lost art of narrative architecture. Through the application of generative design tools that incorporate machine learning techniques of object recognition and classification, we aim to help foster the development of design tools that assist in the discovery of imagistic and symbolic significance in urban places and in three-dimensional forms.

The scope of work proposed here for the Artists + Machine Intelligence Grant builds on previous work, in which methods were developed for incorporating image classification models into an architectural design workflow known as "generative design". Summarized below, this revised generative design process aims to optimize digital 3d models of architectural forms for heretofore un-optimizable qualities; such as styles of single-family house forms, or the experiential qualities of interior space. The result is a tool which is able to account for imagistic and symbolic qualities of form alongside functional and structural ones. For the first time, designers are able to select a design that is both the most energy-efficient and the "most instagrammable"; to understand the trade-offs between structural stability and "heroic-looking" cantilevers; to explore new uncanny house forms that are neither Queen-Anne style nor Eichler style, but some strangely-recognizable hybrid of both.

Hunting a banana and a zucchini
Kyle Steinfeld, 2019

Extending this previous work, this project proposes two aims. The first seeks an extension of the tools and methods already established in a manner that contributes to the creative commons, while the second seeks a broader impact of the work for a lay-audience:

Where many of the above experiments relied on pre-existing classification models, such as the pre-trained ResNet50 model provided by Keras, less emphasis has been placed on training custom models for this specific purpose. For this reason, we propose here a collaboration with AMI to train new models that serve as effective "critics" in the process described above. More on this in a section below.
In architecture, exhibitions of physical design media play an important role in shaping the broader culture of design. Since the product of the work described above is inherently digital (the optimization of a CAD model for some tacit quality), there is value in bringing the digital 3d models that result into the physical world via digital fabrication, and to display these as a part of a larger exhibit of applications of AI in design. Financial support from the AMI grant will be critical to funding the production of these physical artifacts, as described in a section below.

The extension of these tools and the production of these artifacts supports the larger conjecture of the project: If contemporary design tools were able to incorporate allegoric and symbolic formal features, in addition to functional and structural ones, then architects will be empowered to tell better stories with their designs.

Background

In 1972, a moment in which the willfully austere and functionally-driven architectural style known as "Brutalism" had recently been declared a failure, Robert Venturi and Denise Scott Brown established an influential and enduring architectural movement by writing about ducks.

Seeking to reclaim the rich imagistic and narrativistic complexity of pre-Modern architecture, in their seminal work Learning from Las Vegas, Scott Brown and Venturi describe two ways in which architecture may symbolically express its function to the exterior. They asserted that a building can either be a "decorated shed", in which "systems of space and structure are directly at the service of program, and ornament is applied independently"; or a building can be a "duck", in which "systems of space, structure, and program are submerged and distorted by an overall form".

A Duck and a Decorated Shed
Robert Venturi and Denise Scott Brown, 1972

Through this work, Scott Brown and Venturi not only established a Postmodern alternative to Modernist dogma, which demanded unadorned forms optimized for functional performance, they also revived long-dormant formal design approaches related to symbolism and allegory. That is to say, they re-affirmed the validity of caring what a building "looks" like. Their work documented and legitimized these imagistic approaches, drawn from the prosaic architecture of Las Vegas, and demonstrated the appeal of such approaches for a more genteel audience.

Recommendation for a Monument
Robert Venturi and Denise Scott Brown, 1972

"For today's buildings, the decorated shed is more appropriate. Most of the major monuments of modern architecture today are really ducks - they try much too hard to fit their functions into an abstract conception of form, and end up being just big symbols for heroic modern architecture, like the new Boston City Hall. It's all a big symbol, though it won't admit it. How ridiculous - trying to make a piazza publico, like an Italian city-state!"
The New York Times. October 17, 1971

Writing nearly fifty years later, while much has changed technologically, an analogous aesthetic schism remains, as architects alternatively justify formal decisions by appealing to functional or structural necessity, or by directly constructing symbolic associations. While no work of architecture is a pure exemplar of either, the duck-to-shed continuum remains a useful way of understanding how buildings may be "read" from their exterior.

However, while narrativistic justifications, such as the use of symbols and allegories, are employed as often in contemporary design as are functional or structural justifications, the tools architects use in support of the former are far less developed than those for the latter.

For example, architects have at their disposal well-developed structural analysis software tools, energy simulation tools, and access to consultants with the expertise to help designers understand and apply them. Given these analytical tools, combined with a parametric description of a set of possible building designs, it is possible to optimize for structural performance, for energy efficiency, or a host of other quantifiable metrics of building performance.

This approach to optimizing building designs is among the most powerful tools in the contemporary designer's toolkit, and has come to be known in architectural circles as "generative design".

Until recently,

generative design means

have not be applicable

to imagistic,

symbolic,

or allegoric ends.

In architecture, "generative design" is understood as an iterative three-stage optimization process that consists of: generate, evaluate, iterate. While this method is well-known and well-established for applications in supporting functional or structural design intent, this powerful means has never been applied to narrativistic ends.

Until recently.

The Fresh Eyes Cluster was supported by Lobe.ai, Skidmore Owings & Merrill, and the SmartGeometry Organization

In 2018, a group of architects, academics, and machine learning engineers, sought just this. In a previous scope of work, completed in collaboration with Kat Park, Adam Menges, and Samantha Walker, tools were developed and methods established for modifying the three-step generative design process. This work combined a user-defined parametric CAD model (Grasshopper for Rhino) that serves as an "actor" in the generative step, with an image classification model (any of a variety of CNN models produced using Lobe.ai ) that serves as a "critic" in the evaluation stage, and an out-of-the box optimization solver (Opossum ) that serves as a "stage" in the iteration step. The result is a process to optimize digital 3d models of architectural forms for heretofore un-optimizable qualities.

So, by modestly adjusting the nature of the evaluation step of the generative design process, we find a way forward from optimizing for quantifiable objectives, as is typical in generative design to optimizing for more qualitative objectives, such as architectural typology or spatial experience.

It should be noted that, while the process described above is novel in the context of three-dimensional forms for architectural generative design, precedents exist for two-dimensional graphics. Tom White's "Perception Engines" in 2018 adopts a similar approach in the generation of Risographic prints.

Proposal

In the five-month scope related to the Artists + Machine Intelligence Grant, this project proposes two primary deliverables:

The physical production of a series of three-dimensional proposal for architectural massings of single-family detached homes. Each series of house forms will represent the application of some allegoric (duck-like) quality, as described below. The resulting models will cast in plaster from 3d-printed molds, and set into a CNC milled plywood base.
The training of a number of image classification models appropriate to the task of recognizing architectural form, and amenable to the proposed workflow.

In service of the first deliverable, a number of parametric models will be developed in Grasshopper that function as an "actor" in the processes described above. The role of each "actor" is to describe a set of possible quasi-architectural forms. The configuration of this parametric model is critical. The described forms should be constrained enough as to describe a coherent set of feasible house forms. Here, coherence refers to both feasibility, in that an actor should produce forms that represent the rough proportions, size, and grounding of a house, and to formal similarity, in that one actor might produce "blobby" forms, another "spikey" forms, and a third angular ones. While constrained, each actor should also be capable of producing a sufficient variety of forms as to make the action of the "critic" (described below) meaningful. That is to say that the selection of individuals from the design space offered by each "actor" must produce meaningful difference. For this reason, the parametric models produced in this scope of work demand a great deal of care and iterative development.

To perform as intended, the "actors" described above require "critics": image classification models able to make meaningful distinctions between useful classes of architectural form. Where many of the experiments in previous scopes of work relied on pre-existing classification models (such as ResNet50) and were constrained by the categories thereof (the 1000 selected ImageNet classifications, for example), a host of other possibilities may be found by training classification models specifically for the purposes of this project. For example, if it were possible to train a model to discern between massings of architectural types (church vs mosque, courtyard house vs Queen Anne) or of architectural styles (Hadid vs Kahn, Foster vs Pei), then what new hybrids between these categories might be possible? As there remains a number of open research questions here, as discussed in a section below, the development of these classification models some of the most fertile ground for collaboration.

In parallel with the development of the "actors" and "critics" described above, across the five-month scope of work, a number of performances will bring together selected "actors" and "critics" on a "stage". As seen in the animations included in this proposal, each performance is a structured search through the design space of forms offered by the "actor" in order to better satisfy the demands of some number of "critics". It is through these performances that each series of architectural massings will be derived as "snapshots" of the process. The formal results will both stand on their own as individual proposals for a sited single-family home, as well as in series as an illustration of the novel design process proposed here.

In addition to the concrete deliverables of the three-dimensional house forms, this project produces a number of other outcomes that hold value to a broader creative community, including both the image-classification models that will be developed in this scope of work, as well as the already-established "Fresh Eyes" toolkit for connecting Grasshopper to hosted image classification models. Material related to each of these will be shared for non-commercial use on the relevant creative community platforms (GitHub and the FoodForRhino app store for Grasshopper).

Ties to Latest Research in ML

Among the most compelling new capacities of machine learning is related to machine vision, in particular in the their application to image-classification tasks. In the context of our work, this capacity allows for a mode of design evaluation that stands in contrast with existing modes of understanding architectural performance, such as functional metrics or building simulation. For this reason, we are particularly interested in models able to recognize architecturally-relevant features and categorize architectural form.

It has been recently shown that most image classification models rely texture-related image features (such as color and pattern) more than shape-related image features (such as contour or figure-ground) to make their predictions. This reliance on texture presents certain challenges to our approach, and presents obstacles to the training of new classification models that are well-suited for our work. By the same token, it also represents an opportunity to test more robust approaches to image classification that rely on contour and shape.

Benefits of a Collaboration with AMI

This project is advanced by a collaboration with AMI both through material support, as the related stipend allows for the production costs related to the proposed exhibit-ready pieces, and more importantly through critical technical consolation. As described in sections above, certain obstacles in the development of appropriate image classification models have limited the progress of this project. I anticipate that a collaboration with AMI would open up otherwise closed opportunities in this regard. This would not only further the aims of this specific project, but may also represent an opportunity to improve the performance of image classification for non-photographic applications.

"When it cast out eclecticism, Modern architecture submerged symbolism. ... By limiting itself to strident articulations of the pure architectural elements of space, structure, and program, Modern architecture's expression has become a dry expressionism, empty and boring - and in the end, irresponsible."

Learning from Las Vegas (Cambridge, Mass., and London: The MIT Press, 1972, 1977), pp. 101-3.