How to Turn Your PDF Logo into Animated Web Content with LogoMotion
Table of Links
Abstract and 1 Introduction
2 Related Work
2.1 Program Synthesis
2.2 Creativity Support Tools for Animation
2.3 Generative Tools for Design
3 Formative Steps
4 Logomotion System and 4.1 Input
4.2 Preprocess Visual Information
4.3 Visually-Grounded Code Synthesis
5 Evaluations
5.1 Evaluation: Program Repair
5.2 Methodology
5.3 Findings
6 Evaluation with Novices
7 Discussion and 7.1 Breaking Away from Templates
7.2 Generating Code Around Visuals
7.3 Limitations
8 Conclusion and References
4 LOGOMOTION SYSTEM
We present LogoMotion, a LLM-based method that automatically animates logos based on their content. The input is a static PDF document which can consist of image and text layers. The output is an HTML page with JavaScript code that renders the animation. The pipeline has three steps: 1) preprocessing (for visual awareness), which represents the input in HTML and augments it with information about hierarchy, groupings, and descriptions of every element, 2) visually grounded code generation, which takes the preprocessed HTML representation and the static image of the logo and outputs JavaScript animation code, and 3) visually-grounded program repair, which compares the last frame of the animation to the target image and does LLM-based self-refinement if there are visual errors on any layer.
4.1 Input
A user begins by importing their PDF document into Illustrator. Within Illustrator, using ExtendScript, they can export their layered document into an HTML page. We use HTML as a fundamental representation to suit the strengths of an LLM and construct a text representation of the canvas. The HTML representation includes the height, width, z-index and top- and bottom- positions of every image element. Text elements are represented as image layers. Each word is captured as a separate image layer, and its text content is the alt text caption, except in the case of arced text (e.g. logo title in Figure 1), where each letter is a separate image layer. Every element is given a random unique ID. This representation allows the LLM to understand what layers make up the logo image.
The ExtendScript script automatically extracted the bounding boxes and exported each layer into two PNG images: 1) a crop around the bounding box of the design element and 2) a magnified 512×512 version of the design element, which was used for GPT-4-V for captioning.
4.2 Preprocess Visual Information
Given a basic HTML representation of the logo layout, the system does several pre-processing steps to add semantic information about the logo’s visual content.
4.2.1 Image descriptions. To provide information about what each layer depicts, we use GPT-4-V to isolate each layer against a plain background and produce descriptive text. We put this in the alt text HTML attribute. This is pictured in Step 1 of Figure 3.
4.2.2 Visual Hierarchy. To provide a visual hierarchy of elements, we give GPT-4-V the HTML representation of the canvas and the logo image and ask it to classify each element as one of four categories: primary, secondary, text, or background. This step outputs a new HTML file which includes the role classification in the class name of every element (class=“primary”, class=“secondary”, etc.) From our formative work on logo animation, we learned that generally logos have one primary element that deserves the most attention in animation. Thus, we restrict the LLM to select exactly one primary element.
Because primary elements will get characteristic motion applied to them, we need some extra information to describe their motion. This includes the orientation of their image to determine what direction they would come in. For example, a car facing left should drive in from the left, a car facing forward should start small and slowly enlarge as if it is driving towards you. We save this information in a variable () that is used later in the suggestion of a design concept for the animation.
4.2.3 Grouping Elements. In addition to providing a hierarchy, we needed to understand which elements visually and conceptually group together. There are usually many secondary elements that have symmetry, similar positions, or other visual similarities that make them necessary to animate in together. For example, many stars in the night sky should twinkle or two mountains should rise together. To create groups, we called GPT-4-V to make subgroups over the elements that were tagged as secondary. We reorganized the text representation of the canvas such that groups of secondary elements were placed together were made children of a parent
element. The output of this step is shown as the AUGMENTED HTML in Step 2 of Figure 3.
4.2.4 Design Concept. From early explorations with the system, we realized that to get the system to produce coherent animations that told a story, we needed to provide a design concept to relate all the elements together. Thus, before the code generation step, we requested the LLM return a natural language description of the animation. This stage encouraged the model to interpret the logo and connect image elements to relevant animation actions that they might take on in the real world. For example, a flower could bloom from the center of a screen by fading and scaling in, or a skier could ski in from the left side of the screen and rotate one turn
to suggest a flip. Secondary and text elements were also instructed to be given a narrative description of their animation, such that their animations would not clash with the primary elements. See the output of Step 3 from Figure 3 for an (excerpt) example of a design concept.
To automatically generate a design concept, we prompted GPT4-V with an HTML file (augmented with the visual information) and an image of the logo, and asked it to write a design concept with the prompt:
This image is of a logo that we would like to animate .
Here is the HTML representation of this logo : < HTML >
We want to implement a logo animation which has a hero moment on the primary element . The primary element in this caption should animate in a way that mimics its typical behavior or actions in the real world . We analyzed the image to decide if it in its entrance it should take a path onto the screen or not : < entrance description >. Considering this information , suggest a motion that characterizes how this element ({ primary element image caption }) could move while onscreen . Additionally , suggest how this element should be sequenced in the context of a logo reveal and the other elements . ( Note that the element is an image layer , so parts within it cannot be animated .)
The output design concept was saved as a variable to input in the code generation step. Although our pipeline creates a design concept, this can be edited, or iterated on by users if they want more control and interaction with the system.