MAGNET: Augmenting Generative Decoders with Representation Learning and Infilling Capabilities

Savya Khosla1,2, Aditi Tiwari2, Kushal Kafle1, Simon Jenni1, Handong Zhao1, John Collomosse1, Jing Shi1
1Adobe Research   2University of Illinois Urbana-Champaign
MAGNET teaser

Traditionally, LLMs are trained for text generation using unidirectional attention (depicted by black lines), whereas text encoders are trained for representation learning using bidirectional attention (depicted by gray lines).
MAGNET adapts the attention mechanism of LLMs to combine both unidirectional and bidirectional attention, enhancing them with representation learning and infilling capabilities, while retaining their core generative functions.

Abstract

While originally designed for unidirectional generative modeling, decoder-only large language models (LLMs) are increasingly being adapted for bidirectional modeling. However, unidirectional and bidirectional models are typically trained separately with distinct objectives (generation and representation learning). This separation overlooks the opportunity for developing a more versatile language model and for these objectives to complement each other. In this work, we propose MAGNET, a method for adapting decoder-only LLMs to generate robust representations and infill missing text spans. MAGNET employs three self-supervised training objectives and introduces an attention mechanism that combines bidirectional and causal attention, enabling unified training across all objectives. Our results demonstrate that LLMs adapted with MAGNET (1) surpass strong text encoders on token-level and sentence-level representation learning tasks, (2) generate contextually appropriate text infills by leveraging past and future contexts, (3) perform open-ended text generation without excessive repetition of words or phrases, and (4) preserve the knowledge and reasoning capability gained by the LLM during pretraining.

Method Overview

Modify the Attention

We modify the attention mechanism of the LLM by updating the attention mask to combine causal and bidirectional attention.

  • All context tokens (shown in blue) attend to all other context tokens in the sequnce.
  • All span tokens (shown in green) have causal attention among themselves and also attend to all context tokens.
  • MAGNET attention

    Fine-Tune the Model

    We fine-tune the model using three self-supervised learning objectives, enabling it to understand and leverage the modified attention mechanism. Fine-tuning is performed with LoRA for efficiency.

    MAGNET objectives

    Results


    Word-Level Representation Learning

    MAGNET outperforms LLM2Vec, demonstrating the benefits of unified training for representation learning. MAGNET-adapted LLaMa 2-7B also outperforms strong encoders.

    Model Chunking NER POS-Tags
    Encoder Models
    BERT-Large 71.77 90.09 75.12
    XLNet-Large 79.70 93.67 83.02
    DeBERTa-Large 85.74 94.97 86.49
    StructBERT-Large 89.99 97.31 90.86
    Llama 2 Models
    Llama-2-7B 88.23 96.59 91.53
    LLM2Vec 89.66 96.05 90.53
    LLM2Vec[MNTP] 91.61 97.16 92.61
    MAGNET 92.64 98.31 93.34
    Sentence-Level Representation Learning

    MAGNET also outperforms LLM2Vec and Echo Embeddings on sentence-level representation learning, despite these baselines being trained specifically for text encoding.

    Model STS12 STS13 STS14 STS15 STS16 STS-B SICK-R Avg
    Encoder models (finetuned using SimCSE)
    BERT-Base 68.40 82.41 74.38 80.91 78.56 76.85 72.23 76.25
    RoBERTa-Base 70.16 81.77 73.24 81.36 80.65 80.22 68.56 76.57
    RoBERTa-Large 72.86 83.99 75.62 84.77 81.80 81.98 71.26 78.90
    Llama 2 models
    Llama-2-7B 50.98 74.02 62.86 67.09 71.03 63.56 67.22 65.25
    Echo Embeddings 52.40 72.40 61.24 72.67 73.51 65.73 64.39 66.05
    LLM2Vec 65.39 79.26 72.98 82.72 81.02 78.32 71.77 75.92
    MAGNET 67.98 84.66 77.67 84.17 79.44 82.88 78.77 79.36
    Text Infilling

    MAGNET enables bidirectional context access, making it particularly effective for text infilling.

    Method Infilled Line
    John was at the gym. _____ He couldn’t get the barbell off his chest. He was too embarrassed to call for help. John suffocated as the weights crushed his chest.
    Uni-Llama He was on the treadmill, running at a steady pace.
    FS-Llama He was doing bench presses.
    MAGNET John was doing bench presses when he accidentally dropped the barbell on top.
    Toni took a trip to the local fair with her family. _____ The balloon was red and star shaped. Unfortunately it slipped out of her hands. She was sad to see it go.
    Uni-Llama She won a prize for her drawing of a cat.
    FS-Llama She won it in a game of darts.
    MAGNET She was so excited to see the balloon vendor and bought a red one.
    I was resting on my couch. _____ They raised their water guns. I immediately ran. I went outside and sprayed them with the hose when they followed.
    Uni-Llama I was feeling a bit under the weather, so I decided to take a break.
    FS-Llama My neighbor’s kids were playing tag in the backyard.
    MAGNET I heard a noise and looked out the window and saw two boys with water guns.


    Human evaluators also rate MAGNET's generated text as more coherent compared to other setups. The evaluation scores denote the percentage of infillings that were considered contextually appropriate by human evaluators.

    Method Human Evaluation Score
    Unidirectional Llama-2-7B 53.5
    Zero-Shot Setup 5.5
    Five-Shot Setup 54.5
    MAGNET 62.0
    Repetition Problem

    Unlike other encoder models like BERT and LLM2Vec, MAGNET retains the ability to generate coherent and non-repetitive text.

    Model Generated Text
    Initial phrase: The film was well received
    BERT The film was well received and the " " " " " " " " " " " " " " " " " are " " are " are " are are " are are the are the the the the the the the the the the the so so and and and and and and and and and and and and and and and and and and and and and and and and and and and and and and and and and and ...
    LLM2Vec The film was well received by critics and audiences alike.
    The film was well received by critics and audiences alike.
    The film was well received by critics and audiences alike.
    The film was well received by critics and audiences alike.
    The film was well received by critics and audiences alike.
    The film was well received by critics and audiences alike.
    The film was well received by critics and audiences alike...
    MAGNET The film was well received by critics and audiences alike and was nominated for several awards including the Academy Award for Best Visual Effects and the Saturn Award for Best Science Fiction Film. The film was also a commercial success and grossed over $100 million at the box office.

    == Plot ==
    In the year 2018, a meteorite crashes into the Pacific Ocean, causing a massive tsunami that destroys most of the world's coastal cities. The survivors of the disaster band together ...
    Knowledge and Reasoning Tasks

    MAGNET maintains the knowledge and reasoning abilities acquired during LLM pretraining.

    Model HellaSwag BBH ARC NQ MMLU
    Easy Challenge Humanities STEM Social Science Other
    Llama-2-7B 75.51 33.57 73.95 44.28 24.02 43.27 36.09 53.04 54.84
    MAGNET 75.08 32.22 74.33 44.52 24.22 42.25 36.63 52.64 52.40

    BibTex

    @inproceedings{khosla2025magnet,
        title={MAGNET: Augmenting Generative Decoders with Representation Learning and Infilling Capabilities},
        author={Savya Khosla and  Aditi Tiwari and Kushal Kafle and Simon Jenni and Handong Zhao and John Collomosse and Jing Shi1},
        journal={Association for Computational Linguistics},
        year={2025}
    }