The new Infini-Attention and SEO from Google

Google's new Infini-attention and what it may mean for SEO

Google has published a research paper on a new technology called Infini-attention that allows it to process large amounts of data with “infinitely long contexts” while being easily embedded into other models to vastly improve their capabilities.

This last part should be of interest to those interested in Google’s algorithm. Infini-Attention is plug-and-play, meaning it’s relatively easy to plug into other models, including those used with Google’s core algorithm. The part about “infinitely long contexts” may have implications for how some of Google’s search systems may work.

The name of the research paper is: Leave no context behind: Efficient transformers of infinite context with infinite attention

Memory is computationally expensive for LLMs

Large language models (LLMs) have limitations on the amount of data they can process at once because the computational complexity and memory usage can increase significantly. Infini-Attention gives the LLM the ability to handle longer contexts while keeping the required memory and processing power low.

The research paper explains:

“Memory serves as a cornerstone of intelligence, enabling efficient computations tailored to specific contexts. However, Transformers … and Transformer-based LLMs … have limited context-dependent memory, due of the nature of the attention mechanism.

In fact, scaling LLMs to longer sequences (i.e. 1M tokens) is challenging with standard Transformer architectures, and serving longer and longer context models becomes financially expensive.”

And elsewhere, the research paper explains:

“Current transformer models are limited in their ability to process long sequences due to quadratic increases in computational and memory costs. Infini-attention aims to address this scalability issue.”

The researchers hypothesized that Infini-attention can scale to handle extremely long sequences with Transformers without the usual increases in computational and memory resources.

Three important features

Google’s Infini-Attention addresses the shortcomings of transformer models by incorporating three features that allow transformer-based LLMs to handle longer sequences without memory issues and to use the context of previous data in the sequence, not just data near the current point that is ‘is processing.

The characteristics of Infini-Attention

Compressive memory system Long-term linear attention Local masked attention

Compressive memory system

Infini-Attention uses what is called a compressive memory system. As more data is entered (as part of a long data sequence), the compressive memory system compresses some of the older information in order to reduce the amount of space needed to store the data.

Long-term linear care

Infini-attention also uses what are called “long-term linear attention mechanisms” that allow the LLM to process data that exists earlier in the sequence of data being processed, allowing context to be retained. This is a departure from standard transformer-based LLMs.

This is important for tasks where the context exists in a larger data plane. It’s like being able to discuss the whole book and all the chapters and explain how the first chapter relates to another chapter closer to the end of the book.

Local masked attention

In addition to long-term attention, Infini attention also uses what is called local masked attention. This type of attention processes nearby (localized) parts of the input data, which is useful for responses that depend on the closest parts of the data.

The combination of local and long-term attention helps solve the problem of transformers being limited by how much input data they can remember and use for context.

The researchers explain:

“Infinite attention incorporates compressive memory into the vanilla attention mechanism and incorporates both masked local attention and long-term linear attention mechanisms into a single transformer block.”

Results of experiments and tests

Infini Attention was tested against other models for comparison across several benchmarks involving long input sequences, such as long-context language modeling, step key retrieval, and book summarization tasks . Pass key retrieval is a test where the language model has to retrieve specific data from an extremely long text sequence.

List of the three tests:

Long Context Linguistic Modeling Pass Key Test Book Summary

Long context language modeling and the perplexity score

The researchers write that Infini attention outperformed baseline models and that increasing the duration of the training sequence led to even greater improvements in Perplexity score. The perplexity score is a metric that measures the performance of the language model with lower scores indicating better performance.

The researchers shared their findings:

“Infini-Transformer exceeds the baselines of both Transformer-XL and Memorizing Transformers, while maintaining 114 times fewer memory parameters than the Memorizing Transformer model with a KV memory based on vector retrieval with a length of 65K at its ninth layer Infini-Transformer surpasses the memory transformers with a memory length of 65K and achieves a compression ratio of 114x.

We further increased the training sequence length to 100K from 32K and trained the models on the Arxiv-math dataset. 100K training further reduced the perplexity score to 2.21 and 2.20 for the Linear and Linear + Delta models.”

Access key test

The pass key test is where a random number is hidden within a long text sequence with the task being that the model must obtain the hidden text. The passkey is hidden at the beginning, middle or end of the long text. The model was able to solve the passkey test up to a length of 1 million.

“A 1B LLM scales naturally to a sequence length of 1M and solves the access key recovery task when injected with Infini attention. Infini-Transformers solved the password key task with a context length of up to 1M when adjusted with 5K length entries We report the token-level recovery accuracy for hidden step keys in a different part (start/middle/end) of long entries with lengths from 32K to 1M”.

Book summary test

Infini-attention also excelled in the book’s summary test beating the best benchmarks and achieving new levels of state-of-the-art (SOTA) performance.

The results are described:

“Finally, we show that an 8B model with Infinite attention reaches a new SOTA result on a 500K-long book summarization task after continuous pretraining and task tuning.

… We further scaled our approach by continuously training an LLM 8B model with an input length of 8K for 30K steps. We then adapted a book summarization task, BookSum (Kry´sci´nski et al., 2021) where the goal is to generate a summary of an entire book text.

Our model outperforms previous best results and achieves a new SOTA in BookSum by processing the entire text of the book. … There is a clear trend showing that with more text provided as input from the books, our Infini-Transformers improve their summary performance metric.”

Implications of Infini-Attention for SEO

Infini-attention is a breakthrough in modeling long- and short-range attention with greater efficiency than previous models without Infini-attention. It also supports plug-and-play continuous pre-training and long-context adaptation
by design”, which means it can be easily integrated into existing models.

Finally, “continuous prior training and adaptation to long context” makes it exceptionally useful for scenarios where you need to constantly train the model on new data. This last part is very interesting because it can make it useful for applications on the back end of Google’s search systems, especially when it is necessary to be able to analyze long sequences of information and understand the relevance of a part near the beginning of the sequence and another part closer to the end.

Other articles focused on the “infinitely long entries” this model is capable of, but where it’s relevant to SEO is how this ability to handle huge entries and “Leaving context behind” is what’s relevant to to search marketing and how some of Google’s systems might. it works if Google has adapted Infini-attention to its core algorithm.

Read the research paper:

Don’t leave context behind: Efficient transformers of infinite context with infinite attention

Featured image by Shutterstock/JHVEPhoto

Source link

Pages

Categories

The new Infini-Attention and SEO from Google

Memory is computationally expensive for LLMs

Compressive memory system

Long-term linear care

Local masked attention

Results of experiments and tests

Long context language modeling and the perplexity score

Access key test

Book summary test

Implications of Infini-Attention for SEO

About the Author: Ted Simmons

Leave a Reply Cancel reply

Memory is computationally expensive for LLMs

Compressive memory system

Long-term linear care

Local masked attention

Results of experiments and tests

Long context language modeling and the perplexity score

Access key test

Book summary test

Implications of Infini-Attention for SEO

You May Also Like

15 Video Optimization Steps for YouTube SEO

Basic Update, Site Reputation Abuse, SEO, SGE, Local and more

Google is interested in alternatives to Hreflang

Google’s Huge Search Market Share Loss Wasn’t Real: Data Revisited

Google introduces new ways to reach streaming audiences

Consumer behavior is changing, is your marketing?

About the Author: Ted Simmons

Leave a Reply Cancel reply