CNN vs LLM – Crispy Thoughts

A Transformer Architecture

LLM is a type of neural network, it uses a particular neural network architecture called a transformer, which is designed to process and generate data in sequence, like text. An architecture in this context describes how the neurons are connected to one another. All neural networks group their neurons into a number of different layers. If there are many layers, the network is described as being “deep,” which is where “deep learning” comes from.

In a simple neural network architecture, such as Convolutional Neural Networks (CNN), each neuron may be connected to every neuron in the layer above it, in others, a neuron may only be connected to some other neurons that are near it in a grid. CNNs have formed the foundation of image recognition , it is structured in a grid (like the pixels in an image), that’s why that architecture works well for image data.

A transformer is based on the idea of “attention,” whereby certain neurons are more strongly connected (or “pay more attention to”) other neurons in a sequence. Since text is read in a sequence, one word after the other, with different parts of a sentence referring to or modifying others (such as an adjective that modifies the noun but not the verb). An architecture that is built to work in sequence, with different strengths of connection between different parts of that sequence, should work well on text-based data.

A Transformer Architecture

Leave a Reply