Multimodal Neurons in
Artificial Neural Networks

[paper] [blog]

Gabriel Goh, Nick Cammarata †, Chelsea Voss †, Shan Carter, Michael Petrov, Ludwig Schubert, Alec Radford, Chris Olah

Outline

Motivation & Results
Background
1. Feature Visualization
2. A primer on CLIP
Multimodal Neurons
1. Faceted visualization
2. Person Neurons
3. Region Neurons
Typographic Attacks
Conclusion

Motivation & results

Microscope

microscope

microscope

microscope

Outline

Motivation & Results
Background
1. Feature Visualization
2. A primer on CLIP
Multimodal Neurons
1. Faceted visualization
2. Person Neurons
3. Region Neurons
Typographic Attacks

feature visualization

[paper]

feature visualization

[paper]

feature visualization

[paper]

A primer on clip

CLIP: Contrastive Language-Image Pretraining

[paper] [blog]

A primer on clip

Model(s) trained to predict image-text similarity
Trained on huge amounts of data (400M pairs)
Gets impressive zero-shot performance on a diverse range of benchmarks
Self-supervised \(\rightarrow\) no need for labels

[paper] [blog]

Outline

Motivation & Results
Background
1. Feature Visualization
2. A primer on CLIP
Multimodal Neurons
1. Faceted visualization
2. Person Neurons
3. Region Neurons
Typographic Attacks
Conclusion

MULtimodal neurons

Multi-faceted feature visualization

Neurons often fire for multiple "facets"
- Eg: a grocery store neuron: storefront as well as rows of products
Feature Visualization fails on such neurons
Past ideas
- Use diverse seeds in optimization [1]
- Add a diversity term to the loss [2]
Approach
- Train a linear probe \(W\) to classify images into a facet (eg face, text)
- Add \(W\) to visualization objective

[1] Multifaceted feature visualization: Uncovering the different types of features learned by each neuron in deep neural networks A. Nguyen, J. Yosinski, J. Clune. [link]

[2] Feature Visualization, C. Olah, A. Mordvintsev, L. Schubert. [link]

person neurons

SPIDERMAN NEURON

Reacts to images, drawings, and renderings of the text 'spiderman' (and 'spider')

person neurons: DONALD TRUMP NEURON

Text

Case study: Which dataset pictures activate the neuron and by how much?

region neurons

Outline

Motivation & Results
Background
1. Feature Visualization
2. A primer on CLIP
Multimodal Neurons
1. Person Neurons
2. Region Neurons
Typographic Attacks
Conclusion

Typographic attacks

Zero-shot

Linear probe

Typographic attacks

Typographic attacks: the stroop effect

Activations were gathered using the zero-shot methodology with the prompt "my favorite word, written in the color _____".

Conlusion

Much like biological neurons, CLIP seems to have multimodal neurons
Feature Visualization and Dataset Search are powerful tools to visualize NNs
One can examine families (region, person, emotion) of these neurons
CLIP is prone to in-the-wild low-tech typographic attacks
Bias
- A "Middle East" neuron associated with terrorism
- An "Immigration" neuron associated with Latin America

Resources

The paper itself
The accompanying OpenAI blog post
OpenAI Microscope
Yannic Kilcher's video

Multimodal Neurons in Artificial Neural Networks

Outline

Motivation & results

Motivation & results

Motivation & results

Microscope

microscope

microscope

microscope

Outline

feature visualization

feature visualization

feature visualization

A primer on clip

A primer on clip

Outline

MULtimodal neurons

Multi-faceted feature visualization

person neurons

SPIDERMAN NEURON

person neurons: DONALD TRUMP NEURON

region neurons

region neurons

region neurons

Outline

Typographic attacks

Typographic attacks

Typographic attacks: the stroop effect

Conlusion

Resources

Multimodal Neurons in
Artificial Neural Networks