236,213 views
Anthropic has managed to decipher the internal learning of its Claude 3 model, using a new interpretability technique. With it, they have found the numerous patterns hidden in the guts of the neural network, which has not only allowed them to better understand how it works, but also to be able to control it. Today we explain this work. ???? INTERPRETABILITY ARTICLE - May. 2024 https://www.anthropic.com/news/mappin... ???? INTERPRETABILITY ARTICLE - Oct. 2023 https://www.anthropic.com/news/toward... ???? INTERPRETABILITY PAPER - Oct. 2023 https://transformer-circuits.pub/2023... ???? EDITION: Carlos Santana Vega -- MORE DOTCSV! --- ???? NotCSV - Secondary Channel! / notcsv ???? Patreon : / dotcsv ???? Facebook : / ai.dotcsv ???? Twitch!!! : / dotcsv ???? Twitter : / dotcsv ???? Instagram : / dotcsv - MORE SCIENCE! -- ???? This channel is part of the SCENIO outreach network. If you want to learn about other fantastic outreach projects, go here: http://scenio.es/colaboradores