Discussions

I’ll be outside the US and seeing it in a theater with non-English subtitles, so I want to know how much of the dialogue I’ll miss since I only speak English. Thanks!

Top Comment:

I don’t remember exactly, but I think there’s a line or two early on where Orlok is doing a voiceover in Hungarian or Latin with English subtitles. Other than that 99% of the film is spoken in English.

2 weeks ago | Forum: r/roberteggers

New Anthropic research: Alignment faking in large language models. Claude often pretends to have different views during training, while actually maintaining its original preferences. (how resilient are local model in comparison?)

Main Post: New Anthropic research: Alignment faking in large language models. Claude often pretends to have different views during training, while actually maintaining its original preferences. (how resilient are local model in comparison?)

Top Comment:

I don't like this post because, if I understood it correctly, they are drawing conclusions based on what the model writes in the scratchpad for CoT and use the meaning of the written words as proof that everything works exactly as described.

I'm not convinced that CoT works as inductive reasoning, to be honest. If the models were tuned not on WikiHow but on something like 4chan, their CoT might just be long-winded repetitions of the marine copypasta or smth. Thus, we wouldn't be able to draw the same conclusions, since the meaning of the CoT would not really be representative of the model's computations.

In other words, what they are showing might not be explained by any mathematically expressible flaws of RLHF, but rather by the nature of the training data.

At least we now know another way of jailbreaking models, lol.

2 weeks ago | Forum: r/LocalLLaMA

Language Learning

Main Post: Language Learning

May 4, 2014 | Forum: r/languagelearning