关于2,你理解就可以了。
关于3,我并没有反对你的观点,我同意
关于1,你可以阅读一下他们的论文全文,这个应该不难找到。给你摘录一段:
“We have trained language assistants that are both helpful and harmless without using human feedback labelsfor harmlessness. We referred to the technique as ‘constitutional AI’ (CAI) since we used a ‘constitution’ consisting of human-written principles. We established two methods: (1) Constitutional AI which ‘bootstraps’ a helpful RLHF’s instruction-following abilities to critique and revise its own responses so as to remove harmful content, and (2) RL with model-generated labels for harmlessness, which further improves harmlessness.
We used this method to train models that are both harmless and non-evasive, partially resolving an issue in[Bai et al., 2022].
By removing human feedback labels for harmlessness, we have moved further away from reliance on humansupervision, and closer to the possibility of a self-supervised approach to alignment. However, in this workwe still relied on human supervision in the form of helpfulness labels. We expect it is possible to achieve helpfulness and instruction-following without human feedback, starting from only a pretrained LM and extensiveprompting, but we leave this for future work.
Our ultimate goal is not to remove human supervision entirely, but to make it more efficient, transparent, and targeted. All of our methods can leverage chain-of-thought [Nye et al., 2021, Wei et al., 2022] type reasoning– for critiques in the SL stage, and for evaluating comparisons for the RL stage – and we expect that a small number of very high-quality human demonstrations of this reasoning [Scheurer et al., , Saunders et al., 2022]could be used to improve and focus performance. Natural language feedback is also more transparent, interpretable, and improveable as compared to a large dataset of human preference labels.
We leave it to future work to study the effectiveness of this type of feedback.”
抱歉,“宪法型AI“只是一种概念,只是一个正在实验的方向而不是现实。