The good ending is hard-locked behind the bad ending, unless we are afforded the opportunity to spend a very long time working very hard to solve a series of wildly difficult problems.
Do you think there still can be room for people doing conceptual AI safety research (like Agent Foundations), because the ban would get harder to enforce over time and humanity would have to face those problems eventually?
In case we do succeed with the ban, I think we still need to have experts ready to work on alignment as soon as we can, and the only way to create them is to have people do research now.
Omg, I realized Connor looks like the Crypto Jesus man I met a year ago who tried to sell me a superintelligence cult. But he’s not Connor :D Followed.
“Safe" is doing a lot of carrying there. Safe for who, exactly? Safe for Alex Karp? No, thank you, pass. Safe for Capitalism? Also no, pass. Safe for Humanity? But what about non-human life? Thanks but no. Safe in the sense of helping to further decrease local entropy (inside ‘lifeforms’) throughout the cosmos? Now we are talking, not perhaps a perfect definition but a good starting point.
I have a couple of questions and I'd be happy if you respond.
1. Let's say there is 20% of chances that if you do nothing, then someone will create unsafe AI. If you have an idea how to create safer AI, which has only 1% of being unsafe, wouldn't it be wise and morally correct to make it? Someone might leave your project or read your papers, but there might still be lower chance of catastrophe than when you do nothing. Is that reasoning correct? If people can coordinate and pause AI that could be better for safety, but what if there is a risk that people might built AI before pausing it?
2. You assume that you have to create unsafe AI before you create safe AI. I don't understand why you make that assumption. Can't you figure out how to solve AI alignment first, and then figure out how to make the learning efficient enough? Is it impossible to solve AI alignment without fully solving capabilities? I don't know any reason to assume that it's impossible to solve AI alignment without fully solving capabilities.
3. As for "even if you had such an agenda, how do you execute it without accidentally, or due to some asshole leaving the project or reading your papers, building unsafe ASI along the way?"... If you have a way to align your AI and everyone at the company is aware of that solution, then if they leave the project, then they will most likely create aligned AI too. The same is true for the papers, if you include information how to align it in your paper. If there is some cost to implementing that alignment method, then someone might cut corners (which possibly might be still better because of point 1), but what if the alignment method doesn't introduce a lot of additional cost?
I'm not saying that you are wrong, maybe you're right, but I would just like to discuss it to have better clarity.
That would be true if you don’t have an enforceable global runtime governance layer. That is exactly the point. Safety cannot depend on trust alone. My startup's view is that advanced AI needs a real control layer, with monitoring, intervention, and fail-safe shutdown capacity across systems.
That argument is strong against naive AI boxing, but my point is different.
The question is not whether a superhuman model can eventually outsmart a weak container. It is whether it should ever be able to run, scale, connect, or update itself without an external governance layer controlling deployment, permissions, inference, and intervention.
The good ending is hard-locked behind the bad ending, unless we are afforded the opportunity to spend a very long time working very hard to solve a series of wildly difficult problems.
Do you think there still can be room for people doing conceptual AI safety research (like Agent Foundations), because the ban would get harder to enforce over time and humanity would have to face those problems eventually?
In case we do succeed with the ban, I think we still need to have experts ready to work on alignment as soon as we can, and the only way to create them is to have people do research now.
I think people use shortcuts and give marketing free range far too much to trust an AI which uses a lot of both.
Omg, I realized Connor looks like the Crypto Jesus man I met a year ago who tried to sell me a superintelligence cult. But he’s not Connor :D Followed.
“Safe" is doing a lot of carrying there. Safe for who, exactly? Safe for Alex Karp? No, thank you, pass. Safe for Capitalism? Also no, pass. Safe for Humanity? But what about non-human life? Thanks but no. Safe in the sense of helping to further decrease local entropy (inside ‘lifeforms’) throughout the cosmos? Now we are talking, not perhaps a perfect definition but a good starting point.
I have a couple of questions and I'd be happy if you respond.
1. Let's say there is 20% of chances that if you do nothing, then someone will create unsafe AI. If you have an idea how to create safer AI, which has only 1% of being unsafe, wouldn't it be wise and morally correct to make it? Someone might leave your project or read your papers, but there might still be lower chance of catastrophe than when you do nothing. Is that reasoning correct? If people can coordinate and pause AI that could be better for safety, but what if there is a risk that people might built AI before pausing it?
2. You assume that you have to create unsafe AI before you create safe AI. I don't understand why you make that assumption. Can't you figure out how to solve AI alignment first, and then figure out how to make the learning efficient enough? Is it impossible to solve AI alignment without fully solving capabilities? I don't know any reason to assume that it's impossible to solve AI alignment without fully solving capabilities.
3. As for "even if you had such an agenda, how do you execute it without accidentally, or due to some asshole leaving the project or reading your papers, building unsafe ASI along the way?"... If you have a way to align your AI and everyone at the company is aware of that solution, then if they leave the project, then they will most likely create aligned AI too. The same is true for the papers, if you include information how to align it in your paper. If there is some cost to implementing that alignment method, then someone might cut corners (which possibly might be still better because of point 1), but what if the alignment method doesn't introduce a lot of additional cost?
I'm not saying that you are wrong, maybe you're right, but I would just like to discuss it to have better clarity.
That would be true if you don’t have an enforceable global runtime governance layer. That is exactly the point. Safety cannot depend on trust alone. My startup's view is that advanced AI needs a real control layer, with monitoring, intervention, and fail-safe shutdown capacity across systems.
This is necessary for AI that is not superhuman, but straightforwardly impossible for AI systems that are.
See for reference: https://www.lesswrong.com/w/ai-boxing-containment
That argument is strong against naive AI boxing, but my point is different.
The question is not whether a superhuman model can eventually outsmart a weak container. It is whether it should ever be able to run, scale, connect, or update itself without an external governance layer controlling deployment, permissions, inference, and intervention.
That is a different problem from classic boxing.