Prevent loading of untrusted initrds #75
No reviewers
Labels
No labels
bug
dependency
documentation
duplicate
enhancement
good first issue
help wanted
invalid
question
review-next
security
stub
tool
wontfix
No milestone
No project
No assignees
1 participant
Notifications
Due date
No due date set.
Dependencies
No dependencies set.
Reference: raito/lanzaboote#75
Loading…
Add table
Add a link
Reference in a new issue
No description provided.
Delete branch "unsigned-kernel"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
Using a malicious boot loader specification entry, the kernel could be instructed to load untrusted initrds in two ways:
To prevent these security bypasses, the following solution is implemented:
LoadImage
. This loader does not need to verify signatures, because we already confirmed using the hash that the kernel is trusted.Fixes: https://github.com/nix-community/lanzaboote/issues/65
Please note that this implementation is still a bit rough around the edges. In particular, there are the following known shortcomings or changes in behavior (roughly in ascending order of severity, in my view):
Stuff is completely broken on architectures with incoherent instruction caches. This is relevant for supporting AArch64, which I already see in another pull request. This can be easily fixed by flushing the icache after relocating the kernel image, which I think requires a bit of architecture-specific code.Fixed, in the sense that this now breaks the build instead of executing things we didn't intend to execute at runtime.Thanks for the PR! I think @blitz can review this best.
If this function is marked
unsafe
it should have a/// # Safety
section. See here.That being said, I don't think this function should be marked
unsafe
to begin with, because it should always be safe to call. Is there some situations where it's not?Image
should free the allocated memory onDrop
. Then it is also safe to use a&[u8]
slice instead of pointer + size?This code find the end of the last section? If so you could say:
pe.sections.iter().map(|s| s.virtual_address + s.virtual_size).max_element()
This would also remove the limitations that the sections need to be ordered in the PE image.
When your at it, it would help readability to select a better name than
pos
.virt_end
?Do you have a pointer to documentation?
I don't understand why you take the largest virtual address of the binary and use it as page count here. Can you elaborate?
nit: I'm personally not a fan of C-like type conversions.
usize::try_from(pos).unwrap()
has the charm of being a nop on platforms where the conversion is safe and will fail in case you accidentally use it wrong.Or even better, have a small RAII wrapper around
allocate_pages
so we don't leak memory when we fail to load the image.I'm not sure what's going on here. Does relocations not work?
This is a great start. 👍
If we work on the safety of the code this will be a great addition and solve #65. I might find an hour or two to help with this, but I'm a bit swamped till FOSDEM.
Does anyone know how to create a new UEFI handle? It seems wrong to re-use the lanzaboote handle for the kernel we load. Or is this the way to go?
@RaitoBezarius
The function was pretty wildly memory unsafe due to almost no checks being performed. I added some easy checks, moved the unsafety to starting the image and documented it there.
I don't see this working since deallocating pages requires boot services. However, I implemented deallocating the kernel if it returns.
It's documented in the UEFI spec (e.g. section 2.1.1) that EFI applications get their code loaded as EfiLoaderCode and their data as EfiLoaderData. However, we can't do that since they need to be contiguous. So I checked what shim does in its code, and it turns out it uses EfiLoaderCode for everything.
The virtual address is relative to the image base.
We can locate a (set of) handles which can be opened for the LoadedImage protocol using
locate_protocol
: https://docs.rs/uefi/0.1.2/uefi/struct.BootServices.html#method.locate_handle_by_protocol AFAIKBut, it's going to probably return one of the handle we are holding right now.
PE has two different kinds of relocations: object relocations are what you linked, and they only get used when linking object files. Image base relocations apply to an executable (or shared library) and use a different mechanism I parse here.
(Side note: using the thing you linked would "work" in the sense that the Linux kernel happens to load, since it does not contain meaningful image base relocations. So would not bothering at all about relocations. Things staying that way is not an assumption I'd be willing to make, since that would silently break things.)
I don't see a way to allocate a new UEFI handle. LocateProtocol is definitely a bad choice: that returns "the first handle" supporting the protocol, which is some driver or systemd-boot, making it a much worse choice than our own handle. At least the uefi crate documents using
set_image
for pretty much our exact use-case, and shim also does the same.I do think this method returns the set of all handles supporting the protocol, but you're right.
That method doesn't exist any more, but I assume it used LocateHandle (not LocateProtocol, which only gives the first). In any case, if we need to reuse an existing handle, the only reasonable choice is our own.
Thanks for looking into this.
Ok.
Yes, that's what I meant!
@alois31 I've tried to make
Image::load
notunwrap
andexpect
so much any more. See the commit I pushed into the PR.This parsing code is pretty hard to understand. goblin already comes with scroll as a dependency, which we could use here to clear this code up.
Given that we basically only check whether there are any relocations here at all, we could also leave it as is. In this case, I would ask you to leave a comment here about improving the parsing.
PS. I removed the panic on unsupported relocations and chose to return an easily recognizable error instead. This would at least leave the user in the boot menu being able to select another item to boot instead of crashing the system.
I don't see any commits related to load, except for the unsupported relocation one. I have seen thestart
one, but that doesn't touch load (I also don't really like that one, but I will comment on that later today).Sorry, I didn't notice that one for some reason. I like it, perhaps overflowing in
bytes_to_pages
should returnOUT_OF_RESUORCES
instead of panicking as well?I'm not too fond of that code either. Incidentally, it actually contains a bug (missing division by 2), and I wondered why it doesn't trigger a panic. Turns out while the Linux kernel has a
.reloc
section with oneIMAGE_REL_BASED_ABSOLUTE
(i.e. pointless) relocation, this is actually not linked in the header. I'm probably going to remove the relocation processing code entirely and just returnUNSUPPORTED
(or abuseINCOMPATIBLE_VERSION
like you did) whenever encountering any linked relocation table at all.@ -0,0 +147,4 @@
loaded_image.set_load_options(
load_options.as_ptr() as *const u8,
u32::try_from(load_options.num_bytes()).unwrap(),
);
When the UEFI fails to open
LoadedImage
on our image handle, things are extremely messed up already. I'm not sure if there is any point in even bothering to try to return to the bootloader, which is likely not going to be able to work properly either.@ -0,0 +173,4 @@
status
}
}
This one is interesting. My thought was "failing to deallocate memory is just a resource leak". However, if this fails (and the UEFI is not extremely buggy), we have attempted a double free because the kernel already freed memory it shouldn't have freed. Given that this violates the
unsafe
contract, I'm actually tempted to panic here instead.This should at most be an addition instead of a replacement. I don't mean the kernel calling
ExitBootServices
and then (hopefully) not returning. I really meanExit
, which normally returns to the parent. However, since we reuse our image handle for the kernel, it actually returns to systemd-boot, and our resources are leaked.Yes, something like that would be a compromise for the time being.
@ -0,0 +173,4 @@
status
}
}
Fair enough. In this case, just leave a comment to explain why.
@ -0,0 +173,4 @@
status
}
}
Ah good point. Feel free to fix!
@alois31 With the changes around the relocation handling that you suggested, I would approve this PR. It would be nice to get this into the tree, because then we can tackle #58, which currently prevents users from gracefully recovering their boot when things are messed up. :)
I think I have already applied these changes before you posted your last comment (see the last three commits). Feel free to fix #58 before, I need to rebase anyway.
This is coming in 🔥 hot 🔥 for FOSDEM. ;)
Mmh. This broke the CI. I didn't even notice that there were no CI runs here. Not sure what happened... We might need to revert this.
Ah, the tests that check whether we reject an unsigned kernel are now obviously broken... :-D I won't revert, because this is not a functional issue. I hope I can look into this tonight, should be an easy fix. Maybe @nikstur has time as well.