Prevent loading of untrusted initrds #75

Merged
alois31 merged 3 commits from unsigned-kernel into master 2023-02-02 13:08:09 +00:00
alois31 commented 2023-01-23 17:49:16 +00:00 (Migrated from github.com)

Using a malicious boot loader specification entry, the kernel could be instructed to load untrusted initrds in two ways:

  • Since the kernel was signed, it could be loaded directly, with an arbitrary initrd (since neither systemd-boot nor Linux verify initrds).
  • The cmdline our stub received was passed on to the kernel. By loading the stub indirectly via boot loader specification, this could be abused to point the kernel to an arbitrary initrd, as well as instructing it to perform various other fun stuff.

To prevent these security bypasses, the following solution is implemented:

  • Use a custom PE loader, just enough to load the kernel, in order not to rely on LoadImage. This loader does not need to verify signatures, because we already confirmed using the hash that the kernel is trusted.
  • Stop signing kernels on installation, preventing the first bypass.
  • Always pass the cmdline built into the stub UKI to the kernel. By virtue of being built-in, this cmdline is trusted, and the second bypass is fixed.

Fixes: https://github.com/nix-community/lanzaboote/issues/65

Using a malicious boot loader specification entry, the kernel could be instructed to load untrusted initrds in two ways: * Since the kernel was signed, it could be loaded directly, with an arbitrary initrd (since neither systemd-boot nor Linux verify initrds). * The cmdline our stub received was passed on to the kernel. By loading the stub indirectly via boot loader specification, this could be abused to point the kernel to an arbitrary initrd, as well as instructing it to perform various other fun stuff. To prevent these security bypasses, the following solution is implemented: * Use a custom PE loader, just enough to load the kernel, in order not to rely on `LoadImage`. This loader does not need to verify signatures, because we already confirmed using the hash that the kernel is trusted. * Stop signing kernels on installation, preventing the first bypass. * Always pass the cmdline built into the stub UKI to the kernel. By virtue of being built-in, this cmdline is trusted, and the second bypass is fixed. Fixes: https://github.com/nix-community/lanzaboote/issues/65
alois31 commented 2023-01-23 18:00:38 +00:00 (Migrated from github.com)

Please note that this implementation is still a bit rough around the edges. In particular, there are the following known shortcomings or changes in behavior (roughly in ascending order of severity, in my view):

  • The kernel image does not get measured into PCR 4 any more. This should be harmless, because its hash already influenced PCR 4 as part of measuring the stub UKI.
  • The PE loader is extremely basic: it performs nearly no checks at all, and doesn't support most relocation types. I don't think this is a problem, because we only load trusted Linux kernels.
  • NX is not supported. I'm honestly not quite sure how to properly do that given that some pages contain sections with different permissions:
  0 .setup        00003dc0  0000000001000200  0000000001000200  00000200  2**4
                  CONTENTS, ALLOC, LOAD, READONLY, CODE
  1 .reloc        00000020  0000000001003fc0  0000000001003fc0  00003fc0  2**0
                  CONTENTS, ALLOC, LOAD, READONLY, DATA
  2 .compat       00000020  0000000001003fe0  0000000001003fe0  00003fe0  2**0
                  CONTENTS, ALLOC, LOAD, READONLY, DATA
  3 .text         0082b000  0000000001004000  0000000001004000  00004000  2**4
                  CONTENTS, ALLOC, LOAD, READONLY, CODE
  • Stuff is completely broken on architectures with incoherent instruction caches. This is relevant for supporting AArch64, which I already see in another pull request. This can be easily fixed by flushing the icache after relocating the kernel image, which I think requires a bit of architecture-specific code. Fixed, in the sense that this now breaks the build instead of executing things we didn't intend to execute at runtime.
Please note that this implementation is still a bit rough around the edges. In particular, there are the following known shortcomings or changes in behavior (roughly in ascending order of severity, in my view): * The kernel image does not get measured into PCR 4 any more. This should be harmless, because its hash already influenced PCR 4 as part of measuring the stub UKI. * The PE loader is extremely basic: it performs nearly no checks at all, and doesn't support most relocation types. I don't think this is a problem, because we only load trusted Linux kernels. * NX is not supported. I'm honestly not quite sure how to properly do that given that some pages contain sections with different permissions: ``` 0 .setup 00003dc0 0000000001000200 0000000001000200 00000200 2**4 CONTENTS, ALLOC, LOAD, READONLY, CODE 1 .reloc 00000020 0000000001003fc0 0000000001003fc0 00003fc0 2**0 CONTENTS, ALLOC, LOAD, READONLY, DATA 2 .compat 00000020 0000000001003fe0 0000000001003fe0 00003fe0 2**0 CONTENTS, ALLOC, LOAD, READONLY, DATA 3 .text 0082b000 0000000001004000 0000000001004000 00004000 2**4 CONTENTS, ALLOC, LOAD, READONLY, CODE ``` * ~Stuff is completely broken on architectures with incoherent instruction caches. This is relevant for supporting AArch64, which I already see in another pull request. This can be easily fixed by flushing the icache after relocating the kernel image, which I think requires a bit of architecture-specific code.~ Fixed, in the sense that this now breaks the build instead of executing things we didn't intend to execute at runtime.
nikstur commented 2023-01-23 18:13:59 +00:00 (Migrated from github.com)

Thanks for the PR! I think @blitz can review this best.

Thanks for the PR! I think @blitz can review this best.
blitz (Migrated from github.com) reviewed 2023-01-24 22:20:57 +00:00
blitz (Migrated from github.com) commented 2023-01-24 22:20:57 +00:00

If this function is marked unsafe it should have a /// # Safety section. See here.

That being said, I don't think this function should be marked unsafe to begin with, because it should always be safe to call. Is there some situations where it's not?

If this function is marked `unsafe` it should have a `/// # Safety` section. See [here](https://rust-lang.github.io/api-guidelines/documentation.html#function-docs-include-error-panic-and-safety-considerations-c-failure). That being said, I don't think this function should be marked `unsafe` to begin with, because it should always be safe to call. Is there some situations where it's not?
blitz (Migrated from github.com) reviewed 2023-01-24 22:23:35 +00:00
blitz (Migrated from github.com) commented 2023-01-24 22:23:34 +00:00

Image should free the allocated memory on Drop. Then it is also safe to use a &[u8] slice instead of pointer + size?

`Image` should free the allocated memory on `Drop`. Then it is also safe to use a `&[u8]` slice instead of pointer + size?
blitz (Migrated from github.com) reviewed 2023-01-24 22:27:18 +00:00
blitz (Migrated from github.com) commented 2023-01-24 22:27:18 +00:00

This code find the end of the last section? If so you could say:

pe.sections.iter().map(|s| s.virtual_address + s.virtual_size).max_element()

This would also remove the limitations that the sections need to be ordered in the PE image.

When your at it, it would help readability to select a better name than pos. virt_end?

This code find the end of the last section? If so you could say: `pe.sections.iter().map(|s| s.virtual_address + s.virtual_size).max_element()` This would also remove the limitations that the sections need to be ordered in the PE image. When your at it, it would help readability to select a better name than `pos`. `virt_end`?
blitz (Migrated from github.com) reviewed 2023-01-24 22:27:37 +00:00
blitz (Migrated from github.com) commented 2023-01-24 22:27:36 +00:00

Do you have a pointer to documentation?

Do you have a pointer to documentation?
blitz (Migrated from github.com) reviewed 2023-01-24 22:31:35 +00:00
blitz (Migrated from github.com) commented 2023-01-24 22:31:35 +00:00

I don't understand why you take the largest virtual address of the binary and use it as page count here. Can you elaborate?

nit: I'm personally not a fan of C-like type conversions. usize::try_from(pos).unwrap() has the charm of being a nop on platforms where the conversion is safe and will fail in case you accidentally use it wrong.

I don't understand why you take the largest virtual address of the binary and use it as page count here. Can you elaborate? nit: I'm personally not a fan of C-like type conversions. `usize::try_from(pos).unwrap()` has the charm of being a nop on platforms where the conversion is safe and will fail in case you accidentally use it wrong.
blitz (Migrated from github.com) reviewed 2023-01-24 22:33:02 +00:00
blitz (Migrated from github.com) commented 2023-01-24 22:33:02 +00:00

Or even better, have a small RAII wrapper around allocate_pages so we don't leak memory when we fail to load the image.

Or even better, have a small RAII wrapper around `allocate_pages` so we don't leak memory when we fail to load the image.
blitz (Migrated from github.com) reviewed 2023-01-24 22:37:32 +00:00
blitz (Migrated from github.com) commented 2023-01-24 22:37:32 +00:00

I'm not sure what's going on here. Does relocations not work?

I'm not sure what's going on here. Does [relocations](https://docs.rs/goblin/0.6.0/goblin/pe/section_table/struct.SectionTable.html#method.relocations) not work?
blitz (Migrated from github.com) requested changes 2023-01-24 22:39:48 +00:00
blitz (Migrated from github.com) left a comment

This is a great start. 👍

If we work on the safety of the code this will be a great addition and solve #65. I might find an hour or two to help with this, but I'm a bit swamped till FOSDEM.

This is a great start. :+1: If we work on the safety of the code this will be a great addition and solve #65. I might find an hour or two to help with this, but I'm a bit swamped till FOSDEM.
blitz (Migrated from github.com) reviewed 2023-01-24 22:41:29 +00:00
blitz (Migrated from github.com) commented 2023-01-24 22:41:29 +00:00

Does anyone know how to create a new UEFI handle? It seems wrong to re-use the lanzaboote handle for the kernel we load. Or is this the way to go?

@RaitoBezarius

Does anyone know how to create a new UEFI handle? It seems wrong to re-use the lanzaboote handle for the kernel we load. Or is this the way to go? @RaitoBezarius
alois31 (Migrated from github.com) reviewed 2023-01-25 18:03:58 +00:00
alois31 (Migrated from github.com) commented 2023-01-25 18:03:58 +00:00

The function was pretty wildly memory unsafe due to almost no checks being performed. I added some easy checks, moved the unsafety to starting the image and documented it there.

The function was pretty wildly memory unsafe due to almost no checks being performed. I added some easy checks, moved the unsafety to starting the image and documented it there.
alois31 (Migrated from github.com) reviewed 2023-01-25 18:05:00 +00:00
alois31 (Migrated from github.com) commented 2023-01-25 18:05:00 +00:00

I don't see this working since deallocating pages requires boot services. However, I implemented deallocating the kernel if it returns.

I don't see this working since deallocating pages requires boot services. However, I implemented deallocating the kernel if it returns.
alois31 (Migrated from github.com) reviewed 2023-01-25 18:07:26 +00:00
alois31 (Migrated from github.com) commented 2023-01-25 18:07:26 +00:00

It's documented in the UEFI spec (e.g. section 2.1.1) that EFI applications get their code loaded as EfiLoaderCode and their data as EfiLoaderData. However, we can't do that since they need to be contiguous. So I checked what shim does in its code, and it turns out it uses EfiLoaderCode for everything.

It's documented in the UEFI spec (e.g. section 2.1.1) that EFI applications get their code loaded as EfiLoaderCode and their data as EfiLoaderData. However, we can't do that since they need to be contiguous. So I checked what shim does in its code, and it turns out it uses EfiLoaderCode for everything.
alois31 (Migrated from github.com) reviewed 2023-01-25 18:08:45 +00:00
alois31 (Migrated from github.com) commented 2023-01-25 18:08:45 +00:00

The virtual address is relative to the image base.

The virtual address is relative to the image base.
RaitoBezarius (Migrated from github.com) reviewed 2023-01-25 18:11:39 +00:00
RaitoBezarius (Migrated from github.com) commented 2023-01-25 18:11:39 +00:00

We can locate a (set of) handles which can be opened for the LoadedImage protocol using locate_protocol: https://docs.rs/uefi/0.1.2/uefi/struct.BootServices.html#method.locate_handle_by_protocol AFAIK

But, it's going to probably return one of the handle we are holding right now.

We can locate a (set of) handles which can be opened for the LoadedImage protocol using `locate_protocol`: https://docs.rs/uefi/0.1.2/uefi/struct.BootServices.html#method.locate_handle_by_protocol AFAIK But, it's going to probably return one of the handle we are holding right now.
alois31 (Migrated from github.com) reviewed 2023-01-25 18:12:51 +00:00
alois31 (Migrated from github.com) commented 2023-01-25 18:12:50 +00:00

PE has two different kinds of relocations: object relocations are what you linked, and they only get used when linking object files. Image base relocations apply to an executable (or shared library) and use a different mechanism I parse here.

(Side note: using the thing you linked would "work" in the sense that the Linux kernel happens to load, since it does not contain meaningful image base relocations. So would not bothering at all about relocations. Things staying that way is not an assumption I'd be willing to make, since that would silently break things.)

PE has two different kinds of relocations: object relocations are what you linked, and they only get used when linking object files. Image base relocations apply to an executable (or shared library) and use a different mechanism I parse here. (Side note: using the thing you linked would "work" in the sense that the Linux kernel happens to load, since it does not contain meaningful image base relocations. So would not bothering at all about relocations. Things staying that way is not an assumption I'd be willing to make, since that would silently break things.)
alois31 (Migrated from github.com) reviewed 2023-01-25 18:15:30 +00:00
alois31 (Migrated from github.com) commented 2023-01-25 18:15:30 +00:00

I don't see a way to allocate a new UEFI handle. LocateProtocol is definitely a bad choice: that returns "the first handle" supporting the protocol, which is some driver or systemd-boot, making it a much worse choice than our own handle. At least the uefi crate documents using set_image for pretty much our exact use-case, and shim also does the same.

I don't see a way to allocate a new UEFI handle. LocateProtocol is definitely a bad choice: that returns "the first handle" supporting the protocol, which is some driver or systemd-boot, making it a much worse choice than our own handle. At least the uefi crate documents using `set_image` for pretty much our exact use-case, and shim also does the same.
RaitoBezarius (Migrated from github.com) reviewed 2023-01-25 18:17:11 +00:00
RaitoBezarius (Migrated from github.com) commented 2023-01-25 18:17:10 +00:00

I do think this method returns the set of all handles supporting the protocol, but you're right.

I do think this method returns the set of *all* handles supporting the protocol, but you're right.
alois31 (Migrated from github.com) reviewed 2023-01-25 18:22:08 +00:00
alois31 (Migrated from github.com) commented 2023-01-25 18:22:07 +00:00

That method doesn't exist any more, but I assume it used LocateHandle (not LocateProtocol, which only gives the first). In any case, if we need to reuse an existing handle, the only reasonable choice is our own.

That method doesn't exist any more, but I assume it used LocateHandle (not LocateProtocol, which only gives the first). In any case, if we need to reuse an existing handle, the only reasonable choice is our own.
blitz (Migrated from github.com) reviewed 2023-01-26 13:03:40 +00:00
blitz (Migrated from github.com) commented 2023-01-26 13:03:40 +00:00

Thanks for looking into this.

Thanks for looking into this.
blitz (Migrated from github.com) reviewed 2023-01-26 13:05:31 +00:00
blitz (Migrated from github.com) commented 2023-01-26 13:05:31 +00:00

Ok.

Ok.
blitz (Migrated from github.com) reviewed 2023-01-26 13:07:22 +00:00
blitz (Migrated from github.com) commented 2023-01-26 13:07:22 +00:00

Yes, that's what I meant!

Yes, that's what I meant!
blitz commented 2023-01-26 22:55:33 +00:00 (Migrated from github.com)

@alois31 I've tried to make Image::load not unwrap and expect so much any more. See the commit I pushed into the PR.

@alois31 I've tried to make `Image::load` not `unwrap` and `expect` so much any more. See the commit I pushed into the PR.
blitz (Migrated from github.com) reviewed 2023-01-26 23:20:50 +00:00
blitz (Migrated from github.com) commented 2023-01-26 23:20:50 +00:00

This parsing code is pretty hard to understand. goblin already comes with scroll as a dependency, which we could use here to clear this code up.

Given that we basically only check whether there are any relocations here at all, we could also leave it as is. In this case, I would ask you to leave a comment here about improving the parsing.

PS. I removed the panic on unsupported relocations and chose to return an easily recognizable error instead. This would at least leave the user in the boot menu being able to select another item to boot instead of crashing the system.

This parsing code is pretty hard to understand. goblin already comes with [scroll](https://docs.rs/scroll/0.11.0/scroll/) as a dependency, which we could use here to clear this code up. Given that we basically only check whether there are any relocations here at all, we could also leave it as is. In this case, I would ask you to leave a comment here about improving the parsing. PS. I removed the panic on unsupported relocations and chose to return an easily recognizable error instead. This would at least leave the user in the boot menu being able to select another item to boot instead of crashing the system.
alois31 commented 2023-01-27 07:31:22 +00:00 (Migrated from github.com)

@alois31 I've tried to make Image::load not unwrap and expect so much any more. See the commit I pushed into the PR.

I don't see any commits related to load, except for the unsupported relocation one. I have seen the start one, but that doesn't touch load (I also don't really like that one, but I will comment on that later today).

Sorry, I didn't notice that one for some reason. I like it, perhaps overflowing in bytes_to_pages should return OUT_OF_RESUORCES instead of panicking as well?

> @alois31 I've tried to make `Image::load` not `unwrap` and `expect` so much any more. See the commit I pushed into the PR. ~I don't see any commits related to load, except for the unsupported relocation one. I have seen the `start` one, but that doesn't touch load (I also don't really like that one, but I will comment on that later today).~ Sorry, I didn't notice that one for some reason. I like it, perhaps overflowing in `bytes_to_pages` should return `OUT_OF_RESUORCES` instead of panicking as well?
alois31 (Migrated from github.com) reviewed 2023-01-27 17:12:03 +00:00
alois31 (Migrated from github.com) commented 2023-01-27 17:12:02 +00:00

I'm not too fond of that code either. Incidentally, it actually contains a bug (missing division by 2), and I wondered why it doesn't trigger a panic. Turns out while the Linux kernel has a .reloc section with one IMAGE_REL_BASED_ABSOLUTE (i.e. pointless) relocation, this is actually not linked in the header. I'm probably going to remove the relocation processing code entirely and just return UNSUPPORTED (or abuse INCOMPATIBLE_VERSION like you did) whenever encountering any linked relocation table at all.

I'm not too fond of that code either. Incidentally, it actually contains a bug (missing division by 2), and I wondered why it doesn't trigger a panic. Turns out while the Linux kernel has a `.reloc` section with one `IMAGE_REL_BASED_ABSOLUTE` (i.e. pointless) relocation, this is actually not linked in the header. I'm probably going to remove the relocation processing code entirely and just return `UNSUPPORTED` (or abuse `INCOMPATIBLE_VERSION` like you did) whenever encountering any linked relocation table at all.
alois31 (Migrated from github.com) reviewed 2023-01-27 17:31:14 +00:00
@ -0,0 +147,4 @@
loaded_image.set_load_options(
load_options.as_ptr() as *const u8,
u32::try_from(load_options.num_bytes()).unwrap(),
);
alois31 (Migrated from github.com) commented 2023-01-27 17:14:30 +00:00

When the UEFI fails to open LoadedImage on our image handle, things are extremely messed up already. I'm not sure if there is any point in even bothering to try to return to the bootloader, which is likely not going to be able to work properly either.

When the UEFI fails to open `LoadedImage` on our image handle, things are extremely messed up already. I'm not sure if there is any point in even bothering to try to return to the bootloader, which is likely not going to be able to work properly either.
@ -0,0 +173,4 @@
status
}
}
alois31 (Migrated from github.com) commented 2023-01-27 17:16:49 +00:00

This one is interesting. My thought was "failing to deallocate memory is just a resource leak". However, if this fails (and the UEFI is not extremely buggy), we have attempted a double free because the kernel already freed memory it shouldn't have freed. Given that this violates the unsafe contract, I'm actually tempted to panic here instead.

This one is interesting. My thought was "failing to deallocate memory is just a resource leak". However, if this fails (and the UEFI is not extremely buggy), we have attempted a double free because the kernel already freed memory it shouldn't have freed. Given that this violates the `unsafe` contract, I'm actually tempted to panic here instead.
alois31 (Migrated from github.com) commented 2023-01-27 17:31:05 +00:00

This should at most be an addition instead of a replacement. I don't mean the kernel calling ExitBootServices and then (hopefully) not returning. I really mean Exit, which normally returns to the parent. However, since we reuse our image handle for the kernel, it actually returns to systemd-boot, and our resources are leaked.

This should at most be an addition instead of a replacement. I don't mean the kernel calling `ExitBootServices` and then (hopefully) not returning. I really mean `Exit`, which normally returns to the parent. However, since we reuse our image handle for the kernel, it actually returns to systemd-boot, and our resources are leaked.
blitz (Migrated from github.com) reviewed 2023-01-29 15:10:26 +00:00
blitz (Migrated from github.com) commented 2023-01-29 15:10:25 +00:00

Yes, something like that would be a compromise for the time being.

Yes, something like that would be a compromise for the time being.
blitz (Migrated from github.com) reviewed 2023-01-29 15:11:11 +00:00
@ -0,0 +173,4 @@
status
}
}
blitz (Migrated from github.com) commented 2023-01-29 15:11:11 +00:00

Fair enough. In this case, just leave a comment to explain why.

Fair enough. In this case, just leave a comment to explain why.
blitz (Migrated from github.com) reviewed 2023-01-29 15:12:04 +00:00
@ -0,0 +173,4 @@
status
}
}
blitz (Migrated from github.com) commented 2023-01-29 15:12:04 +00:00

Ah good point. Feel free to fix!

Ah good point. Feel free to fix!
blitz commented 2023-01-29 15:13:47 +00:00 (Migrated from github.com)

@alois31 With the changes around the relocation handling that you suggested, I would approve this PR. It would be nice to get this into the tree, because then we can tackle #58, which currently prevents users from gracefully recovering their boot when things are messed up. :)

@alois31 With the changes around the relocation handling that you suggested, I would approve this PR. It would be nice to get this into the tree, because then we can tackle #58, which currently prevents users from gracefully recovering their boot when things are messed up. :)
alois31 commented 2023-01-30 08:42:02 +00:00 (Migrated from github.com)

I think I have already applied these changes before you posted your last comment (see the last three commits). Feel free to fix #58 before, I need to rebase anyway.

I think I have already applied these changes before you posted your last comment (see the last three commits). Feel free to fix #58 before, I need to rebase anyway.
blitz (Migrated from github.com) approved these changes 2023-02-02 13:07:48 +00:00
blitz (Migrated from github.com) left a comment

This is coming in 🔥 hot 🔥 for FOSDEM. ;)

This is coming in :fire: hot :fire: for FOSDEM. ;)
blitz commented 2023-02-02 15:42:51 +00:00 (Migrated from github.com)

Mmh. This broke the CI. I didn't even notice that there were no CI runs here. Not sure what happened... We might need to revert this.

Mmh. This broke the CI. I didn't even notice that there were no CI runs here. Not sure what happened... We might need to revert this.
blitz commented 2023-02-02 15:43:29 +00:00 (Migrated from github.com)
3885f114a8d6d6e3641da83a0b45366c5777e825 is the first bad commit
commit 3885f114a8d6d6e3641da83a0b45366c5777e825
Author: Alois Wohlschlager <alois1@gmx-topmail.de>
Date:   Mon Jan 23 13:52:24 2023 +0100

    Do not sign the kernel
    
    Malicious boot loader specification entries could be used to make a
    signed kernel load arbitrary unprotected initrds. Since we do not want
    this, do not sign the kernel. This way, the only things allowed to boot
    are our UKI stubs, which do verify the initrd.

 rust/tool/src/install.rs | 8 +++++---
 1 file changed, 5 insertions(+), 3 deletions(-)

``` 3885f114a8d6d6e3641da83a0b45366c5777e825 is the first bad commit commit 3885f114a8d6d6e3641da83a0b45366c5777e825 Author: Alois Wohlschlager <alois1@gmx-topmail.de> Date: Mon Jan 23 13:52:24 2023 +0100 Do not sign the kernel Malicious boot loader specification entries could be used to make a signed kernel load arbitrary unprotected initrds. Since we do not want this, do not sign the kernel. This way, the only things allowed to boot are our UKI stubs, which do verify the initrd. rust/tool/src/install.rs | 8 +++++--- 1 file changed, 5 insertions(+), 3 deletions(-) ```
blitz commented 2023-02-02 15:46:00 +00:00 (Migrated from github.com)

Ah, the tests that check whether we reject an unsigned kernel are now obviously broken... :-D I won't revert, because this is not a functional issue. I hope I can look into this tonight, should be an easy fix. Maybe @nikstur has time as well.

Ah, the tests that check whether we reject an unsigned kernel are now obviously broken... :-D I won't revert, because this is not a functional issue. I hope I can look into this tonight, should be an easy fix. Maybe @nikstur has time as well.
Sign in to join this conversation.
No description provided.