Project Idea: Working Suspend/Resume/Hibernate

This is a specification for a project idea that I had. Basically, it is to implement a working suspend/resume/hibernate system into FreeBSD. Due to the amount of work required to add functionality to device drivers for the suspend/resume events, this project, if approved, will be a community effort. This proposal include some basic outlines of the procedures that I think are required to facilitate an effective suspend/resume/hibernate system.

The best way that I think to do this is to use a kernel thread to monitor the interrupt counters for various hardware interrupts. See vmstat(8) -i option to illustrate my point. If the watched interrupt counts do not change within a certain time frame, then the suspend system procedures are invoked. Same applies to harddisk spin down (this is independent of suspend during normal operation) and monitor timeout (screen saver mechanism?). The checks are done once a minute to minimize system overhead. The resume thread checks interrupt counts once every second. If something happens (interrupt received), then the system is brought back up. During suspend mode, the CPU is placed in either a suspend or halt mode after the thread checks counters to minimize power consumption (a good thing for laptops).

Speaking of the CPU, the automated CPU throttling thread would evaluate the system load statistics periodically and if the utilization exceeds a preset percentage threshold, then the clock frequency of the CPU is increased to bring more computational bandwidth to bear. However, if the system idle percentage exceeds a preset threshold, then the clock frequency of the CPU is scaled back to reduce power consumption. The reason why this works is because of the CMOS fabrication technology. If the digital states of CMOS logic circuits are not changing, then the power draw is usually in the picoamperes. But starting increasing the rate that the digital states change and more power is drawn. This is due to the physical construction of a CMOS transistor, which is technically known as an Insulated Gate Field Effect Transistor. The control gate is insulated from the conduction channel and therefore forms a capacitor. Capacitors are like little batteries as they store an electrical charge. When the digital state transitions from 0 to 1 or 1 to 0, that capacitor has to either charge or discharge. That causes current flow, which translates to increased power consumption and increased heat generation. This is why lowering the clock frequency of the CPU can save a considerable amount of power. This will balance system performance with power consumption as the CPU is not wasting clock cycles during the idle loop (Also a good thing for laptops).

I think that the hardest part to implement will be hibernate. I have come up with two different methods to use. I kinda like method 1 better because of the hardware issue discussed at the end of the proposal, but method 1 is far more challenging to implement than method 2. However, if there is a way to also save the hardware state of all the devices, then method 2 would be the way to go.

Please review and comment.

Thanks.

NOTE: School just started for the semester, so my personal time will be extremely limited as I am taking some pretty hard hitting classes this time around.


Code:
Project: Complete rewrite of FreeBSD APM system to allow a working
	suspend/resume/hibernate function.
	
Operational Characteristics:
  Operates compeletly within the kernel via new kernel process/thread
    "apmanproc"
  sysctl(8) turning of operational parameters
    Supports two profiles
      Battery
      A/C
    Keyboard/Mouse timeout to suspend
    Keyboard/Mouse timeout to monitor off
    Automatic HD spin down inactivity timeout (HD activity only)
    Automatic CPU throttling based on current load/strategy
      Full Speed (No control)
      Fixed Speed (No control)
      Control based on system load
        Adaptive (10-20% CPU reserve, 80-90% utilization)
	Degraded (0% CPU reserve, always < 75% speed regardless of load)
    Sleep/Suspend Events
      Keyboard/Mouse inactivity timeout
      Network inactivity timeout
      Modem inactivity timeout
      Serial Port inactivity timeout (for serial console)
      Configurable via bit mask
    Wake up/Resume Events
      Keyboard/Mouse interrupt received
      Network interrupt received
      Modem interrupt received
      Serial Port interrupt received
      Configurable via bit mask

Sleep/Suspend Actions:
  1. Send pause signal to all userland processes
  2. Suspend all non-essential threads/processes via scheduler
  3. Save screen data and power off monitor
  4. Power down/suspend other hardware devices
  5. Spin down harddisks
  6. Launch watch thread that checks interrupt counters once
     every second to look for wake up events according to
     bit mask.

Wake up/Resume Actions:
  1. Spin up harddisks
  2. Wake up/resume hardware devices
    - No device probe needed (already known)
    - Reinit devices if needed (this is a device driver function)
  3. Restore screen data and power on monitor
  4. Resume threads/processes via scheduler
  5. Send resume signal to all userland processes
  6. Launch watch thread that checks interrupt counters once
     every minute to check interrupt counters for inactivity
     timeouts according to bit mask.

Hibernate:
  Procedure 1:
    ** Each process is saved in it's own file
      /boot/hibernate  ?
      /var/hibernate   ?
    File contains following information
      VM Map
      Core Image
      Network Data
      Thread/Process Data (proc/thread structures)
      VNode Data
      Machine State
      Any other platform specific data

    Suspend all processes via the scheduler
    Flush all I/O buffers and deallocate
    Loop for each userland process:
      Read in swapped out pages (if swapon)
      Write core image to disk
      Write kernel data to disk (VM Map, VNode, etc)
      Deallocate memory resources
    Write hibernate flag to disk
    Power Off

  Procedure 2:
    ** Entire memory is written to disk in one big file
    
    Suspend all processes via the scheduler
    Flush all I/O buffers
    Write hardware peripheral state to disk (If Possible)
    Write all physical memory out to disk
    Write hibernate flag to disk
    Power Off

  Resume From Hibernate:
    In both cases, if the hibernate flag is found on disk, a very
    small module is loaded into memory.  This module reads the
    file(s) from disk.  However, the device probe will need to
    be done again because we are resuming from a power off state
    so the hardware will not be in the same state as it was
    before the power off.  This makes Procedure 1 more attractive
    because then once the kernel boots, the processes that were
    running at the time are restored.  More involved to write,
    but quite possibly the better solution.
 
Your Ideas w.r.t. suspend/resume seem already to be covered by ACPI - at least they should be in there. Given the status of ACPI on many platforms and the portability of the code on different platforms not being Tier1, well, doing it all over again could be worthwile.

Regarding the hibernation, let me point you to a method already implemented which can write the process out to swap. Usually there is a confusion what paging and swapping is. What the system normally does is called paging when memory is rare, swapping occurs when paging does not provide enough memory fast enough. Then seperate processes are suspended for some time and completely removed from memory. Well almost, what the kernel keeps is the process structure and kernel stack space.

You may also consider booting the complete kernel untill init is launched but switch to resume when the system was hibernated. This would make sure the hardware is initialized and all drivers are working, also that all kernel threads are available. You would only need to re-attach the processes to the process list and resume them. This would also require to re-mount the swap space without doing the initialisation of the swap space because we still need the swapped information.

Some time ago I considered placing a bounty on a working hibernation which was not dependent on ACPI (and I guess a lot of users of no Tier-1 hardware would like to chip in), but somehow I never came to do it.

What would you consider appropriate bai.. hmm bounty?

In case someone could give us transportabe jails, jails you freeze on one system and which may be taken out of hibernation there or on a different system (provided that the kernel version is sufficiently close), that one would be welcomed to the hall of fame IMHO.
 
@Maelstorm: Please propose this on freebsd-acpi@freebsd.org mailing list. There are a lot more of FreeBSD developers around than on the forums and it would be a pity if your idea would go unnoticed.
 
I didn't expect to get responses this quickly.

Crivens said:
Your Ideas w.r.t. suspend/resume seem already to be covered by ACPI
- at least they should be in there. Given the status of ACPI on many platforms and
the portability of the code on different platforms not being Tier1, well, doing it
all over again could be worthwile.

If the ACPI functions are already there, then why is the current implementation not working? Or is it? The method that I propose actually doesn't use ACPI except for very minimal things like power off (My knowledge of ACPI and its capabilities is extremely limited). I wasn't aware that there was a distinction made between paging and swapping. I always considered it as pages swapped out to disk. Thanks for the info.

You may also consider booting the complete kernel until init is launched but
switch to resume when the system was hibernated. This would make sure the hardware is
initialized and all drivers are working, also that all kernel threads are available.
You would only need to re-attach the processes to the process list and resume them.
This would also require to re-mount the swap space without doing the initialization
of the swap space because we still need the swapped information.

That's what I was trying to elude to with method 1. I thought about the swap space which is why I listed it. I do not know what the mechanism is to initialize the swap space other than swapon. Is the swap space available to the kernel for read/write access before swapon is called? I would think it is since it's just another partition on the harddisk. How about we load the processes that are stored on the swap partition before the kernel transfers control to init? Or don't even call init since init would be one of the processes that was loaded from the swap.

Code:
In case someone could give us transportabe jails, jails you freeze on one system
 and which may be taken out of hibernation there or on a different system (provided that
 the kernel version is sufficiently close), that one would be welcomed to the hall of
 fame IMHO.

It is doable, but isn't a jail a combination of chroot and process jailing? It would be kinda hard to move the jailed process and its associated file system over to another machine unless you use something like tar to archive everything and NFS or a scripted ftp to transfer it. Method 1 does allow for single process freezing (it would be similar to a core dump, but restartable). I don't think that this was conceived of before.


Ime@ said:
@Maelstorm: Please propose this on freebsd-acpi@freebsd.org
mailing list. There are a lot more of FreeBSD developers around than on the forums and
it would be a pity if your idea would go unnoticed.

I tend to stay away from the mailing lists as I prefer the forum environment. However, I will post a link to this thread on the mailing list though and invite comments to this thread. I would definitely like to get more people on the forums.
 
Maelstorm said:
If the ACPI functions are already there, then why is the current implementation not working? Or is it?
The status of ACPI depends a lot on the work the manufacturer is willing to spend on it. Since most of these consider all the world to run various versions of windows, they only test against that. Just dump your ACPI and try to rebuild it straingt away with the compiler supplied by intel for this purpose. On my laptop, I get tons of warnings and even errors from the compiler. Do I need to say that power management does not work properly and suspend/resume not at all?
Maelstorm said:
Is the swap space available to the kernel for read/write access before swapon is called?
No. One point speaking for using the swap space instead of files is, you need to be able to access these files from very early in booting. Should these files happen to end up on something like a crypto-loopback file system, you may have trouble to deal with that.
Keeping the kernal internal data seperated from the user processes in the "freezer" would also enable you to have the swap space placed on some elaborated RAID system (like ZFS with compression and whatnot) but still being able to easily read the kernel data needed to access that file system from a seperated partition which is plain stupid /dev/xxx. Much easier to do in kernel space, and in this point simple is better. Hibernate is not a place I would value speed over 100% correctness.
Maelstorm said:
I would think it is since it's just another partition on the harddisk. How about we load the processes that are stored on the swap partition before the kernel transfers control to init? Or don't even call init since init would be one of the processes that was loaded from the swap.
That was the idea.
Maelstorm said:
It is doable, but isn't a jail a combination of chroot and process jailing? It would be kinda hard to move the jailed process and its associated file system over to another machine unless you use something like tar to archive everything and NFS or a scripted ftp to transfer it. Method 1 does allow for single process freezing (it would be similar to a core dump, but restartable). I don't think that this was conceived of before.
I have used cryopid on linux which does exactly that. It can freeze a process and restart it on other machines provided all the files are there as well.
 
I'm looking at the ACPI specification right now. It seems that ACPI is event driven. Knowing that, I think I'll write a kernel mode event handler that takes in all possible events and decides what to do. The way that I see it, there are only 6 possible actions that can be taken. Ignore, sleep, wake, hibernate, soft off, power off. For instance, closing the lid on a laptop would initiate a hibernate, soft off, power off, or could just simple be ignored.

I'm still going to have to think about this for awhile because I would like to use the available hardware, but because of such wide variances, I'm thinking that more of a complete software approach would be more in tune to FreeBSD becoming platform independent. IRQ counts are statistics that are already maintained by the kernel I believe, so the only hardware issue is to transition the various devices to a low power state, and that is handled by the device drivers.
 
You may start by reading the manpage for acpi, it may provide several starts for investigation.

Deciding if you can sleep based on irq statistics may not be possible. For example, you do not know what irqs are of interest and which may be ignored. Going to sleep while the wifi only recives static noise may be desirable, but doing so while even a slow download is in progress will surely annoy the user. How will you know the difference?
 
Two things:

People have been working on ACPI and suspend/resume for a long time. These are just hard problems to solve. I've wondered if it would help to pick a certain reference machine like one of the popular Lenovo Thinkpad models and push to get complete suspend/resume support for that system. That could be a template and example for other systems. Even better if some cooler-than-average manufacturer could donate some systems.

Second, the mailing lists have been the preferred choice for FreeBSD developers for many years. Don't let the user interface keep you from that valuable source of information. One of the web/newsgroup interfaces like gmane.org might be helpful.
 
Actually, I do know the IRQs of interest. IRQ 1 is always the keyboard. And I think that IRQ 12 is the mouse. IRQ 3 & 4 are generally used for serial ports. This is why I plan to use a bitmask to determine which interrupts to monitor. Furthermore, device drivers can also collect statistics about the devices they control. Say, a USB Ethernet driver or a WiFi driver can maintain a counter as to how many valid frames have been received, and those can be queried periodically (I set the IRQ monitoring thread to run once a minute in normal mode, once per second in sleep mode.). I have also decided to have actions be isolated from each other as well. If the mouse/keyboard does not have any activity in 15 minutes (configurable) then turn off the monitor. If there have been no HD interrupts for 3 min (also configurable), spin down the harddisks.

As for the WiFi slow download, I have a couple of ideas for that. From what I have observed of WiFi, there is a periodic heartbeat that is sent to/from the access point. This heartbeat operates at a very low rate...I think one or two packets per second. So any download will significantly increase the number of interrupts for the given time frame of 1 second. Now if the sleep timeout is 20 minutes, then there are packet bursts which increase that rate. If there are more than 10 interrupts in 1 second, then the WiFi is active and should not be powered down. The same thing with the harddisk. Furthermore, during a download, both the network and harddisk interrupts will be firing.

WBlock@: The code that I'm planning on writing is all about event generation and acting on those events. I figure that people have been working on it, but perhaps its time to take a different approach. It seems to me that developers are trying to use a hardware approach to solve this problem. That works for some systems, but not others which is the problem that we have now. I'm proposing a mostly software solution that does the timing entirely in software and generates events based on that. There are some hardware generated events such as pressing the sleep or power button, or closing the lid of a laptop.

I don't know how Microsoft implements it in Windows, but it has worked on Windows for years. I think it has been more than time to get this working on FreeBSD. Granted, FreeBSD is generally a server operating system and people who are looking to use it on a laptop/netbook should probably use PC-BSD instead, but this is functionality that is broken and should be fixed.

As for the mailing list, I'll just join it and be done with it. I have yet to receive a message from the list moderator.
 
Maelstorm said:
Granted, FreeBSD is generally a server operating system and people who are looking to use it on a laptop/netbook should probably use PC-BSD instead, but this is functionality that is broken and should be fixed.

Using PC-BSD doesn't gain one anything, since the underlying OS remains FreeBSD. Fairly identical installations of FreeBSD 9.1 beta on my laptop and desktop behave differently; fortunately my primary work machine, a non-portable workstation, resumes flawlessly.

Given unlimited budget and resources maybe this would be fixed sooner but being a server centric OS it is understandable that suspend/resume hasn't received more focus. On the other hand, making it easier to use FreeBSD as an OS for desktop users ought to be welcomed by developers of all sorts and that in turn could lead to broader adoption of FreeBSD as a server OS too. Or maybe that's just wishful thinking.

In the meantime I hope my workstation continues to suspend and resume - I hold my breath each time I build and install world and kernel. ;)
 
Maelstorm said:
I don't know how Microsoft implements it in Windows, but it has worked on Windows for years.

Hardware manufacturers have employees that are paid to get ACPI working on their hardware in Windows.

As for the mailing list, I'll just join it and be done with it. I have yet to receive a message from the list moderator.

What moderator? :) Most FreeBSD mailing lists are unmoderated. You may also find the freebsd-wireless mailing list useful.
 
Back
Top