These are the ramblings of Matthijs Kooijman, concerning the software he hacks on, hobbies he has and occasionally his personal life.
Most content on this site is licensed under the WTFPL, version 2 (details).
Questions? Praise? Blame? Feel free to contact me.
My old blog (pre-2006) is also still available.
Sun | Mon | Tue | Wed | Thu | Fri | Sat |
---|---|---|---|---|---|---|
1 | 2 | 3 | ||||
4 | 5 | 6 | 7 | 8 | 9 | 10 |
11 | 12 | 13 | 14 | 15 | 16 | 17 |
18 | 19 | 20 | 21 | 22 | 23 | 24 |
25 | 26 | 27 | 28 | 29 | 30 | 31 |
(...), Arduino, AVR, BaRef, Blosxom, Book, Busy, C++, Charity, Debian, Electronics, Examination, Firefox, Flash, Framework, FreeBSD, Gnome, Hardware, Inter-Actief, IRC, JTAG, LARP, Layout, Linux, Madness, Mail, Math, MS-1013, Mutt, Nerd, Notebook, Optimization, Personal, Plugins, Protocol, QEMU, Random, Rant, Repair, S270, Sailing, Samba, Sanquin, Script, Sleep, Software, SSH, Study, Supermicro, Symbols, Tika, Travel, Trivia, USB, Windows, Work, X201, Xanthe, XBee
My book about Arduino and XBee includes a chapter on battery power and sleeping. When I originally wrote it, it ended up over twice the number of pages originally planned for it, so I had to severely cut down the content. Among the content removed, was a large section talking about interrupts, sleeping and race conditions. Since I am not aware of any other online sources that cover this subject as thoroughly, I decided to publish this content as a blogpost separately, which is what you're looking at now.
In this blogpost, I will first explain interrupts and race conditions using a number of examples. Then sleeping is added into the mix, which again results in some interesting race conditions. All these examples have been written for Arduino boards using the AVR architecture, but the general concepts apply equally well to other platforms.
The basics of interrupts and sleeping on AVR are not covered in detail here. If you have no experience with this, I recommend these excellent articles on interrupts and on sleeping by Nick Gammon, which cover interrupts, sleeping and other powersaving in a lot of detail.
In this post, I will show relevant snippets of the code, but omit trivial things like constant definitions or pinmode settings. The full example sketches can be downloaded as a tarball and each will be separately linked below as well.
This first example will explore the use of interrupts, starting without
sleeping, then sleeping will be added. The sketch will light
up the internal led whenever a button is pressed and keep it lit until
the button is not pressed for four seconds. Button presses will be
detected using an external interrupt, using the Arduino
attachInterrupt()
function.
When an interrupt happens, an ISR function will be
called. Note that when using attachInterrupt()
, you are not defining a
real ISR, but just a normal function that gets called by the ISR that is
hidden inside the Arduino code. This "ISR" looks like this:
// Time of the last buttonpress
volatile uint32_t last_press = 0;
// On a buttonpress: turn on led and record time
void buttonPress() {
digitalWrite(LED_BUILTIN, HIGH);
last_press = millis();
}
Pressing the button turns on the LED, and remembers the timestamp of the buttonpress. If the LED is already on, the LED will be unchanged, but the timestamp will be updated.
To keep this timestamp, a global variable is defined, so it can be
accessed both from the interrupt handler and the loop. Note that it is
declared with the keyword volatile
. This keyword tells the compiler
that the variable is used from inside an interrupt handler. Formally, it
is a bit more complicated than that, but in most cases it is ok to
remember to use volatile
on all variables that are written to inside
an interrupt handler and read or written outside an interrupt handler.
This keyword tells the compiler that the variable can change at any
time, and that the compiler should not optimize away access to this
variable.
To make sure that this function is called when the button is pressed, it has to be registered with the Arduino code:
void setup () {
// Set up button
pinMode(BUTTON_PIN, INPUT_PULLUP);
attachInterrupt(BUTTON_INT, buttonPress, FALLING);
// Set up led
pinMode(LED_BUILTIN, OUTPUT);
}
This sets up the input and output pins and registers the
buttonPress()
interrupt handler. It should get called on every
falling edge (so when the button is pressed).
Note that this example completely ignores switch bouncing (which is the effect that when you push or release a switch, it will very swiftly connect and disconnect a few times, causing the interrupt handler to trigger multiple times). Bouncing is not a problem for this example, but check out this article for more info on connecting switches, including some strategies for debouncing.
So, now you have a way to turn the light on, but you also need to turn
it off when a timeout has passed. This is handled by the loop()
function:
void loop () {
if (millis() - last_press >= TIMEOUT)
digitalWrite(LED_BUILTIN, LOW);
}
Here, (millis() - last_press)
is the time since the last button
press. Whenever that time becomes larger that TIMEOUT
, the led is
turned off (and, until the button is pressed again, it will continue
being turned off every loop, which is not terribly useful, but won't
hurt either).
Note that this particular way of handling things correctly handles
millis()
overflow. For more info, see this article.
If you upload the sketch to your Arduino board, you will have a button-controlled blinky light with timeout. Perfect! Or is there perhaps still a problem?
Perhaps, while playing with your brand new blinky toy, you noticed it did not always work as expected. If not, go ahead and press the button repeatedly. As long as you press it at least once every four seconds, the led should always remain on, right? If you keep pressing the button (as fast as you want), you will see that sometimes the led actually turns off anyway. It might need a couple dozen of presses, but it should happen eventually.
How is this possible? To understand what is going on, you will have to understand that the AVR microcontroller is an 8-bit microcontroller. This means that most of its operations, and in particular accessing memory, happen one byte (8 bits) at a time.
Note that the last_press
variable is a uint32_t
variable,
meaning it is 32 bits, or 4 bytes long. So when the loop()
function
needs to read it, each of these bytes are read from memory, one by one,
using separate instructions.
Now consider what happens when the pin interrupt triggers in the middle
of this operation? loop()
will have fetched some of the bytes but then
the interrupt handler will change the variable in memory, after which
the loop()
continues to fetch the remaining bytes from memory. The
result is that half of the bytes come from the old value, while half of
the bytes come from the new value. This will likely cause the comparison
to completely mess up and return true even when the timeout has not
expired yet.
If this seems rather unlikely: you have seen it happening, perhaps even
multiple times if you kept going for a while. Since the microcontroller
is not doing a whole lot except for checking the last_press
variable
over and over again, the chance of the interrupt triggering at the exact
right time is actually fairly significant.
What you are seeing here is what is commonly called a race condition. Generally speaking, a race condition is present when two things need to happen in a certain order but it depends on chance whether they will actually happen in the right or wrong order. Typically, when a race condition is present, the events usually happen in the correct order, and only rarely the incorrect order occurs. This often makes race conditions particularly hard to reproduce, with problems occurring occasionally in your software in production, but never on the developer's system where they could be diagnosed. Because of this, recognizing race conditions early is a big win. Whether you are dealing with interrupts in a microcontroller, multiple threads or processes in a bigger operating system, or true concurrency with multiple processors or cores, wherever there are multiple concurrent threads of execution, race conditions will be lurking around the corner.
In this case, the correct ordering of events is that the interrupt
should be handled either before or after all four bytes of
last_press
are loaded, but not in between the loading of the bytes.
Another term often used for this is to say that last_press
must be
loaded atomically, meaning it should not be possible to be interrupted
halfway.
If you look closely, you might find there is a second race condition.
Consider what happens what happens when the interrupt triggers after the
last_press
variable was loaded, but before the led is disabled?
In this case, the led will be turned on by the ISR, but it is
immediately disabled again by loop()
(which had already decided it
would turn off the led). So instead of staying on for 4 more seconds,
the led stays off. Triggering this bug requires pressing the button at
the exact moment the light is about to turn off, so it is very unlikely
that you will trigger this bug in your testing. But if you would design
a device containing this code and produced it a million times, some of
your users will probably see the bug. It is common to say that the
window for triggering this bug is very small.
The most common way to fix this is fairly simple: disable interrupts around code that needs to be executed atomically. This ensures that an interrupt cannot occur during the code, guaranteeing correct ordering. If any interrupt triggers while interrupts are disabled, the CPU will queue the interrupt (by setting an interrupt flag bit in a register) and as soon as interrupts are enabled again, the interrupt handler will run.
Usually, you should try to only disable interrupts for a very short time. The longer interrupts are disabled, the longer any queued interrupts will have to wait, which can cause problems with timing-sensitive applications (just like interrupt handlers themselves must be short).
So, what does this mean for the code? Just disable interrupts before
the if
and re-enable them afterward. This ensures that loading the
last_press
value, but also the decision and turning off the led now
happen atomically, forbidding the interrupt to trigger halfway. This
looks like this:
void loop () {
noInterrupts();
if (millis() - last_press >= TIMEOUT)
digitalWrite(LED_BUILTIN, LOW);
interrupts();
}
This uses the noInterrupts()
and interrupts()
functions defined
by Arduino to disable and re-enable interrupts globally (meaning all
interrupts are disabled). You might also encounter cli()
(clear
interrupt bit) and sei()
(set interrupt bit), which are the
AVR-specific versions of the same functions. Using the Arduino versions
makes it easier to port the code to other architectures too.
With this change applied to the sketch, you should be able to keep punching the button over and over again, with the led staying on (until you stop pressing for four seconds, of course).
Now that you have an interrupt controlled button working, time to add sleeping. One of the reasons to use interrupts, is that (only) an interrupt can wake an Arduino from its sleep. This means that with the code shown above, once the LED is off and the Arduino is waiting for the next button press, it can just go to sleep, knowing it will be woken up when a button is pressed:
void loop () {
noInterrupts();
if (millis() - last_press >= TIMEOUT) {
digitalWrite(LED_BUILTIN, LOW);
doSleep();
}
interrupts();
}
This is just the previous loop()
function, with the doSleep()
call added after the led is turned off (the doSleep()
function is
shown below). Since a button press wakes up the microcontroller, it will
end up waiting for a button press in slumber and only resuming with the
code after doSleep()
when a button was pressed.
Note that doSleep()
is called with interrupts disabled. You might think it
would be good to re-enable interrupts after turning off the LED, to keep them
disabled as short as possible. However, this would introduce another
race condition: Consider what would happen if the button interrupt would
trigger after turning off the led, but before going to sleep? In this
case, the interrupt handler would turn on the led, and then the
microcontroller goes to sleep. During sleep the loop()
function will
not run to detect that four seconds have passed, so the led will stay on
indefinitely (until pressing the button wakes up the microcontroller
again).
By keeping interrupts disabled when calling doSleep()
, this is
avoided. However, the interrupts cannot remain disabled when actually
going to sleep, since then the microcontroller can never wake up again.
So they should be re-enabled just before going to sleep:
void doSleep() {
set_sleep_mode (SLEEP_MODE_PWR_DOWN);
sleep_enable();
interrupts();
sleep_cpu ();
sleep_disable();
}
But what if a button press happens between disabling interrupts and
re-enabling them again just before sleeping? As interrupts are
disabled, the interrupt handler cannot run, but a flag will be set and
the ISR will run after interrupts are enabled again. Now, the AVR
architecture guarantees that after interrupts are enabled, at least one
instruction runs uninterrupted. In this case, this means that the
sleep_cpu()
function (which translates to a single sleep
instruction) always runs. If an interrupt is pending, it will be
processed after the sleep
instruction, causing the microcontroller
to wake up immediately again (which is not terribly efficient, but it
is correct).
So, with this sketch, the Arduino is sleeping while the LED is off, significantly reducing power usage. However, when waiting for these 4 seconds to pass, the Arduino is still running and consuming power. To also sleep while waiting, you will need some way to wake up after 4 seconds have passed.
With what you have seen so far, you could add an external timer that pulls a line high or low after some time. If you connect this line to an Arduino interrupt pin, you can have the Arduino wake up at the right moment. In fact, this approach is sometimes used combined with a real-time clock (RTC) module, which is particularly good at keeping accurate time and tracking long timeouts.
Fortunately, the AVR microcontrollers also feature a number of internal timers that can be used for this purpose. The most power efficient timer is the watchdog timer. Its original purpose is to automatically reset the microcontroller when it is locked up, by keeping a counter and resetting the system if the software does not regularly reset this counter.
However, the watchdog module has a second mode, where it does not reset the entire microcontroller, but just fires an interrupt. Since the watchdog timer runs on a private 128kHz oscillator, it can also run while in power-down mode (where all other clocks are disabled) and its interrupt can cause a wakeup. In addition to being low-power, this oscillator is not very accurate and cannot handle arbitrary intervals. If you need a more precise timing source, you should look at using power-save mode and run timer2 in asynchronous mode, using a secondary crystal.
To use the watchdog timer for the four-second timeout, the
buttonPress()
interrupt handler should be modified as follows:
void buttonPress() {
digitalWrite(LED_BUILTIN, HIGH);
wdt_reset();
enable_wdt_interrupt(WDTO_4S);
}
As before, this turns on the led. However, instead of keeping a
last_press
timestamp, this resets the watchdog timer (restarting its
counter if it was already counting) and it enables the watchdog and
configures it to count up to four seconds in case it was not running
yet.
Note that this uses the enable_wdt_interrupt()
function to enable the
watchdog timer interrupt, but not the watchdog timer itself.
Unfortunately, avr-libc does not provide a function for this, so
this uses a custom wdt_interrupt.h
file that you need to put
alongside the sketch.
Now, when the watchdog timer expires, an interrupt is triggered. This
wakes up the microcontroller and then runs the WDT_vect
interrupt
handler, which is defined as follows:
ISR(WDT_vect) {
digitalWrite(LED_BUILTIN, LOW);
wdt_disable();
}
When the timeout happens, the led is turned off, and the watchdog is disabled again (it will be re-enabled when you press the button).
Since both of the interrupt handlers above already take care of
all behavior needed, there is nothing left for the loop()
function
to do other than to sleep:
void loop () {
doSleep();
}
Now with this sketch all the race conditions you fixed
earlier can no longer occur now, since the loop()
function no longer
makes any decisions about whether to sleep, and the two interrupt
handlers cannot interrupt each other (so each will run atomically
already).
So, hopefully after reading this post, you will have a better idea of the challenges and race conditions involved when making an Arduino sleep. If you have further questions or remarks, feel free to leave them below!
Comments are closed for this story.