Difference between revisions of "Script for Undervolt Stress Testing"
| (No difference) | |
Latest revision as of 12:20, 12 September 2008
This script helps in calibrating voltages when undervolting a Pentium M processor.
People have many different tolerances for how far they will undervolt their system. Some are eager to just run their Pentium-Ms at 700mV and abandon safety; they ramp their systems as far as they can without crashing their system, and maybe they pull the voltages up a margin from the failure point. However, this provides only a weak degree of security as a number of failures can occur that might not surface immediately. In the worst case, the system will fail months later, and the blame might be assigned to, say, a kernel upgrade or patch when really the system failed due to intermittent lack of power.
Many would like to guard themselves again such a failure and consequently have opted to run a prime number stress test such as MPrime in a "torture test" mode, while they ramp down their voltages to find a comfortable margin from the failure point. However, as per recommendations from a thread of the Linux-Thinkpad mailing list, perhaps even more can be done. Following such advice, this script not only runs MPrime, but also toggles on and off a lot of power-demanding features of the laptop throughout the course of the test. The idea is to more rapidly expose corner cases in which the system might act up.
This page contains a large amount of code. The actual code should be moved to a dedicated code article, to make easier to download and edit.
#!/bin/bash
#
# DESCRIPTION AND MOTIVATION 
# --------------------------
# Designed for an undervolted laptops with frequency stepping, this script
# swings the system between aggressive and low power use, and also swings
# among the available frequencies.
# 
# The idea is that such exteme use of the system will likely explore corner
# cases where the system might fail.  Hopefully, such testing can curtail the
# time necessary to establish confidence in undervolted systems.
#
# In the background the MPrime program, a prime number search engine, runs in a
# "torture test" mode, in which it tests computations against known results and
# errs out if there's a discrepancy.  Unless it errs out, this script runs
# forever.
# 
# IMPLEMENTATION
# --------------
# The design of this script attempts to address laptops beyond the Thinkpad T42
# for which it was designed.  Many of the function definitions are prepended
# with conditionals that check the system for functionality and either bail out
# or disable features accordingly.
#
# In particular, the nature of what "aggressive" constitutes is defined by a 
# number of "toggle_" functions.  The pre-pended conditional to these functions
# appends the function name to $AGGRESSIVE_TOGGLES if the system appears to
# support the feature.  The toggle_aggression function then calls all the 
# functions in $AGGRESSIVE_TOGGLES.  Look at these "toggle_" functions for 
# examples of how to extend this script for other possible stressing.
#
# EXTERNAL PROGRAMS EMPLOYED
# --------------------------
# Test system integriy (required):  MPrime - http://www.mersenne.org/prime.htm
# Download files:  curl - http://curl.haxx.se
# Read random sectors from CD:  spew (for gorge) - http://spew.berlios.de
# Keep hard disk active:  stress - http://weather.ou.edu/~apw/projects/stress/
#
# EXECUTION
# ---------
# Read this script including all the warnings below, and then make sure all the
# variables in the "Script Globals" section are appropriately set. 
#
# This script uses the mprime binary with the "-t" switch for the MPrime
# "torture test."  This test by default uses all the memory available on the
# system.  However, if you run this system for many hours, your kernel may run
# out of memory, and kill mprime and this script.  To spare yourself this
# problem, use the "NightMemory=" and "DayMemory=" parameters in MPrime's
# local.ini file, a file typically in the same directory as the mprime
# executable (read the MPrime documentation for specifics).  The torture test
# by default uses the greater of these two settings, so just set them both a
# reasonable margin away from the total amount of memory available on your
# system.  On a system with 512MB of RAM, I set these parameters both to 448,
# and had enough memory left over to run my normal set of background processes.
#
# The arguments of this script are "aggression" toggles to disable.  Any
# function below that begins with "toggle_$OPTION" can be disabled by using
# $OPTION as one of the arguments of this script.  Otherwise, all the stressing
# that a system supports are enabled by default.
#
# Because of Warning 3 below, I recommend you run this script as
#
#     stress_test 2>&1 | tee output
#
# so that you have a persistent record of what has happened in case your battery
# drains completely.
#
# Keeping in mind Warning 1.1, run the script for as long as it takes to 
# establish confidence in your system (a few hours, half a day, etc.).
#
# WARNINGS
# --------
# 1) This is a STRESS test, and it is very possible that you may witness some
# very bad behavior.  Some systems might already be on the verge of breaking,
# and this script might push them over the edge, and damage them irreparably.
# Especially since you've probably undervolted your system, please accept the
# inherent risk in running this script.  In fact, I have even seen some
# unexpected behavior on non-undervolted systems running this script.  
#
# 1.1) This is a STRESS test, and it will run your system very hot at times.
# Since you are probably running this test because you've undervolted your
# system, you assumedly care a lot about conserving your battery's charge.
# However, running a system hot and needlessly running through charging cycles
# will tax your battery more than just normal use.  It is very difficult to
# even estimate how much of your battery's life you may throw away running
# this test.  In all likelihood on a battery that's not too old or too new, it
# should be imperceptible, and the security you'll gain after running this test
# will be worth it.  You can alway run this script without the battery
# connected -- just run it with an "ac_via_smapi" argument to disable 
# toggling from the ac to battery power.
#
# 2) Please READ THIS SCRIPT BEFORE RUNNING IT.  It was very much designed for my
# personal system, and although it worked very well for my needs, it relies 
# heavily on a number of external programs for full functionality.  Finding these
# programs isn't so bad (with the exception of MPrime all were available as 
# Debian packages -- spew, gorge, curl, etc.).  As I noted above, I've tried to 
# structure this script such that it can be extended (as opposed to overwritten) 
# to support other functionality.  However, you should also read this script 
# entirely because it's not mature, so it's difficult for me to document all the 
# strange ways in which it might behave under various circumstances.
#
# 3) This script might drain your battery completely.  It has some strong measures
# to prevent that from happening, but I can't make guarantees.   
#
# 4) Be mindful that upon breaking out of this script, your system maybe not be
# in an agreeable state.  There is a bash trap that performs a lot of cleanup 
# if you exit with a Ctrl-C.  But I didn't make the code to revert the CD's speed, 
# the wireless device's original txpower, the display's brightness, etc.  Also, the 
# bash trap isn't perfect, and might fail to restore the system.
#
set -e  # Script designed to bail out on any irregularities.
##############################################
# SCRIPT GLOBALS                             #
#  (may need some adjusting for your system) #
##############################################
MPRIME_BIN="./gimps/mprime" # MPrime binary location (get from
                            #   http://www.mersenne.org/freesoft.htm)
AGGRESSIVE_SLEEP_SEC=90     # Seconds for "agressive" testing interval when 
                            #   testing with a fixed frequency
NONAGGRESSIVE_SLEEP_SEC=120 # Seconds for non-"aggressive" testing interval
                            #   when testing with a fixed frequency
FREQ_CYCLE_SLEEP_SEC=15     # Seconds for each random frequency when testing
                            #   with a fixed aggression
FREQ_CYCLE_NUM=15           # Number of random frequencies to cycle through 
                            #   when testing with a fixed aggression
CAPACITY_LIMIT=50           # Minimum mWh required in battery before the script
                            #   takes time out to recharge the battery
SECONDS_TO_CHARGE=300       # Seconds to charge is $CAPACITY_LIMIT is reached
WIFI_DEVICE=eth1            # Set to garbage if you don't want to use wifi 
MAX_TXPOWER=20              # Tx power (dB) used for wifi device in aggressive
                            #   mode (off in non-aggressive mode)
CDROM_DEV_FILE=/dev/hdc     # Set to garbage if you don't want to use the CD-ROM
MAX_CD_SPEED=24             # Speed of CD in aggressive mode (off in
                            #   non-aggressive mode)
# Some services need to be stopped to prevent a conflict with
# aggressive/non-aggressive mode settings.  These services are restarted in
# reverse order upon the script's exit.  You can customize the path to these
# scripts here if your flavor of GNU doesn't use /etc/init.d/.
#
SERVICES_TO_STOP="tpsmapi powernowd acpid sleepd laptop-mode"
PATH_TO_SERVICES_SCRIPTS="/etc/init.d"
# Some info that should be in SysFS or ProcFS.
#
SYS_CPU_DIR=/sys/devices/system/cpu/cpu0/cpufreq
FREQS="$(cat $SYS_CPU_DIR/scaling_available_frequencies)"
FREQS_ARRAY=($FREQS)
SYS_TPSMAPI_BAT_DIR=/sys/devices/platform/smapi/BAT0
IBM_ACPI_BRIGHTNESS_FILE=/proc/acpi/ibm/brightness
RF_KILL_FILE=/sys/class/net/$WIFI_DEVICE/device/rf_kill
############
# BINARIES #
############
#
# Establishes paths for all binaries to make it easier for functions to test if
# they are executable with 'test -x "$BINARY_BIN"'.  
#
{
  CURL_BIN=$(which curl)
  GORGE_BIN=$(which gorge)
  STRESS_BIN=$(which stress)
  IWCONFIG_BIN=$(which iwconfig)
  IFUP_BIN=$(which ifup)
  IFDOWN_BIN=$(which ifdown)
  EJECT_BIN=$(which eject)
  CPUFREQSET_BIN=$(which cpufreq-set)
  KILLALL_BIN=$(which killall)
  RENICE_BIN=$(which renice)
} || true
#############
# FUNCTIONS #
############# 
# clean_up()
#
# Kills mprime background job and starts services that were stopped at the
# beginning of the scripts execution.
#
if [ ! -x "$KILLALL_BIN" ]
  then echo "Sorry, this script uses killall" ; exit 1
fi
for service in $SERVICES_TO_STOP ; do
  if [ ! -x "$PATH_TO_SERVICES_SCRIPTS/$service" ]
    then echo "$PATH_TO_SERVICES_SCRIPTS/$service can't be called." ; exit 1
  fi
done
clean_up()
{
  $KILLALL_BIN -q mprime || true
  if [ "$AGGRESSIVE" = "true" ] ; then toggle_aggression ; fi
  local SERVICES_TO_START=""
  for service in $SERVICES_TO_STOP
    do SERVICES_TO_START="$service $SERVICES_TO_START"
  done
  for service in $SERVICES_TO_START
    do $PATH_TO_SERVICES_SCRIPTS/$service start
  done
}
trap "echo 'cleaning up...' ; clean_up" SIGINT SIGTERM SIGHUP
# do_sleep()
#
# Before starting a testing interval, checks in the battery is low, and charges the
# battery if necessary.  After the testing interval, the running status of the 
# mprime background job is verified. 
#
# TODO: I've not addressed multiple batteries, APM, or ACPI.
#
if [ ! -r "$SYS_TPSMAPI_BAT_DIR/remaining_capacity" ] 
  then 
    echo -n "WARNING: Thinkpad SMAPI SysFS interface not " > /dev/stderr
    echo "available to detect if battery" > /dev/stderr
    echo -n "         level too low.  This script could drain " > /dev/stderr
    echo "all of your battery." > /dev/stderr
fi
do_sleep()
{
  if [ -r "$SYS_TPSMAPI_BAT_DIR/remaining_capacity" ] ; then
    local REMAINING_CAPACITY
    while REMAINING_CAPACITY=$(cat $SYS_TPSMAPI_BAT_DIR/remaining_capacity \
                                2> /dev/std) \
      && REMAINING_CAPACITY=${REMAINING_CAPACITY%% *} \
      && [ "$REMAINING_CAPACITY" ] \
      && [ "$REMAINING_CAPACITY" -lt "$CAPACITY_LIMIT" ] ; do
        echo ; echo -n "Battery is too low to continue, " 
        echo "taking a break to charge up."
        OLD_AGGRESSIVE="$AGGRESSIVE"
        if [ "AGGRESSIVE" = "true" ] ; then toggle_aggression ; fi
        sleep $SECONDS_TO_CHARGE 
        if [ ! "$OLD_AGGRESSIVE" = "$AGGRESSIVE" ] ; then toggle_aggression ; fi
    done
  fi
  sleep $1
  if kill -0 $MPRIME_PID 2> /dev/null 
    then return 0
    else 
      echo ; echo "mprime bailed out here!"
      clean_up
      exit 1
  fi
}
# set_frequency()
#
# Changes the frequency of the processor to $1.
#
# TODO: Perhaps there should be other ways to change the frequency another way.
#       I found cpufreq-set convenient because it handles both ProcFS _and_
#       SysFS.
#
if [ ! -x "$CPUFREQSET_BIN" ] ; then
  echo "Sorry, the set_frequency() function needs to be updated" > /dev/stderr
  echo "    to change frequencies without cpufreq-set." > /dev/stderr
  exit 1
fi
set_frequency()
{
  $CPUFREQSET_BIN -f $1
}
# toggle_ac_via_smapi()
#
# If the system is an Thinkpad with the tp_smapi kernel module set up, the 
# ac power is cut in an aggressive mode and returned in the non-agressive mode. 
#
if [ -w "$SYS_TPSMAPI_BAT_DIR/force_discharge" \
  -a -w "$SYS_TPSMAPI_BAT_DIR/inhibit_charge_minutes" ]
    then AGGRESSIVE_TOGGLES="$AGGRESSIVE_TOGGLES toggle_ac_via_smapi"
fi
toggle_ac_via_smapi()
{
  if [ "$AGGRESSIVE" = "true" ]
    then
      echo 0 > $SYS_TPSMAPI_BAT_DIR/force_discharge 
      echo 0 > $SYS_TPSMAPI_BAT_DIR/inhibit_charge_minutes
    else 
      echo 1 > $SYS_TPSMAPI_BAT_DIR/force_discharge 
      echo 5 > $SYS_TPSMAPI_BAT_DIR/inhibit_charge_minutes
  fi
}
# toggle_ibm_acpi_brightness()
#
# If the Thinkpad ibm_acpi kernel module is set up, the brightness of screen
# is set to the brightest setting in an agressive mode and the dimmest setting
# otherwise.
#
if [ -w "$IBM_ACPI_BRIGHTNESS_FILE" ]
    then AGGRESSIVE_TOGGLES="$AGGRESSIVE_TOGGLES toggle_ibm_acpi_brightness"
fi
toggle_ibm_acpi_brightness()
{
  if [ "$AGGRESSIVE" = "true" ]
    then echo level 0 > $IBM_ACPI_BRIGHTNESS_FILE
    else echo level 7 > $IBM_ACPI_BRIGHTNESS_FILE
  fi
}
# toggle_intel_wireless()
#
# Turns the wireless device on in power-hogging mode when aggressive, and
# turns the device off otherwise.
#
# NOTE: Designed for the Intel 2200BG open source driver, and may not be 
#   compatible with much else.  
#
if [ -w "$RF_KILL_FILE" -a -x "$PKILL_BIN" -a -x "$IFDOWN_BIN" \
  -a -x "$IFUP_BIN" -a -x "$IWCONFIG_BIN" -a "$WIFI_DEVICE" ] \
    && grep "$WIFI_DEVICE" /proc/net/wireless
      then 
        AGGRESSIVE_TOGGLES="$AGGRESSIVE_TOGGLES toggle_intel_wireless"
        $IWCONFIG_BIN $WIFI_DEVICE txpower $MAX_TXPOWER
        $IWCONFIG_BIN $WIFI_DEVICE power off
fi
toggle_intel_wireless()
{
  if [ "$AGGRESSIVE" = "true" ]
    then echo 1 > $RF_KILL_FILE
    else 
      echo 0 > $RF_KILL_FILE
      $PKILL_BIN ^ifdown$\|^ifup$ || true
      $IFDOWN_BIN $WIFI_DEVICE 2> /dev/null || true
      $IFUP_BIN $WIFI_DEVICE 2> /dev/null
      local NUM_OF_TRIES=0
      while $IWCONFIG_BIN $WIFI_DEVICE | grep unassociated > /dev/null \
          && [ "$NUM_OF_TRIES" -lt 15 ]
        do sleep 3
        NUM_OF_TRIES=$(($NUM_OF_TRIES + 1))
      done
  fi
}
# toggle_gorge()
#
# In an aggressive mode, reads data from the CD-ROM at random offsets using the 
# 'gorge' command (http://spew.berlios.de/).
#
# NOTE: Don't use a DVD, as the speed set by `eject' doesn't affect DVDs.
#
# NOTE: Make sure to use a CD with more than 450MB of data.
#
if [ -x "$GORGE_BIN" -a -x "$KILLALL_BIN" -a -r "$CDROM_DEV_FILE" ]
  then AGGRESSIVE_TOGGLES="$AGGRESSIVE_TOGGLES toggle_gorge"
fi
toggle_gorge()
{
  if [ "$AGGRESSIVE" = "true" ]
    then $KILLALL_BIN -q $GORGE_BIN || true
    else 
      $GORGE_BIN -r 450M $CDROM_DEV_FILE 2> /dev/null &
      local GORGE_PID=$!
      #
      # My laptop needed a little priority push to get gorge CD reading started
      # in sync with the interval.
      #
      if [ -x "$RENICE_BIN" ]
        then $RENICE_BIN -2 -p $GORGE_PID > /dev/null
      fi
  fi
}
# toggle_stress()
#
# Runs the `stress' program (http://weather.ou.edu/~apw/projects/stress/) in 
# the aggressive mode with settings to issue a large number of write(), 
# unlink(), and sync() events.
#
if [ -x "$STRESS_BIN" -a -x "$KILLALL_BIN" ]
  then AGGRESSIVE_TOGGLES="$AGGRESSIVE_TOGGLES toggle_stress"
fi
toggle_stress()
{
  if [ "$AGGRESSIVE" = "true" ]
    then $KILLALL_BIN -q $STRESS_BIN || true
    else $STRESS_BIN -q -i 1 -d 1 &
  fi
}
# toggle_curl()
#
# Downloads a file (to drain power through the wireless device) in the
# aggressive mode using `curl'.
#
if [ -x "$CURL_BIN" -a -x "$KILLALL_BIN" ]
  then AGGRESSIVE_TOGGLES="$AGGRESSIVE_TOGGLES toggle_curl"
fi
toggle_curl()
{
  URL_FIRST_HALF="http://cdimage.debian.org/cdimage/weekly-builds/"
  URL_SECOND_HALF="i386/iso-cd/debian-testing-i386-binary-1.iso"
  if [ "$AGGRESSIVE" = "true" ]
    then $KILLALL_BIN -q $CURL_BIN || true
    else $CURL_BIN $URL_FIRST_HALF$URL_SECOND_HALF > /dev/null 2> /dev/null &
  fi
}
# toggle_aggression()
#
# Runs all the "toggle_" functions supported by the system unless specified
# as disabled in the script arguments.
#
for toggle_to_disable in $@ 
  do AGGRESSIVE_TOGGLES=$(echo $AGGRESSIVE_TOGGLES \
                            | sed -e "s/toggle_$toggle_to_disable//")
done
toggle_aggression()
{ 
  for toggle in $AGGRESSIVE_TOGGLES ; do $toggle ; done
  if [ "$AGGRESSIVE" = "true" ]
    then AGGRESSIVE="false"
    else AGGRESSIVE="true"
  fi
}
#########
# SETUP #
#########
# Stopping services that might interfere with the system state this script
# controls (precondition satisfied in definition of clean_up).
#
for service in $SERVICES_TO_STOP
  do /etc/init.d/$service stop
done
# Setting CD to a fast speed 
#
if [ -x "$EJECT_BIN" ] 
  then $EJECT_BIN -x $MAX_CD_SPEED
elif [ -x "$HDPARM_BIN" ]
  then $HDPARM_BIN -E $MAX_CD_SPEED
fi 
# Starting the prime number search
#
if [ ! -x "$MPRIME_BIN" ] ; then 
  echo "mprime program not executable/found." > /dev/stderr
  exit 1
fi  
$MPRIME_BIN -t > mprime_output.txt &
MPRIME_PID=$!
########
# BODY #
########
while true ; do
  for f in $FREQS ; do
    echo "Cycling aggression twice for ${f}kHz: "
    set_frequency $f
    if [ ! "$AGGRESSIVE" = "true" ] ; then toggle_aggression ; fi
    for i in 1 2 ; do
      echo "    high " ; do_sleep $AGGRESSIVE_SLEEP_SEC ; toggle_aggression
      echo "    low " ; do_sleep $NONAGGRESSIVE_SLEEP_SEC ; toggle_aggression
    done
    echo 
    for i in 1 2 ; do
      if [ $i -eq 1 ] 
        then
          if [ ! "$AGGRESSIVE" = "true" ] ; then toggle_aggression ; fi
          echo "Random freqs under high aggression: "
        else
          if [ "$AGGRESSIVE" = "true" ] ; then toggle_aggression ; fi
          echo "Random freqs under low aggression: "
      fi 
      for (( i=1 ; i<=$FREQ_CYCLE_NUM ; i+=1 )) ; do
        FREQ=${FREQS_ARRAY[$(($RANDOM % 6))]}
        echo "    ${FREQ}..."
        set_frequency $FREQ
        do_sleep $FREQ_CYCLE_SLEEP_SEC
      done
      echo
    done
  done
done
<digg />
