Difference between revisions of "Problem with e1000: EEPROM Checksum Is Not Valid"

From ThinkWiki
Jump to: navigation, search
(From Mat's Blog)
(Solutions: Added Auke Kok's patches.)
Line 12: Line 12:
  
 
Try to reload the e1000 module until the ethernet is pluged in, and the hardware have a chance to detect a link.
 
Try to reload the e1000 module until the ethernet is pluged in, and the hardware have a chance to detect a link.
 +
 +
=== Use e1000e---Kernel Patch ===
 +
Auke Kok published two patches in October 2007 that help solve both the "corrupted" EEPROM read and bad latency.
 +
One of the patches moves many network cards over to the e1000e (e1000 for PCI-Express) module. The second disables some PCIe power management features that were the cause for the bad EEPROM read and some stability issues.
 +
[http://kerneltrap.org/mailarchive/linux-netdev/2007/10/31/374579 <nowiki>[PATCH 3/4] e1000/e1000e: Move PCI-Express device IDs over to e1000e</nowiki>]
 +
[http://kerneltrap.org/mailarchive/linux-netdev/2007/10/31/374573 <nowiki>[PATCH 2/4] e1000e: Disable L1 ASPM power savings for 82573 mobile variants</nowiki>]
 +
Refer to [http://www.linuxhq.com/patch-howto.html LinuxHQ] on how to apply kernel patches.
  
 
=== From Lenovo ===
 
=== From Lenovo ===

Revision as of 18:16, 25 February 2008

Problem Description

On certain ThinkPads, e1000 driver for Intel Gigabit controller fails to load with the following error message in /var/log/messages:

e1000: 0000:02:00.0: e1000_probe: The EEPROM Checksum Is Not Valid
e1000: probe of 0000:02:00.0 failed with error -5 

The problem is caused by a power savings feature obstructing normal operation, and causes the first bytes read from the EEPROM to be corrupt, resulting in a random or invalid MAC address (but no other data corruption). The EEPROM checksum test traps the problem and the driver refuses to load.

Solutions

Try to reload the e1000 module until the ethernet is pluged in, and the hardware have a chance to detect a link.

Use e1000e---Kernel Patch

Auke Kok published two patches in October 2007 that help solve both the "corrupted" EEPROM read and bad latency. One of the patches moves many network cards over to the e1000e (e1000 for PCI-Express) module. The second disables some PCIe power management features that were the cause for the bad EEPROM read and some stability issues.

[PATCH 3/4] e1000/e1000e: Move PCI-Express device IDs over to e1000e
[PATCH 2/4] e1000e: Disable L1 ASPM power savings for 82573 mobile variants

Refer to LinuxHQ on how to apply kernel patches.

From Lenovo

Lenovo provides a script that uses 'ethtool' command to update the card's settings. They say it is for SLED 10 but the Linux flavor shouldn't really matter. For some users, neither of the circumventions listed below help, but this script does!

Via module parameter

In recent kernels (at least with 2.6.22, maybe also in 2.6.21) there is a kernel module option to make the module ignore the error.

Load the module like this

modprobe e1000 eeprom_bad_csum_allow=1

You might also apply that parameter via modprobe.d or if you are using Debian/Ubuntu as append-line in your bootloader: e1000.eeprom_bad_csum_allow=1

From Mat's Blog

The fundamental solution is explained at Mat's Blog which directs the reader to Intel's site to download PROBOOT.EXE. Extract files from PROBOOT.EXE onto a bootable DOS device. Boot from it. Then run the command "IBAUTIL -DEFCFG"

---

After IBAUTIL -DEFCFG, you may have your mac address changed. Then you can update it with the real one with a dos binary to put an a bootable cd : EEUPDATE.EXE For example, EEUPDATE /NIC=1 /MAC=XX:XX:XX:XX:XX:XX. Then, no more "invalid mac address" message using linux and no need to use MACAddressChanger for windows :)

Where you can find this magic tool : ftp://ftp.extensa.ru/Drivers/DESKTOPS/Ver_x900/BIOS_XP/AMT/

---

Circumvention

  • Upgrade your BIOS

Lenovo has published newer BIOS revisions that appear to fix the issue for some users. The BIOS upgrade turns off "Deep smart power down" which has been known to cause issues at initialization time (the driver can re-enable the issue later if you desire, the feature works correctly then).

  • Insert a cable

Inserting a linked network cable bypasses the problem.

  • Take the checksum twice

This bug report describes a fix -- take the checksum twice. First time will report a bad checksum, second will work (the problem seems to be triggered by some power-saving technology). This requires a tweak to the driver source and a rebuild of your kernel. This is much better than a previous "fix" published here that disabled checksum checking entirely.

I updated the patch above to 2.6.22:

diff -urN linux-2.6.22-suspend2-r1.orig/drivers/net/e1000/e1000_main.c linux-2.6.22-suspend2-r1/drivers/net/e1000/e1000_main.c
--- linux-2.6.22-suspend2-r1.orig/drivers/net/e1000/e1000_main.c        2007-08-17 23:32:04.000000000 +0200
+++ linux-2.6.22-suspend2-r1/drivers/net/e1000/e1000_main.c     2007-09-05 16:39:11.000000000 +0200
@@ -999,16 +999,18 @@
                goto err_eeprom;
        }
 
-       /* before reading the EEPROM, reset the controller to
-        * put the device in a known good starting state */
-
-       e1000_reset_hw(&adapter->hw);
-
-       /* make sure the EEPROM is good */
-
        if (e1000_validate_eeprom_checksum(&adapter->hw) < 0) {
-               DPRINTK(PROBE, ERR, "The EEPROM Checksum Is Not Valid\n");
-               goto err_eeprom;
+               /* before reading the EEPROM, reset the controller to
+                * put the device in a known good starting state */
+               
+               e1000_reset_hw(&adapter->hw);
+
+               /* make sure the EEPROM is good */
+
+               if (e1000_validate_eeprom_checksum(&adapter->hw) < 0) {
+                       DPRINTK(PROBE, ERR, "The EEPROM Checksum Is Not Valid\n");
+                       goto err_eeprom;
+               }
        }
 
        /* copy the MAC address out of the EEPROM */

Ra 15:28, 5 September 2007 (UTC)


  • Remove/add kernel module

Removing and adding the kernel module is a possible work-around. As root, run

# modprobe -r e1000
# modprobe e1000

On some occasions, the commands have to be run twice before eth0 becomes useable. On some X60s this will not work at all.

  • Disabling and re-enabling the NIC in the BIOS

For some it fixed the issue finally, for some it helped just temporarily.

  • Hacking the kernel to carry on even if the checksum is not valid

Although being a very ugly, hack, this works fine for me. To do that, you have to search drivers/net/e1000/e1000_main.c for the line containing the error message and then comment out the following two lines which set the error state and then jump to the error code. Although this doesn't fry the hardware for me, consider yourself warned...

See also