Connecting ESX to SAN: PCI Device resource allocation failure

Recently I installed three equal IBM server x 3650 machines with VMware and installed and connected two QLogic HBA’s in each server.

After installing ESX I ran into a problem with one of the machines trying to connect to the datastores. Two of the machines perfectly could see all presented luns after a rescan. One machine however, did not even see a card, no WWN’s available. At first I thought the Storage guy made some mistake in presenting the LUNs, so I called him right away, but he assured me that the faulty machine was targeted. So the investigation continued…

Some quick searches on Google did not bring me any further than pointing me at the HBA config-utility that is initialized at boot time to look if there’s something wrong with the HBA.

While rebooting the host I was looking for the QLogic banner which was not appearing, but instead of a banner I got an error message:

PCI device resource allocation error:

image

Next when booting into the Console screen there also turned up an error:

error_SC

When I saw these errors I was not happy assuming I had an hardware issue/failure.

When I googled on the specific error message I just got a few hits, one of them (from a IBM developer site I believe) stated that there is a limited amount of memory available to load ROM bios-apps, and if you have a lot of devices like RAID controllers, HBA controllers, network controllers etc. you may reach a point to where a device ROM  BIOS will not load. He also suggested to turn off unneeded ROM BIOS apps/PXE boot options.

Because I indeed have up to four nics AND a RAID controller AND two H’BA’s, I went into the BIOS and turned off the PXE boot option of the onboard nics in the BIOS:

BIOS1 BIOS2 BIOS3

After the PXE was disabled on the onboard nics, the system apparently had enough resources to load the Qlogic Firmware and also ESX was able to load the driver and see the presented LUNs and booted normally (as the others).

6 comments

  1. Dennis Agterberg

    Great good-to-know article.Since you were installing 3 servers I assume they all had the same hardware, amount of NIC’s, HBA’s etc. Any idea as to why only one of the servers experienced the problem?

  2. Gerben

    Hi Dennis, Thanks for your response. Your assumption was right, they all have the same hardware amount of nics etc.
    I don’t know for sure, but I can imagine that a pci slot order also makes a difference. It may wel be that on the other two the pxe devices are last in row and are not able to load, which is not noticed because they are not used.

  3. Dennis Agterberg

    Hi Gerben, thanx for the explanation. That could well be the case. I little bit of an unwanted situation then because those ESX servers are not all the same. When I deploy the same servers (be it ESX or just some Windows servers) I make sure everything is the same, even the placement of cards in PCI slots. Anyways a good thing you ran into this, good to know.

  4. Amar

    Hi,

    i add two nics in my 3650x server and i got same error, i followed your steps and it works fine

    Thank you so much

  5. Parias

    Interesting, I never would’ve thought to look for this.. hit up a similar problem with an x3550. Thanks for publishing this!

Leave a comment

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.