iSCSI woes

Justin and Randy and other chap from Dell really helped this, and now there is finally a solid connection.
There is an issue with multipath, but other than that, is all good.
--------
called again...

----------
next steps: see if dell can lend a hand.
---------------------------------------
unsure == see 500gb drs, per ubunty
dmesg | grep sd

but ought to be bigger.

sudo fdisk /dev/sdb
n
p
enter
w
The above commands are from inside the fdisk utility; see man fdisk for more detailed instructions. Also, the cfdisk utility is sometimes more user friendly.

Now format the file system and mount it to /srv as an example:

sudo mkfs.ext4 /dev/sdb1
sudo mount /dev/sdb1 /srv
Finally, add an entry to /etc/fstab to mount the iSCSI drive during boot:

/dev/sdb1 /srv ext4 defaults,auto,_netdev 0 0

=-------------

=============
now mngt port direct to server, at same IP, talking.
--------------------
mngt port on switch.

2 iscsi ports connected. one directly into gar. the other, unsure,
==================================
valid port was on 132.101
once eth1 set same subnet, then able to conn.

suggested not direct-conn

managin array lost conn.

===================================
lets try some more with dell.
init name 1LMSGX1
InitiatorName=iqn.1993-08.org.debian:01:765faf945068
====================================
try again today -- we need to let the power vault let know about the initiator.
============
may need to tell the MD power vault about the initiator

ed 29 January 2014 - 06:37 PM
You should look at the IQN in that file and see if it matches what hosts you are allowed to connect from on your array (using the Dell MDSM utility).
-=Tobias

James Wheatley MEMBERS
#5

24 posts
Posted 29 January 2014 - 06:58 PM
Yeah, they both match the IQN that I have added under the Hosts on the Dell SAN:

Non-working:
# cat /etc/iscsi/initiatorname.iscsi
InitiatorName=iqn.2014-01.com.glldnvm2:198183cb
InitiatorAlias=glldnvm2

Working:
# cat /etc/iscsi/initiatorname.iscsi
InitiatorName=iqn.2013-03.com.glldnvm1:9c76ae97
InitiatorAlias=glldnvm1

Tobias Kreidl CTP MEMBER
#6

16,560 posts
Posted 29 January 2014 - 07:12 PM
Did you restart the iSCSI daemons on the host since the change? I assume you can at lesat ping the storage array interface, right?
-=Tobias

James Wheatley MEMBERS
#7

24 posts
Posted 29 January 2014 - 08:05 PM
Both daemons restarted and have just seen that pinging from VM2 is not connecting, which is strange as it is a direct network cable. Have checked all the IP settings and all are good, any ideas why I would not be able to ping the SAN?

Alan Lantz MEMBERS
#8

5,633 posts
Posted 29 January 2014 - 08:28 PM
Did you miss adding the storage management interfaces to the server after you joined the pool?

Alan Lantz
SysAdmin
City of Rogers, AR

James Wheatley MEMBERS
#9

24 posts
Posted 30 January 2014 - 08:26 AM
Both management interfaces were added just after joining to the pool:

Working server (can ping all):
192.168.10.10 (server) ---> 192.168.10.201 (SAN RAID Controller 1)
192.168.40.10 (Server) ---> 192.168.40.210 (SAN RAID Controller 2)

Non-working server (cannot ping):
192.168.20.10 (server) ---> 192.168.20.201 (SAN RAID Controller 1)
192.168.30.10 (server) ---> 192.168.30.210 (SAN RAID Controller 2)

James Wheatley MEMBERS
#10

24 posts
Posted 30 January 2014 - 09:48 AM
OK I have now managed to get the 2 connections to Ping, turned out 2 bad cables!!

Although I am still unable to repair the SR connections, I get the same error "Logging in to the iSCSI target failed. Check username and password."

I have restarted the problem server but unable to restart the Master until next week due to timings, would this make a difference?

James Wheatley MEMBERS
#11

24 posts
Posted 30 January 2014 - 09:57 AM
Discovery works as expected on the new Host:

# iscsiadm -m discovery -t st -p 192.168.20.201
192.168.10.210:3260,2 iqn.1984-05.com.dell:powervault.md3200i.690b11c0001923e00000000050f4eea2
192.168.10.201:3260,1 iqn.1984-05.com.dell:powervault.md3200i.690b11c0001923e00000000050f4eea2
192.168.40.210:3260,2 iqn.1984-05.com.dell:powervault.md3200i.690b11c0001923e00000000050f4eea2
192.168.30.201:3260,1 iqn.1984-05.com.dell:powervault.md3200i.690b11c0001923e00000000050f4eea2
192.168.40.201:3260,1 iqn.1984-05.com.dell:powervault.md3200i.690b11c0001923e00000000050f4eea2
192.168.20.201:3260,1 iqn.1984-05.com.dell:powervault.md3200i.690b11c0001923e00000000050f4eea2
192.168.30.210:3260,2 iqn.1984-05.com.dell:powervault.md3200i.690b11c0001923e00000000050f4eea2
192.168.20.210:3260,2 iqn.1984-05.com.dell:powervault.md3200i.690b11c0001923e00000000050f4eea2

# iscsiadm -m discovery -t st -p 192.168.30.210
192.168.10.210:3260,2 iqn.1984-05.com.dell:powervault.md3200i.690b11c0001923e00000000050f4eea2
192.168.10.201:3260,1 iqn.1984-05.com.dell:powervault.md3200i.690b11c0001923e00000000050f4eea2
192.168.40.210:3260,2 iqn.1984-05.com.dell:powervault.md3200i.690b11c0001923e00000000050f4eea2
192.168.30.201:3260,1 iqn.1984-05.com.dell:powervault.md3200i.690b11c0001923e00000000050f4eea2
192.168.40.201:3260,1 iqn.1984-05.com.dell:powervault.md3200i.690b11c0001923e00000000050f4eea2
192.168.20.201:3260,1 iqn.1984-05.com.dell:powervault.md3200i.690b11c0001923e00000000050f4eea2
192.168.30.210:3260,2 iqn.1984-05.com.dell:powervault.md3200i.690b11c0001923e00000000050f4eea2
192.168.20.210:3260,2 iqn.1984-05.com.dell:powervault.md3200i.690b11c0001923e00000000050f4eea2

James Wheatley MEMBERS
#12

24 posts
Posted 03 February 2014 - 09:13 AM
Any ideas as what could be the cause?

Everything from the command line seems to work fine, I am just unable to repair from XenCenter or if I try and add a new SR using the new server IPs I get the "Unable to connect to iSCSI service on target"

# iscsiadm --mode node --targetname iqn.1984-05.com.dell:powervault.md3200i.690b11c0001923e00000000050f4eea2 --portal 192.168.20.201:3260 --login
Logging in to [iface: default, target: iqn.1984-05.com.dell:powervault.md3200i.690b11c0001923e00000000050f4eea2, portal: 192.168.20.201,3260]
Login to [iface: default, target: iqn.1984-05.com.dell:powervault.md3200i.690b11c0001923e00000000050f4eea2, portal: 192.168.20.201,3260]: successful
[root@GLLDNVM2 ~]# iscsiadm --mode node --targetname iqn.1984-05.com.dell:powervault.md3200i.690b11c0001923e00000000050f4eea2 --portal 192.168.30.210:3260 --login
Logging in to [iface: default, target: iqn.1984-05.com.dell:powervault.md3200i.690b11c0001923e00000000050f4eea2, portal: 192.168.30.210,3260]
Login to [iface: default, target: iqn.1984-05.com.dell:powervault.md3200i.690b11c0001923e00000000050f4eea2, portal: 192.168.30.210,3260]: successful

James Wheatley MEMBERS
#13

24 posts
Posted 19 March 2014 - 05:40 PM
OK I can now get the SRs to connect but only after I login to the target from the command line, if I restart the server the drives do not connect until I run the login command and then repair the connections.

Anyone got any ideas on this?

James Cannon CITRIX EMPLOYEES
#14

4,402 posts
Posted 19 March 2014 - 10:08 PM
Hi James,

You could destroy the PBDs for the problem host.
xe pbd-list host-uuid= sr-uuid=
xe pbd-unplug uuid=
xe pbd-destroy uuid=

Once done, select "repair" in XenCenter. One thing to note about XenCenter, once the storage is configured, there should be no changes made on SAN that would affect connection to the LUN. The PBD has the connection details the host needs to connect to the SAN.

If the above does not work, we can play Linux admin and configure open-iscsi to be running when host boots. To do that, we use Redhat chkconfig command:
chkconfig open-iscsi on

That will ensure open-iscsi service is started when host boots. Normally open-iscsi is started by xapi service when needed.

Another thing we can play around with is the /etc/rc.d/rc.local. We can add iscsiadm commands at end of file to run at host boot. I should note that any mods to dom 0 will be lost on host upgrade. It is a hack, but it may work.

The biggest issue I have seen is where host has been added to the pool before SAN was configured for access by host. That sets us up on the wrong foot, unfortunately. :(

James Wheatley MEMBERS
#15

24 posts
Posted 20 May 2014 - 02:17 PM
Hi James,

Thanks for the above but I'm afraid I have had no joy.

I even removed the problematic host from the pool and re-installed XenServer, making sure that the SAN was correctly setup to accept the new Host before adding to the pool. I was left with the exact same issue, unable to connect to the SAN unless I log in via the console first.

Would this point to an issue with how the pool is setup or with the SAN itself. Is there a procedure I missed when setting up the pool that could cause this issue?

Please note that the new Host can be removed and wiped at any time if you think there is something else I could try to get this to work.

James Cannon CITRIX EMPLOYEES
#16

4,402 posts
Posted 20 May 2014 - 04:26 PM
Reinstalling XenServer would only put the binaries back on the host. When it comes to XenServer, the iSCSI is always an external dependency. On XenServer, the only thing that needs to be configured is an IP address on a NIC dedicated for storage. 2 IPs if you are using multipathing. If you can ping the SAN, then it is a matter of permission to access the LUN. You may want to compare the switch port settings that are used for storage and ensure they match the switch ports used by the other hosts.

I would double-check SAN mappings. How are the other hosts allowed access? By IP range? IQN (iSCSI qualified name)? On some SANs you need to allow access to LUN by more than one host (Dell Equalogic). Other SANs, may require manual mapping of LUN to each port on SAN (EMC Clariion).

James Wheatley MEMBERS
#17

24 posts
Posted 23 May 2014 - 07:40 AM
All SAN mappings double checked and all are good. The hosts connect by direct connections (no switch involved) into a Dell PowerVault with 2 x 4 port cards, so 2 cables from each host split between the 2 controllers.

If I use the following commands I can get the host to connect to the SAN but as soon as I restart the server it loses the connection, does that mean that something is not running at startup??

iscsiadm -m node -L all
xe-toolstack-restart

James Cannon CITRIX EMPLOYEES
#18

4,402 posts
Posted 23 May 2014 - 04:58 PM
Hi James,

The PowerVault is a MD-series SAN? If yes, then you would have a preferred path for LUNs. With a direct connect, you may be directly connected to the controller that is not the preferred path. I would buy a cheap switch (if you do not have one at home or elsewhere) and then plug everything into that (for testing purposes). That way all hosts will be able to access the active controller (preferred path).

The commands you are running is just a work around. I do not think it is a matter of something not running (you have the same software on all hosts), we would expect to see consistent behavior, unless you were messing around with chkconfig or other Linux utility and made file-level modifications, etc., which I doubt, but some people familiar with Linux sometimes do so and cause all kinds of mischief and grief when dom 0 is treated like a standard Linux distro.

BEST ANSWER HELPFUL ANSWER
James Wheatley MEMBERS
#19

24 posts
Posted 29 July 2014 - 07:48 AM
Hi,

I can confirm that this is now fixed and the switch helped. Not sure what the underlying issue really was but my guess would be some traffic not getting through as intended unless I told it what to do and where to go?

Thanks for all your help.

----
it actually makes more sense that this appliance is mounted on gar.
--installed iscsi
--configured eth1
--connected wire to eth1
--initiated discovery - but failed.
================================
oracle wants many preliminars before delving into actual help.
while we can do that, i wonder whether the path forward is to move
the storage array into Ubuntu, either 100% or at least one of the two volumes of 1.8Tb.

============
powerVault firmware and software updated. assisted by Dell.
========
storage array problem mmay be rooted in the OS.

may be we should really mount it in the switch, so it is accessible through other points, specifically, from 'gar'

Status: 

Priority: 

Normal