Tuesday, August 31, 2010

VMware View components

The View Connection Server component (CB) has been installed on the VM piermont. The Connection Sever application is used for desktop clients connecting to View desktops as well as View administration. View Manager is now accessible from piermont's web server.

Next is the install of View Composer
on VCS01 (vCenter Server). During the install, the View Composer service would not start. Some perusing found that the service needed a logon id. The dbuser id was added to the service but this needs investigation, there was no method to add this during install.

Monday, August 30, 2010

VMware View

The reinstall of vCenter went smoothly, although there is a minor error with VUM.

The real old vCenter VM has been renamed "Piermont", moved to cresskill, and will be used for the next Proof of concept process: VMware View.

Some RTFM is needed.

New vCenter Server

The time has come to re-install vCenter Server. The settings for the NY100 Data Center should all be captured in the databases on red-bank (vCenter and VUM) so the process should be to snapshot to existing image, uninstall VCS and VUM, then reinstall them, pointing at red-bank.

The rebuild was completed without incident.....

All ESXi hosts were upgraded from ESXi 4.0 Update 1 to ESXi 4.0 Update 2. ESXi version 4.1 is out but that will happen in the future.

Friday, August 27, 2010

Update on back end network speed

During one of my Storage vMotion tests, there were flashing lights on the front door 100 Mbps switch, and no flashing lights on the back door Gig-e. This signaled that the Storage vMotion operation was actually being performed on the slow network. Also, the progress was slow.

First change was to disconnect Uber's front door vNIC (100Mbps). Also, made sure that the source host (local disk on westwood) and the destination host (shared iSCSI LUN served from Uber on cresskill) had vMotion-enabled vmkernel ports on the same dvSwitch port group.

Once the migration was restarted, the stats were showing 30Mbps network throughput outbound (local-to-shared) and closer to 50Mbps inbound. More testing of the network configuration is needed to verify why vMotion defaulted to the slow network.


Wednesday, August 25, 2010

VMware Tools on Uber(s)

While both Uber and Uber2 now have NFS and iSCSI shared storage, a bottleneck was noticed on the network. With iSCSI configured on Uber, the initial cold Storage vMotion of a Windows Server 2008 VM (10GB partition) took the better part of 4 hours to complete.

Looking at network performance, the NIC seemed to be peaking at about 3Mbps, which calculated out correctly with the speed of the migration. At the moment, the E1000 virtual NIC is being used on Linux.

It is well documented that VMware Tools has the VMXNET network drivers that are optimized for VM's. While there is no built package for VMware Tools on Ubuntu, VMware did release an open source version, but this needs to be built.

Some searching found this procedure:

http://www.gorillapond.com/2009/03/09/install-vmware-tools-on-ubuntu-linux-reborn/

Once again, Uber2 was used for the test install of VMware Tools and then Uber was updated.

After the code install and before changing the NIC card, network throughput while copying ISO files from Uber to Uber2 improved slightly to about 3.8Mbps. Once the VMXNET 3 NIC was configured, this jumped up to 7.7Mbps.

To test this with a VMware function, another Ubuntu server VM with 16GB of thick storage was Storage vMotion'ed from shared storage (Uber on cresskill) to local ESXi host storage (local disk on westwood). This cold Storage vMotion took about 6 minutes.

Will redo (cold migration) the initial Win2k8001 test to see a real apples-to-apples test, but there was a significant improvement after VMware Tools. Also, going to look at "ethtool" which can set the link speed. The NIC cards could still be running at 10Mbps.

iSCSI on Uber

Uber, the primary Ubuntu VM housing shared storage for the NY100 Data Center is serving shared storage for ISO images and VM's via NFS on separate Datastores.

Uber2 VM was used to validate the configuration of an iSCSI target server on Ubuntu. Once the testing was completed successfully, the final process was to port the configuration to Uber.

Uber now has a 100GB image file sitting on its 200GB vHard Drive (/dev/sdc). This 100GB sparse file was configured in iSCSI as LUN0.
Once the LUN was configured, iSCSI target started.

On Northvale (one ESXi host in the Towers1 cluster), a new iSCSI storage adapter was configured and the 100GB iSCSI LUN on Uber was discovered . A new Datastore was added to northvale, formatted as VMFS-3, and this Datastore will be added to all hosts in the NY100 Data Center.

The plan is to move all VM storage from:

NFS - Datastore name shared-VM-store01
to
iSCSI - Datastore name shared-iscsci-store01

Will test Cold migration first, and then live Storage vMotion.

The build ISO images on the Datastore named shared-ISO-store01 will continue to be served via NFS.

Monday, August 23, 2010

Uber2 iSCSI working

Uber2 now has a virtual 200GB partition which is being shared as an iSCSI target. iSCSI initiators on ESXi Hosts in the Towers1 Cluster can now discover the LUNs on Uber2 and connect to them.

Next will be to set this up on Uber, adding a 100GB partition, configuring the Ubuntu iSCSI initiator, and testing DRS from iSCSI. Once complete, move all Cluster aware VM's to iSCSI and only use NFS for the ISO share.

The logistics of moving cresskill VM's to thin provisioning is still on the list.

Space issues

All VM's in the NY100 datacenter were created with Thick provisioning. This is beginning to cause space issues, mainly on cresskill, where the Uber VM is providing virtual shared storage for the Towers Cluster. The conversion from Thick to thin can be performed by "Storage vMotion", as one of the option in the wizard is to change the disk provisioning. Cloning would also work.

The clone function was tried first, on Uber2 (iSCSI test Ubuntu). Once cloned, one of the NIC cards would not load properly in Ubuntu, even though both cards were present in the VM's settings. After some of the following commands, the 2nd NIC was up:

modprobe
lshw -C network (showed 1 NIC disabled)
iwconfig (showed the actual selected "eth" numbers for the NICs)

Once the correct eth number was placed in /etc/network/interfaces, Uber2 was back in business.

The storage vMotion method was used going forward.

Cold vMotion of VCS01, red-bank, and Trenton into the Towers1 cluster will be needed in order to allow enough space for Uber to be migrated, since Uber has a 200GB partition for shared storage.

Thursday, August 19, 2010

Uber2 iSCSI almost configured

The 2nd utility test VM (Ubuntu) was ready for install of the iscsitarget services. Once installed, the configuration of the ietd.conf file went quickly. A 2nd vHard drive of 8GB was created as a test LUN.

While the ESXi server initiators could see the target under discovery and Rescan, there was an issue with the format. Will figure this out tomorrow.

VMware Update Manager installed

VUM was installed on vcs01, using red-bank's database engine. With the work moving vCenter off of MSDE, the VUM configuration to SQL was trivial.

vcs01.crm.netx was selected as the point of install for VUM and we ran into some issues trying to install the VUM plug-in (name resolution). The vSphere client machine was not using the AD DNS pointer. Once this was "resolved", the plugin went right in.

More configuration work is needed before attempting to let VUM do an upgrade on a newly built VM.

The ESXi servers are already up to date.

Uber down overnight

During the backup maintenance to and from Uber (Ubuntu VM that holds cluster shared storage and ISO files) and the XP Physical machine (currently named tibet77 with 1TB raid), a large copy process hung Uber and the reboot caused it to hang while running fsck. It was left over night and the reboot in the morning seemed to fix the issue. The vCenter logs are still showing connectivity problems to tibet77.

More work needed on the utility002 testing of iSCSI and the possible replacement of tibet77 with a Ubuntu build and enabling all three protocols available (SMB, NFS, & iSCSI).

Wednesday, August 18, 2010

Maintenance before VUM

Working on some maintenance items in the NY100 vSphere Data Center. Since a number of items moved forward today, it is time to do some backups, grab additional ISO's and files for Uber's shared storage, and get ready for iSCSI testing on Utility002 VM (Uber2?). This VM is currently sitting in the Towers1 vSphere cluster so it will be cold vMotion'ed to cresskill.

Also, VUM will be installed and a few VM's will be updated with this tool. Tomorrow, a webinar is scheduled discussing vSphere monitoring.

SQL Server auth, scheduling, Limited disk in VCS01

VCS01 and red-bank VM's on cresskill are now members of AD and a domain id was created for SQL connections.

The process of getting vCenter to talk to SQL Server turned out to be very exciting. In the end, the database for vcs is on red-bank (SQL Server VM). Obviously, VM snapshots played a large roll in the testing and troubleshooting.

"vCenter server service" and "vCenter web service" were configured to logon with the new SQL domain id.

The SQL Express service dependency that was added previously to the vCenter services
was removed from both services allowing the local SQL Express services to be disabled, reducing the required resources on the vCenter VM.

The shutdown/startup scheduling was also tweaked on cresskill. Generally speaking, the process is to manually shut all VM's down before cresskill is brought down, but the restart of the VM's should be automatic and orderly (cresskill is not part of the DRS cluster and is a utility ESXi host).

Finally, the 8GB virtual disk (8GB C: partition) created for VCS01 was down to 1GB free. After increasing the virtual disk to 16GB, Dell's extpart.exe was used grow the C: partition it to 16 GB. Now that the vCenter database is on the SQL VM, VCS01 should not grow much.


Tuesday, August 17, 2010

The crm.netx Domain; cresskill maintenance

Trenton is the lone Domain Controller (DC) and DNS is configured on it. VCS01 has been added to the domain (crm.netx) but still has issues connecting to red-bank's SQL engine, so vCenter was pointed back to VCS01's SQL Express instance to continue testing (good exercise, backtracking and making sure the local DB connection still works).

Performance trends will provide useful information on cresskill with all 4 VM's up. Start up and shutdown schedules are also being tweaked to ensure a smooth build up and tear down. Here is the startup sequence:

- Uber (utility Linux) needs to be up to make shared storage available
- Trenton needs to start next to make the AD domain available
- red-bank needs to be up next in order to have SQL 2005 available for clients
- VCS01 needs to be up to activate:
......- the vSphere DataCenter
......- the Towers1 Cluster
......- the Distributed virtual Switch (DvS)

The startup timing will move after I run through several reboot cycles of cresskill.

Active Directory

Another win2k3 VM was built to house a Domain Controller for the environment. The new server (named Trenton) will also house the DNS role, forwarding requests to the internet router.

This VM server, along with VCS01, red-bank, and utility01 will all sit on the ESXi host cresskill, which is not part of the vSphere Cluster "The Towers1" and will not participate in DRS, HA, etc..

VCS01 and red-bank will join AD but will eventually use the backdoor gig-e for vCenter communications.

Issues with the vCenter DB connectivity

After installing SQL server 2005 on red-bank (new win2k3 server), it was time to move the vCenter files from VCS01's MSDE to red-bank:

- Shut down vCenter Server Services
- Detach the vCenter DB from MSDE
- Copy the two VCS files to Red-bank's SQL data folder
- Attach the DB to SQL Server 2005

Once completed, the ODBC connection needed to be updated to point at red-bank (admin Tools -> data Sources (ODBC)). While ODBC testing was successful, vCenter would not start, which is probably related to server service authentication from VCS01 to red-bank.

Will move forward on building AD such that authentication is centralized.

Monday, August 16, 2010

SQL Server, iSCSI test server

The new Win2k3 Server VM (named Red-bank) that will house SQL Server 2005 is up and running. The next step is to install SQL Server 2005 on it and then migrate the current vCenter DB from MSDE.

A new Ubuntu server VM (named "utility002") is up and the iSCSI target code is installed. A small (8GB) second vHard Drive was created for the testing. NFS client was also configured such that files from shared ISO's on utility001 could be pulled.

Finally, a webinar on "Deploying Exchange 2010 on vSphere" is giving the team more depth in deploying VMware for this new Enterprise messaging solution.

The week's Agenda

Here are a few items for this week:

- Move vCenter Server database from MSDE to SQL Server (VM name = Red-Bank)
- Build an Active Directory domain with DNS (and possibly DHCP)
- Load and configure WMware Update Manager (VUM)
- Fully vet the particulars of vShpere 4.1 install
- Test Linux iSCSI and determine if cresskill should move off of NFS shared storage
- Begin testing HA

Friday, August 13, 2010

ESX 3.5 Install

The install of ESX 3.5 completed but it took close to 1 hour to boot up and it still was not accessible. The host CPU pegged at 100 throughout bootup.

A reinstall may be attempted again on cresskill with all VM's off (cresskill has 6 GB of physical RAM).

DRS in action

On the Towers Cluster, 3 VM's are running on westwood (win2kmove, win2k8001, testwin2k) with no VM's running on northvale. Distributed Resource Scheduling (DRS) thresholds are set to "moderately aggressive".

win2k8001 is a windows Server 2008 x64 VM.

A small looping vbscript program is placed on the
win2kmove VM. Each instance of this program will continuously run 3 million sine calculations. This will busy the cpu on westwood to simulate heavy activity.

Four of these looping scripts are started on VM win2kmove, which causes westwood's CPU to raise to over 60%. This exceeds the DRS thresholds and within 4 minutes, vSphere's DRS moves VM win2k8001 from westwood to northvale. At the moment, there are no specific resource pools defined.

This test confirms DRS functionality.

vMotion, ESX 3.5 in a VM

Continuing to test manual vMotion and changing configuration to see the results.

Since we were unable to load ESX 3.5 onto any bare metal and the old server has not been put back together, an attempt will be made to load ESX 3.5 into a VM. This is just so that we can try an upgrade to 4.0 on it.

The ESX 3.5 VM install went fine but the boot process has taken a large amount of time, as the CPU is pegged on the ESXi 4 host. Early in this process all other active VM's were vMotion'ed off of this host to give it maximum resources.

Thursday, August 12, 2010

Migration, vMotion, Uber maintenance, etc.

After vMotioning VM's back and forth from westwood to northvale to make sure the configuration is bullet proof, I moved my vCenter Server VM over from westwood to cresskill (utility host) and put vCenter's files in the shared storage on NFS via the utility VM (lets call this VM Uber, the Ubuntu build).

In my testing haste, I did shutdown Uber while vCenter had its files on shared storage! After realizing this, I powered Uber back up. While vCenter's Win2k3 event logs showed the missing disk, the server came back alive.

I am now thinking it is better to use the local cresskill host datastore such that I can shut down Uber while I am still logged into vCenter. I just activated Storage vMotion to move vCenter's files. The down side is of course single point of failure (will resolve this next week).

I am getting yellow and red warnings on Uber about high CPU and memory usage. This is partially being caused by the migration vCenter's files to local storage on cresskill. Also, various Cluster VM's continue to read & write to Uber's shared storage.

Uber currently has 1.5 GB of virtual memory. I will push it to 2.5 GB shortly. This Ubuntu VM will be a memory and network workhorse going forward since I plan to put the files from most VM's on it.

I brought up my legacy XP utility machine to do some data backups onto its 1 TB Raid 1 array. Tomorrow, DRS testing and building a few more VM's, including x64 builds. Next week, make vCenter Server Fault Tolerant. Once done, consider making Uber FT.

Host vMotion successful

After much digging into CPUID and vSphere EVC mode, I was able to properly configure the cluster in EVC mode with two of my three hosts.

I am now able to successfully vMotion a Win2k3 VM from westwood to northvale (ESXi host names) and back again, leaving the network and storage as shared.

With the Win2k3 VM not doing any real work, the vMotion takes less than 1 minute. I am going to run some spin type scripts on the VM to see if I can detect any hiccups.

During the vMotion migration wizard, if I select cresskill (my ESXi utility host) I get a CPU incompatibility message. This host has a Pentium D, while the other two have Core 2 CPU's of different flavors.

vSphere Maintenance

Now that all three ESXi Hosts are joined to The Towers vSphere Data Center, I am using the vSphere Host Update Utility to make sure that all hosts are patched to the same level from the base (vSphere 4.0 Update 1).

I have been trying to figure out a way to move the vCenter Server VM (Win2k3) to my Utility host but there is an obvious catch-22 doing that, since vCenter needs to be active in order to do a cold migrate and I am still fighting the war of CPU versions for vMotion to function.

During patching of the esxi-cresskill-175 (new name for the Dell 745 utility host), I realized that the only way to do this is to log into esxi-westwood-140 (new name for Dell 780 host), shutdown vCenter Server and make a local Datastore copy of the files. Then, copy the new folder out to the new NFS shared storage on cresskill once vCenter is back up.

In the future, things like vCenter will always sit on shared storage.

For future reference, the new build host with the Intel D975XBX motherboard and Intel E6700 CPU will now be referred to as esxi-northvale-195 or "northvale" for short.

Wednesday, August 11, 2010

New Host, more on NFS, still no vMotion

The new machine (we can call it northvale for now) is fully functional as a ESXi host with access to the internet and the backdoor Gig-e network.

NFS fun
One of the plans was to build
a Linux VM and serve shared storage from it over NFS. I now have a 200 GB NFS mount off of the Linux VM talking to all hosts over the Gig-e. I can add additional storage as needed from the local datastore where this VM resides.

This will simulate having shared storage from a remote server: Virtual Shared DASD (VSD). I did a cold move of this VM to the Dell 745 which will probably become a test/utility ESXi host (desktop machine) until I repurpose the machine to Windows 7 x64 later in the year.

vMotion
vMotion is still elusive based on the two ESXi hosts in my cluster (CPU's are E6700 & E8400). More digging....More RTFM....

Hardware
Finally for today, the 3rd and last hard drive arrived. Hopefully, this one will actually work!

First Boot

After assembling the new machine, I decided to wait on installing the PCI Intel Pro 1000 (which will connect to the back door gig-e) until I get power up and until the machine looks like it will work. It has an on board NIC that I can use for management access.

First boot saw the PXE try to find its foot (no go) then no boot image. A few clicks in the bios and a ESX 3.5 CD in the DVD and the box beagn loading ESX. As I saw with my Dell 745, 3.5 seems none too happy with these new boxes.

So I removed the 3.5 CD and inserted a ESXi 4.0 Update 1 CD and quickly installed it and a configured a static IP. Logging into the ESXi from vSphere client told me everything looks good.

Moving on to vCenter to add this new host to the Towers Data Center.


Tuesday, August 10, 2010

New ESXi hardware is in

All the components of the new build are in. This new machine will have power up tomorrow.

ESX 3.5 Upgrade wil not happen today

I was all set to install ESX 3.5 on my Dell 745 until....the install does not recognize the SATA controller. OK, so I did not scrutinize the 3.5 HCL. Oh well, time to rebuild my old and slow IDE chassis and just have a 3.5 box in the vSphere Towers cluster for drill.

Hardware is enjoyable

I received the replacement hard drive yesterday for the original Dell 745 (the box without VT on the CPU). The new drive was also bad.

Fortunately, since I needed a hard drive for the new motherboard, I bought and received two 500GB SATA drives so I had a spare. I had already put one of those drives in the Dell machine and reinstalled ESXi 4, so I will put that back into the machine and will proceed with a fresh ESX 3.5 install.

The supplier apologized, said they would pull another drive, test it, and send.

Monday, August 9, 2010

NFS, SMB, and ESX 3.5 Upgrade

Still waiting for parts for the new build (mid-week??) so I moved forward on my Linux VM NFS testing.

My Linux utility VM now has an SMB mount point (client) from my Utility XP machine and some of the build ISO's have been copied over. Also, a 2nd vHard drive was added to the Linux VM to accommodate the ISO's.

One thing that is very clear: copying files from XP to Linux VM over the back door gig-e is much faster than copying from XP to the ESXi datastores. I have not looked closely with a sniffer to determine why this is, but it is something for the list.

An NFS mount point was added to Linux VM and a new Datastore was added to the Towers Cluster. A Win2k3 build was performed to test this scenario. Obviously, with the new NFS mount point sitting on a VM on the virtual Distributed switch, the Win2k3 build happens much faster. Wait until sysprep in configured!

Finally, the ISO for ESX 3.5 was procured from a colleague to allow testing of ESX 3.5 to ESXi 4 upgrade. This will happen in the next day or two.

Saturday, August 7, 2010

First ESXi machine is awake

Some parts have arrived for the new build.

After replacing the hard drive in the first ESXi machine, I installed ESXi 4.0 Update 1 and then joined the machine to the Towers cluster. The drive I installed is not the hard drive that is being sent as a replacement but it moves me off the dime so I can work on that machine again.

No comment on the new machine, as I now have several components that are not compatible. Sometimes it is easier to just buy a machine!!

Thursday, August 5, 2010

Testing iSCSI and NFS on Linux

While I wait for hardware to arrive for the new ESXi build and hard drive replacement in the first ESXi host, I have turned my attention back to my utility machine "tibet77" (XP, NFS, StarWind iSCSI target).

There still seem to be hiccups when accessing the 1TB raid array from ESXi, whether via NFS or iSCSI.

I installed a few Ubuntu VM's to begin the process of building a test utility machine on Linux running the NFS and iSCSI services. Once I have this down, I will install Ubuntu onto tibet77. Of course, the new build will need to talk turkey to the 1TB Raid array which is formatted NTFS today.

I would rather not archive the data, reformat as ext3, then restore. We will see how successful I will be at that.

Sunday, August 1, 2010

ESXi Host Hard Drive failure - New Motherboard

Hearing some "ticking" from my Dell 745 resolved to be the hard drive going south. A new drive is supposedly being sent by my supplier. The rebuild should not take too long and this box was going to be used to test ESXi 3.5 to 4.0 upgrade anyway.

I have a new motherboard coming in order to bring a 3rd ESXi Host into the Towers vSphere cluster, which will go into a spare case I have in the lab.