Unresponsive Server in ARD

For the past several weeks at work I’ve been gradually working on upgrading our OS X server from Leopard to Tiger. The process has certainly not been without hiccups, but it has gone smoothly for the most part.

After an initial false start attempting to simply upgrade the server, I ended up simply installing the Leopard server from a blank disk. This seemed to take care of most of the really strange things that were happening after the upgrade.

This particular server is of the headless XServe variety, so we primarily use Apple Remote Desktop to access it in addition to the Server Admin Tools and SSH. Since installing Leopard on the server however, I’ve been noticing that at times it is acting erratically. Usually I’ll first notice that the server will either stop showing up in ARD or it show up as black, indicating that there is no ARD agent on the computer. I’ve tried restarting the computer, which will fix it, but that’s not a very good solution for obvious reasons.

I had also noticed while using Server Admin that sometimes the server CPU is running at completely full capacity, like in this screenshot:

OS X Server CPU gone crazy

The other day the server stopped responding in ARD again. As usual though, I was still able to access it through both Server Admin and SSH. After a little research, I found this useful page of commands, which includes this one-liner:

sudo /System/Library/CoreServices/RemoteManagement/ARDAgent.app/Contents/Resources/kickstart -restart -agent -menu

Running this command restarts the ARD Agent, which is what we want if it is frozen. Once I did this things got a little better, and the server came up in ARD as active. I tried controlling the server through ARD, but no dice, still no connection.

At this point I noticed that there was a user logged on to the server and I remembered that I had also been having problems with VNCDragHelper freezing. I found this on an Apple discussion page:

When remotely managing an XServe with OS 10.5.1 from a 10.4.11 client with ARD 3.2, several times (3 up till now) the server UI becomes unresponsive, at least finder. This even gets worse when trying to start the Application Monitor, then also the Dock freezes, and the Application Monitor UI never opens. When doing an ssh> sudo top, it shows that both “Application Monitor” and “VNCDragHelper” do consume almost 100% CPU. Luckily only on a Single core, but that keeps two cores (one processor 100% busy). killall “Activity Monitor” brings the activity monitor down, when sending it with Remote Desktop Unix command.

Perfect, that must be it. In SSH, I ran the following command:

sudo killall -9 VNCDragHelper

I also killed the loginwindow because that appeared to be frozen as well (judging from the top command that I ran):

sudo killall -9 loginwindow

Suddenly after running both those commands, the server leapt back to responsiveness. I was able to access it in ARD without problem. Also, after about an hour I checked the CPU diagram in Server Admin and was able to see a noticeable improvement.

OS X Server CPU back to normal

Now that’s a sight for sore eyes. For reference, I was running 10.5.3 and ARD 3.1 when this problem happened. I’m not sure that anything has been fixed in 10.5.4 though.

Using ssh in Script

I’ve been trying to write this script to roll over one of the log files on the lab computers. I also want it to copy the rolled file to our server. Now, I’ve already written two different scripts that accomplish this task. One resides on each computer and does the rolling task (I execute it on all the computers from ARD). The other is on the server and does the sftp bit.

Even though I’ve already got the working scripts, it’s of course not good enough. I’m accessing the remote computers in each script in one way or another. I’m also pretty much manipulating the same files. I should be able to combine the two scripts into one.

So, I wrote this script that combined both of the previous scripts. It has two basic parts. The first part uses ssh to log into the remote computer and does the following tasks:

Copy the log file and name the new one to include a date stamp gzip the rolled log file delete the old log file

The second part simply uses sftp to get the rolled log file.

Unfortunately, the ssh part of the log file just does not work. Clearly, the problem is that I just do not understand how to run ssh from a batch file (or at least not interactively). Here is the relevant part of the script:

echo "OK, starting ssh block"
for remotehost in $computer_list; do
 echo "OK, rolling log on ${remotehost}"
 ssh $remotehost <<EOF

  cd ${log_source:?"Directory does not exist."}
  cp ${log_file} $new_log_file
  gzip -fq $new_log_file
  rm $log_file
  echo "New log file name: ${new_log_file}"
  echo "Finished rolling log file"

This generates the following error after the ssh line:

Pseudo-terminal will not be allocated because stdin ↵
is not a terminal.

I pretty much understand what this error means, but have no idea how to fix it. After doing a lot of searching, I came up with this format for calling ssh:

ssh -v -q -o "BatchMode=yes" $remotehost

It doesn’t work for me though. At least, I don’t know how to use it. There is lots of information online about how to set up the authorization keys (which I’ve done), but not how to actually run it from a script. The stuff that I did find looked like it was written in latin.

So, after deciding that I was outmatched by the ssh batchmode, I decided I should try and write the whole thing just using sftp. Just one problem though, I can’t cp a file in sftp! There is no way for me to copy a remote file will using sftp. The only way I can think of to get around this is to GET the file, rename it on my local machine, and then PUT the renamed copy back on the remote computer. But, well, that’s just dumb. So, now I’m stuck. And grumpy.