UF-Statistics UFL

How to operate TSM



Last modified: Mon Sep 10 16:38:02 EDT 2007

Overview

Some sketchy documentation exists at http://tsm.nerdc.ufl.edu. The real authority for tsm questions is Allen Rout <asr@ufl.edu>, who shouldn't mind getting cellphoned at any time day or night with tsm questions until such time as he improves the local docs.

TSM's system model appears simple. There is a tape robot fifteen feet long in the data center machine room, which contains all data that is currently backed up. This machine room is one of the most carefully protected locations on campus, however, at extra cost you can have copies of your data stored on tapes in another physical location. Tsm keeps a database of all files on all tapes. When you do a "backup", it puts all files it doesn't have a copy of onto tape. There is a ram and disk cache fronting these tapes for speed.

Tsm's performance characteristics are flexible. If you're having a real emergency, explain your situation to asr and he may be able to delay other people's backups to favor your restores. The more notice you give, the more technical options exist. The data center is pleased to rise to the occasion of carefully pre-planned upgrade/ juggle scenarios.

Tsm is robust. If a backup run is killed, or the network is unplugged, or your machine reboots, something reasonable will happen. If tsm can't reconnect on the spot it will save what it got so far and continue where it left off on the next run. If you ^Z a restore for two hours and then fg, it will remake the connection and continue. If you uncleanly terminate a restore, it will hold on to the rest of the files to restore to the point of backups failing to overwrite them.

Tsm is busy. The tape drives are some of the busiest known to IBM. When you do a restore, you may see pauses while your tape is swapped several times. Be patient and rejoice that you aren't changing your own tapes. If you don't get a tape drive in a reasonable timeframe, such as 30 minutes, email osg-nsam-l@lists.ufl.edu and ask for one.

There is now a mirror of TSM in Atlanta, so the data is protected from local disasters such as large fires or tornados which might destroy portions of the UF campus.

Local installation

For Gentoo on Intel you want to emerge /usr/local/portage/app-admin/tsm-ba/tsm-ba-5.1.6.ebuild.

For Ubuntu on Intel:

http://lists.ibiblio.org/pipermail/unclug/2007-June/000397.html

I recently built up a linux box using Ubuntu and discovered that it's
not hard to set it up to use the TSM backup client.  Assuming you have
a passing familiarity with setting up TSM on RPM-based distributions,
here are the basic steps:

1. Install the "alien" package which lets you (among other things)
install RPM packages on Ubuntu or other Debian-based distros.

    $ sudo apt-get install alien

2.  download the TSM client software from IBM

[ http://ftp.software.ibm.com/storage/tivoli-storage-management/maintenance/client/v5r4/Linux/Linux86/v541/ls ]

3.  untar the TSM RPMs...

4.  Use alien to install the appropriate RPMs.
    $ sudo alien -i --scripts TIVsm-API.i386.rpm TIVsm-BA.i386.rpm

5.  Set up the normal TSM stuff (dsm.opt/dsm.sys/inclexcl.dsm).

[

mkdir /etc/tivoli
scp herring:/opt/tivoli/tsm/client/ba/bin/dsm.{sys,opt} .
emacs dsm.sys
	change nodename
	add "managedservices schedule"

add this new client to your tsm administrative domain

]

6.  Run a manual backup.

     $ sudo dsmc incremental

7.  [ does not apply, is about opening ports in firewall ]

8.  

[

Created /etc/init.d/dsmcad, added into /etc/init.d symlinks with
"update-rc.d dsmcad multiuser 95 05".  Symlink inspector bum(8) seems
to not find it, removed bum.

]

9.  Start the dsmc sched process

[ /etc/init.d/dsmcad start ]

10.  Check to see if the /var/log/dsmsched.log is correct.

     $ tail /var/log/dsmsched.log

06/29/07   13:33:13 --- SCHEDULEREC QUERY END
06/29/07   13:33:13 Next operation scheduled:
06/29/07   13:33:13 ------------------------------------------------------------
06/29/07   13:33:13 Schedule Name:         AM0230
06/29/07   13:33:13 Action:                Incremental
06/29/07   13:33:13 Objects:
06/29/07   13:33:13 Options:
06/29/07   13:33:13 Server Window Start:   02:30:00 on 06/30/07
06/29/07   13:33:13 ------------------------------------------------------------
06/29/07   13:33:13 Waiting to be contacted by the server.

-- 
   Alan Hoyle  -  alanh at unc.edu  -  http://www.alanhoyle.com/
     "I don't want the world, I just want your half." -TMBG
                Get Horizontal, Play Ultimate.

To administrate tsm you will need a tsm administrator login. See asr.

Operation

The command-line editing within the tsm programs is bizarre to Unix fingers, and probably matches some mainframe convention.

To add an entire host to the backup system

# /opt/tivoli/tsm/client/admin/bin/dsmadmc

register node seahorse.stat.ufl.edu foofoo dom=STAT userid=none

"foofoo" is the literal campus-wide well-known non-password.

Then, on the host, do:

# /opt/tivoli/tsm/client/ba/bin/dsmc q restore

For username and password give your tsm admin, not the
hostname.  This stores the password for future commands.

To remove an entire host from the backup system

# /opt/tivoli/tsm/client/admin/bin/dsmadmc

remove node seahorse.stat.ufl.edu

To delete a particular filesystem that you no longer want backups of

# /opt/tivoli/tsm/client/admin/bin/dsmadmc

delete filespace flounder.stat.ufl.edu /depot

To backup an individual host

# /opt/tivoli/tsm/client/ba/bin/dsmc

incremental

To tell what version of a file are stored and available to restore from

minke# cd /export/home11/casella
minke# /opt/tivoli/tsm/client/ba/bin/dsmc
tsm> q back mbox -ina=yes

To restore a directory tree on an individual host to another location

# /opt/tivoli/tsm/client/ba/bin/dsmc

restore -sub=yes "/export/home/someuser/*" "/export/home/elsewhere/"

To restore to a particular point in time

minke# /opt/tivoli/tsm/client/ba/bin/dsmc restore -subdir=yes -pitd=2/3/2004 -pitt=14:00:00 "/export/home11/yang/*" "/export/home11/yang-restore/"

To find out what errors are really occuring, with better error messages that you get from the command-line programs

tail -f /var/adm/dsmerror.log

To create a schedule for automatically doing backups

# /opt/tivoli/tsm/client/admin/bin/dsmadmc

define schedule STAT STAT_NIGHTLY type=Client
       action=incremental starttime=00:10 duration=120 durunits=Minutes
       	period=1 perunits=Days dayofweek=Any expiration=Never

To configure a Solaris host to do scheduled backups

ssh root@thehost
cd /etc/init.d
cp ~bb/tsm/etc_init.d_tsm /etc/init.d/tsm
chmod +x tsm
./tsm start
cd /etc/rc3.d
ln -s /etc/init.d/tsm S30tsm

# /opt/tivoli/tsm/client/admin/bin/dsmadmc

define association STAT STAT_NIGHTLY thehost.stat.ufl.edu

If your backups start failing in the middle with the error

ANS1330S This node currently has a pending restartable restore
session.  The requested operation cannot complete until this
session either completes or is canceled.

# /opt/tivoli/tsm/client/admin/bin/dsmadmc q restore

Will reveal the restore, but the cancel-restore priv isn't
independently grantable so asr has to cancel it.

To restore a failed machine's data onto a repair machine

# -- Method #1 --
# reset failed machine's tsm password
repair# dsmadmc update node failed.stat.ufl.edu thisisthenewpassword

# do the restore
repair# dsmc restore "/blah/*" "/other/" -sub=y -virtualnodename=failed.stat.ufl.edu
# prompted for machine username, hit enter
# prompted for machine password, enter thisisthenewpassword
# restore goes

# -- Method #2 --
# reset failed machine's tsm password
repair# dsmadmc update node failed.stat.ufl.edu thisisthenewpassword

# add new server stanza to repair machine
# at top of repair:/usr/bin/dsm.sys
defaultserver bighonkintaperobot

# at bottom of repair:/usr/bin/dsm.sys
servername alternate
* same connection information
commmethod tcpip
tcpserveraddress tsm-ext.cns.ufl.edu
tcpport 1609
passwordaccess generate
* different nodename
nodename failed.stat.ufl.edu

# then do restore
repair# dsmc restore "/blah/*" "/other/" -sub=y -se=alternate
# prompted for machine username, hit enter
# prompted for machine password, enter thisisthenewpassword
# restore goes

More notes on cross-box restores

Give herring access to some of minke's files.  Takes ten minutes to run:

   minke# dsmc set acc backup '/tsm/eelpout-root/*/*' herring.stat.ufl.edu root

Restore minke's files on herring.  -replace=no means overlay, don't
replace existing files, this was useful when poor tsm restore
performance for /var/amavis/quarantine (speculation: not due to large
number of files in dir, but because deletes and adds were scattered
all over tape) brought over about half the files before a new approach
was tried, and the half was wanted to be kept:

   herring# dsmc restore '/tsm/eelpout-root/*' '/depot/tmp/eelpout-root-restore/' -sub=y -fromnode=minke.stat.ufl.edu  -replace=no

Additional notes from NSAM list

Date:         Fri, 28 Oct 2005 13:01:37 -0400
From: "Allen S. Rout" 
Subject:      A discussion of restore procedures...

If you want to restore some BOX-A files to BOX-B (presumedly because BOX-A is
non compos mentis) here's a convenient way to go about it.

+ Reconfigure BOX-B to communicate with the TSM server as though it were
  BOX-A.   In unix land or MSWIN land, you can do this by adding additional
  'servername' stanzas to the DSM.SYS file and issuing commands with an
  additional argument "-server=[something other than the default]".

+ "SET ACCESS" so that box-b may access files belonging to box-a. This is a
  persnickety procedure.   We eventually had to do

SET ACCESS backup PATH-FS6\BACKUPS:/BACKUPS*/*/* PATH-FS1.PATHOLOGY.UFL.EDU *

Yick.

Let me walk through that command line, or rather the fourth word in it:

 [node]\[volume]:/[directory]*/*/*

All of these are important to communicate the path to a novell box.  The
equivalent in MS is probably replete with dollar signs and "//" es.

Note the trailing "/*/*" This expresses the notion "All directories, and all
contents of those directories, under the named directory".  I prefer "-r", but
they didn't ask.

Setting this, and getting it right, was not so much "complicated", as "made
longer" by the fact that the Novell client attempts to contact the host server
TSA.  This introduced a two-minute timeout to most commands.  Once the timeout
completed, the commands were executed without incident, but the timeout was
initially nerve-wracking.

+ Return BOX-B to its' normal configuration.  You should now be able to QUERY
  RESTORE and RESTORE with the argument "-filesfrom=BOX-A", so you can access
  the remote backup records.  In the GUI client, you can change where the
  files are coming from by selecting 'utilities | Access another node'.



[Email][Back][Home]


Information Academic Programs Personnel
Departmental Units Department Resources Links

Last modified: Mon Sep 10 17:15:29 EDT 2007