informatics institute logo Informatics Institute UMDNJ logo
Bioinformatics
 

Offerings, by Semester

Training
 
 
 
 
Short Unix Workshop: Introduction

Introduction

First, a credit. This tutorial has been inspired and enriched by a turorial written my M.Stonebank@surrey.ac.uk..

The majority of bioinformatics software, and scientific software in general, is developed to run on the UNIX platform. For this reason, it is important that scientists know how to use UNIX. This workshop is designed to give you an introduction to some basic concepts about how UNIX works, and commands that you will use frequently.

There are many tutorials for UNIX which can be found on the Web. Each tutorial will introduce you UNIX in its own way. This workshop will introduce UNIX to you from the perspective of bioinformatics. While we will probably not use these commands frequently, understanding how an operating system works, under the hood, is useful for scientists. Besides, you can add UNIX to your resume!

You are welcome to use your UNIX account and the multiplicity of UNIX tutorials you can google to develop these skills for your own use. The good news is that you can do very fine bioinformatics investigations these days never having to care much about what operating system you are using.

UNIX and Servers

UNIX is a common multi-user operating system. By operating system, we mean the suite of programs which make the computer work. UNIX is used by some the workstations and servers within the school. Other workstations may be running different operating systems such as Linux or Windows.

A multi-user operating system is one which allows multiple users to interact with a computer at the same time on a server. Traditionally, a single-user operating system, such as most versions of Microsoft Windows you have dealt with, are designed best to be used by one person at a time. These defiiitions are blurring, however.

On X terminals and workstations, X-Windows provides a graphical interface between the user and UNIX. However, knowledge of UNIX is required for operations which are not covered by a graphical program, or for when there is no X windows system, for example, in a (secure) shell session.

The UNIX operating system

The UNIX operating system is made up of three parts; the kernel, the shell and the programs.

The kernel

The kernel of UNIX is the heart of the operating system: it allocates time and memory to programs and handles the filesystems and communications in response to system calls.

As an illustration of the way that the shell and the kernel work together, suppose a user types rm myfile (which has the effect of removing the file called myfile). The shell searches the filesystem for the file containing the program rm, and then requests the kernel, through system calls, to execute the program rm on myfile. When the process rm myfile has finished running, the shell then returns the UNIX prompt to the user, indicating that it is waiting for further commands.

The shell

The shell acts as an interface between the user and the kernel. When a user logs in, the login program checks the username and password, and then starts another program called the shell. The shell is a command line interpreter (CLI). It interprets the commands the user types in and arranges for them to be carried out. The commands are themselves programs: when they terminate, the shell gives the user another prompt waiting for the user to enter another command.

The adept user can customise his/her own shell, and users can use different shells on the same machine. Staff and students in the school have the kcsh shell, or Korn shell by default.

The Korn shell is the most advanced of the shells that are "officially" distributed with UNIX systems. Some of the features of Korn shell include:

  • Command-line editing, issuing text instructions (and getting text output)
  • Integrated programming features: the functionality of several external UNIX commands, including test, expr, getopt, and echo, has been integrated into the shell itself, enabling common programming tasks to be done more cleanly and without creating extra processes.
  • Control structures, especially the select construct, which enables easy menu generation.
  • Debugging primitives that make it possible to write tools that help programmers debug their shell code.
  • Regular expressions, well known to users of UNIX utilities like grep and awk, have been added to the standard set of filename wildcards and to the shell variable facility.
  • Advanced I/O features, including the ability to do two-way communication with concurrent processes (coroutines).
  • New options and variables that give you more ways to customize your environment.
  • Increased speed of shell code execution.
  • Security features that help protect against "Trojan horses" and other types of break-in schemes.

Files and processes

Everything in UNIX is either a file or a process.

A process is an executing program identified by a unique PID (process identifier).

A file is a collection of data. They are created by users using text editors, running programs, etc.

Examples of files:

  • a document (report, essay etc.)
  • the text of a program written in some high-level programming language
  • instructions comprehensible directly to the machine and incomprehensible to a casual user, for example, a collection of binary digits (an executable or binary file);
  • a directory, containing information about its contents, which may be a mixture of other directories (subdirectories) and ordinary files.

The Directory Structure

All the files are grouped together in the directory structure. The file-system is arranged in a hierarchical structure, like an inverted tree. The top of the hierarchy is traditionally called root.

Unix File System

In this diagram, the directory /users2/joe contains a subdirectory classes. As you will see, on our systems, most or all you you will be listed in a directory called /home

[ Next: Logging in and Basic Commands ]

Page last modified January 9, 2008

 

UMDNJ logo Informatics Institute informatics institute logo informatics institute logo