# Databank descriptors

## What is a databank descriptor ?

A databank descriptor is a text file containing the instructions used by *BeeDeeM* to download and install databanks.

There are two types of descriptor:

* **bank descriptor** (.dsc file): descriptor used to describe the installation of a single databank
* **global descriptor** (.gd file): descriptor used to start the installation of one or several databanks

## Description of databases to be managed: the bank descriptor

By default, *BeeDeeM* provides a non-exhaustive list of **descriptors** for processing various sequence databanks and biological classifications (ontologies). All of these files are suffixed with extension “.dsc” and are located in *${conf}* directory.

Each file contains a group of instructions used by *BeeDeeM*:

* to download (via FTP) all files making up the complete distribution of a database&#x20;
* to process the downloaded files to make them usable (decompressing, un-archiving, indexing, etc.)

Here is a sample bank descriptor aims at installing Uniprot\_SwissProt:

```
# Bank name
db.name=Uniprot_SwissProt
# Bank description
db.desc=UniprotKB/SwissProt databank (contains annotations).
# Bank type
db.type=p
# Bank location
db.ldir=${mirrordir}|p|Uniprot_SwissProt

# Server access
ftp.server=ftp.expasy.org
ftp.port=21
ftp.uname=anonymous
ftp.pswd=user@company.com
# Directory to locate files to download
ftp.rdir=/databases/uniprot/current_release/knowledgebase/complete
ftp.rdir.exclude=

# File(s) to retrieve
db.files.include=uniprot_sprot.dat.gz
db.files.exclude=

# Processing tasks
tasks.unit.post=gunzip,idxsw
tasks.global.post=delgz,deltmpidx,formatdb(lclid=false;check=true;nr=true)

# Keep previous release or not
history=0
```

The use of such a file will be explained in the next section.

The full format of the database descriptors is documented in section [Databank descriptor format](https://pgdurand.gitbook.io/beedeem/getting-started/descriptors-format).

## Description of processing to be performed: the global descriptor

The processing that *BeeDeeM* will perform is described in a **global descriptor**.

Here is an example of such a descriptor:

```
# List of banks to retrieve (use bank descriptor name)
db.list=PDB_protein

# What to do (download or info)
db.main.task=download

# Restart a failed process
resume.date=none

# Parameters of the loader engine
task.delay=1000
ftp.delay=5000
ftp.retry=3

# Do we have to send an email to DBMS manager?
mail.smtp.host=
mail.smtp.port=
mail.smtp.sender.mail=
mail.smtp.sender.pswd=
mail.smtp.recipient.mail=
```

By default, *BeeDeeM* has a “test” descriptor for processing the installation of PDB Protein.

This descriptor is the file named 'test.gd' located in the directory *${conf}*.

Note: We will use this file 'test.gd' in the rest of this manual to explain how to use *BeeDeeM*. However, you can create other descriptors (*e.g.* by deriving them from 'test.gd'), but always be sure to save them in the directory *${conf}*.

Before starting any processing, it is **VERY IMPORTANT** to check the following two lines in the global descriptor:

```
db.list=PDB_Protein
resume.date=none
```

The first line is a comma separated list of database descriptors to use (without their ".dsc" extension). It defines which databank(s) will be installed during a single *BeeDeeM* processing.

The second line gives a restart date. This line is only used in the case of a restart after a failure. If you start *BeeDeeM* for the first time or if you are updating the databases, it is absolutely imperative to set "resume.date" to the value *none*. All of this is explained in section [Advanced uses](https://pgdurand.gitbook.io/beedeem/getting-started/advanced-uses).

**But now, let's see how to install a databank using these descriptors!**
