# Databank descriptors

## What is a databank descriptor ?

A databank descriptor is a text file containing the instructions used by *BeeDeeM* to download and install databanks.

There are two types of descriptor:

* **bank descriptor** (.dsc file): descriptor used to describe the installation of a single databank
* **global descriptor** (.gd file): descriptor used to start the installation of one or several databanks

## Description of databases to be managed: the bank descriptor

By default, *BeeDeeM* provides a non-exhaustive list of **descriptors** for processing various sequence databanks and biological classifications (ontologies). All of these files are suffixed with extension “.dsc” and are located in *${conf}* directory.

Each file contains a group of instructions used by *BeeDeeM*:

* to download (via FTP) all files making up the complete distribution of a database&#x20;
* to process the downloaded files to make them usable (decompressing, un-archiving, indexing, etc.)

Here is a sample bank descriptor aims at installing Uniprot\_SwissProt:

```
# Bank name
db.name=Uniprot_SwissProt
# Bank description
db.desc=UniprotKB/SwissProt databank (contains annotations).
# Bank type
db.type=p
# Bank location
db.ldir=${mirrordir}|p|Uniprot_SwissProt

# Server access
ftp.server=ftp.expasy.org
ftp.port=21
ftp.uname=anonymous
ftp.pswd=user@company.com
# Directory to locate files to download
ftp.rdir=/databases/uniprot/current_release/knowledgebase/complete
ftp.rdir.exclude=

# File(s) to retrieve
db.files.include=uniprot_sprot.dat.gz
db.files.exclude=

# Processing tasks
tasks.unit.post=gunzip,idxsw
tasks.global.post=delgz,deltmpidx,formatdb(lclid=false;check=true;nr=true)

# Keep previous release or not
history=0
```

The use of such a file will be explained in the next section.

The full format of the database descriptors is documented in section [Databank descriptor format](/beedeem/getting-started/descriptors-format.md).

## Description of processing to be performed: the global descriptor

The processing that *BeeDeeM* will perform is described in a **global descriptor**.

Here is an example of such a descriptor:

```
# List of banks to retrieve (use bank descriptor name)
db.list=PDB_protein

# What to do (download or info)
db.main.task=download

# Restart a failed process
resume.date=none

# Parameters of the loader engine
task.delay=1000
ftp.delay=5000
ftp.retry=3

# Do we have to send an email to DBMS manager?
mail.smtp.host=
mail.smtp.port=
mail.smtp.sender.mail=
mail.smtp.sender.pswd=
mail.smtp.recipient.mail=
```

By default, *BeeDeeM* has a “test” descriptor for processing the installation of PDB Protein.

This descriptor is the file named 'test.gd' located in the directory *${conf}*.

Note: We will use this file 'test.gd' in the rest of this manual to explain how to use *BeeDeeM*. However, you can create other descriptors (*e.g.* by deriving them from 'test.gd'), but always be sure to save them in the directory *${conf}*.

Before starting any processing, it is **VERY IMPORTANT** to check the following two lines in the global descriptor:

```
db.list=PDB_Protein
resume.date=none
```

The first line is a comma separated list of database descriptors to use (without their ".dsc" extension). It defines which databank(s) will be installed during a single *BeeDeeM* processing.

The second line gives a restart date. This line is only used in the case of a restart after a failure. If you start *BeeDeeM* for the first time or if you are updating the databases, it is absolutely imperative to set "resume.date" to the value *none*. All of this is explained in section [Advanced uses](/beedeem/getting-started/advanced-uses.md).

**But now, let's see how to install a databank using these descriptors!**


---

# Agent Instructions: Querying This Documentation

If you need additional information that is not directly available in this page, you can query the documentation dynamically by asking a question.

Perform an HTTP GET request on the current page URL with the `ask` query parameter:

```
GET https://pgdurand.gitbook.io/beedeem/getting-started/using-descriptors.md?ask=<question>
```

The question should be specific, self-contained, and written in natural language.
The response will contain a direct answer to the question and relevant excerpts and sources from the documentation.

Use this mechanism when the answer is not explicitly present in the current page, you need clarification or additional context, or you want to retrieve related documentation sections.
