Plant Transcription Factor Database
v4.0Previous version: v1.0, v2.0, v3.0
|Home BLAST Prediction RegMap ATRM Download Help About Links|
- The flowchart for construction of PlantTFDB
- Data source
- Pipeline to construct comprehensive protein dataset
- Family assignment rules
- Thresholds for domain identification
- Pipeline for parsing BLAST reciprocal best hits (RBHs) and inferring orthologous groups
- Pipeline for GO annotation
- Curation and projection of TF binding motifs
- Transcription factor information
- Multiple sequence alignment
- Phylogenetic trees
- Quick search
- TF prediction server
- Help for PlantRegMap
Transcription Factor Information
The ID of transcription factor collected in PlantTFDB. For species with genome annotation, IDs from genome annotation were adopted as the PlantTFDB ID directly. For species without genome annotation, a unique TF ID was assigned for each TF, which consists of three characters which represent the species (e.g. Aan represents
Artemisia annua) and 6 figures.
The taxonomic ID and lineage for each organism was collected from NCBI Taxonomy.
The gene (data source) coding for this transcription factor.
Gene Model ID
The ID of gene model, which was extracted from the original data source. Gene model ID can be searched in advanced search page.
Gene Model Type
The type of gene model. There are three types of gene model in PlantTFDB:
'genome' -- gene models came from genome annotation;
'PU_ref' -- gene models came from PlantGDB and UniGene, and they were selected as a representation of a cluster of PUTs and Unigene;
'PU_unref' -- gene models came from PlantGDB and UniGene, but they were not selected as a representation of a cluster of PUTs and Unigene;
The source where gene model was got
The Domain used to identify and classify transcription factors.
Domain and other features identified by InterProScan v5.
Plant Ontology (PO) was downloaded from TAIR10 for
A. thalianaand Plant Ontology Consortium for other species.
Nucleic Localization Signal
Nucleic Localization signal (NLS) predicted by predictnls.
The best Blast hit from PDB.
The express description (tissue specificity and developmental stage) was collected from UniProt. The best Blast hit from UniGene, GEO, Genevisible and the direct links to Expression Atlas, AtGenExpress and ATTED-II were added.
Manually curated regulations are collected from ATRM.
Protein-promoter and protein-protein interaction data were collected from BioGRID, IntAct, and BIND.
Multiple Sequence Alignment
Multiple sequence alignment for full length transcription factors was inferred using T-Coffee(v9.03).
Multiple sequence alignment for domain was constructed through Hidden Markov Model-guided method.
Phylogenetic trees for TFs within a family intra-species and within the same orthologous group are inferred using MrBayes (v3.2.6) based on the WAG model for 50,000 generations, and the result tree is an unrooted tree.
Phylogenetic trees for TFs of a family from all species are inferred using FastTree (v2.1.9) based on the WAG model with 100 times bootstraps, and the result tree is an unrooted tree.
In quick search box, you can search the TF using TF ID or common name.
TF Prediction Server
A TF prediction server has been upgraded in this version. The family assignment rules and thresholds determined by established methods (see details in the supplemental materials) are used to identify transcrption factors in the input sequences. When users input nucleic acid sequences, ESTScan 3.0 is employed to identify CDS regions of input nucleic acid sequences and translate them to protein sequences. When GC content of input sequences is less than 48%, the ESTScan model trained from the mRNA of
Arabidopsis thalianawill be used. Otherwise, the model trained from
Oryza sativawill be used. By checking "Best hit in
Arabidopsis thaliana", links to the best hits in
Arabidopsis thalianawill be added in the result for predicted transcription factors. Users can access it here to identify TFs in multiple sequences.