Catalog Format Changes¶
The catalogs have changed formats slightly, though the toolkit should be compatible with most so far (except possibly for backwards compatibility with single-column catalogs). Some changes relate to the additional files that are included in the catalog directory
1.0.0 (2012-12)¶
- Initial format bgzip file with 4 columns:
1.1.0 (2014?)¶
- Included datasource.properties and columns.tsv
- Directory structure should be:
- {SOURCE}/{VERSION}_{ASSEMBLY}[.p{PATCH}]/{TYPE}[_NODUPES].v{VERSION}
- Ex: /data5/bsi/catalogs/bior/v1/ESP/V2_GRCh38/variants_nodups.v1/
- Includes blacklist files for columns to ignore for wrapper apps like biorweb
- Chromosomes should conform to those in the human chromosome list and in sorted order
- Chromosome should be “UNKNOWN” when not known or not applicable, and
min,max should be 0,1. I think we needed to have at least one
non-zero position. Previous was dots or (. 0 0) in the cols like:
- /data5/bsi/catalogs/bior/v1/omim/2013_02_27/genemap_GRCh37.tsv.bgz
- Dataset property added in datasource.properties
- Can have a single column for JSON in the catalog
- Added HumanReadableName to columns.tsv. By default, this is set to the same as the key
1.1.1 (2017)¶
- Reflects the new automation format
- Will include markdown files
- Will include build files from automation, as well as stats (in build/ subdirectory)