ISA tools API¶
The ISA tools API is published on PyPI as the isatools
Python package. The package aims to provide you, the
developer, with a set of tools to help you easily and quickly build your own ISA objects, validate, and convert
between serializations of ISA-formatted datasets and other formats/schemas
(e.g. SRA schemas). The goal of this package is to provide a
flexible way to build and use ISA content, as well as provide utility functions for file conversions and validation.
Note
isatools
is currently only supported in Python 3.4 and 3.5. Python 2.7 support is present in the py2
source code branch in Github.
- Installation
- ISA model
- Creating objects
- Tutorial: describing a simple experiment with objects
- Creating ISA objects based on study-design
- Converting between ISA formats
- Downloading files stored in Github
- Validating ISA-Tab and ISA JSON
- Importing data in ISA formats
- Exporting data in ISA formats
- Creating ISA content with a Sample and Assay plan
- Known Issues
License¶
This code is licensed under the CPAL License.
Installation¶
Requires: Python 3.4, 3.5 or 3.6; additionally Java 1.6+ for SRA and Biocrates importers
Installation from the Python Package Index¶
The ISA API is published on the Python Package Index (PyPI) as the isatools package ((see https://pypi.python.org/pypi/isatools/), and you can use pip
to install it.
$ pip install isatools
Now you’re ready to get started!
Installation from sources¶
The ISA-API source code is hosted on GitHub at: https://github.com/ISA-tools/isa-api You can get the source code by running the following git command:
$ git clone https://github.com/ISA-tools/isa-api/
We recommend using a virtual environment for your Python projects. virtualenv
is a tool for creating isolated
Python runtime environments. It does not install separate copies of Python, but rather it does provide a clever way
to keep different configurations of environment cleanly separated.
If you are on Mac OS X or Linux, one of the following two commands will work for you:
$ sudo easy_install virtualenv
or even better:
$ sudo pip install virtualenv
Then, you can create a virtual environament:
$ virtualenv venv
and activate it:
$ source venv/bin/activate
Finally, you should install the requirements with:
$ pip install -r requirements.txt
or
$ pip install --upgrade -r requirements.txt
if you want to upgrade the requirements already installed.
Install into your local Python environment with:
python setup.py install
or open up your favourite IDE to explore the codebase. We use JetBrains’ https://www.jetbrains.com/pycharm/.
Now you’re ready to get started!
For full instructions on installing and using virtualenv
see their documentation.
Running tests¶
The tests in the ISA-API rely on datasets available in the test branch of the ISAdatasets repository.
Thus, the first step for running the tests is to clone that branch to the tests/data folder from the root of your isa-api source code project:
git clone -b tests --single-branch http://github.com/ISA-tools/ISAdatasets tests/data
After that, you can run the test with the usual command:
python setup.py test
ISA data model¶
For the ISA tools API, we have represented the ISA model version 1.0 (see the ISA-Tab specification) with a set of JSON schemas, which provide the information the ISA model maintains for each of the objects.
The objective of designing and developing JSON schemas is to support a new serialization of the ISA-Tab model in JSON format, in addition to existing serializations in tabular format and RDF format.
The core set of schemas for ISA model version 1.0 can be found in the folder isatools/resources/schemas/isa_model_version_1_0_schemas/core.
The main object is the Investigation, which groups a set of Studies and maintains associated information such as Publications, People involved and the ontologies used for annotating the dataset.
Creating ISA content¶
The ISA API provides a set of Python classes that you can use to create ISA content with.
The three main objects that you need to create ISA content are:
Investigation
Study
Assay
…of course!
Important
As a pre-requisite to using ISA model classes, please make sure you have read and understood the ISA Abstract Model that the ISA formats are based on.
Getting started¶
In isatools.model
, the class Investigation
is used as the top level container for all other ISA content.
The Investigation
Python class corresponds to the
Investigation as defined in the
ISA Model Specification. For example, to create an empty
ISA structure consisting of an investigation with one study, you might use the following code:
>>> from isatools.model import *
>>> investigation = Investigation()
>>> investigation.studies.append(Study()) # adds a new default Study object to investigation
This code simply creates one Investigation
object, and adds a single Study
object to it. The constructor of
each of these objects creates empty structures for each of these. We can then inspect the structure by accessing
its instance variables as follows:
>>> investigation.studies
[<isatools.model.Study object>]
>>> investigation.studies[0].assays
[]
>>> investigation.title
''
Since we have not set any data in our ISA objects, these are by default mostly empty at the moment. We can set some instance variables with data as follows:
>>> investigation.title = "My ISA Investigation"
>>> investigation.title
'My ISA Investigation'
>>> investigation.studies[0].title = "My ISA Study"
>>> investigation.studies[0].title
'My ISA Study'
>>> investigation.studies[0].assays.append(Assay()) # adds a new default Assay object to study
>>> i.studies[0].assays
[<isatools.model.Assay object>]
If you then write these out to ISA-Tab, we can inspect the output written into an i_investigation.txt
file. We
do this using the isatab
module to dump()
the Investigation
object we created, as follows:
>>> from isatools import isatab
>>> isatab.dump(investigation, 'tmp/') # dump out ISA-Tab to tmp/
<isatools.model.Investigation object>
If all went as expected, you should find an i_investigation.txt
file with the standard Investigation sections,
one Study section structured as defined by the
ISA-Tab Specification.
Hint
Remember that when you dump()
ISA content, you do it on the Investigation
object. This means any
Study
and Assay
objects and content must be attached to the Investigation
for it to be serialized out.
Different classes in isatools.model
have class constructors and instance variables that roughly map to the
ISA Abstract Model. For full details of how to instantiate model classes, access and manipulate ISA data as objects,
please inspect the module’s docstrings.
Obviously this isn’t enough to create a fully populated ISA investigation, but we would recommend that you have a look
in the isatools.model
package to inspect all the docstring documentation that is included with each of the ISA
model classes.
Tutorial: describing a simple experiment¶
In this section, we provide a basic example of creating a complete experiment descriptor using the ISA API’s model classes. The descriptor is not complete and realistic, but it demonstrates the range of component classes that you can use to create ISA content, including things like sample characteristics, ontology annotations and units.
Important
As a pre-requisite to using ISA model please make sure you have read and understood the ISA Abstract Model that the ISA formats are based on.
Firstly, we need to import the ISA API’s model classes from the isatools
PyPI package.
from isatools.model import *
Next, we build our descriptor encapsulated in a single Python function to simplify the example code. In a real application or script, you might decompose the functionality and hook it up to interactive components to solicit feedback from a user on-the-fly.
def create_descriptor():
"""Returns a simple but complete ISA-Tab 1.0 descriptor for illustration."""
# Create an empty Investigation object and set some values to the instance variables.
investigation = Investigation()
investigation.identifier = "i1"
investigation.title = "My Simple ISA Investigation"
investigation.description = "We could alternatively use the class constructor's parameters to set some default " \
"values at the time of creation, however we want to demonstrate how to use the " \
"object's instance variables to set values."
investigation.submission_date = "2016-11-03"
investigation.public_release_date = "2016-11-03"
# Create an empty Study object and set some values. The Study must have a filename, otherwise when we serialize it
# to ISA-Tab we would not know where to write it. We must also attach the study to the investigation by adding it
# to the 'investigation' object's list of studies.
study = Study(filename="s_study.txt")
study.identifier = "s1"
study.title = "My ISA Study"
study.description = "Like with the Investigation, we could use the class constructor to set some default values, " \
"but have chosen to demonstrate in this example the use of instance variables to set initial " \
"values."
study.submission_date = "2016-11-03"
study.public_release_date = "2016-11-03"
investigation.studies.append(study)
# Some instance variables are typed with different objects and lists of objects. For example, a Study can have a
# list of design descriptors. A design descriptor is an Ontology Annotation describing the kind of study at hand.
# Ontology Annotations should typically reference an Ontology Source. We demonstrate a mix of using the class
# constructors and setting values with instance variables. Note that the OntologyAnnotation object
# 'intervention_design' links its 'term_source' directly to the 'obi' object instance. To ensure the OntologySource
# is encapsulated in the descriptor, it is added to a list of 'ontology_source_references' in the Investigation
# object. The 'intervention_design' object is then added to the list of 'design_descriptors' held by the Study
# object.
obi = OntologySource(name='OBI', description="Ontology for Biomedical Investigations")
investigation.ontology_source_references.append(obi)
intervention_design = OntologyAnnotation(term_source=obi)
intervention_design.term = "intervention design"
intervention_design.term_accession = "http://purl.obolibrary.org/obo/OBI_0000115"
study.design_descriptors.append(intervention_design)
# Other instance variables common to both Investigation and Study objects include 'contacts' and 'publications',
# each with lists of corresponding Person and Publication objects.
contact = Person(first_name="Alice", last_name="Robertson", affiliation="University of Life", roles=[OntologyAnnotation(term='submitter')])
study.contacts.append(contact)
publication = Publication(title="Experiments with Elephants", author_list="A. Robertson, B. Robertson")
publication.pubmed_id = "12345678"
publication.status = OntologyAnnotation(term="published")
study.publications.append(publication)
# To create the study graph that corresponds to the contents of the study table file (the s_*.txt file), we need
# to create a process sequence. To do this we use the Process class and attach it to the Study object's
# 'process_sequence' list instance variable. Each process must be linked with a Protocol object that is attached to
# a Study object's 'protocols' list instance variable. The sample collection Process object usually has as input
# a Source material and as output a Sample material.
# Here we create one Source material object and attach it to our study.
source = Source(name='source_material')
study.sources.append(source)
# Then we create three Sample objects, with organism as Homo Sapiens, and attach them to the study. We use the utility function
# batch_create_material() to clone a prototype material object. The function automatiaclly appends
# an index to the material name. In this case, three samples will be created, with the names
# 'sample_material-0', 'sample_material-1' and 'sample_material-2'.
prototype_sample = Sample(name='sample_material', derives_from=[source])
ncbitaxon = OntologySource(name='NCBITaxon', description="NCBI Taxonomy")
investigation.ontology_source_references.append(ncbitaxon) # remember to add the newly declared ontology source to the parent investigation
characteristic_organism = Characteristic(category=OntologyAnnotation(term="Organism"),
value=OntologyAnnotation(term="Homo Sapiens", term_source=ncbitaxon,
term_accession="http://purl.bioontology.org/ontology/NCBITAXON/9606"))
prototype_sample.characteristics.append(characteristic_organism)
study.samples = batch_create_materials(prototype_sample, n=3) # creates a batch of 3 samples
# Now we create a single Protocol object that represents our sample collection protocol, and attach it to the
# study object. Protocols must be declared before we describe Processes, as a processing event of some sort
# must execute some defined protocol. In the case of the class model, Protocols should therefore be declared
# before Processes in order for the Process to be linked to one.
sample_collection_protocol = Protocol(name="sample collection",
protocol_type=OntologyAnnotation(term="sample collection"))
study.protocols.append(sample_collection_protocol)
sample_collection_process = Process(executes_protocol=sample_collection_protocol)
# Next, we link our materials to the Process. In this particular case, we are describing a sample collection
# process that takes one source material, and produces three different samples.
#
# (source_material)->(sample collection)->[(sample_material-0), (sample_material-1), (sample_material-2)]
for src in study.sources:
sample_collection_process.inputs.append(src)
for sam in study.samples:
sample_collection_process.outputs.append(sam)
# Finally, attach the finished Process object to the study process_sequence. This can be done many times to
# describe multiple sample collection events.
study.process_sequence.append(sample_collection_process)
# Next, we build n Assay object and attach two protocols, extraction and sequencing.
assay = Assay(filename="a_assay.txt")
extraction_protocol = Protocol(name='extraction', protocol_type=OntologyAnnotation(term="material extraction"))
study.protocols.append(extraction_protocol)
sequencing_protocol = Protocol(name='sequencing', protocol_type=OntologyAnnotation(term="material sequencing"))
study.protocols.append(sequencing_protocol)
# To build out assay graphs, we enumereate the samples from the study-level, and for each sample we create an
# extraction process and a sequencing process. The extraction process takes as input a sample material, and produces
# an extract material. The sequencing process takes the extract material and produces a data file. This will
# produce three graphs, from sample material through to data, as follows:
#
# (sample_material-0)->(extraction)->(extract-0)->(sequencing)->(sequenced-data-0)
# (sample_material-1)->(extraction)->(extract-1)->(sequencing)->(sequenced-data-1)
# (sample_material-2)->(extraction)->(extract-2)->(sequencing)->(sequenced-data-2)
#
# Note that the extraction processes and sequencing processes are distinctly separate instances, where the three
# graphs are NOT interconnected.
for i, sample in enumerate(study.samples):
# create an extraction process that executes the extraction protocol
extraction_process = Process(executes_protocol=extraction_protocol)
# extraction process takes as input a sample, and produces an extract material as output
extraction_process.inputs.append(sample)
material = Material(name="extract-{}".format(i))
material.type = "Extract Name"
extraction_process.outputs.append(material)
# create a sequencing process that executes the sequencing protocol
sequencing_process = Process(executes_protocol=sequencing_protocol)
sequencing_process.name = "assay-name-{}".format(i)
sequencing_process.inputs.append(extraction_process.outputs[0])
# Sequencing process usually has an output data file
datafile = DataFile(filename="sequenced-data-{}".format(i), label="Raw Data File")
sequencing_process.outputs.append(datafile)
# Ensure Processes are linked forward and backward. plink(from_process, to_process) is a function to set
# these links for you. It is found in the isatools.model package
plink(extraction_process, sequencing_process)
# make sure the extract, data file, and the processes are attached to the assay
assay.data_files.append(datafile)
assay.samples.append(sample)
assay.other_material.append(material)
assay.process_sequence.append(extraction_process)
assay.process_sequence.append(sequencing_process)
assay.measurement_type = OntologyAnnotation(term="gene sequencing")
assay.technology_type = OntologyAnnotation(term="nucleotide sequencing")
# attach the assay to the study
study.assays.append(assay)
#IMPORTANT: remember to list all Characteristics used in the study object: do as follows:
study.characteristic_categories.append(characteristic_organism.category)
To write out the ISA-Tab, you can use the isatab.dumps()
function:
from isatools import isatab
return isatab.dumps(investigation) # dumps() writes out the ISA as a string representation of the ISA-Tab
The function listed above is designed to return all three files as a single string output for ease of inspection.
Alternatively you could do something like dump(isa_obj=investigation, output_path='./')
to write the files to
the file system.
Alternatively to write out the ISA JSON, you can use the ISAJSONEncoder
class with the Python json
package:
import json
from isatools.isajson import ISAJSONEncoder
# Note that the extra parameters sort_keys, indent and separators are to make the output more human-readable.
return json.dumps(investigation, cls=ISAJSONEncoder, sort_keys=True, indent=4, separators=(',', ': '))
The final lines of code is a main
routine to invoke the create_descriptor()
function.
if __name__ == '__main__':
print(create_descriptor())
If you save the file into something like createSimpleISA.py
, to execute this script on the command line and view
the output, you would run it with:
python createSimpleISA.py
This example can be found in the isatools.examples
package in
createSimpleISAtab.py and
createSimpleISAJSON.py
createSimpleISAtab.py¶
An example of using the ISA model classes to create an ISA-Tab set of files.
#!/usr/bin/env python
from isatools.model import *
def create_descriptor():
"""Returns a simple but complete ISA-Tab 1.0 descriptor for illustration."""
# Create an empty Investigation object and set some values to the instance variables.
investigation = Investigation()
investigation.identifier = "i1"
investigation.title = "My Simple ISA Investigation"
investigation.description = "We could alternatively use the class constructor's parameters to set some default " \
"values at the time of creation, however we want to demonstrate how to use the " \
"object's instance variables to set values."
investigation.submission_date = "2016-11-03"
investigation.public_release_date = "2016-11-03"
# Create an empty Study object and set some values. The Study must have a filename, otherwise when we serialize it
# to ISA-Tab we would not know where to write it. We must also attach the study to the investigation by adding it
# to the 'investigation' object's list of studies.
study = Study(filename="s_study.txt")
study.identifier = "s1"
study.title = "My ISA Study"
study.description = "Like with the Investigation, we could use the class constructor to set some default values, " \
"but have chosen to demonstrate in this example the use of instance variables to set initial " \
"values."
study.submission_date = "2016-11-03"
study.public_release_date = "2016-11-03"
investigation.studies.append(study)
# Some instance variables are typed with different objects and lists of objects. For example, a Study can have a
# list of design descriptors. A design descriptor is an Ontology Annotation describing the kind of study at hand.
# Ontology Annotations should typically reference an Ontology Source. We demonstrate a mix of using the class
# constructors and setting values with instance variables. Note that the OntologyAnnotation object
# 'intervention_design' links its 'term_source' directly to the 'obi' object instance. To ensure the OntologySource
# is encapsulated in the descriptor, it is added to a list of 'ontology_source_references' in the Investigation
# object. The 'intervention_design' object is then added to the list of 'design_descriptors' held by the Study
# object.
obi = OntologySource(name='OBI', description="Ontology for Biomedical Investigations")
investigation.ontology_source_references.append(obi)
intervention_design = OntologyAnnotation(term_source=obi)
intervention_design.term = "intervention design"
intervention_design.term_accession = "http://purl.obolibrary.org/obo/OBI_0000115"
study.design_descriptors.append(intervention_design)
# Other instance variables common to both Investigation and Study objects include 'contacts' and 'publications',
# each with lists of corresponding Person and Publication objects.
contact = Person(first_name="Alice", last_name="Robertson", affiliation="University of Life", roles=[OntologyAnnotation(term='submitter')])
study.contacts.append(contact)
publication = Publication(title="Experiments with Elephants", author_list="A. Robertson, B. Robertson")
publication.pubmed_id = "12345678"
publication.status = OntologyAnnotation(term="published")
study.publications.append(publication)
# To create the study graph that corresponds to the contents of the study table file (the s_*.txt file), we need
# to create a process sequence. To do this we use the Process class and attach it to the Study object's
# 'process_sequence' list instance variable. Each process must be linked with a Protocol object that is attached to
# a Study object's 'protocols' list instance variable. The sample collection Process object usually has as input
# a Source material and as output a Sample material.
# Here we create one Source material object and attach it to our study.
source = Source(name='source_material')
study.sources.append(source)
# Then we create three Sample objects, with organism as Homo Sapiens, and attach them to the study. We use the utility function
# batch_create_material() to clone a prototype material object. The function automatically appends
# an index to the material name. In this case, three samples will be created, with the names
# 'sample_material-0', 'sample_material-1' and 'sample_material-2'.
prototype_sample = Sample(name='sample_material', derives_from=[source])
ncbitaxon = OntologySource(name='NCBITaxon', description="NCBI Taxonomy")
characteristic_organism = Characteristic(category=OntologyAnnotation(term="Organism"),
value=OntologyAnnotation(term="Homo Sapiens", term_source=ncbitaxon,
term_accession="http://purl.bioontology.org/ontology/NCBITAXON/9606"))
prototype_sample.characteristics.append(characteristic_organism)
study.samples = batch_create_materials(prototype_sample, n=3) # creates a batch of 3 samples
# Now we create a single Protocol object that represents our sample collection protocol, and attach it to the
# study object. Protocols must be declared before we describe Processes, as a processing event of some sort
# must execute some defined protocol. In the case of the class model, Protocols should therefore be declared
# before Processes in order for the Process to be linked to one.
sample_collection_protocol = Protocol(name="sample collection",
protocol_type=OntologyAnnotation(term="sample collection"))
study.protocols.append(sample_collection_protocol)
sample_collection_process = Process(executes_protocol=sample_collection_protocol)
# Next, we link our materials to the Process. In this particular case, we are describing a sample collection
# process that takes one source material, and produces three different samples.
#
# (source_material)->(sample collection)->[(sample_material-0), (sample_material-1), (sample_material-2)]
for src in study.sources:
sample_collection_process.inputs.append(src)
for sam in study.samples:
sample_collection_process.outputs.append(sam)
# Finally, attach the finished Process object to the study process_sequence. This can be done many times to
# describe multiple sample collection events.
study.process_sequence.append(sample_collection_process)
# Next, we build n Assay object and attach two protocols, extraction and sequencing.
assay = Assay(filename="a_assay.txt")
extraction_protocol = Protocol(name='extraction', protocol_type=OntologyAnnotation(term="material extraction"))
study.protocols.append(extraction_protocol)
sequencing_protocol = Protocol(name='sequencing', protocol_type=OntologyAnnotation(term="material sequencing"))
study.protocols.append(sequencing_protocol)
# To build out assay graphs, we enumereate the samples from the study-level, and for each sample we create an
# extraction process and a sequencing process. The extraction process takes as input a sample material, and produces
# an extract material. The sequencing process takes the extract material and produces a data file. This will
# produce three graphs, from sample material through to data, as follows:
#
# (sample_material-0)->(extraction)->(extract-0)->(sequencing)->(sequenced-data-0)
# (sample_material-1)->(extraction)->(extract-1)->(sequencing)->(sequenced-data-1)
# (sample_material-2)->(extraction)->(extract-2)->(sequencing)->(sequenced-data-2)
#
# Note that the extraction processes and sequencing processes are distinctly separate instances, where the three
# graphs are NOT interconnected.
for i, sample in enumerate(study.samples):
# create an extraction process that executes the extraction protocol
extraction_process = Process(executes_protocol=extraction_protocol)
# extraction process takes as input a sample, and produces an extract material as output
extraction_process.inputs.append(sample)
material = Material(name="extract-{}".format(i))
material.type = "Extract Name"
extraction_process.outputs.append(material)
# create a sequencing process that executes the sequencing protocol
sequencing_process = Process(executes_protocol=sequencing_protocol)
sequencing_process.name = "assay-name-{}".format(i)
sequencing_process.inputs.append(extraction_process.outputs[0])
# Sequencing process usually has an output data file
datafile = DataFile(filename="sequenced-data-{}".format(i), label="Raw Data File", generated_from=[sample])
sequencing_process.outputs.append(datafile)
# Ensure Processes are linked forward and backward. plink(from_process, to_process) is a function to set
# these links for you. It is found in the isatools.model package
plink(extraction_process, sequencing_process)
# make sure the extract, data file, and the processes are attached to the assay
assay.samples.append(sample)
assay.data_files.append(datafile)
assay.other_material.append(material)
assay.process_sequence.append(extraction_process)
assay.process_sequence.append(sequencing_process)
assay.measurement_type = OntologyAnnotation(term="gene sequencing")
assay.technology_type = OntologyAnnotation(term="nucleotide sequencing")
# attach the assay to the study
study.assays.append(assay)
from isatools import isatab
return isatab.dumps(investigation) # dumps() writes out the ISA as a string representation of the ISA-Tab
if __name__ == '__main__':
print(create_descriptor()) # print the result to stdout
createSimpleISAjson.py¶
An example of using the ISA model classes to create an ISA-JSON file.
#!/usr/bin/env python
from isatools.model import *
def create_descriptor():
"""Returns a simple but complete ISA-JSON 1.0 descriptor for illustration."""
# Create an empty Investigation object and set some values to the instance variables.
investigation = Investigation()
investigation.identifier = "i1"
investigation.title = "My Simple ISA Investigation"
investigation.description = "We could alternatively use the class constructor's parameters to set some default " \
"values at the time of creation, however we want to demonstrate how to use the " \
"object's instance variables to set values."
investigation.submission_date = "2016-11-03"
investigation.public_release_date = "2016-11-03"
# Create an empty Study object and set some values. The Study must have a filename, otherwise when we serialize it
# to ISA-Tab we would not know where to write it. We must also attach the study to the investigation by adding it
# to the 'investigation' object's list of studies.
study = Study(filename="s_study.txt")
study.identifier = "s1"
study.title = "My ISA Study"
study.description = "Like with the Investigation, we could use the class constructor to set some default values, " \
"but have chosen to demonstrate in this example the use of instance variables to set initial " \
"values."
study.submission_date = "2016-11-03"
study.public_release_date = "2016-11-03"
investigation.studies.append(study)
# Some instance variables are typed with different objects and lists of objects. For example, a Study can have a
# list of design descriptors. A design descriptor is an Ontology Annotation describing the kind of study at hand.
# Ontology Annotations should typically reference an Ontology Source. We demonstrate a mix of using the class
# constructors and setting values with instance variables. Note that the OntologyAnnotation object
# 'intervention_design' links its 'term_source' directly to the 'obi' object instance. To ensure the OntologySource
# is encapsulated in the descriptor, it is added to a list of 'ontology_source_references' in the Investigation
# object. The 'intervention_design' object is then added to the list of 'design_descriptors' held by the Study
# object.
obi = OntologySource(name='OBI', description="Ontology for Biomedical Investigations")
investigation.ontology_source_references.append(obi)
intervention_design = OntologyAnnotation(term_source=obi)
intervention_design.term = "intervention design"
intervention_design.term_accession = "http://purl.obolibrary.org/obo/OBI_0000115"
study.design_descriptors.append(intervention_design)
# Other instance variables common to both Investigation and Study objects include 'contacts' and 'publications',
# each with lists of corresponding Person and Publication objects.
contact = Person(first_name="Alice", last_name="Robertson", affiliation="University of Life", roles=[OntologyAnnotation(term='submitter')])
study.contacts.append(contact)
publication = Publication(title="Experiments with Elephants", author_list="A. Robertson, B. Robertson")
publication.pubmed_id = "12345678"
publication.status = OntologyAnnotation(term="published")
study.publications.append(publication)
# To create the study graph that corresponds to the contents of the study table file (the s_*.txt file), we need
# to create a process sequence. To do this we use the Process class and attach it to the Study object's
# 'process_sequence' list instance variable. Each process must be linked with a Protocol object that is attached to
# a Study object's 'protocols' list instance variable. The sample collection Process object usually has as input
# a Source material and as output a Sample material.
# Here we create one Source material object and attach it to our study.
source = Source(name='source_material')
study.sources.append(source)
# Then we create three Sample objects, with organism as Homo Sapiens, and attach them to the study. We use the utility function
# batch_create_material() to clone a prototype material object. The function automatically appends
# an index to the material name. In this case, three samples will be created, with the names
# 'sample_material-0', 'sample_material-1' and 'sample_material-2'.
prototype_sample = Sample(name='sample_material', derives_from=[source])
ncbitaxon = OntologySource(name='NCBITaxon', description="NCBI Taxonomy")
characteristic_organism = Characteristic(category=OntologyAnnotation(term="Organism"),
value=OntologyAnnotation(term="Homo Sapiens", term_source=ncbitaxon,
term_accession="http://purl.bioontology.org/ontology/NCBITAXON/9606"))
prototype_sample.characteristics.append(characteristic_organism)
study.samples = batch_create_materials(prototype_sample, n=3) # creates a batch of 3 samples
# Now we create a single Protocol object that represents our sample collection protocol, and attach it to the
# study object. Protocols must be declared before we describe Processes, as a processing event of some sort
# must execute some defined protocol. In the case of the class model, Protocols should therefore be declared
# before Processes in order for the Process to be linked to one.
sample_collection_protocol = Protocol(name="sample collection",
protocol_type=OntologyAnnotation(term="sample collection"))
study.protocols.append(sample_collection_protocol)
sample_collection_process = Process(executes_protocol=sample_collection_protocol)
# Next, we link our materials to the Process. In this particular case, we are describing a sample collection
# process that takes one source material, and produces three different samples.
#
# (source_material)->(sample collection)->[(sample_material-0), (sample_material-1), (sample_material-2)]
for src in study.sources:
sample_collection_process.inputs.append(src)
for sam in study.samples:
sample_collection_process.outputs.append(sam)
# Finally, attach the finished Process object to the study process_sequence. This can be done many times to
# describe multiple sample collection events.
study.process_sequence.append(sample_collection_process)
# Next, we build n Assay object and attach two protocols, extraction and sequencing.
assay = Assay(filename="a_assay.txt")
extraction_protocol = Protocol(name='extraction', protocol_type=OntologyAnnotation(term="material extraction"))
study.protocols.append(extraction_protocol)
sequencing_protocol = Protocol(name='sequencing', protocol_type=OntologyAnnotation(term="material sequencing"))
study.protocols.append(sequencing_protocol)
# To build out assay graphs, we enumereate the samples from the study-level, and for each sample we create an
# extraction process and a sequencing process. The extraction process takes as input a sample material, and produces
# an extract material. The sequencing process takes the extract material and produces a data file. This will
# produce three graphs, from sample material through to data, as follows:
#
# (sample_material-0)->(extraction)->(extract-0)->(sequencing)->(sequenced-data-0)
# (sample_material-1)->(extraction)->(extract-1)->(sequencing)->(sequenced-data-1)
# (sample_material-2)->(extraction)->(extract-2)->(sequencing)->(sequenced-data-2)
#
# Note that the extraction processes and sequencing processes are distinctly separate instances, where the three
# graphs are NOT interconnected.
for i, sample in enumerate(study.samples):
# create an extraction process that executes the extraction protocol
extraction_process = Process(executes_protocol=extraction_protocol)
# extraction process takes as input a sample, and produces an extract material as output
extraction_process.inputs.append(sample)
material = Material(name="extract-{}".format(i))
material.type = "Extract Name"
extraction_process.outputs.append(material)
# create a sequencing process that executes the sequencing protocol
sequencing_process = Process(executes_protocol=sequencing_protocol)
sequencing_process.name = "assay-name-{}".format(i)
sequencing_process.inputs.append(extraction_process.outputs[0])
# Sequencing process usually has an output data file
datafile = DataFile(filename="sequenced-data-{}".format(i), label="Raw Data File", generated_from=[sample])
sequencing_process.outputs.append(datafile)
# Ensure Processes are linked forward and backward. plink(from_process, to_process) is a function to set
# these links for you. It is found in the isatools.model package
plink(extraction_process, sequencing_process)
# make sure the extract, data file, and the processes are attached to the assay
assay.samples.append(sample)
assay.data_files.append(datafile)
assay.other_material.append(material)
assay.process_sequence.append(extraction_process)
assay.process_sequence.append(sequencing_process)
assay.measurement_type = OntologyAnnotation(term="gene sequencing")
assay.technology_type = OntologyAnnotation(term="nucleotide sequencing")
# attach the assay to the study
study.assays.append(assay)
import json
from isatools.isajson import ISAJSONEncoder
# To write JSON out, use the ISAJSONEncoder class with the json package and use dump() or dumps()
# Note that the extra parameters sort_keys, indent and separators are to make the output more human-readable.
return json.dumps(investigation, cls=ISAJSONEncoder, sort_keys=True, indent=4, separators=(',', ': '))
if __name__ == '__main__':
print(create_descriptor()) # print the result to stdout
validateISAtab.py¶
An example program using the ISA-Tab validator to validate one or more ISA-Tab archives.
#!/usr/bin/env python
# Inspired by validateSBML.py example from libSBML Python API
from isatools import isatab
import sys
import os
def main(args):
"""usage: validateISAtab.py inputfile1 [inputfile2 ...]
"""
if len(args) < 1:
print(main.__doc__)
sys.exit(1)
numfiles = 0
invalid = 0
skipped = 0
totalerrors = 0
totalwarnings = 0
for i in range(1, len(args)):
print("---------------------------------------------------------------------------")
if not os.path.isfile(args[i]):
print("Cannot open file {}, skipping".format(args[i]))
skipped += 1
numfiles += 1
else:
with open(args[i]) as fp:
report = isatab.validate(fp)
numerrors = len(report['errors'])
numwarnings = len(report['warnings'])
if numerrors > 0:
invalid += 1
print("Validator found {} errors and {} warnings in this ISA-Tab archive".format(numerrors, numwarnings))
totalerrors += numerrors
totalwarnings += numwarnings
numfiles += 1
print("---------------------------------------------------------------------------")
print("Validated {} ISA-Tab archives, {} valid ISA-Tab archives, {} invalid ISA-Tab archives"
.format(numfiles - skipped, numfiles - invalid - skipped, invalid))
print("Found {} errors and {} warnings in across all ISA-Tab archives".format(totalerrors, totalwarnings))
if invalid > 0:
sys.exit(1)
if __name__ == '__main__':
main(sys.argv)
validateISAjson.py¶
An example program using the ISA-JSON validator to validate one or more ISA-JSON files.
#!/usr/bin/env python
# Inspired by validateSBML.py example from libSBML Python API
from isatools import isajson
import sys
import os
def main(args):
"""usage: validateISAjson.py inputfile1 [inputfile2 ...]
"""
if len(args) < 1:
print(main.__doc__)
sys.exit(1)
numfiles = 0
invalid = 0
skipped = 0
totalerrors = 0
totalwarnings = 0
for i in range(1, len(args)):
print("---------------------------------------------------------------------------")
if not os.path.isfile(args[i]):
print("Cannot open file {}, skipping".format(args[i]))
skipped += 1
numfiles += 1
else:
with open(args[i]) as fp:
report = isajson.validate(fp)
numerrors = len(report['errors'])
numwarnings = len(report['warnings'])
if numerrors > 0:
invalid += 1
print("Validator found {} errors and {} warnings in this ISA-JSON file".format(numerrors, numwarnings))
totalerrors += numerrors
totalwarnings += numwarnings
numfiles += 1
print("---------------------------------------------------------------------------")
print("Validated {} ISA-JSONs, {} valid ISA-JSONs, {} invalid ISA-JSONs"
.format(numfiles - skipped, numfiles - invalid - skipped, invalid))
print("Found {} errors and {} warnings in across all ISA-JSONs".format(totalerrors, totalwarnings))
if invalid > 0:
sys.exit(1)
if __name__ == '__main__':
main(sys.argv)
ISA Conversions¶
The ISA API includes a set of functions to allow you to convert between ISA formats, as well as between ISA formats.
These converters can be found in the isatools.convert
package.
Converting from ISA-Tab to ISA JSON¶
To convert from a directory ./tabdir/
containing valid ISA-Tab files (e.g. i_investigation.txt
, s_...txt
and a_...txt
files):
from isatools.convert import isatab2json
isa_json = isatab2json.convert('./tabdir/')
Hint
The conversions by default run the ISA validator to check for correctness of the input content. To skip the validation step, set the validate_first
parameter to False
by doing something like converter.convert('./my/path/', validate_first=False)
.
Hint
The conversions by default use a legacy ISA-Tab parser, which has now been replaced with a faster version. To specify using the new parser, set the use_new_parser
parameter to True
by doing something like isatab2json.convert('./my/path/', use_new_parser=True)
.
Converting from ISA JSON to ISA-Tab¶
To convert from a ISA JSON file isa.json
directory to write out ISA-Tab files to a target directory ./outdir/
:
from isatools.convert import json2isatab
with open('isa.json')) as file_pointer:
json2isatab.convert(file_pointer, './outdir/')
To turn off pre-conversion validation, use validate_first=False. By default it is set to validate_first=True.
The ISA API can also convert to and from other formats for import/export to relevant databases and services. For more on those conversions, please read the sections on Importing data in ISA formats </importdata> and Exporting data in ISA formats </exportdata>.
Downloading files stored in Github¶
The ISA GitHub Adapter class¶
The GitHub API wrapper/adapter may be useful to retrieve ISA datasets (as JSON or Tabs) or configuration files in XML
format. The core class, IsaGitHubStorageAdapter
can be instantiated with or without authorisation.
Usage without authentication¶
If authentication is not required to access the required resource, you can directly instantiate an adapter object and use it.
from isatools.net.storage_adapter import IsaGitHubStorageAdapter
from zipfile import ZipFile
adapter = IsaGitHubStorageAdapter()
adapter.retrieve('tests/data/BII-I-1', 'test_out_dir', owner='ISA-tools', repository='isa-api')
# retrieving a directory (containg either an ISA-tab dataset or a set of configuration files,
# will return a file-like object containg the zipped content of the directory.
buf = adapter.retrieve('tests/data/BII-I-1', destination='test_out_dir', owner='ISA-tools',
repository='isa-api')
# Default owner is "ISA-tools' and default repo is 'isa-api' so they can actually be omitted.
# Default destination directory is 'isa-target'
zip_file = ZipFile(buf)
# get the list of the files retrieved from the directory
zip_file.namelist()
# an ISA JSON dataset is returned as a stardard JSON object
json_obj = adapter.retrieve('isatools/sampledata/BII-I-1.json', destination='test_out_dir',
owner='ISA-tools', repository='isa-api', validate_json=True)
# set write_to_file to False to avoid saving the resource to disk
json_obj = adapter.retrieve('isatools/sampledata/BII-I-1.json', write_to_file=False,
owner='ISA-tools', repository='isa-api', validate_json=True)
# retrieving a single configuration file returns an lxml ElementTree object:
xml_obj = adapter.retrieve('isaconfig-2013222/protein_expression_ge.xml',
repository='Configuration-Files')
# get root element for the configuration file
xml_obj.getroot()
Usage with authentication¶
To access as authenticated user, the recommended way is to instantiate the storage adapter in a with statement.
with IsaGitHubStorageAdapter(username='yourusername', password='yourpw',
note='test_api') as adapter:
adapter.is_authenticated # true
# do stuff...
Otherwise you must explicitly call the close()
method to delete the current authorisation from the GitHub server
adapter = IsaGitHubStorageAdapter(username='yourusername', password='youpw', note='test_api')
adapter.is_authenticated # True
# do stuff...
adapter.close()
Validating ISA-Tab and ISA JSON¶
Using the ISA API you can validate ISA-Tab and ISA JSON files.
Validating ISA-Tab¶
To validate ISA-Tab files in a given directory ./tabdir/
against a given configuration found in a directory ./isaconfig-default_v2015-07-02/
, do something like the following:
from isatools import isatab
my_json_report = isatab.validate(open(os.path.join('./tabdir/', 'i_investigation.txt')), './isaconfig-default_v2015-07-02/')
making sure to point to the investigation file of your ISA-Tab, and providing the XML configurations. The validator will then read the location of your study and assay table files from the investigation file in order to validate those. If no path to XML configurations is provided, the ISA API will automatically select and use the isaconfig-default_v2015-07-02
configurations.
This ISA-Tab validator has been tested against the sample data sets BII-I-1, BII-S-3 and BII-S-7, that are found in the isatools
package.
The validator will return a JSON-formatted report of warnings and errors.
Validating ISA JSON¶
To validate an ISA JSON file against the ISA JSON version 1.0 specification you can use do so by doing this by doing something like:
from isatools import isajson
my_json_report = isajson.validate(open('isa.json'))
The rules we check for in the new validators are documented in this working document in Google spreadsheets. Please be aware as this is a working document, some of these rules may be amended as we get more feedback and evolve the ISA API code.
This ISA JSON validator has been tested against a range of dummy test data found in ISAdatasets
GitHub repository.
The validator will return a JSON-formatted report of warnings and errors.
Batch validation of ISA-Tab and ISA-JSON¶
To validate a batch of ISA-Tabs or ISA-JSONs, you can use the batch_validate()
function.
To validate a batch of ISA-Tabs, you can do something like:
from isatools import isatab
my_tabs = [
'/path/to/study1/',
'/path/to/study2/'
]
my_json_report = isatab.batch_validate(my_tabs, '/path/to/report.txt')
To validate a batch of ISA-JSONs, you can do something like
from isatools import isajson
my_jsons = [
'/path/to/study1.json',
'/path/to/study2.json'
]
my_json_report = isajson.batch_validate(my_jsons, '/path/to/report.txt')
In both cases, the batch validation will return a JSON-formatted report of warnings and errors.
Reformatting JSON reports¶
The JSON reports produced by the validators can be reformatted using a function found in the isatools.utils
module.
For example, to write out the report as a CSV textfile to report.txt
, you can do something like:
from isatools import utils
with open('report.txt', 'w') as report_file:
report_file.write(utils.format_report_csv(my_json_report))
Importing data into ISA formats¶
We have provided a number of modules that allow you to import data into ISA formats from well-known databases or
services. Imports from these services is supported by modules found in the isatools.net
package:
Importing from the MetaboLights database, to ISA¶
To import an MetaboLights study from the MetaboLights as ISA-Tab files, provide an MetaboLights accession number:
from isatools.net import mtbls as MTBLS
tmp_dir = MTBLS.get('MTBLS1')
This method downloads the ISA-Tab files for a study, and returns a string path to a temporary directory containing the ISA-Tab files.
To import an MetaboLights study from the MetaboLights as ISA JSON files, provide an MetaboLights accession number:
from isatools.net import mtbls as MTBLS
isa_json = MTBLS.getj('MTBLS1')
This method gets the study and returns the ISA content as ISA JSON.
You can also do simple queries on MetaboLights studies to retrieve samples and related data files, based on factor selection:
from isatools.net import mtbls as MTBLS
MTBLS.get_factor_names('MTBLS1')
# response:
# {'Gender', 'Age'}
MTBLS.get_factor_values('MTBLS1', 'Gender')
# response:
# {'Male', 'Female'}
query = {
"Gender": "Male"
}
samples_and_files = MTBLS.get_data_files('MTBLS1', factor_query=query)
# response:
# [
# {
# 'sample': 'ADG10003u_007'},
# 'data_files': ['ADG10003u_007.zip'],
# 'query_used': {'Gender': 'Male'}
# }, ...
# ]
Importing SRA from the European Nucleotide Archive, to ISA-Tab¶
Notice: this method depends on SAXON XSLT Processor
To import an SRA study from the European Nucleotide Archive (ENA) as ISA-Tab files, provide an ENA accession number and your path to the SAXON JAR file:
from isatools.net import sra2isatab
sra2isatab.sra_to_isatab_batch_convert('BN000001', 'your/path/to/saxon9.jar')
This method returns the ISA-Tab files as a byte stream (io.BytesIO
).
Importing from MetabolomicsWorkbench, to ISA-Tab¶
To import a study from the Metabolomics Workbench as ISA-Tab files, provide an accession number and your local path to write your files to:
from isatools.net.mw2isa import mw2isa_convert
success, study_id, validate = mw2isa_convert(studyid="ST000367", outputdir='tmp/', dl_option="no", validate_option="yes")
# If success == True, download and conversion ran OK. If validate == True, the ISA-Tabs generated passed validation
See isa-api/isatools/convert/mw2isa.py
Importing from Biocrates, to ISA-Tab¶
Notice: this method depends on SAXON XSLT Processor
See isa-api/isatools/net/biocrates2isatab.py
Importing from third-party formats is supported with our conversion modules found in the isatools.convert
package:
Importing mzML to ISA-Tab¶
To import metadata from mzML mass spectrometry files, the ISA API integrates with the mzml2isa
tool from https://github.com/ISA-tools/mzml2isa and can be run as follows:
from isatools.convert import mzml2isa
mzml2isa.convert('your/path/to/mzml/files/', 'tmp/', "My Study ID")
Importing SampleTab to ISA¶
To import metadata from SampleTab files (e.g. from EBI BioSamples database), you can do the following to import a SampleTab to ISA-Tab:
from isatools.convert import sampletab2isatab
with open('your/path/to/sampletab.txt', 'r') as input_sampletab:
sampletab2isatab.convert(input_sampletab, 'tmp/')
To import a SampleTab to ISA JSON, you can do:
from isatools.convert import sampletab2json
with open('your/path/to/sampletab.txt', 'r') as input_sampletab:
with open('your/path/to/myjson.json', 'w') as output_json:'
sampletab2json.convert(input_sampletab, output_json)
You can also load SampleTab content directly into ISA Python objects:
from isatools import sampletab
with open('your/path/to/sampletab.txt', 'r') as input_sampletab:
ISA = sampletab.load(input_sampletab)
Importing MAGE-TAB to ISA¶
To import metadata from MAGE-TAB files (e.g. from EBI ArrayExpress database), you can do the following to import a MAGE-TAB to ISA-Tab:
from isatools.convert import magetab2isatab
with open('your/path/to/magetab.idf.txt', 'r') as input_magetab:
magetab2isatab.convert(input_sampletab, 'tmp/')
To import a MAGE-TAB to ISA JSON, you can do:
from isatools.convert import sampletab2json
with open('your/path/to/magetab.idf.txt', 'r') as input_sampletab:
with open('your/path/to/myjson.json', 'w') as output_json:'
magetab2json.convert(input_magetab, output_json)
Exporting data from ISA formats¶
We have provided a number of modules that allow you to export data from ISA formats to formats ready for consumption by
well-known databases or services in the following conversion modules found in the isatools.convert
package:
Converting from ISA-Tab to SRA XML¶
To convert from a directory ./tabdir/
containing valid ISA-Tab files to a write the SRA XML files to a target directory ./outdir/
, validating against a given configuration in ./isaconfig-default_v2015-07-02/
:
from isatools.convert import isatab2sra
sra_settings={
"sra_broker": "MYORG",
"sra_center": "MYORG",
"sra_project": "MYORG",
"sra_broker_inform_on_status": "support@myorg.org",
"sra_broker_inform_on_error": "support@myorg.org",
"sra_broker_contact_name": "Support"
}
isatab2sra.convert('./tabdir/', './outdir/', sra_settings=sra_settings)
This method writes the SRA files out to ./outdir/
.
Note that when subitting SRA XML to ENA, you need to supply broker information as shown above in the sra_settings
JSON, customised to your own organisation’s settings.
Converting from ISA JSON file to SRA XML¶
To convert from a a ISA JSON file isa.json
directory to write out SRA XML files to a target directory ./outdir/
:
sra_settings={
"sra_broker": "MYORG",
"sra_center": "MYORG",
"sra_project": "MYORG",
"sra_broker_inform_on_status": "support@myorg.org",
"sra_broker_inform_on_error": "support@myorg.org",
"sra_broker_contact_name": "Support"
}
from isatools.convert import json2sra
json2sra.convert(open('isa.json')), './outdir/', sra_settings=sra_settings)
This method writes the SRA files out to ./outdir/
.
Note that when subitting SRA XML to ENA, you need to supply broker information as shown above in the sra_settings
JSON, customised to your own organisation’s settings.
To turn off pre-conversion validation, use validate_first=False
. By default it is set to validate_first=True
.
Converting ISA content to SampleTab¶
To export metadata from SampleTab files (e.g. for EBI BioSamples database), you can do the following to export a ISA-Tab to SampleTab:
from isatools.convert import isatab2sampletab
with open('your/path/to/i_investigation.txt', 'r') as input_investigation_file:
with open('your/path/to/sampletab.txt', 'w') as output_sampletab_file:
isatab2sampletab.convert(input_investigation_file, output_sampletab_file)
To export an ISA JSON file to SampleTab, you can do:
from isatools.convert import isatab2sampletab
with open('your/path/to/i_investigation.txt', 'r') as input_investigation_file:
with open('your/path/to/sampletab.txt', 'w') as output_sampletab_file:
isatab2sampletab.convert(input_investigation_file, output_sampletab_file)
You can also dump SampleTab content directly from ISA Python objects:
from isatools import sampletab
with open('your/path/to/sampletab.txt', 'w') as output_sampletab:
# Note: ISA would be a previously loaded or constructed root Investigation object
sampletab.dump(ISA, output_sampletab)
Exporting ISA content to MAGE-TAB¶
To export metadata to MAGE-TAB files (e.g. for EBI ArrayExpress database), you can do the following to export a ISA-Tab to MAGE-TAB:
from isatools.convert import isatab2magetab
with open('your/path/to/i_investigation.txt', 'r') as input_investigation_file:
isatab2magetab.convert(input_investigation_file, 'your/output/path/')
To export an ISA JSON file to SampleTab, you can do:
from isatools.convert import json2magetab
with open('your/path/to/i.json', 'r') as input_isajson_file:
json2magetab.convert(input_isajson_file, 'your/output/path/')
Creating ISA content with a Sample and Assay plan¶
The ISA API provides a set of classes that you can use to plan a study, in terms of sample collection and assay run patterns. Objects of these classes can then be used to dump ISA-Tab or ISA-JSON templates corresponding to your study design. They are found in the isatools.create.models
module.
Take a look at the isatools-notebooks GitHub repository for Jupyter Notebook examples of using the planning objects.
Known issues¶
isatools v0.9 package¶
- Various API packages have been refactored and moved.
isatools.model.v1
is now inisatools.model
. Packagesisatools.validate
,isatools.config
,isatools.schemas
have been removed. Some items fromisatools.convert
andisatools.io
have been split into theisatools.io
orisatools.net
packages, where the latter has been created to distinguish parts of the ISA API that require network access - Issue #153 is still outstanding since version 0.4, as per below; new issue #235 where missing `Protocol REF`s are sometimes incorrectly inserted on loading ISA-Tab
- New ISA creation functionality from planning objects in
isatools.create.models
currently only support a limited number of technology types (DNA microarray, DNA sequencing, mass spectrometry, and NMR) - SRA/ENA importer and Biocrates importer relies on XSLT2 processing only available with SAXON and requires .jar file to run
- It has been decided to stop work on retrofitting support for Python 2.7 and focus on keeping up to date with the current and future Python versions
isatools v0.8 package¶
- Issues #153 is still outstanding, as per below; new issue #208 (ISA-Tab validation issue)
- SRA/ENA importer and Biocrates importer relies on XSLT2 processing only available with SAXON and requires .jar file to run
isatools v0.7 package¶
- Issues #101, #153 are still outstanding, as per below
- SRA/ENA importer and Biocrates importer relies on XSLT2 processing only available with SAXON and requires .jar file to run
isatools v0.6 package¶
- Issues #146, #101, #153 are still outstanding, as per below
- SRA/ENA importer and Biocrates importer relies on XSLT2 processing only available with SAXON and requires .jar file to run
- We are aware that there may be some performance issues when loading and writing ISA-Tab documents with several thousand samples. This has been worked on and there is ongoing efforts to improve on the current performance of ISA-Tab I/O
isatools v0.5 package¶
- All issues inherited from v0.4 (see below)
- Currently only Python 3.4 and 3.5 is supported. Python 2.7 support is present in the
py2
source branch on Github.
isatools v0.4 package¶
- For certain of ISA-Tab table files, the ISA-Tab parser cannot disambiguate between process instances where a
Name
column is required to qualify aProtocol REF
has been left blank. Utility functions have been written to detect these anomalies and to assist in correcting them, in theisatools.utils
package. #146 (see detail after bullet points) - When converting to ISA JSON in using
UUID
orcounter
Identifier Types, some elements are not detected, such asArray_Design_REF
#101 - The ISA-Tab parser does not support reading
Protein Assignment File
,Peptide Assignment File
,Post Translational Modification Assignment File
columns, and therefore theisatab2*
converters also do not support these #174 - The SRA/ENA importer in
sra2isatab
relies on XSLT2 processing functionality only available with SAXON, so you must provide the JAR file yourself to use this sra2isatab
converter does not support SRA pools #153- The legacy functionality (marked in the documentation) relies on Java 1.6
To check for possible erroneous pooling events in an ISA-Tab archive, you can try something like:
>>> from isatools import utils
>>> utils.detect_isatab_process_pooling('tests/data/tab/MTBLS1/')
INFO: Converting ISA-Tab to ISA JSON...
INFO: Converting ISAtab to ISAjson for tests/data/tab/MTBLS1/
INFO: ... conversion finished.
Checking s_MTBLS1.txt
Checking a_mtbls1_metabolite_profiling_NMR_spectroscopy.txt
Possible process pooling detected on: #process/Extraction1
Possible process pooling detected on: #process/ADG_normalized_data.xlsx
[{'a_mtbls1_metabolite_profiling_NMR_spectroscopy.txt': ['#process/Extraction1', '#process/ADG_normalized_data.xlsx']}]
>>>
In this case, #process/Extraction1
is the pooling that we did not expect. This is a pooling on a single
Extraction
. From manual inspection of the ISA-Tab file a_mtbls1_metabolite_profiling_NMR_spectroscopy.txt
we can
- then confirm that values are entirely missing from
Extract Name
, causing the parser to think the experimental graph- converges on one process node. To rectify this, individual values should be put into this
Name
column. We can try
fix erroneous pooling by filling out an empty Name
column with a corresponding Protocol REF
by doing the following:
>>> utils.insert_distinct_parameter(open('tests/data/tab/MTBLS1/a_mtbls1_metabolite_profiling_NMR_spectroscopy.txt', 'r+'), 'Extraction')
Are you sure you want to add a column of hash values in Extract Name? Y/(N)
>? Y
If successful, this will fill out the empty column with 8 character-long UUIDs (e.g. 4078cb03).
Please be aware that these utility functions detect_isatab_process_pooling()
and insert_distinct_parameter()
are
there to help you manually fix your ISA-Tabs, not to automatically fix them for you. We wil address this issue in more
depth in following releases.
isatools v0.3 package¶
required
constraints on JSON schemas causes validation failure for@id
objects, meaning some constraints using JSON schemas cannot be used for validation #108- Chained processes (i.e. a process followed by another process without any intermediate inputs and outputs, in ISAtab a
Protocol REF
columns followed by anotherProtocol REF
columns without any materials in between) are not currently supported. It is not recommended to attempt to use such patterns with this version of theisatools
package #111 - When converting to ISA JSON in using
UUID
orcounter
Identifier Types, some elements are not detected, such asArray_Design_REF
#101 - The SRA/ENA importer in
sra2isatab
relies on XSLT2 processing functionality only available with SAXON, so you must provide the JAR file yourself to use this - The legacy functionality (marked in the documentation) relies on Java 1.6
isatools v0.2 package¶
required
constraints on JSON schemas causes validation failure for@id
objects, meaning some constraints using JSON schemas cannot be used for validation #108- When converting to ISA JSON in using
UUID
orcounter
Identifier Types, some elements are not detected, such asArray_Design_REF
#101 Protocol REF
columns must be present in order for the ISA-Tab to JSON conversion to pick up processes in the process sequences #111- Characteristics and Factor Values declared in assay tables in ISAtab are associated to Sample objects only. This means that when writing from Python objects, or converting from ISA JSON, to ISAtab these columns appear at the study table.
- Chained processes (i.e. a process followed by another process without any intermediate inputs and outputs, in ISAtab a
Protocol REF
columns followed by anotherProtocol REF
columns without any materials in between) are not currently supported. It is not recommended to attempt to use such patterns with this version of theisatools
package #111 - For experimental graph patterns to work, should follow relatively simple patterns. e.g. Straight Sample -> … -> Materials -> … -> Data paths (per assay), or simple splitting and pooling. See test package code for examples of what works.
- No ISA JSON configurations have been included that correspond with the following default XML configurations:
clinical_chemistry.xml
and most are as yet untested.
isatools v0.1 package¶
- Characteristics and Factor Values declared in assay tables in ISAtab are associated to Sample objects only. This means that when writing from Python objects, or converting from ISA JSON, to ISAtab these columns appear at the study table.
- Chained processes (i.e. a process followed by another process without any intermediate inputs and outputs, in ISAtab a
Protocol REF
columns followed by anotherProtocol REF
columns without any materials in between) are not currently supported. It is not recommended to attempt to use such patterns with this version of theisatools
package #111 - For experimental graph patterns to work, should follow relatively simple patterns. e.g. Straight Sample -> … -> Materials -> … -> Data paths (per assay), or simple splitting and pooling. See test package code for examples of what works.
For a full up-to-date list of issues, or to report an issue or ask a question, please see the issue tracker.