2. Advanced uses of Pear¶
Here we introduce some more elaborated ways we can use pear to perform in-depth analyses and produce effective representations of the embedded distance matrices.
pear.toml¶
Pear will automatically look for a pear.toml
file in the working directory, alternatively, a .toml
file can be specified using the --config
flag. A .toml
file is just a convenient way of specifying many parameters. Doing so in an auxiliary file remove excessive clutter in the use of pear and promotes a more standardized way of performing a series of analyses.
We will guide you through the use of this additional tool.
!cat template_pear.toml
Here you can see the pear.toml
example stored in the same directory of this notebook. It is a perfect template for your future analyses!
Let's go thorugh all its parts:
-
[trees]
contains single file specifications. Each line associated with this key should direct pear to a file containing trees in Newick format. The nomenclature is "file\(n\)=filename", where \(n\) is just the index of the file, whereas filename is the path to the file itself. -
[dir]
contains directory and pattern specifications. Each directory should contain only tree-containing files, and should be indicated with a . Alternatively, apattern
can be indicated to narrow the research of the files. -
[collection]
stores details related to thetree_set
orset_collection
:-
output_file
specifies an alternative name and path for the distance matrix file; -
distance_matrix
indicates the path of a precomputed distance matrix; -
metadata
indicates a.csv
file containing metadata compatible with the collection. That means that the number of rows in the file should be equal to the number of trees in the collection. The information stored in metadata can be of any type (discrete or continuous) and can subsequently used in the representation of your data in the 3D embedding instead of the 3\(^{rd}\) dimension, or to color the points (trees).
-
-
[highlight]
allows for specifying specific trees in the set/collection which are going to highlighted in the final plots. The way one specifies this is by giving a list of indexes indicating which trees to be highlighted for a given set (either if that is part of a collection or not). You specify a list for a set by writing "file\(n\)"if the file has been indexed as such in the[files]
argument. Otherwise by using the name of the file (without extension: filename.trees is just filename) if the file has been specified through[dir]
selection.
-
[distance]
specifies themethod
used to compute the distance matrix. It can be chosen amonghashrf_RF
,hashrf_wRF
,smart_RF
,tqdist_quartet
,tqdist_triplet
. -
[embedding]
specifies themethod
used to compute the embedding of the distance matrix, thedimensions
of the embedding, and whether to display thequality
or not. Methods arepcoa
,tsne
,isomap
,lle
. -
[plot]
defines some aspects of the plots produced by pear:name_plot
specifies the name of the plot produced;plot_meta
indicates which feature to use to color the points in the graph, default value isSET-ID
which simply colors bytree_set
. ASTEP
meta-variable is present and indicates the index of a tree in atree_set
, it can be used to color trees when the ordering is important. Other meta-variables can be specified through the[metadata]
argument.-
z_axis
, similarly toplot_meta
, indicates an alternative meta-variable to be used in the plots. The selected meta-variable replaces the 3\(^{rd}\) dimension in the 3D graphs. select
indicates whether the graph should have a set of interactive buttons to display/hide specifictree_set
s or not.same_scale
indicates whether the same colorscale should be applied to everytree_set
or not.show
specifies whether the plot should be shown or not.
.toml
structure and the flags in pear, the arguments will be overscribed by the ones indicated on the command line. On an additional note related to this, the flag--meta
allows to specify on the command line a metadata file, replicating the behaviour of themetadata
argument in[collection]
.
Examples¶
Example 1¶
We use pear to analyze 3 runs of a MCMC algorithm, called Beast, used to estimate a phylogenetic tree structure.
We upload 2 files using the[trees]
argument and 1 by specifying the[dir]
and apattern
.
We compute the Robinson Foulds distances using hashrf.
We embed the distances in a 3-dimensional space using PCoA, and we plot the results colouring by STEP and highlighting some specific trees for each set. We also display some buttons to hide/show the sets in the plots, we use different colorscales (the default behaviour), and we show the results.!cat example_1.toml
!pear_ebi --config example_1.toml
[34mPEAR v0.[0m[1;34m1.85[0m [37mLooking into directory [0m[35m..[0m[35m/beast_trees/[0m[35m [0m[37m- pattern: [0m[35m*run2*[0m [95mYour input:[0m ───────────────────────────── Tree set collection containing [1;36m3003[0m trees; File: Set_collection_[93m72c842e8-e1ac-4a90-8fbc-535c4b10ef92[0m; Distance matrix: not computed. ───────────────────────────── beast_run1; Containing [1;36m1001[0m trees. beast_long; Containing [1;36m1001[0m trees. beast_run2; Containing [1;36m1001[0m trees. [2K[32m⠸[0m [1;32mCalculating distances...[0m0m [1A[2K[1;34mhashrf_RF | Done![0m [2K[32m⠋[0m [1;32mEmbedding distances...[0m0m [1A[2K[1;34mpcoa | Done![0m [93m- Leaving PEAR -[0m
Example 1 Continued¶
We now show that we can easily modify a single parameter without touching the
.toml
file by simply overriding it on the command line.
As an example, we change the embedding method to tSNE.!pear_ebi --config example_1.toml --tsne 3
[34mPEAR v0.[0m[1;34m1.85[0m [37mLooking into directory [0m[35m..[0m[35m/beast_trees/[0m[35m [0m[37m- pattern: [0m[35m*run2*[0m [95mYour input:[0m ───────────────────────────── Tree set collection containing [1;36m3003[0m trees; File: Set_collection_[93m9a55b5c8-53b6-40c3-b4c7-55df883b3258[0m; Distance matrix: not computed. ───────────────────────────── beast_run1; Containing [1;36m1001[0m trees. beast_long; Containing [1;36m1001[0m trees. beast_run2; Containing [1;36m1001[0m trees. [2K[32m⠼[0m [1;32mCalculating distances...[0m0m [1A[2K[1;34mhashrf_RF | Done![0m [2K[32m⠹[0m [1;32mEmbedding distances...[0m0m [1A[2K[1;34mtsne | Done![0m [93m- Leaving PEAR -[0m
Example 2¶
We use pear to compare 5 different algorithms used to estimate a phylogenetic tree structure with the real structure (we know it as we use simulated data).
We upload 6 files using the[trees]
argument.
We compute the Robinson Foulds distances using hashrf.
We embed the distances in a 3-dimensional space using PCoA, and we plot the results colouring by the Likelihood values obtained during the runs. We highlight the true tree structure.
We substitute the 3\(^{rd}\) dimension with the Likelihood scores, and we use the same colorscale to represent the likelihood of the proposed structures.!cat example_2.toml
!pear_ebi --config example_2.toml
[34mPEAR v0.[0m[1;34m1.85[0m [95mYour input:[0m ───────────────────────────── Tree set collection containing [1;36m138[0m trees; File: Set_collection_[93mec4084c4-a5a7-4878-abcc-93c634208b6e[0m; Distance matrix: computed. ───────────────────────────── IQtreeStartingTree_Trees; Containing [1;36m29[0m trees. MapleStartingTree_Trees; Containing [1;36m5[0m trees. ParsimonyRAxMLStartingTree_GTRmodel_Trees; Containing [1;36m47[0m trees. RAxMLNGStartingTree_Trees; Containing [1;36m26[0m trees. UshERStartingTree_Trees; Containing [1;36m30[0m trees. TrueTreeSimulations; Containing [1;36m1[0m trees. [2K[32m⠋[0m [1;32mEmbedding distances...[0m0m [1A[2K[1;34mpcoa | Done![0m [93m- Leaving PEAR -[0m
Example 2 Continued¶
We now show that we can easily run the same analyses in a very neat way by simply renaming our
.toml
file by simply renaming itpear.toml
.!pear_ebi
[34mPEAR v0.[0m[1;34m1.85[0m [95mYour input:[0m ───────────────────────────── Tree set collection containing [1;36m138[0m trees; File: Set_collection_[93mb87402e6-2e41-4902-8fd2-092a3f6c230a[0m; Distance matrix: computed. ───────────────────────────── IQtreeStartingTree_Trees; Containing [1;36m29[0m trees. MapleStartingTree_Trees; Containing [1;36m5[0m trees. ParsimonyRAxMLStartingTree_GTRmodel_Trees; Containing [1;36m47[0m trees. RAxMLNGStartingTree_Trees; Containing [1;36m26[0m trees. UshERStartingTree_Trees; Containing [1;36m30[0m trees. TrueTreeSimulations; Containing [1;36m1[0m trees. [2K[32m⠋[0m [1;32mEmbedding distances...[0m0m [1A[2K[1;34mpcoa | Done![0m [93m- Leaving PEAR -[0m
Last update: 2024-04-29