Tree iSAX
- class pyCFOFiSAX._tree_iSAX.TreeISAX(size_word, threshold, data_ts, base_cardinality=2, max_card_alphabet=128, boolean_card_max=True)
- La classe TreeISAX contenant
ISAX distribution of sequences to index
and towards the first root node
Avertissement
In this version, the data_ts are mandatory to define in advance the future breakpoints
- Paramètres
size_word (int) – The number of Sax discretization for each sequence
threshold (int) – The maximum capacity of the nodes of the tree
data_ts (numpy.ndarray) – Sequence array to be inserted
base_cardinality (int) – The smallest cardinality for encoding iSAX
max_card_alphabet (int) – if self.boolean_card_max == True, Max cardinality for encoding iSAX
- Variables
size_word (int) – Number of letters contained in the SAX words indexed in the tree
threshold (int) – Threshold before the separation of a sheet into two leaf nodes
- _minmax_nodes()
Returns the breakpoints of the nodes of the tree. Uses
_do_bkpt()
.- Renvoie
The min and max breakpoints of the nodes of the tree
- Type renvoyé
numpy.ndarray
- _minmax_obj_vs_node(ntss_tmp, bool_print: bool = False)
Compute distance min and max between the sequences
ntss_tmp
and the nodes of the tree.- Paramètres
ntss_tmp (numpy.ndarray) – Reference sequences
bool_print (boolean) – if True, Displays the times of each preprocessing step
- Renvoie
Minimum distances between sequences and nodes, Maximum distances between sequences and nodes
- Type renvoyé
numpy.ndarray, numpy.ndarray
- _minmax_obj_vs_nodeleaf()
Computes the min and max distances between the
ntss_tmp
sequences and the leaf nodes of the tree.Avertissement
Attention must be executed after
_minmax_obj_vs_node()
anddistrib_nn_for_cdf()
.
- count_nodes_by_tree()
The COUNT_NODES_BY_TREE function returns the number of nodes and leaf nodes of the shaft. Uses
get_number_internal_and_terminal()
.- Renvoie
the number of internal nodes, the number of leaves nodes
- Type renvoyé
int, int
- distrib_nn_for_cdf(ntss_tmp, bool_print: bool = False)
Calculates the two indicators, average and standard deviation of the distances, necessary for the use of the CDF of the normal distribution. The computation of these indicators are described in Scoring Message Stream Anomalies in Railway Communication Systems, L.Foulon et al., 2019, ICDMWorkshop.
- Paramètres
ntss_tmp (numpy.ndarray) – Reference sequences
bool_print (boolean) – and True, Displays the nodes stats on the standard output
- Renvoie
- Type renvoyé
list(numpy.ndarray, numpy.array)
- get_level_max()
Function to return the max level considering root level = 0
- Renvoie
The max depth level
- Type renvoyé
int
- get_list_nodes_and_barycentre()
Returns Lists of Nodes and centroids
- Renvoie
List of nodes, List of Leaf Nodes, List of Leaf centroids
- Type renvoyé
list, list, list
- get_list_nodes_leaf()
Returns List of Leaves Nodes
- Renvoie
List of leaves nodes
- Type renvoyé
list
- get_nodes_of_level(level: int)
Function to return the nodes of a level, considering root level = 0
- Paramètres
level (int) – The level of the tree to evaluate
- Renvoie
The nodes of the ith level of the three
- Type renvoyé
list
- get_number_internal_and_terminal()
Function to return the number of leaf nodes and internal nodes
- Renvoie
the number of internal nodes, the number of leaves nodes
- Type renvoyé
int, int
- get_size()
Function to return the memory size of the tree, nodes and sequences contained in the tree
- Renvoie
Total memory size, nodes” memory size, memory size of the sequences
- Type renvoyé
int, int, int
- get_size_and_width_and_number_types_nodes()
- Feature grouping:
get_size()
get_width_of_all_level()
get_number_internal_and_terminal()
- Renvoie
Total memory size, memory size of nodes, memory size of the sequences, the number of nodes on each level, the number of internal nodes, the number of sheet nodes, and the number of sequence inserted in the tree
- Type renvoyé
int, int, int, list, int, int, int
- get_width_of_all_level()
Function to return the width of all levels in a list, considering root level = 0
- Renvoie
The number of node on each level of the tree
- Type renvoyé
list
- get_width_of_level(level: int)
Return the width of a level, considering root level = 0
- Renvoie
the number of node on the level of the tree
- Type renvoyé
int
- insert(new_sequence)
This insert function convert new sequence in PAA values then call the function insert_paa
- Paramètres
new_sequence (numpy.array) – The new sequence to be inserted
- insert_paa(new_paa)
The insert function that directly calls the function of its root node
- Paramètres
new_paa (numpy.array) – The new sequence to be inserted
- number_nodes_visited(sub_query: numpy.array, ntss_tmp: numpy.ndarray)
Account the number of average visited nodes in the tree for calculating the approximation.
- Paramètres
sub_query (numpy.array) – The sequence to be evaluated
ntss_tmp (numpy.ndarray) – Reference sequences
- Renvoie
Returns the number of nodes visited in the tree for the approximation iCFOF
- Type renvoyé
numpy.array
- preprocessing_for_icfof(ntss_tmp, bool_print: bool = False, count_num_node: bool = False)
- Allows us to appeal, for the id_tree tree, to the two methods of preprocessing:
_minmax_obj_vs_node()
,distrib_nn_for_cdf()
.
- Paramètres
ntss_tmp – Reference sequences
bool_print (boolean) – if True, Displays the times of each preprocessing step
count_num_node (boolean) – if True, count the number of nodes
- Renvoie
if
count_num_node
True, Returns the number of nodes in the tree- Rtypes
int
- vrang_list(sub_query: numpy.array, ntss_tmp: numpy.ndarray)
Get the vrang list for the
sub_query
sequence in the tree. Necessary for the calculation of the approximation. The same method faster but without the tree course:vrang_list_faster()
.- Paramètres
sub_query – The sequence to be evaluated
ntss_tmp – Reference sequences (IE. Reference history)
- Renvoie
The vrang list of``sub_query``
- Type renvoyé
list(float)
- vrang_list_faster(sub_query: numpy.array, ntss_tmp: numpy.ndarray)
Get the vrang list for the
sub_query
sequence in the tree. Necessary for the calculation of the approximation. This method is the fast version of the methodvrang_list()
.Note
This method does not travel the tree, but directly prunes the leaves nodes. Preserved (uncut) leaves will be used by the approximation function.
- Paramètres
sub_query – The sequence to be evaluated
ntss_tmp – Reference sequences (IE. Reference history) in PAA format
- Renvoie
The vrang list of``sub_query``
- Type renvoyé
list(float)
Fonctions Numba pour Tree iSAX
Pour l’obtention du vrang
- pyCFOFiSAX._tree_iSAX.vrang_seq_ref(distance, max_array, min_array, cdf_mean, cdf_std, num_ts_by_node, index_cdf_bin, cdf_bins)
Calculates the vrang from the distance between the sequence to be evaluated and the reference sequence.
- Paramètres
distance (float) – The distance between the two sequences
max_array (np_array) – Max distances between the nodes of the tree and the reference sequence
min_array (np_array) – MIN distances between the nodes of the tree and the reference sequence
cdf_mean (np_array) – The average distances between the nodes of the tree and the reference sequence
cdf_std (np_array) – Dispersion of distances in each leaf node
num_ts_by_node (np_array) – The number of sequence in each node sheet
index_cdf_bin (np_array) – index of the
cdf_bins
CDFcdf_bins (np_array) – Normal distribution cdf values centered at the origin and standard deviation
- Renvoie
le vrang
- Type renvoyé
int
- pyCFOFiSAX._tree_iSAX.vrang_list_for_all_seq_ref(len_seq_list, distance, max_array, min_array, cdf_mean, cdf_std, num_ts_by_node, index_cdf_bin, cdf_bins)
Uses the function
vrang_seq_ref()
For each reference sequence.- Paramètres
len_seq_list (float) – The number of reference sequence
distance (np_array) – The distance between the two sequences
max_array (np_ndarray) – Max distances between the nodes of the tree and the reference sequence
min_array (np_ndarray) – MIN distances between the nodes of the tree and the reference sequence
cdf_mean (np_ndarray) – The average distances between the nodes of the tree and the reference sequence
cdf_std (np_array) – Dispersion of distances in each leaf node
num_ts_by_node (np_array) – The number of sequence in each node sheet
index_cdf_bin (np_array) – The index of the CDF
cdf_bins
cdf_bins (np_array) – Normal distribution cdf values centered at the origin and standard deviation
- Renvoie
la liste des vrang
- Type renvoyé
np_array
Pour compter les nœuds visités
- pyCFOFiSAX._tree_iSAX.nodes_visited_for_seq_ref(distance, max_array, min_array, list_parent_node)
- pyCFOFiSAX._tree_iSAX.nodes_visited_for_all_seq_ref(len_seq_list, distance, max_array, min_array, list_parent_node)