Tree iSAX

class pyCFOFiSAX._tree_iSAX.TreeISAX(size_word, threshold, data_ts, base_cardinality=2, max_card_alphabet=128, boolean_card_max=True)
La classe TreeISAX contenant
  • ISAX distribution of sequences to index

  • and towards the first root node

Avertissement

In this version, the data_ts are mandatory to define in advance the future breakpoints

Paramètres
  • size_word (int) – The number of Sax discretization for each sequence

  • threshold (int) – The maximum capacity of the nodes of the tree

  • data_ts (numpy.ndarray) – Sequence array to be inserted

  • base_cardinality (int) – The smallest cardinality for encoding iSAX

  • max_card_alphabet (int) – if self.boolean_card_max == True, Max cardinality for encoding iSAX

Variables
  • size_word (int) – Number of letters contained in the SAX words indexed in the tree

  • threshold (int) – Threshold before the separation of a sheet into two leaf nodes

_minmax_nodes()

Returns the breakpoints of the nodes of the tree. Uses _do_bkpt().

Renvoie

The min and max breakpoints of the nodes of the tree

Type renvoyé

numpy.ndarray

_minmax_obj_vs_node(ntss_tmp, bool_print: bool = False)

Compute distance min and max between the sequences ntss_tmp and the nodes of the tree.

Paramètres
  • ntss_tmp (numpy.ndarray) – Reference sequences

  • bool_print (boolean) – if True, Displays the times of each preprocessing step

Renvoie

Minimum distances between sequences and nodes, Maximum distances between sequences and nodes

Type renvoyé

numpy.ndarray, numpy.ndarray

_minmax_obj_vs_nodeleaf()

Computes the min and max distances between the ntss_tmp sequences and the leaf nodes of the tree.

Avertissement

Attention must be executed after _minmax_obj_vs_node() and distrib_nn_for_cdf().

count_nodes_by_tree()

The COUNT_NODES_BY_TREE function returns the number of nodes and leaf nodes of the shaft. Uses get_number_internal_and_terminal().

Renvoie

the number of internal nodes, the number of leaves nodes

Type renvoyé

int, int

distrib_nn_for_cdf(ntss_tmp, bool_print: bool = False)

Calculates the two indicators, average and standard deviation of the distances, necessary for the use of the CDF of the normal distribution. The computation of these indicators are described in Scoring Message Stream Anomalies in Railway Communication Systems, L.Foulon et al., 2019, ICDMWorkshop.

Paramètres
  • ntss_tmp (numpy.ndarray) – Reference sequences

  • bool_print (boolean) – and True, Displays the nodes stats on the standard output

Renvoie

Type renvoyé

list(numpy.ndarray, numpy.array)

get_level_max()

Function to return the max level considering root level = 0

Renvoie

The max depth level

Type renvoyé

int

get_list_nodes_and_barycentre()

Returns Lists of Nodes and centroids

Renvoie

List of nodes, List of Leaf Nodes, List of Leaf centroids

Type renvoyé

list, list, list

get_list_nodes_leaf()

Returns List of Leaves Nodes

Renvoie

List of leaves nodes

Type renvoyé

list

get_nodes_of_level(level: int)

Function to return the nodes of a level, considering root level = 0

Paramètres

level (int) – The level of the tree to evaluate

Renvoie

The nodes of the ith level of the three

Type renvoyé

list

get_number_internal_and_terminal()

Function to return the number of leaf nodes and internal nodes

Renvoie

the number of internal nodes, the number of leaves nodes

Type renvoyé

int, int

get_size()

Function to return the memory size of the tree, nodes and sequences contained in the tree

Renvoie

Total memory size, nodes” memory size, memory size of the sequences

Type renvoyé

int, int, int

get_size_and_width_and_number_types_nodes()
Feature grouping:
  • get_size()

  • get_width_of_all_level()

  • get_number_internal_and_terminal()

Renvoie

Total memory size, memory size of nodes, memory size of the sequences, the number of nodes on each level, the number of internal nodes, the number of sheet nodes, and the number of sequence inserted in the tree

Type renvoyé

int, int, int, list, int, int, int

get_width_of_all_level()

Function to return the width of all levels in a list, considering root level = 0

Renvoie

The number of node on each level of the tree

Type renvoyé

list

get_width_of_level(level: int)

Return the width of a level, considering root level = 0

Renvoie

the number of node on the level of the tree

Type renvoyé

int

insert(new_sequence)

This insert function convert new sequence in PAA values then call the function insert_paa

Paramètres

new_sequence (numpy.array) – The new sequence to be inserted

insert_paa(new_paa)

The insert function that directly calls the function of its root node

Paramètres

new_paa (numpy.array) – The new sequence to be inserted

number_nodes_visited(sub_query: numpy.array, ntss_tmp: numpy.ndarray)

Account the number of average visited nodes in the tree for calculating the approximation.

Paramètres
  • sub_query (numpy.array) – The sequence to be evaluated

  • ntss_tmp (numpy.ndarray) – Reference sequences

Renvoie

Returns the number of nodes visited in the tree for the approximation iCFOF

Type renvoyé

numpy.array

preprocessing_for_icfof(ntss_tmp, bool_print: bool = False, count_num_node: bool = False)
Allows us to appeal, for the id_tree tree, to the two methods of preprocessing:
  • _minmax_obj_vs_node(),

  • distrib_nn_for_cdf().

Paramètres
  • ntss_tmp – Reference sequences

  • bool_print (boolean) – if True, Displays the times of each preprocessing step

  • count_num_node (boolean) – if True, count the number of nodes

Renvoie

if count_num_node True, Returns the number of nodes in the tree

Rtypes

int

vrang_list(sub_query: numpy.array, ntss_tmp: numpy.ndarray)

Get the vrang list for the sub_query sequence in the tree. Necessary for the calculation of the approximation. The same method faster but without the tree course: vrang_list_faster().

Paramètres
  • sub_query – The sequence to be evaluated

  • ntss_tmp – Reference sequences (IE. Reference history)

Renvoie

The vrang list of``sub_query``

Type renvoyé

list(float)

vrang_list_faster(sub_query: numpy.array, ntss_tmp: numpy.ndarray)

Get the vrang list for the sub_query sequence in the tree. Necessary for the calculation of the approximation. This method is the fast version of the method vrang_list().

Note

This method does not travel the tree, but directly prunes the leaves nodes. Preserved (uncut) leaves will be used by the approximation function.

Paramètres
  • sub_query – The sequence to be evaluated

  • ntss_tmp – Reference sequences (IE. Reference history) in PAA format

Renvoie

The vrang list of``sub_query``

Type renvoyé

list(float)

Fonctions Numba pour Tree iSAX

Pour l’obtention du vrang

pyCFOFiSAX._tree_iSAX.vrang_seq_ref(distance, max_array, min_array, cdf_mean, cdf_std, num_ts_by_node, index_cdf_bin, cdf_bins)

Calculates the vrang from the distance between the sequence to be evaluated and the reference sequence.

Paramètres
  • distance (float) – The distance between the two sequences

  • max_array (np_array) – Max distances between the nodes of the tree and the reference sequence

  • min_array (np_array) – MIN distances between the nodes of the tree and the reference sequence

  • cdf_mean (np_array) – The average distances between the nodes of the tree and the reference sequence

  • cdf_std (np_array) – Dispersion of distances in each leaf node

  • num_ts_by_node (np_array) – The number of sequence in each node sheet

  • index_cdf_bin (np_array) – index of the cdf_bins CDF

  • cdf_bins (np_array) – Normal distribution cdf values centered at the origin and standard deviation

Renvoie

le vrang

Type renvoyé

int

pyCFOFiSAX._tree_iSAX.vrang_list_for_all_seq_ref(len_seq_list, distance, max_array, min_array, cdf_mean, cdf_std, num_ts_by_node, index_cdf_bin, cdf_bins)

Uses the function vrang_seq_ref() For each reference sequence.

Paramètres
  • len_seq_list (float) – The number of reference sequence

  • distance (np_array) – The distance between the two sequences

  • max_array (np_ndarray) – Max distances between the nodes of the tree and the reference sequence

  • min_array (np_ndarray) – MIN distances between the nodes of the tree and the reference sequence

  • cdf_mean (np_ndarray) – The average distances between the nodes of the tree and the reference sequence

  • cdf_std (np_array) – Dispersion of distances in each leaf node

  • num_ts_by_node (np_array) – The number of sequence in each node sheet

  • index_cdf_bin (np_array) – The index of the CDF cdf_bins

  • cdf_bins (np_array) – Normal distribution cdf values centered at the origin and standard deviation

Renvoie

la liste des vrang

Type renvoyé

np_array

Pour compter les nœuds visités

pyCFOFiSAX._tree_iSAX.nodes_visited_for_seq_ref(distance, max_array, min_array, list_parent_node)
pyCFOFiSAX._tree_iSAX.nodes_visited_for_all_seq_ref(len_seq_list, distance, max_array, min_array, list_parent_node)