Data Mining

   Our data mining research started in the early 90s and span over the past decade. The broad scope of our research was motivated by real problem domain which includes:

1. Data Mining Algorithms:

  • Clustering of non-numerical and numerical attribute values for query relaxation
  • Multivariate classification for data with large attribute space
  • Temporal Data Mining
  • Mining MFI (Maximum Frequent Item Sets) for association rules

2. Selected Data Mining Applications

1) Query Processing with sequence data

In this experiment we shall show MQuery tool to construct a sequence query then using our approximate sequence matching technology to generate a set of matched sequences, which are ranked according to the nearest measure as shown in the following set of figures.


Fig. 1 shows the lung tumor sequence template.

Query 1: Query Sequence with two images
Fig. 2 A query sequence of two CT lung images showing a tumor growing, constructed from the template in fig 1.

Executing the query returns a result viewer window with which we can view the results of the query.

Fig. 3 KMeD searches the sequence database with the approximate matching technique and the best results are returned and shown in the Result Viewer. The features used for the lung tumor are the distance from the x and y centroid, and lung tumor area. The features which are represented in a column. The first four columns represent the features for the first image, and the next four represent the features of the second image. The distance is the nearness of the answer sequence with the target sequence. Each row represents an answer, their ranked according the distance. The corresponding first three image result sequences are shown in Figure 4.

First query result

Second query result


Third query result
Figure 4

Query 2: Query Sequence with three images


Figure 5 A query sequence of three CT lung images showing a tumor growing, constructed from the template in fig 1.

Figure 6 KMeD searches the sequence database with the approximate matching technique and the best results are returned and shown in the Result Viewer. The features used for the lung tumor are the distance from the x and y centroid, and lung tumor area. The features which are represented in a column. The first four columns represent the features for the first image, and the next four represent the features of the second image, and the next four represent the features of the third image. The distance is the nearness of the answer sequence with the target sequence. Each row represents an answer, their ranked according the distance. The corresponding first two image result sequences are shown in Figure 7.


First Result of Query


Second Result of Query
Figure 7

2) Control of Systems with MEMS Sensors and Actuators via Data Mining Techniques

  • MEMS sensors and actuators
  • Dynamic control
  • Delta wing flight control
  • Temporal and spatial data mining

3) Drug Exposure Side Effects from Mining Pregnancy Data

4) Challenges and Techniques for Mining Clinical Data

3. References