Assignment 2:Due 16 Sept 2013
2. Experiment with different types of binning on the glass data (installed with WEKA). Try equal width and equal depth binning. Try various numbers of bins.  Use OneR on the original data and on your binned data.  Compare the results.

#equal frequency(depth) binning in weka, with ignore class=true or false (doesn't matter),using OneR, number of bin=10

=== Run information ===

Scheme:weka.classifiers.rules.OneR -B 6
Relation:     Glass-weka.filters.unsupervised.attribute.Discretize-F-B10-M-1.0-Rfirst-last
Instances:    214
Attributes:   10
              RI
              Na
              Mg
              Al
              Si
              K
              Ca
              Ba
              Fe
              Type
Test mode:10-fold cross-validation

=== Classifier model (full training set) ===

Al:
	'(-inf-0.85]'	-> build wind float
	'(0.85-1.145]'	-> build wind float
	'(1.145-1.225]'	-> build wind float
	'(1.225-1.285]'	-> build wind float
	'(1.285-1.355]'	-> build wind float
	'(1.355-1.475]'	-> build wind non-float
	'(1.475-1.555]'	-> build wind non-float
	'(1.555-1.725]'	-> build wind non-float
	'(1.725-2.07]'	-> headlamps
	'(2.07-inf)'	-> headlamps
(123/214 instances correct)


Time taken to build model: 0 seconds

=== Stratified cross-validation ===
=== Summary ===

Correctly Classified Instances         121               56.5421 %
Incorrectly Classified Instances        93               43.4579 %
Kappa statistic                          0.3829
Mean absolute error                      0.1242
Root mean squared error                  0.3524
Relative absolute error                 58.6353 %
Root relative squared error            108.5738 %
Total Number of Instances              214

=== Detailed Accuracy By Class ===

               TP Rate   FP Rate   Precision   Recall  F-Measure   ROC Area  Class
                 0.8       0.313      0.554     0.8       0.655      0.744    build wind float
                 0.513     0.21       0.574     0.513     0.542      0.652    build wind non-float
                 0         0          0         0         0          0.5      vehic wind float
                 0         0          0         0         0          ?        vehic wind non-float
                 0         0          0         0         0          0.5      containers
                 0         0          0         0         0          0.5      tableware
                 0.897     0.103      0.578     0.897     0.703      0.897    headlamps
Weighted Avg.    0.565     0.191      0.463     0.565     0.502      0.687

=== Confusion Matrix ===

  a  b  c  d  e  f  g   <-- classified as
 56 14  0  0  0  0  0 |  a = build wind float
 28 39  0  0  0  0  9 |  b = build wind non-float
 11  5  0  0  0  0  1 |  c = vehic wind float
  0  0  0  0  0  0  0 |  d = vehic wind non-float
  0  6  0  0  0  0  7 |  e = containers
  3  4  0  0  0  0  2 |  f = tableware
  3  0  0  0  0  0 26 |  g = headlamps


#try the rule of thumb of number of bins, bin=log2(214)=7.74 (use 8),ignore class=false

results
=== Run information ===

Scheme:weka.classifiers.rules.OneR -B 6
Relation:     Glass-weka.filters.unsupervised.attribute.Discretize-F-B8-M-1.0-Rfirst-last
Instances:    214
Attributes:   10
              RI
              Na
              Mg
              Al
              Si
              K
              Ca
              Ba
              Fe
              Type
Test mode:10-fold cross-validation

=== Classifier model (full training set) ===

Al:
	'(-inf-0.905]'	-> build wind float
	'(0.905-1.185]'	-> build wind float
	'(1.185-1.275]'	-> build wind float
	'(1.275-1.355]'	-> build wind float
	'(1.355-1.5]'	-> build wind non-float
	'(1.5-1.625]'	-> build wind non-float
	'(1.625-1.985]'	-> build wind non-float
	'(1.985-inf)'	-> headlamps
(122/214 instances correct)


Time taken to build model: 0 seconds

=== Stratified cross-validation ===
=== Summary ===

Correctly Classified Instances         122               57.0093 %
Incorrectly Classified Instances        92               42.9907 %
Kappa statistic                          0.3735
Mean absolute error                      0.1228
Root mean squared error                  0.3505
Relative absolute error                 58.0048 %
Root relative squared error            107.9885 %
Total Number of Instances              214

=== Detailed Accuracy By Class ===

               TP Rate   FP Rate   Precision   Recall  F-Measure   ROC Area  Class
                 0.829     0.319      0.558     0.829     0.667      0.755    build wind float
                 0.592     0.275      0.542     0.592     0.566      0.658    build wind non-float
                 0         0          0         0         0          0.5      vehic wind float
                 0         0          0         0         0          ?        vehic wind non-float
                 0         0          0         0         0          0.5      containers
                 0         0          0         0         0          0.5      tableware
                 0.655     0.043      0.704     0.655     0.679      0.806    headlamps
Weighted Avg.    0.57      0.208      0.47      0.57      0.511      0.681

=== Confusion Matrix ===

  a  b  c  d  e  f  g   <-- classified as
 58 12  0  0  0  0  0 |  a = build wind float
 28 45  0  0  0  0  3 |  b = build wind non-float
 12  5  0  0  0  0  0 |  c = vehic wind float
  0  0  0  0  0  0  0 |  d = vehic wind non-float
  0  9  0  0  0  0  4 |  e = containers
  3  5  0  0  0  0  1 |  f = tableware
  3  7  0  0  0  0 19 |  g = headlamps


#use equal width, ignore class=false, number of bins=8, find optimal no of bins:false
=== Run information ===

Scheme:weka.classifiers.rules.OneR -B 6
Relation:     Glass-weka.filters.unsupervised.attribute.Discretize-B8-M-1.0-Rfirst-last
Instances:    214
Attributes:   10
              RI
              Na
              Mg
              Al
              Si
              K
              Ca
              Ba
              Fe
              Type
Test mode:10-fold cross-validation

=== Classifier model (full training set) ===

Al:
	'(-inf-0.69125]'	-> build wind float
	'(0.69125-1.0925]'	-> build wind float
	'(1.0925-1.49375]'	-> build wind float
	'(1.49375-1.895]'	-> build wind non-float
	'(1.895-2.29625]'	-> headlamps
	'(2.29625-2.6975]'	-> headlamps
	'(2.6975-3.09875]'	-> headlamps
	'(3.09875-inf)'	-> containers
(115/214 instances correct)


Time taken to build model: 0 seconds

=== Stratified cross-validation ===
=== Summary ===

Correctly Classified Instances         110               51.4019 %
Incorrectly Classified Instances       104               48.5981 %
Kappa statistic                          0.2979
Mean absolute error                      0.1389
Root mean squared error                  0.3726
Relative absolute error                 65.5707 %
Root relative squared error            114.8155 %
Total Number of Instances              214

=== Detailed Accuracy By Class ===

               TP Rate   FP Rate   Precision   Recall  F-Measure   ROC Area  Class
                 0.914     0.451      0.496     0.914     0.643      0.731    build wind float
                 0.355     0.21       0.482     0.355     0.409      0.573    build wind non-float
                 0         0          0         0         0          0.5      vehic wind float
                 0         0          0         0         0          ?        vehic wind non-float
                 0         0.01       0         0         0          0.495    containers
                 0         0          0         0         0          0.5      tableware
                 0.655     0.043      0.704     0.655     0.679      0.806    headlamps
Weighted Avg.    0.514     0.229      0.429     0.514     0.448      0.643

=== Confusion Matrix ===

  a  b  c  d  e  f  g   <-- classified as
 64  6  0  0  0  0  0 |  a = build wind float
 45 27  0  0  0  0  4 |  b = build wind non-float
 12  5  0  0  0  0  0 |  c = vehic wind float
  0  0  0  0  0  0  0 |  d = vehic wind non-float
  2  8  0  0  0  0  3 |  e = containers
  3  5  0  0  0  0  1 |  f = tableware
  3  5  0  0  2  0 19 |  g = headlamps


#use the same as before, but use optimize number of bins for each attribute, ignore class=true or false doesn't make a difference
=== Run information ===

Scheme:weka.classifiers.rules.OneR -B 6
Relation:     Glass-weka.filters.unsupervised.attribute.Discretize-O-B8-M-1.0-Rfirst-last
Instances:    214
Attributes:   10
              RI
              Na
              Mg
              Al
              Si
              K
              Ca
              Ba
              Fe
              Type
Test mode:10-fold cross-validation

=== Classifier model (full training set) ===

Mg:
	'(-inf-0.56125]'	-> headlamps
	'(0.56125-1.1225]'	-> build wind non-float
	'(1.1225-1.68375]'	-> build wind non-float
	'(1.68375-2.245]'	-> containers
	'(2.245-2.80625]'	-> build wind non-float
	'(2.80625-3.3675]'	-> build wind non-float
	'(3.3675-3.92875]'	-> build wind float
	'(3.92875-inf)'	-> build wind non-float
(107/214 instances correct)


Time taken to build model: 0 seconds

=== Stratified cross-validation ===
=== Summary ===

Correctly Classified Instances          96               44.8598 %
Incorrectly Classified Instances       118               55.1402 %
Kappa statistic                          0.2231
Mean absolute error                      0.1575
Root mean squared error                  0.3969
Relative absolute error                 74.3975 %
Root relative squared error            122.2995 %
Total Number of Instances              214

=== Detailed Accuracy By Class ===

               TP Rate   FP Rate   Precision   Recall  F-Measure   ROC Area  Class
                 0.9       0.507      0.463     0.9       0.612      0.697    build wind float
                 0.211     0.138      0.457     0.211     0.288      0.536    build wind non-float
                 0         0          0         0         0          0.5      vehic wind float
                 0         0          0         0         0          ?        vehic wind non-float
                 0         0.03       0         0         0          0.485    containers
                 0         0.01       0         0         0          0.495    tableware
                 0.586     0.097      0.486     0.586     0.531      0.744    headlamps
Weighted Avg.    0.449     0.23       0.38      0.449     0.374      0.609

=== Confusion Matrix ===

  a  b  c  d  e  f  g   <-- classified as
 63  7  0  0  0  0  0 |  a = build wind float
 50 16  0  0  1  1  8 |  b = build wind non-float
 17  0  0  0  0  0  0 |  c = vehic wind float
  0  0  0  0  0  0  0 |  d = vehic wind non-float
  0  5  0  0  0  1  7 |  e = containers
  0  4  0  0  2  0  3 |  f = tableware
  6  3  0  0  3  0 17 |  g = headlamps


#use original data without binning
=== Run information ===

Scheme:weka.classifiers.rules.OneR -B 6
Relation:     Glass
Instances:    214
Attributes:   10
              RI
              Na
              Mg
              Al
              Si
              K
              Ca
              Ba
              Fe
              Type
Test mode:10-fold cross-validation

=== Classifier model (full training set) ===

Al:
	< 0.905	-> build wind float
	< 1.1150000000000002	-> build wind non-float
	< 1.2149999999999999	-> build wind float
	< 1.2650000000000001	-> build wind non-float
	< 1.42	-> build wind float
	< 1.815	-> build wind non-float
	< 2.95	-> headlamps
	>= 2.95	-> containers
(135/214 instances correct)


Time taken to build model: 0.01 seconds

=== Stratified cross-validation ===
=== Summary ===

Correctly Classified Instances         124               57.9439 %
Incorrectly Classified Instances        90               42.0561 %
Kappa statistic                          0.3946
Mean absolute error                      0.1202
Root mean squared error                  0.3466
Relative absolute error                 56.7438 %
Root relative squared error            106.8083 %
Total Number of Instances              214

=== Detailed Accuracy By Class ===

               TP Rate   FP Rate   Precision   Recall  F-Measure   ROC Area  Class
                 0.786     0.299      0.561     0.786     0.655      0.744    build wind float
                 0.579     0.268      0.543     0.579     0.561      0.655    build wind non-float
                 0         0          0         0         0          0.5      vehic wind float
                 0         0          0         0         0          ?        vehic wind non-float
                 0.231     0          1         0.231     0.375      0.615    containers
                 0         0          0         0         0          0.5      tableware
                 0.759     0.054      0.688     0.759     0.721      0.852    headlamps
Weighted Avg.    0.579     0.2        0.53      0.579     0.534      0.69

=== Confusion Matrix ===

  a  b  c  d  e  f  g   <-- classified as
 55 15  0  0  0  0  0 |  a = build wind float
 26 44  0  0  0  0  6 |  b = build wind non-float
 12  5  0  0  0  0  0 |  c = vehic wind float
  0  0  0  0  0  0  0 |  d = vehic wind non-float
  0  7  0  0  3  0  3 |  e = containers
  3  5  0  0  0  0  1 |  f = tableware
  2  5  0  0  0  0 22 |  g = headlamps