What sort of things can you do?
You can use data to develop categories, clusters and classifications.
In k-Nearest Neighbour (k-NN) classification a data point is classified by majority vote of its nearest neighbours. If k=1 the green circle will be classed with the red triangles, because its nearest neighbour is a red triangle. If k=3 it will again be red triangles because the majority of the (k=)3 nearest neighbours are triangles. If k=5 it will be classified as a blue square because 3 of the (k=)5 nearest neighbours are blue squares. Clearly the choice of k is critical. An alternate method is to weight the classification by the distance to each of the nearest neighbours.
You can try to discover behaviours which occur together. For example, in the sentence: "This is the life!", there are 2xe, 1xf, 2xh etc. If we only count where there are 2 or more occurrences, the "frequent 1 sequences" are: 2xe, 2xh, 3xi, 2xs, 2xt. If we seek the 2-sequences (only for these) we have: e_, e!, hi, he, is, is, s_, s_, th, th. Using a frequency threshold of 2 again, we are left with is, s_, and th as our frequent 2 sequences. Moving to 3 sequences (again only using those we have identified as frequent 2 sequences) and another threshold of 2 we have only 1 frequent 3 sequence: is_. Moving to 4 sequences we find is_i and is_t. Neither of these pass the threshold so the algorithm stops. What have we learnt? The 1 sequences could tell us something about the commonest letters in English and the 2 sequences tell us that is and th are frequent combinations and that s often happens at the end of words. The 3 sequences tell us that is often happens at the end of words.
So what?
We now have a predictive framework: if you get an i expect an s (and then a space), if you get an s expect a space, if you get a t expect an h.
We could use this process to create 'recommendations' a la Amazon: if you enjoyed doing those sums you might like to try these. Or diagnoses, enabling us to identify the appropriate intervention for the measured behaviour.
References
Baker, S.J.D. & Yacef, K. (2009) The State of Educational Data Mining in 2009: A Review and Future Visions: http://www.educationaldatamining.org/JEDM/images/articles/vol1/issue1/JEDMVol1Issue1_BakerYacef.pdf accessed 10th January 2011
International Working Group on Educational Data Mining available at http://educationaldatamining.org/ accessed 10th January 2011
Wikipedia Data Mining available at http://en.wikipedia.org/wiki/Data_mining accessed 10th January 2011
No comments:
Post a Comment