A preliminary pass at trying predict (T13) school decisions Forum

drblakedowns · Post by **drblakedowns** » Sun Feb 28, 2016 8:33 pm

First off, I would like to give a huge thank you to Robb for providing me with the LSN data so I could play around with it.

The experiment: This started as trying to answer the question "what do the decisions from one school will tell me about other schools?" Basically I am playing around with Machine Learning models for classification. I also wanted to see if there is anything that we can learn from the models.

The data: 19271 myLSN users from 05-06 to present who applied and heard back from T13 schools. All schools with status 'Pending' were considered dings for every past cycle, current cycle 'pending' are ignored. I am looking at initial decision so 'WL Accept' and 'WL Reject' is considered WL.

Weka is used for the Classification.

Evaluation: 10-Fold cross validation.

First Pass, I looked at OneR which tries to find one rule to best classify the data. This is very simplistic, but you always start with the simplest thing that can work, and I found the results interesting.

Yale - Best predictor: stanford's acceptance, the rules 'Stanford-Accept -> Yale-Accept'; 'Stanford-Waitlist or Stanford-Reject -> Yale-Reject' correctly predicts the Yale results in 78.4804 % of users.
Harvard - Best predictor: yale's results. The rules Yale-Accept or Yale-WL -> Harvard-Accept, and Yale-Re -> Harvard-Re. Correctly predicts 67.9604 % of Harvard results in users.
Stanford - Best predictor: yale's results. The rules 'Yale Accept or WL -> Stanford Accept' and 'Yale Reject -> Stanford Reject' correctly classified the Stanford results in 75.5683 % of users.
Columbia - Best predictor: nyu's results. The rules 'NYU Accept -> Columbia Accept', 'NYU WL-> Columbia WL' and 'NYU Reject -> Columbia Reject' correctly predicts columbia results in 65.1188 % of users.
Chicago - Best predictor: nyu. The rules 'NYU Accept -> Chicago Accept', 'NYU WL-> Chicago WL' and 'NYU Reject -> Chicago Reject' correctly predicts Chicago results in 61.8695 % of the users.
NYU - Best predictor: lsat score. The rules 'LSAT < 170.5 -> nyu-Reject' and 'LSAT >= 170.5 -> nyu-Accept' correctly predicts NYU results in 66.0256 % of applicants.
Penn - Best predictor: uva. The rules ' UVA Accept -> Penn Accept', 'UVA WL-> Penn WL' and 'UVA Reject -> UVA Reject' correctly predicts Penn results in 56.84 % of LSN users.
Duke - Best predictor: lsat score. The rules 'LSAT < 168.5 -> duke-Reject' and 'LSAT >= 168.5 -> duke accept' correctly predicts duke results in 61.8403 % of users.
Berkeley - Best predictor: Harvard. The rules 'Harvard Accept or waitlist -> Berkeley Accept' and 'Harvard Reject -> Berkeley Reject' correctly predicts Berkeley results in 77.3727 % of users.
UVA - Best predictor: penn. The rules 'Penn Accept -> UVA Accept', 'Penn WL-> UVA WL' and 'Penn Reject -> UVA Reject' correctly predicts UVA results in 54.14 % of users.
Michigan - Best predictor: duke. "Duke Accept -> Michigan Accept', 'Duke WL -> Michigan WL', and 'Duke Reject -> Michigan Reject' correctly predicts Michigan results in 55.3852 % of users.
Northwestern - Best predictor: nyu. The rules 'NYU Accept or WL -> Northwestern Accept' and 'NYU Reject -> Northwestern Reject' correctly predicts northwestern results in 59.6167 % of users.
Cornell - Best predictor: michigan. The rules 'Michigan Accept or WL-> Cornell Accept' and 'Michigan Reject -> Cornell Reject' correctly predicts Cornell results in 60.693 % of users.

Note: for Yale and Stanford, the rejection rate is so high that it makes the prediction look better then it is, also for Michigan, getting a merit scholarship from a top 13 school was a slightly better predictor than Duke.

Discussion: For almost every school, there is another school whose admission can be used to predict at least 55% of that school's admission, and in some cases it is more (Harvard predicting 77% of Berkeley applicants). It is interesting to see that LSAT is better predictor for NYU and Duke then any other school, but this shouldn't be read to say that LSAT is the only this that those schools care about, because in the majority of applicants there is a correlation between LSAT and GPA, and law school applicants generally have good undergrad gpas.

I didn't do any error analysis, or go in depth at all, with these results, but I will on request.

Please let me know if there are things you would like to see with the models.
There is more to come, the next set of results will be from Naive Bayes models (because they seem to work best), and from decision trees because they are the most human readable.

drblakedowns · Post by **drblakedowns** » Mon Feb 29, 2016 11:51 am

Built a rule based model to predict T13-scholarships that has fairly decent results.
A yes in the model means you are getting a scholarship 70% of the time, and a no means no scholarship 90% of the time.

As a bonus the rules are human readable (the numbers after the rules are the #correct / #incorrect from that rule).
It looks like a Duke admit is a good bellwether for getting a scholarship.

Rules:

(duke = duke-Ac) and (chicago = chicago-Ac) => T13Cash=T13Cash-yes (1208.0/308.0)
(duke = duke-Ac) and (michigan = michigan-Ac) and (chicago = chicago-Wa) => T13Cash=T13Cash-yes (304.0/47.0)
(lsat >= 169) and (michigan = michigan-Ac) and (gpa >= 3.61) and (uva = uva-Wa) => T13Cash=T13Cash-yes (153.0/26.0)
(lsat >= 169) and (duke = duke-Ac) and (penn = penn-Wa) => T13Cash=T13Cash-yes (317.0/80.0)
(lsat >= 168) and (michigan = michigan-Ac) and (uva = uva-Ac) and (gpa >= 3.54) and (columbia = columbia-Wa) => T13Cash=T13Cash-yes (76.0/16.0)
(lsat >= 169) and (duke = duke-Ac) and (uva = uva-Wa) => T13Cash=T13Cash-yes (172.0/58.0)
(duke = duke-Ac) and (uva = uva-Ac) and (penn = penn-Wa) => T13Cash=T13Cash-yes (16.0/5.0)
(lsat >= 169) and (michigan = michigan-Ac) and (gpa >= 3.61) and (northwestern = northwestern-Ac) => T13Cash=T13Cash-yes (121.0/36.0)
(lsat >= 169) and (uva = uva-Ac) and (gpa >= 3.54) and (gpa >= 3.83) and (michigan = michigan-Wa) => T13Cash=T13Cash-yes (50.0/7.0)
(lsat >= 169) and (michigan = michigan-Ac) and (uva = uva-Ac) and (harvard = harvard-Wa) => T13Cash=T13Cash-yes (34.0/6.0)
(lsat >= 169) and (nyu = nyu-Ac) and (chicago = chicago-Ac) and (gpa >= 3.59) and (lsat >= 173) => T13Cash=T13Cash-yes (183.0/53.0)
(lsat >= 168) and (michigan = michigan-Ac) and (penn = penn-Wa) => T13Cash=T13Cash-yes (180.0/62.0)
(lsat >= 169) and (michigan = michigan-Ac) and (gpa >= 3.61) and (lsat <= 173) and (uva = uva-Ac) => T13Cash=T13Cash-yes (90.0/35.0)
(lsat >= 168) and (nyu = nyu-Ac) and (uva = uva-Ac) and (penn = penn-Ac) and (harvard = harvard-Re) => T13Cash=T13Cash-yes (55.0/17.0)
(lsat >= 169) and (nyu = nyu-Ac) and (columbia = columbia-Ac) and (lsat >= 173) and (duke = duke-Ac) and (stanford = stanford-Wa) => T13Cash=T13Cash-yes (16.0/1.0)
(michigan = michigan-Ac) and (chicago = chicago-Wa) => T13Cash=T13Cash-yes (146.0/61.0)
(lsat >= 168) and (uva = uva-Ac) and (gpa >= 3.53) and (chicago = chicago-Wa) => T13Cash=T13Cash-yes (61.0/19.0)
(lsat >= 169) and (duke = duke-Ac) and (uva = uva-Ac) and (gpa >= 3.75) => T13Cash=T13Cash-yes (116.0/44.0)
(lsat >= 168) and (michigan = michigan-Ac) and (lsat >= 171) and (gpa >= 3.3) and (gpa <= 3.72) => T13Cash=T13Cash-yes (157.0/74.0)
(lsat >= 168) and (michigan = michigan-Ac) and (gpa >= 3.61) and (duke = duke-Wa) => T13Cash=T13Cash-yes (21.0/7.0)
(lsat >= 168) and (nyu = nyu-Ac) and (northwestern = northwestern-Ac) => T13Cash=T13Cash-yes (177.0/87.0)
(lsat >= 168) and (gpa >= 3.53) and (cornell = cornell-Ac) and (nyu = nyu-Wa) => T13Cash=T13Cash-yes (82.0/30.0)
(lsat >= 168) and (gpa >= 3.57) and (nyu = nyu-Ac) and (lsat >= 175) and (gpa >= 3.9) and (lsat <= 179) and (berkeley = berkeley-Ac) => T13Cash=T13Cash-yes (30.0/9.0)
Else => T13Cash=T13Cash-no (15506.0/1263.0)

Evaluation (Precision / Recall /F-Measure):

T13Cash-yes : 0.677 / 0.705 / 0.691
T13Cash-no : 0.923 / 0.914 / 0.919
Weighted Avg.: 0.873 / 0.871 / 0.872

Code: Select all

=== Confusion Matrix ===

     a     b   <-- classified as
  2137  1803 |     a = T13Cash-yes
  1255 14076 |     b = T13Cash-no

A preliminary pass at trying predict (T13) school decisions Forum

A preliminary pass at trying predict (T13) school decisions

Re: A preliminary pass at trying predict (T13) school decisions