Input:
Two Files:
1. Capacity - This contains (variable,limit)
2. [login to view URL] - This contains (id,variable,value)
Output:
[login to view URL] (id, variable, value)
The process:
1. Import both Input Files
2. Create a subset of Score containing:
a. The single highest value per variable
b. The id must be unique
3. Write this to an output table
4. Remove the id selected above from the score table (so that it does not get used again).
5. If the count per variable in output table = [login to view URL] then remove the variable from the score table. (So that the variable is not selected above the capacity).
6. Cycle back to point 2 until there are no more records in score.
7. Write out the [login to view URL] file.
Additional Note:
Sample data:
Capasity:-
variable,limit
AGT110,1875
AGT120,2187
AGT139,1562
AGT146,1875
AGT150,2187
AGT152,2187
AGT153,1875
AGT155,1875
AGT157,1250
Score:-
clientId,variable,value
14313738,AGT160,0.06359
14317404,AGT162,0.07821
14362561,AGT155,0.21406
14321720,AGT161,0.32384
14314713,AGT162,0.07992
14369775,AGT163,0.1069
14353801,AGT153,0.06878
14365883,AGT165,0.08649125
14332650,AGT153,0.07421
The output file should only contain 31870 records.
The first round should look as follows:-
id,variable,value
14348744,AGT161,0.72528
14349046,AGT160,0.66255
14310070,AGT155,0.57827
14348800,AGT110,0.53863
14349598,AGT159,0.44097
14355688,AGT146,0.43844
14353679,AGT162,0.41105
14352158,AGT120,0.36759
14349991,AGT139,0.33552
14368664,AGT164,0.31323
14349627,AGT158,0.29891
14349800,AGT152,0.28776
14364479,AGT163,0.25394
14355732,AGT157,0.23732
14349641,AGT150,0.23238
14357995,AGT153,0.33374
14353811,AGT166,0.3119712
14349070,AGT165,0.3096063
Lastly the code must be fast very fast.