The "cursor" implementation of sampling is the Java application sampleRelation.java. You have to modify the source code slightly so that it will use your own SQL server, with the appropriate username and password.
The "deterministic" version of the sampling is executed using the SQL script sampleRelations.sql.
You can run the script as isql -U <user> -P <passwd> -d <database name> -i sampleRelations.sql
NOTE: This script creates the relations R1Sample and R2Sample with an extra column S (not described in the paper) that corresponds to different sample sizes S. The SQL script creates sample sizes S=1, 2, 4, ... 256. It is trivial to adapt it to use different sample sizes.
DROP TABLE [dbo].[R1Sample] CREATE TABLE [dbo].[R1Sample] ( [tid] int NOT NULL, [token] varchar (80) NOT NULL, [c] int NOT NULL, [S] int NOT NULL, PRIMARY KEY (S,tid,token), FOREIGN KEY (tid) REFERENCES R1, FOREIGN KEY (token) REFERENCES R1IDF ) DROP TABLE [dbo].[R2Sample] CREATE TABLE [dbo].[R2Sample] ( [tid] int NOT NULL, [token] varchar (80) NOT NULL, [c] int NOT NULL, [S] int NOT NULL, PRIMARY KEY (S,tid,token), FOREIGN KEY (tid) REFERENCES R2, FOREIGN KEY (token) REFERENCES R2IDF ) DECLARE @S int DECLARE @I int DECLARE @UPPERLIMIT int -- The upper limit in the sample size will be 2^@UPPERLIMIT SET @I=0 SET @UPPERLIMIT=8 -- So we will create sample sizes S = 1, 2, 4, ... 256 WHILE @I <= @UPPERLIMIT BEGIN SET @S=POWER(2, @I) INSERT INTO R1Sample(tid, token, c, S) SELECT rw.tid AS tid, rw.token AS token, ROUND( (rw.weight/rs.total) * @S, 0 ) AS c, @S AS S FROM R1Weights rw, R1Sum rs WHERE rw.token = rs.token AND ROUND( (rw.weight/rs.total) * @S, 0 )>0 INSERT INTO R2Sample(tid, token, c, S) SELECT rw.tid AS tid, rw.token AS token, ROUND( (rw.weight/rs.total) * @S, 0 ) AS c, @S AS S FROM R2Weights rw, R2Sum rs WHERE rw.token = rs.token AND ROUND( (rw.weight/rs.total) * @S, 0 )>0 SET @I = @I+1 END GO |