machine learning - Does test file in weka requires same or less number of features as train?

Question

Welcome To Ask or Share your Answers For Others

machine learning - Does test file in weka requires same or less number of features as train?

asked Oct 24, 2021 in Technique[技术] by 深蓝 (71.8m points)

I have prepared two different .arff files from two different datasets one for testing and other for training. Each of them have equal instances but different features changing the dimensionality of feature vector for each file. When i did cross-validation on each of these files, they are working perfectly. This shows .arff files are properly prepared and don't have any error.

Now if i use the train file having less dimensionality compared to test file for evaluation. I get a following error.

Exception in thread "main" java.lang.ArrayIndexOutOfBoundsException: 5986
at  weka.classifiers.bayes.NaiveBayesMultinomial.probOfDocGivenClass(NaiveBayesMultinomial.java:295)
at weka.classifiers.bayes.NaiveBayesMultinomial.distributionForInstance(NaiveBayesMultinomial.java:254)
at weka.classifiers.Evaluation.evaluationForSingleInstance(Evaluation.java:1657)
at weka.classifiers.Evaluation.evaluateModelOnceAndRecordPrediction(Evaluation.java:1694)
at weka.classifiers.Evaluation.evaluateModel(Evaluation.java:1574)
at TrainCrossValidateARFF.main(TrainCrossValidateARFF.java:44)

Does test file in weka requires same or less number of features as train ? Code for evaluation

public class TrainCrossValidateARFF{
    private static DecimalFormat df = new DecimalFormat("#.##");
    public static void main(String args[]) throws Exception
    {
            if (args.length != 1 && args.length != 2) {
                    System.out.println("USAGE: CrossValidateARFF <arff_file> [<stop_words_file>]");
                    System.exit(-1);
            }
            String TrainarffFilePath = args[0];
            DataSource ds = new DataSource(TrainarffFilePath);
            Instances Train = ds.getDataSet();
            Train.setClassIndex(Train.numAttributes() - 1);

            String TestarffFilePath = args[1];
            DataSource ds1 = new DataSource(TestarffFilePath);
            Instances Test  = ds1.getDataSet();
            // setting class attribute
            Test.setClassIndex(Test.numAttributes() - 1);

            System.out.println("-----------"+TrainarffFilePath+"--------------");
            System.out.println("-----------"+TestarffFilePath+"--------------");
            NaiveBayesMultinomial naiveBayes = new NaiveBayesMultinomial();
            naiveBayes.buildClassifier(Train);

            Evaluation eval = new Evaluation(Train);
            eval.evaluateModel(naiveBayes,Test);
            System.out.println(eval.toSummaryString("
Results
======
", false));
}
}

See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙；凝视深渊过久,深渊将回以凝视…

503 views

1 Answer

深蓝 · Answer 1 · 2021-10-23T21:31:48+0000

Does test file in weka requires same or less number of features as train ? Code for evaluation

Same number of features are necessary. You may need to insert ? for class attribute too.

According to Weka Architect Mark Hall

To be compatible, the header information of the two sets of instances needs to be the same - same number of attributes, with the same names in the same order. Furthermore, any nominal attributes must have the same values declared in the same order in both sets of instances. For unknown class values in your test set just set the value of each to missing - i.e "?".

Categories

machine learning - Does test file in weka requires same or less number of features as train?

Please log in or register to add a comment.

Please log in or register to answer this question.

1 Answer

Please log in or register to add a comment.

Just Browsing Browsing

Most popular tags