I've got a problem that I'm working on involving a dataset with 12 variables in which I want to create a function with two inputs (numberOfAttributes, supportThreshold).
For example, with inout (4,.6), I'd like to retrieve all 4 attribute combos that comprise 60% of the dataset.
Here's my code:
def attributesSet(numberOfAttributes, supportThreshold):
import csv
import pandas as pd
import itertools
import math
names = ['age','sex','education','country','race','status','workclass','occupation','hours-
per-week','income','capital-gain','capital-loss']
combinations = []
final = []
for comb in itertools.combinations(names,numberOfAttributes):
combinations.append(list(comb))
c = pd.read_csv('census.csv')
c.columns= names
total = len(c.index)
required = supportThreshold*total
for i in combinations:
g = c.groupby(i).size().sort_values(ascending=False)
groups = g[g>required].index
satisfied = list(groups)
for j in satisfied:
row = ''
for t in j:
row = row + t
if j.index(t) != len(j)-1:
row = row + ','
final.append(''+row)
return final
My code works up until I change numberOfAttributes to 1, in which case my outputs have a comma inbetween each character. Does anyone know how I can fix this?