Welcome to ShenZhenJia Knowledge Sharing Community for programmer and developer-Open, Learning and Share
menu search
person
Welcome To Ask or Share your Answers For Others

Categories

I'm trying to compare cells within a data frame using pandas. the data looks like that:

seqnames, start, end, width, strand, s1, s2, s3, sn
1, Ha412HOChr01, 1, 220000, 220000, CN2, CN10, CN2, CN2
2, Ha412HOChr01, 1, 220000, 220000, CN2, CN2, CN2, CN2
3, Ha412HOChr01, 1, 220000, 220000, CN2, CN4, CN2, CN2
n, Ha412HOChr01, 1, 220000, 220000, CN2, CN2, CN2, CN6

I was able to make individual comparisons with the following code

import pandas as pd

df = pd.read_csv("test.csv") 

if df.iloc[0,5] != df.iloc[0,6]:
  print("yay!")
else:
  print("not intersting...")

I would like to iterate a comparison between s1 and all the other s columns, line by line in a loop or in any other more efficient methods. when i've tried the following code:

df = pd.read_csv("test.csv") 
df.columns
#make sure to change in future analysis
ref = df[' Sunflower_14_S8']
all_the_rest = df.drop(['seqnames', ' start', ' end', ' width', ' strand'], axis=1)
#all_the_rest.columns

OP = ref.eq(all_the_rest)
OP.to_csv("OP.csv")

i've got a wired output

0,False,False,False,False,False,False,False,False,False,False,False,False,False
1,False,False,False,False,False,False,False,False,False,False,False,False,False
2,False,False,False,False,False,False,False,False,False,False,False,False,False
3,False,False,False,False,False,False,False,False,False,False,False,False,False444,False,False,False,False,False,False,False,False,False,False,False,False,False

it seems like it compare all the characters instead of the strings I'm new to programming and I'm stuck, appreciate your help!


与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
thumb_up_alt 0 like thumb_down_alt 0 dislike
4.2k views
Welcome To Ask or Share your Answers For Others

1 Answer

Does this help?

import pandas as pd
# define a list of columns you want to compare
cols = ['s1', 's2', 's3']
# some sample data
df = pd.DataFrame(columns=cols)
df['s1'] = ['CN2', 'CN10', 'CN2', 'CN2']
df['s2'] = ['CN2', 'CN2', 'CN2', 'CN2']
df['s3'] = ['CN2', 'CN2', 'CN2', 'CN6']
# remove 's1' from the list of columns
cols_except_s1 = [x for x in cols if x!='s1']
# create a blank dataframe to hold our comparisons
df_comparison = pd.DataFrame(columns=cols_except_s1)
# iterate through each other column, comparing it against 's1'
for x in cols_except_s1:
    comparison_series = df['s1'] == df[x]
    df_comparison[x] = comparison_series
# the result is a dataframe that has columns of Boolean values
print(df_comparison)

outputs

    s2  s3
0   True    True
1   False   False
2   True    True
3   True    False

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
thumb_up_alt 0 like thumb_down_alt 0 dislike
Welcome to ShenZhenJia Knowledge Sharing Community for programmer and developer-Open, Learning and Share
...