Welcome to ShenZhenJia Knowledge Sharing Community for programmer and developer-Open, Learning and Share
menu search
person
Welcome To Ask or Share your Answers For Others

Categories

I am trying to improve my lookup table run time.

dest_df = pd.DataFrame({"dest":["uk LHR","from ROM","City:LONDON","planetoronto"," rome rome","junk plane"]}) ## 300,000 rows
city_df_lookup=pd.DataFrame({"places":["london"," paris","toronto","rome"],
                           "code":["LHR","PAR","YTO","ROM"]}) ## around 10,000 rows 
code = city_df_lookup.code.tolist()                                                  
places = city_df_lookup.places.tolist()                                                        

def select(x):                                                                   
    for co, pl in zip(code, places):                                       
        if co in x:                                                             
            return pl                                                        

dest_df["clean_dest"] = dest_df["dest"].apply(select)  

dest_df.head()

dest               dest_match
0   uk LHR          london
1   from ROM        rome
2   City:LONDON     None
3   Planetoronto    None 
4   rome    rome    None 
5   junk plane      None

Unfortunately, the code above takes too long and i would also like the loop to try and string match between city_df_lookup.places and dest_df.dest columns

The desired output is:

dest               dest_match
0   uk LHR          london
1   from ROM        rome
2   City:LONDON     london
3   Planetoronto    tornoto
4   rome    rome    rome  
5   junk plane      No Match

I was thinking of using ahocorasick but not sure if there is a simpler method.

question from:https://stackoverflow.com/questions/65904049/large-scale-string-matching-between-different-dataframes-python

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
thumb_up_alt 0 like thumb_down_alt 0 dislike
1.1k views
Welcome To Ask or Share your Answers For Others

1 Answer

Waitting for answers

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
thumb_up_alt 0 like thumb_down_alt 0 dislike
Welcome to ShenZhenJia Knowledge Sharing Community for programmer and developer-Open, Learning and Share
...