-
Notifications
You must be signed in to change notification settings - Fork 192
AssertionError while running NER #38
Copy link
Copy link
Open
Labels
documentationImprovements or additions to documentationImprovements or additions to documentation
Description
您好,參考官網的範例寫了一個 get_nlp_result function,會 iterate data_df 的 row,將文字資料 row[text_col] 依序送進 ws、pos 和 ner function 處理
在跑ner的時候有時會遇到 AssertionError error,請問這是甚麼問題造成的呢?
def get_nlp_result(data_df, id_col, text_col):
start = time.time()
pos_list = []
entity_list = []
sentence_list = []
for index, row in data_df.iterrows(): # document level
# print(f"\ndocument {index}")
# clean data
result = []
tmp = Sentence_Segmentation(row[text_col])
flat_list = [item for sublist in tmp for item in sublist]
# ckip
w_sentence_list = ws(flat_list, coerce_dictionary = dictionary2) # set dictionary
pos_sentence_list = pos(w_sentence_list)
entity_sentence_list = ner(w_sentence_list, pos_sentence_list)
for i, sentence in enumerate(flat_list): # sentence level
# print(f"sentence {i}: {sentence}")
sentence_list.append([row[id_col],sentence])
temp_tokens = get_pos(row[id_col],w_sentence_list[i], pos_sentence_list[i])
temp_entites = get_ner(row[id_col],entity_sentence_list[i])
pos_list.append(temp_tokens)
if len(temp_entites) != 0:
entity_list.append(temp_entites)
pos_flat = [item for sublist in pos_list for item in sublist]
entity_flat = [item for sublist in entity_list for item in sublist]
pos_table = pd.DataFrame(data=pos_flat,
columns=[id_col,'word','pos'])
entity_table = pd.DataFrame(data=entity_flat,
columns=[id_col,'word','ner'])
sentence_table = pd.DataFrame(data=sentence_list,
columns=[id_col,'sentence'])
end = time.time()
print("time costing: {}".format(end - start))
return pos_table, entity_table, sentence_table
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
documentationImprovements or additions to documentationImprovements or additions to documentation
