Как исправить функцию подсчета слов
функция count words должна удалить все стоп-слова, но при запуске кода я получаю список строк со стоп-словами, поэтому мне было интересно, где я ошибаюсь.
Первые несколько строк текстового файла таковы:
and the evening and the morning were the first day. and god said let there be a firmament in the midst of the waters and let it divide the waters from the waters. and god made the firmament and divided the waters which were under the firmament from the waters which were above the firmamenin the beginning god created the heaven and the earth. and the earth was without form and void; and darkness was upon the face of the deep. and the spirit of god moved upon the face of the waters.
Что я уже пробовал:
import re filename="bibleSentences.15.txt" def getData(filename): with open(filename,'r') as f: #converting to list where each element is an individual line of text file lines=[line.rstrip() for line in f] return lines getData(filename) def normalize(filename): #converting all letters to lowercase lowercase_lines=[x.lower() for x in getData(filename)] #strip out all non-word or tab or space characters(remove punts) stripped_lines=[re.sub(r"[^\w \t]+", "", x) for x in lowercase_lines] print(stripped_lines) return stripped_lines normalize(filename) import nltk nltk.download('stopwords') from nltk.corpus import stopwords stopwords.words('english') stopwords=set(stopwords.words('english')) def countwords(filename): output_array=[] for sentence in normalize(filename): temp_list=[] for word in sentence.split(): if word.lower() not in stopwords: temp_list.append(word) output_array.append(''.join(temp_list)) print(output_array) return output_array output=countwords(filename) print(output) countwords(filename)
Richard MacCutchan
Структура вашей программы немного случайна. Поместите все функции в начале и основной код в нижней части. Также не ставьте вызовы функций после каждой функции без всякой причины.