aurps Ответов: 0

Как исправить функцию подсчета слов


функция count words должна удалить все стоп-слова, но при запуске кода я получаю список строк со стоп-словами, поэтому мне было интересно, где я ошибаюсь.


Первые несколько строк текстового файла таковы:
and the evening and the morning were the first day.
and god said let there be a firmament in the midst of the waters and let it divide the waters from the waters.
and god made the firmament and divided the waters which were under the firmament from the waters which were above the firmamenin the beginning god created the heaven and the earth.
and the earth was without form and void; and darkness was upon the face of the deep.
and the spirit of god moved upon the face of the waters.


Что я уже пробовал:

import re
filename="bibleSentences.15.txt"

def getData(filename):
  with open(filename,'r') as f:
    #converting to list where each element is an individual line of text file
    lines=[line.rstrip() for line in f]
    return lines
getData(filename)

def normalize(filename):
    #converting all letters to lowercase
    lowercase_lines=[x.lower() for x in getData(filename)]
    #strip out all non-word or tab or space characters(remove punts)
    stripped_lines=[re.sub(r"[^\w \t]+", "", x) for x in lowercase_lines]
    print(stripped_lines)
    return stripped_lines
normalize(filename)

import nltk
nltk.download('stopwords')
from nltk.corpus import stopwords
stopwords.words('english')
stopwords=set(stopwords.words('english'))

def countwords(filename):
  output_array=[]
  for sentence in normalize(filename):
    temp_list=[]
    for word in sentence.split():
      if word.lower() not in stopwords:
        temp_list.append(word)
    output_array.append(''.join(temp_list))
    print(output_array)
    return output_array
output=countwords(filename)
print(output)
countwords(filename)

Richard MacCutchan

Структура вашей программы немного случайна. Поместите все функции в начале и основной код в нижней части. Также не ставьте вызовы функций после каждой функции без всякой причины.

0 Ответов