Как я могу динамически очищать данные с разных веб-сайтов на основе входных данных
I am trying to build a system, which when given an input, would return relevant specific information about it by scraping the web (For example: given a software name, output information about its releases). How to go about building a scraper for such a system?
Что я уже пробовал:
I have done web scraping before using Beautiful Soup. But, that pertained to getting information from a single specific website. In this case, I might have to scrape websites of dynamically built URLs (like wiki pages of the input software or official product pages shown in google search results) and different software websites/wiki have different structures to display releases data. Are there any other approaches to get such information about different softwares in a structured way?
Richard MacCutchan
Точно так же, как выскабливание одного сайта, но вам нужно будет найти список адресов веб-сайтов откуда-то.