[FIXED] Verschiedene Selektoren auf der Webseite durchlaufen, um sie in einem großen DF zu speichern

Ausgabe

Ich kam heute mit einer Frage zu diesem Projekt, die super schnell beantwortet wurde, also bin ich wieder da. Der folgende Code kratzt durch die bereitgestellte Website, ruft die Daten ab und fügt eine Spalte für die Instanz der Tabelle hinzu, die geschabt wird. Der nächste Kampf, dem ich damit gegenüberstehe, besteht darin, alle Instanzen der Spielneuheit mit einer Spalte in die big_df zu laden, um zu replizieren, was das Drop-down-Menü der Spielneuheit derzeit anzeigt. Wenn mir jemand beim letzten Teil meines Puzzles helfen könnte, wäre ich sehr dankbar.

https://www.fantasypros.com/daily-fantasy/nba/fanduel-defense-vs-position.php

from selenium import webdriver
from selenium.webdriver.chrome.service import Service
from selenium.webdriver.chrome.options import Options
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
import time as t
import pandas as pd 

pd.set_option('display.max_columns', None)
pd.set_option('display.max_colwidth', None)
big_df = pd.DataFrame()
chrome_options = Options()
chrome_options.add_argument("--no-sandbox")
chrome_options.add_argument('disable-notifications')
chrome_options.add_argument("window-size=1280,720")

webdriver_service = Service(r'chromedriver\chromedriver') ## path to where you saved chromedriver binary
driver = webdriver.Chrome(service=webdriver_service, options=chrome_options)
wait = WebDriverWait(driver, 20)
url = "https://www.fantasypros.com/daily-fantasy/nba/fanduel-defense-vs-position.php"
driver.get(url)
sleep(60)


tables_list = wait.until(EC.presence_of_all_elements_located((By.XPATH, '//ul[@class="pills pos-filter pull-left"]/li')))

for x in tables_list:
    x.click()
    print('selected', x.text)
    t.sleep(2)
    table = wait.until(EC.element_to_be_clickable((By.XPATH, '//table[@id="data-table"]')))
    df = pd.read_html(table.get_attribute('outerHTML'))[0]
    df['Category'] = x.text.strip()
    big_df = pd.concat([big_df, df], axis=0, ignore_index=True)
    print('done, moving to next table')
print(big_df)
big_df.to_csv('fanduel.csv')

Lösung

So können Sie Ihr Endziel erreichen:

from selenium import webdriver
from selenium.webdriver.chrome.service import Service
from selenium.webdriver.chrome.options import Options
from selenium.webdriver.support.ui import Select
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
import time as t
import pandas as pd 

pd.set_option('display.max_columns', None)
pd.set_option('display.max_colwidth', None)
big_df = pd.DataFrame()
chrome_options = Options()
chrome_options.add_argument("--no-sandbox")
chrome_options.add_argument('disable-notifications')
chrome_options.add_argument("window-size=1280,720")

webdriver_service = Service("chromedriver/chromedriver") ## path to where you saved chromedriver binary
driver = webdriver.Chrome(service=webdriver_service, options=chrome_options)
wait = WebDriverWait(driver, 20)
url = "https://www.fantasypros.com/daily-fantasy/nba/fanduel-defense-vs-position.php"
driver.get(url)

select_recency_options = [x.text for x in wait.until(EC.presence_of_all_elements_located((By.XPATH, '//select[@class="game-change"]/option')))]
for option in select_recency_options:
    select_recency = Select(WebDriverWait(driver, 20).until(EC.element_to_be_clickable((By.XPATH, '//select[@class="game-change"]'))))
    select_recency.select_by_visible_text(option)
    print('selected', option)
    t.sleep(2)

    tables_list = wait.until(EC.presence_of_all_elements_located((By.XPATH, '//ul[@class="pills pos-filter pull-left"]/li')))

    for x in tables_list:
        x.click()
        print('selected', x.text)
        t.sleep(2)
        table = wait.until(EC.element_to_be_clickable((By.XPATH, '//table[@id="data-table"]')))
        df = pd.read_html(table.get_attribute('outerHTML'))[0]
        df['Category'] = x.text.strip()
        df['Recency'] = option
        big_df = pd.concat([big_df, df], axis=0, ignore_index=True)
        print('done, moving to next table')
display(big_df)
big_df.to_csv('fanduel.csv')

Das Ergebnis ist ein (größerer) Datenrahmen:

    Team    PTS REB AST 3PM STL BLK TO  FD PTS  Category    Recency
0   HOUHouston Rockets  23.54   9.10    5.10    2.54    1.88    1.15    2.65    48.55   ALL Season
1   OKCOklahoma City Thunder    22.22   9.61    5.19    2.70    1.67    1.18    2.52    47.57   ALL Season
2   PORPortland Trail Blazers   22.96   8.92    5.31    2.74    1.63    0.99    2.65    46.84   ALL Season
3   SACSacramento Kings 23.00   9.10    5.03    2.58    1.61    0.95    2.50    46.65   ALL Season
4   ORLOrlando Magic    22.35   9.39    4.94    2.62    1.57    1.04    2.50    46.36   ALL Season
... ... ... ... ... ... ... ... ... ... ... ...
715 TORToronto Raptors  23.33   13.97   2.77    0.57    0.84    1.88    3.38    49.03   C   Last 30
716 NYKNew York Knicks  19.78   15.40   2.94    0.53    0.90    1.92    2.17    48.96   C   Last 30
717 BKNBrooklyn Nets    19.69   13.60   3.16    0.86    1.10    2.25    2.06    48.74   C   Last 30
718 BOSBoston Celtics   17.79   11.95   3.75    0.41    1.66    1.80    2.54    45.60   C   Last 30
719 MIAMiami Heat   17.41   14.19   2.16    0.50    1.01    1.52    1.75    43.52   C   Last 30
720 rows × 11 columns


Beantwortet von –
Barry der Platipus


Antwort geprüft von –
Terry (FixError Volunteer)

0 Shares:
Leave a Reply

Your email address will not be published. Required fields are marked *

You May Also Like