Does oddsportal block scrapers somehow?

Collapse
X
 
  • Time
  • Show
Clear All
new posts
  • owain
    SBR Rookie
    • 09-19-17
    • 8

    #1
    Does oddsportal block scrapers somehow?
    I'm writing a python script using Requests and BeautifulSoup to scrape very select data from oddsportal; I'm only in the early stages, but it's functional on every website I've tried except for oddsportal.

    The issue occurs during the HTML grabbing. Regardless of which specific oddsportal page I try to grab, whether it be the homepage or a league page, it returns the same oddsportal error page HTML which contains none of the relevant information, along with "Odds Portal: Page Not Found" and "The page you requested is not available", etc. I can post the entire HTML if necessary, but it really does contain nothing of interest.

    I'm wondering if this is due to oddsportal blocking the HTML grab in some way, or whether it's due to javascript or something else that I'm missing. Any insight and/or remedies would be much appreciated.
  • vampire assassin
    SBR Sharp
    • 03-09-18
    • 296

    #2
    Originally posted by owain
    I'm writing a python script using Requests and BeautifulSoup to scrape very select data from oddsportal; I'm only in the early stages, but it's functional on every website I've tried except for oddsportal.

    The issue occurs during the HTML grabbing. Regardless of which specific oddsportal page I try to grab, whether it be the homepage or a league page, it returns the same oddsportal error page HTML which contains none of the relevant information, along with "Odds Portal: Page Not Found" and "The page you requested is not available", etc. I can post the entire HTML if necessary, but it really does contain nothing of interest.

    I'm wondering if this is due to oddsportal blocking the HTML grab in some way, or whether it's due to javascript or something else that I'm missing. Any insight and/or remedies would be much appreciated.
    Oddsportal's site is formatted weirdly. I had trouble scraping it also. I'm no expert scraper, but the windows inside the display blew up my scraper.
    Comment
    • allnighter
      SBR Wise Guy
      • 10-12-17
      • 708

      #3
      I'm also very interested on learning more about Python and web scraping but I'm, at best, a noob in coding.

      You might try to ask this question on the stack overflow site. If it's a coding matter they might be able to help.
      Cheers.
      Comment
      • A4K
        SBR Hall of Famer
        • 10-08-12
        • 5243

        #4
        You guys would be much better served if you hire someone on Fiverr to do the scraping for you. Cost you a few bucks but it will save you a lot of time.
        Comment
        • allnighter
          SBR Wise Guy
          • 10-12-17
          • 708

          #5
          Originally posted by A4K
          You guys would be much better served if you hire someone on Fiverr to do the scraping for you. Cost you a few bucks but it will save you a lot of time.
          Thanks for the info A4K.
          I kinda of challenged myself to learn more coding and scraping but I think eventually will end up paying for that )
          Comment
          • Larkman
            SBR Rookie
            • 06-03-18
            • 29

            #6
            They use javascript to populate the data tables using JSON data, so if you do a straight programmatic http request it will receive the html prior to this population so the odds data won't be there. Your options are to either open up your browser network console and try to decipher the various calls to get the JSON and then replicate that programmatically, which I tried doing and is a pain in the arse, or to use a headless browser like selenium/phantomjs, which will behave like a normal browser and allow you to scrape the html post data population, which is much easier but considerably slower when scraping a large number of pages.

            I have a fully functional oddsportal scraper in C# so I'm happy to answer any more specific questions you have.
            Comment
            • Larkman
              SBR Rookie
              • 06-03-18
              • 29

              #7
              Also now that I read your OP again, the reason you get a page not found error in the html is because as standard, programmatic http requests don't have a user-agent string set, and OddsPortal returns that page whenever there is no user-agent. Set it to "Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:60.0) Gecko/20100101 Firefox/60.0" or some other valid string and you should at least not get the error anymore.
              Comment
              • turtledoves
                SBR MVP
                • 08-27-17
                • 3398

                #8
                what url are you trying to scrape? try passing Referer header

                "Referer: abc"

                Selenium with chromedriver is most futureproof, it looks like a real browser is visiting. you might have to constantly change the scripts to fake the headers and play the cat and mouse game if you go the other route.

                maybe find inspiration from this thread if its blocked in the future

                Hello, It seems that all endpoints are not available since saturday. I get an "Access Denied" error now whereas it worked great until friday 04/07/2017. Others users have the same problem than me ?...
                Last edited by turtledoves; 06-05-18, 06:54 PM.
                Comment
                • owain
                  SBR Rookie
                  • 09-19-17
                  • 8

                  #9
                  Thanks for this information. I have to be honest here and admit that I have been getting some help from my son with this. He is abroad at the moment so I'll update him as soon as he gets back.
                  Thanks again.
                  Comment
                  • arwar
                    SBR High Roller
                    • 07-09-09
                    • 208

                    #10
                    interesting about headless browser!
                    Comment
                    • owain
                      SBR Rookie
                      • 09-19-17
                      • 8

                      #11
                      I took A4K's advice, and paid someone to do it.
                      Last edited by owain; 08-21-18, 10:26 AM. Reason: not finished
                      Comment
                      • turtledoves
                        SBR MVP
                        • 08-27-17
                        • 3398

                        #12
                        Comment
                        SBR Contests
                        Collapse
                        Top-Rated US Sportsbooks
                        Collapse
                        Working...