Python Series: MLB Default Lineups from Rotowire

Collapse
X
 
  • Time
  • Show
Clear All
new posts
  • Waterstpub87
    SBR MVP
    • 09-09-09
    • 4102

    #1
    Python Series: MLB Default Lineups from Rotowire
    Back in the day, maybe 4 or 5 years ago, it would take me 45 minutes to update my MLB spreadsheet every night. One large task was updating line ups, specifically substituting the platooning players against left handed pitchers. At the time, there was a website that had a grid of all the batting lineups. That site was taken down, forcing me to look elsewhere for default line ups. Originally written in VBA, I wrote this script in python to capture the daily lineups from Rotowire.

    Sharing here, hope that it can help someone save some time.

    Please don't reply until sequence is finished.

    Thanks.

    Will consider doing a series of these type of automation if there is interest.
  • Waterstpub87
    SBR MVP
    • 09-09-09
    • 4102

    #2
    Step 1: Installing Python:

    I use anaconda distribution. I find that it is very easy to troubleshoot code and to use.

    Where to get it: https://www.anaconda.com/products/distribution

    Install it. and move on to the next step
    Comment
    • Waterstpub87
      SBR MVP
      • 09-09-09
      • 4102

      #3
      Step 2: Installing Selenium:

      The python package I use for web scraping is selenium. This particular script uses google chrome. If you need a different, you will have to search for the proper driver (Step 3).

      1. Search for Anaconda Prompt in your start menu
      2. It will load a CMD like screen
      3. Type pip install Selenium

      This will download the selenium package to your machine, for use in python
      Comment
      • Waterstpub87
        SBR MVP
        • 09-09-09
        • 4102

        #4
        Step 3: Getting the Chrome Driver

        You need a driver to be able to use selenium. It basically uses the browser in test mode to go to the website, and run the scraping.

        Go here to retrieve it: https://chromedriver.chromium.org/downloads

        Select the correct one for your version. If your Chrome updates itself, you will need to re-download the driver for the correct version.

        It will download a folder. I tend to copy the driver out of it, and put it in documents or something easy to get to.
        Last edited by Waterstpub87; 04-24-22, 11:23 PM.
        Comment
        • Waterstpub87
          SBR MVP
          • 09-09-09
          • 4102

          #5
          Step 4: Opening python

          I use Spyder as the development environment.

          Search for Spyder in your start menu.

          Click to open

          It will bring it up, with a temp file loaded.

          Click new file

          Save as "MLB lineup scrape" or a file name of your choosing. Save it where you want your end files to go, something like documents or a folder on your desktop.
          Paste the code below

          Edit the path to your driver file, it needs to be inside the ' ', single quotes. It does not need an extension, it should just end in chromedriver. There needs to be double slashes \\ in the path. Like:
          'C:\\user\\documents\\chromedriver'

          and click the green run button.

          Code:
          import pandas as pd
          
          from selenium import webdriver
          
          
          driver = webdriver.Chrome('Path to chrome driver')
                 
          teams = ['BAL',
                   'BOS',
                   'CWS',
                   'CLE',
                   'DET',
                   'HOU',
                   'KC',
                   'LAA',
                   'MIN',
                   'NYY',
                   'OAK',
                   'SEA',
                   'TB',
                   'TEX',
                   'TOR',
                   'ARI',
                   'ATL',
                   'CHC',
                   'CIN',
                   'COL',
                   'LAD',
                   'MIA',
                   'MIL',
                   'NYM',
                   'PHI',
                   'PIT',
                   'SD',
                   'SF',
                   'STL',
                   'WAS']
          
                            
          for x in teams:         
          
              driver.get('https://www.rotowire.com/baseball/batting-orders.php?team='+x)
              battertables = driver.find_elements_by_xpath('//ol[@class="list is-rankings pad-5-10"]')
              rbatteritmes = battertables[1].find_elements_by_xpath('.//li[@class="md-text"]')
              rbatters = []
              for z in rbatteritmes:
                  rbatters.append(z.text)
              Lbatteritems = battertables[2].find_elements_by_xpath('.//li[@class="md-text"]')
              Lbatters = []
              for z in Lbatteritems:
                  Lbatters.append(z.text)
                  
              teamvsr = pd.DataFrame({x:rbatters})
              teamvsl = pd.DataFrame({x:Lbatters})
              
              if x == 'BAL':
                  vsRight = teamvsr
              else:
                  vsRight = vsRight.join(teamvsr)
              
              if x == 'BAL':
                  VsLeft = teamvsl
              else:
                  VsLeft = VsLeft.join(teamvsl)
                  
                  
          driver.close()
          vsRight.to_csv('VsRightBattingLineups.csv')
          VsLeft.to_csv('VsLeftBattingLineups.csv')
          The script should pull up chrome, and you should see it going between the team tabs. It should output two CSV files, on with the default vs Right, the other with default vs left.
          Last edited by Waterstpub87; 04-24-22, 11:42 PM.
          Comment
          • Waterstpub87
            SBR MVP
            • 09-09-09
            • 4102

            #6
            That should be it. Feel free to reply. Let me know if you run into any issues.
            Comment
            • KVB
              SBR Aristocracy
              • 05-29-14
              • 74817

              #7
              Very nice Waterst.

              Comment
              • Optional
                Administrator
                • 06-10-10
                • 60798

                #8
                Nice easy to understand tutorial.

                Nice example to help new people get started with Python too
                .
                Comment
                • LT Profits
                  SBR Aristocracy
                  • 10-27-06
                  • 90963

                  #9
                  If you are referring to Roster Resource, grid still exists, it has moved to FanGraphs with its manager Jason Martinez
                  Comment
                  • Waterstpub87
                    SBR MVP
                    • 09-09-09
                    • 4102

                    #10
                    Originally posted by LT Profits
                    If you are referring to Roster Resource, grid still exists, it has moved to FanGraphs with its manager Jason Martinez
                    I was. Emailed back and forth with him many times.

                    I could never find the grid again. Maybe they added it back this year. The grid was not being updated, maybe in 2020. They then moved to fangraphs, and maybe did not have the grid.

                    At that point, I had built the VBA to scrape the actual pages, the underlying ones from roster resource. The fangraphs never loaded in VBA, something about how it was structured.

                    My issue with that was that the platoon was never listed as a separate line up. Require a manual fix. Which is annoying as hell.

                    This particular process does not require that, and only a true/false with an index to load the vs left handed line up when required.
                    Comment
                    • LT Profits
                      SBR Aristocracy
                      • 10-27-06
                      • 90963

                      #11
                      Originally posted by Waterstpub87
                      I was. Emailed back and forth with him many times.

                      I could never find the grid again. Maybe they added it back this year. The grid was not being updated, maybe in 2020. They then moved to fangraphs, and maybe did not have the grid.

                      At that point, I had built the VBA to scrape the actual pages, the underlying ones from roster resource. The fangraphs never loaded in VBA, something about how it was structured.

                      My issue with that was that the platoon was never listed as a separate line up. Require a manual fix. Which is annoying as hell.

                      This particular process does not require that, and only a true/false with an index to load the vs left handed line up when required.
                      Yes Jason and I used to message often, although we have not since last season. I used to help him with Opener pitchers. (i.e., Long tonight for Giants)

                      I still update my defaults manually because a major flaw with script is it does not account for players that are known to be out injured but are still on active roster, as they are still listed in default lineup. Two examples are JD Martinez currently (pending tonight) and Buxton of Minnesota last week.
                      Comment
                      • KVB
                        SBR Aristocracy
                        • 05-29-14
                        • 74817

                        #12
                        Yes, players injured but still in the default lineup is an issue.

                        With a lot of juggling these lineups and pitchers these days, it has to be addressed.

                        I often feel like everytime I get things updated and running I just create another issue where something manual must still be done.

                        Alway something manual...lol.
                        Comment
                        • Waterstpub87
                          SBR MVP
                          • 09-09-09
                          • 4102

                          #13
                          If you had an injured player list, which you could generate somewhere.

                          You could write something on the above script to replace the player or print a message to alert you to do so.

                          I don't know your level of scripting knowledge, so my apologies if an obvious solution.

                          Like I said, originally I was using Jason's grid. His grid looked like to loaded from underlying team pages. I audit this once, and discovered massive differences, I think it was after the trade deadline a few years ago. On the grid, the traded players were on the old team, and on the individual team pages, they were on the new team. I wrote him, and he said it was errored, and was moving to fangraphs. No blame, it was great thing to get for free, and its probably a lot of work, so I totally get it. Your not going to get something institutional quality for free run by one guy.

                          The old team pages were still in google docs, which fed to fangraphs. So I wrote VBA to run it to scrape from individual pages. The VBA took like 15 minutes or so to run, and VBA with scraping has a tendency to hang. On top of this, I still had to manually adjust the lineups.

                          I built this earlier to get around it. I might add the injury thing, but my feeling is that place like Rotowire are likely doing a decent enough job, and I don't have a lot of free time. It adds a bit of error to the process I'm sure, but thats why you have tolerances.

                          Edit to say I might be making this up. It was a few years ago, there was a reason I stopped using the grid. I could look at my emails from then, but it isn't important.
                          Last edited by Waterstpub87; 04-25-22, 12:31 PM.
                          Comment
                          • LT Profits
                            SBR Aristocracy
                            • 10-27-06
                            • 90963

                            #14
                            Originally posted by Waterstpub87
                            If you had an injured player list, which you could generate somewhere.

                            You could write something on the above script to replace the player or print a message to alert you to do so.

                            I don't know your level of scripting knowledge, so my apologies if an obvious solution.

                            Like I said, originally I was using Jason's grid. His grid looked like to loaded from underlying team pages. I audit this once, and discovered massive differences, I think it was after the trade deadline a few years ago. On the grid, the traded players were on the old team, and on the individual team pages, they were on the new team. I wrote him, and he said it was errored, and was moving to fangraphs. No blame, it was great thing to get for free, and its probably a lot of work, so I totally get it. Your not going to get something institutional quality for free run by one guy.

                            The old team pages were still in google docs, which fed to fangraphs. So I wrote VBA to run it to scrape from individual pages. The VBA took like 15 minutes or so to run, and VBA with scraping has a tendency to hang. On top of this, I still had to manually adjust the lineups.

                            I built this earlier to get around it. I might add the injury thing, but my feeling is that place like Rotowire are likely doing a decent enough job, and I don't have a lot of free time. It adds a bit of error to the process I'm sure, but thats why you have tolerances.

                            Edit to say I might be making this up. It was a few years ago, there was a reason I started using the grid. I could look at my emails from then, but it isn't important.
                            Thing is I use actual anticipated lineups, so for example subbing Buxton's replacement in Buxton's leadoff spot does not work when that guy bats 9th. Plate appearances is a model component.
                            Comment
                            • oilcountry99
                              SBR Wise Guy
                              • 08-29-10
                              • 707

                              #15
                              @Waterstreetpub87
                              Thanks for this, I don't use python or scrape but its a great working example. Would love to see more. Thanks for sharing.
                              Comment
                              • Waterstpub87
                                SBR MVP
                                • 09-09-09
                                • 4102

                                #16
                                Originally posted by LT Profits
                                Thing is I use actual anticipated lineups, so for example subbing Buxton's replacement in Buxton's leadoff spot does not work when that guy bats 9th. Plate appearances is a model component.
                                I like this idea. Agree with you, actual batting position is important. If the rotowire stuff is actually accurate, I hope this solves it.
                                Comment
                                • oilcountry99
                                  SBR Wise Guy
                                  • 08-29-10
                                  • 707

                                  #17
                                  Do you know if theses lineups are more accurate than Rotogrinders expected lineups?
                                  Comment
                                  • Waterstpub87
                                    SBR MVP
                                    • 09-09-09
                                    • 4102

                                    #18
                                    Originally posted by oilcountry99
                                    Do you know if theses lineups are more accurate than Rotogrinders expected lineups?
                                    No idea. I've spot checked it a few times when games start. They seem decent enough. I bet a lot of props, and I don't have a lot of missing players vs what draftkings has.

                                    Also, it is being updated atleast daily. I have a part of the excel model that checks if I have data on a player. I keep having to add new players as the lineups changes. I have to do this daily, so I assume it's pretty frequently updated.
                                    Last edited by Waterstpub87; 04-25-22, 01:34 PM.
                                    Comment
                                    • Waterstpub87
                                      SBR MVP
                                      • 09-09-09
                                      • 4102

                                      #19
                                      Originally posted by oilcountry99
                                      @Waterstreetpub87
                                      Thanks for this, I don't use python or scrape but its a great working example. Would love to see more. Thanks for sharing.
                                      Never too late to start. I've had to teach several people python at work. Was a VBA guy, become a python guy. Its like going from a shitty 1980's honda to a ferrari. Their both cars, but there is a world of difference.

                                      Appreciate the kind words from SBR luminaries such as KVB and Optional
                                      Comment
                                      • potamushippo
                                        SBR Rookie
                                        • 03-06-19
                                        • 14

                                        #20
                                        Thanks. Tried to give you points but forum throws message
                                        This user unable to receive points.
                                        Comment
                                        • Optional
                                          Administrator
                                          • 06-10-10
                                          • 60798

                                          #21
                                          Originally posted by potamushippo
                                          Thanks. Tried to give you points but forum throws message
                                          This user unable to receive points.
                                          You can only send 2 points a day as a non-pro member, I assume is what you mean.

                                          It's free to upgrade to Pro membership right now. Just click here and choose any option and submit the form and you will be approved. https://www.sportsbookreview.com/forum/sbr-pro/
                                          .
                                          Comment
                                          SBR Contests
                                          Collapse
                                          Top-Rated US Sportsbooks
                                          Collapse
                                          Working...