PROBLEMS ENCOUNTERED DURING SEASON RECONSTRUCTION of GAME BY GAME RECORD OF A HISTORICAL BASEBALL SEASON
In using newspaper records to reconstruct the principal statistics of a baseball team’s season (wins and losses, primarily, scores, pitching records, etc.) for the purpose of obtaining a day-by-day record of that team’s progress, there are certain challenges to overcome.
For example, in completing my initial “rough draft” of the 1911 season of the Minneapolis Millers for an upcoming report, I found that my primary gauge for reflecting how successful I’ve been in being accurate, win/loss totals, was not justified with the official totals provided in various publications. The official totals for the Millers as they finished nabbing their second straight American Association championship in 1911, was 99 wins, 66 losses. My game-by-game record arrived at a total of 98 wins and 67 losses. But I was so careful!
In the past I have reconstructed a baseball season by using microfilm records photocopied from an original source, most notably Sporting Life magazine which was a comprehensive sporting publication giving each box score for every season through, roughly, 1923.
My procedure involves some pretty pure data entry. I start by setting up a spreadsheet with the categories I need, taking each box score and recording the results into my spreadsheet. I organize my record by separating home stands from road trips. This makes it easier for me to go back and check my record for errors.
While recently revisiting my complete collection of box scores for the 1911 season in my attempt to correct the record, I thought of a variety of factors to consider which need to be checked to make sure a clerical error did not cause the document’s errors. It’s happened before. Other problems arise from incomplete reporting in the original record. Here is a summary of the items I’ve come up with which could be at the root of an error within the document:
1. Sequence of numbers. Game numbers (game #1 for opening day, game #2 for the following game, etc.), win numbers, loss numbers and dates can all be a source of a sequence error and should be checked early in the process of discovering an error. I’ve found that it’s best to do this after the document is printed out, then examine the document for any potential sequence problems on a sheet-by-sheet basis. One place where an error in sequence can happen is between sheets. Always check the numbers at the bottom of Sheet A and compare them with those at the top of Sheet B to make sure.
2. Proper Credit. In recording a win it is possible that the win/loss gets tallied in the wrong column. As the likelihood of this happening is not as great as a typical clerical error, this is something that can be done later in the accuracy check process, but it should not be overlooked. Here is where it is especially important to record summary figures at the end of each home stand or road trip, for example, a line representing the total number for each column should culminate each section, and those numbers should reconcile with those in the sample.
3. Wins are Wins. During the data entry process, it is possible that a win was given to the wrong team, either in the actual record (always check the line score for each game to make sure) or simply by mental error. Always check as you go along to make sure you’ve assigned the win or loss to the corresponding team. If the score given in the box score indicates a win for Milwaukee, make sure Milwaukee is given credit for the win. Same for Losses. Otherwise the record will not reconcile properly. Some of these ideas may seem redundant, but experience will prove that each time you double-check or triple-check your document you will be enhancing its accuracy and avoiding future errors.
4. Missing Box Score. If you have undertaken the painstaking process of double-checking your document for each of the above types of errors and your document’s figures continue to be different from the official record, it is likely that the original record failed to include a box score, perhaps the second game of a double-header or an abbreviated contest. Murphy’s law dictates that you will likely find the error somewhere toward the last two or three weeks of the season, after you’ve spent hours examining the first dozen weeks! In such cases as these you will need to consult a different source and compare the team’s won/loss record in the standings with the record you’ve arrived at.
(NOTE: when a won/loss record is tallied within a spreadsheet, it is best to record it as a numeral in sequential fashion, rather than as a “check” or an “x.” For example, if Milwaukee wins on Opening Day, the numeral “one” is placed in the win column. If the win the next day, the numeral “two” is recorded, and so on. This allows the data to be examined more readily. The disadvantage is that the record will have to be corrected once an error is discovered, but it is worth the time that takes. Such an approach is the equivalent of dropping bread crumbs as you walk through the woods…it’s nice to be able to trace your steps if you need to.)