For the past century, pollen analysis has served as a primary tool for inferring past changes in vegetation composition and structure (Birks et al. 2016, Edwards et al. 2017). Pollen-based inferences are supported by empirical studies comparing modern pollen assemblages with modern vegetation composition. In one approach, pollen abundances (usually percentages) for individual taxa are compared directly with quantitative estimates of abundance in surrounding vegetation (Jackson 1994, Davis 2000). This approach has been applied most frequently using spatially extensive but coarse-scale forest inventory data (Webb et al. 1981, Bradshaw and Webb 1985, Prentice & Webb 1986, Prentice et al. 1987, Paciorek & McLachlan 2009, Dawson et al. 2016, Kujawa et al. 2016). In these studies, forest composition cannot usually be estimated accurately within a 1- to 10 km radius of the individual sites owing to limited spatial density of forest inventory data. A few studies have compared vegetation composition within 50-100 m of pollen-sampling sites, but in these cases the pollen is from forest-floor assemblages (Bradshaw 1981, Jackson & Wong 1994, Jackson & Kearsley 1998) or from small forest hollows (Calcote 1995, 1998, Parshall & Calcote 2001). Largely lacking are pollen assemblage data from lake sediments paired with local forest composition, measured within 100 to 1000 m of the lake margins (Jackson 1990). This absence represents a substantial gap in ability to understand and model pollen-vegetation relationships, because lakes are the primary source of fossil-pollen sequences worldwide, and because the leptokurtic nature of pollen dispersal ensures that local vegetation has an important effect on pollen composition in sediments (Jackson 1994, Sugita 1994, 2007a, 2007b, Jackson & Lyford 1999). Here, I present a data set pairing modern pollen assemblages from 33 small lakes in the forested northeastern United States (Fig. 1) with forest composition data measured within 20, 50, 100, 500, and 1000 metres of the lake margins. This data set incorporates most of the sites used in Jackson (1990), adding 16 new sites and delivering the vegetation data by species in absolute units (i.e., total basal area), which allows various weightings and transformations to be applied. The data set should be of value to paleoecologists and forest ecologists in understanding, modeling, and validating the pollen-vegetation relationships that are at the heart of paleoecological inference.