About

Project Background

The goal of the Mosaic project is to extend the collection and distribution of historical census microdata to regions like Continental Europe and the ancient world, for which full survival of centralized records are not available. Instead, these regions typically have only partially surviving, and decentralized, records. The Mosaic project, through its many partners, plans to put the pieces back together again. Censuses offer a unique resource for comparative research because they can be integrated across time and space. Mosaic data employ coding schemes based on those employed by IPUMS International for contemporary global microcensuses and the historical microcensuses originally collected through the NAPP project. . 

About the Data

Most data released by the Mosaic project is not representative of a region or country, especially in cases where the data stems from research concentrating on one specific settlement. 

To be included in the Mosaic project, a dataset must meet baseline requirements:

  • The data source should list individual persons, preferably by name
  • The data source should list all persons of a settlement or area, not only household heads, men, or adult people
  • The data source should list individuals by residence units (houses, hearths, domestic groups, or households)

Characteristics that should be either given explicitly or possible to infer:

  • Age
  • Sex
  • Relationship to household head
  • Marital status
  • Place of enumeration
  • Year of enumeration
  • (first name)
  • (last name)
     

Most data files have been transformed into a harmonized format. Some unharmonized datafiles are also available. These characteristics are described on the data download page. There is a zipped file available for each dataset, which contains three files:

  • a CSV file with codes for each variable
  • a CSV file with value labels for each variable
  • a readme file with the citation you must use for this dataset
     

Note: These files use Unicode (UTF-8) for character encoding, the "." as the decimal separator, the semicolon as delimiter of columns, and lists the variables in the first line of the file. If your software uses the “,” as the decimal separator, you have to open the CSV file with an editor and replace the “.” with “,” before importing the file.

Harmonized files contain 30 variables, which are described in detail in a documentation file. [Download Here]

The CSV file with codes for each variable can be imported into any software able to read CSV files. Scripts are available for importing CSV files into SPSS and R. [Download both scripts here]

The CSV file with value labels for each variable can be used by persons who do not want to use statistical software for analyzing the data, but prefer to use a spreadsheet program like MS Excel or LibreOffice.org Calc.

Managing Board

Joshua R. Goldstein (University of California, Berkeley)

Siegfried Gruber (University of Graz)

Kees Mandemakers (International Institute of Social History, Amsterdam)

Péter Öri (Demographic Research Institute, Budapest)

Steven Ruggles (Minnesota Population Center, Minneapolis)

Mikolaj Szoltysek (The Cardinal Wyszyński University in Warsaw)

Project Team

Here you see the then team of the Laboratory of Historical Demography of the Max Planck Institute for Demographic Research in Rostock, where the Mosaic Project started in 2011.

Mosaic team standing on a beach

© Peter Wilhelm / MPIDR

From left: Fred Heiden, Joshua R. Goldstein, Rembrandt D. Scholz, Saskia C. Hin, Mikolaj Szoltysek, Siegfried Gruber, Martin Dinter, Sebastian Klüsener. Not on the photo: Christa Matthys and Barbara Zuber-Goldstein.

On these two photos you can see research assistants and student helpers working for the Mosaic Project in spring 2013.

The coding and harmonizing team from left: Johannes Kummerow, Antje Diebermann, Martin Dinter, Siegfried Gruber, Carolin Dettlof, Juliane Schapper, Ringo Tiedemann, Jonas Richter-Dumke, Johannes Heinsdorf, Martin Petry, Martin Peters, Mathias Voigt, Maria Bilo.

The data transcription team from left: Carolin Dettlof, Antje Diebermann, Johannes Heinsdorf, Siegfried Gruber, Sabrina Kasch, Alexander Neumann, Miriam Pretz, Saskia Gagern, Martin Dinter.

Not on the photos: Jana Amtsberg, Christine Bach, Susanne Beck, Michael Christoph, Tim Gundlach, Julia Harder, Andreas Höhn, Caroline Holz, Kevin Kockot, Katrin Krüger, Viatcheslav Obodzinskiy, Martin Stark, Thomas Stein, Nadine Tesch, Kathrin Thomsen, Stephanie Zylla.