Pradhan Mantri Gram Sadak Yojana


Ministry of Rural Development has released geo-tagged data for more than 7,00,000 rural educational,
agro and health facilities that were surveyed for the purpose of selection of roads in PMGSY-III. The
dataset can be accessed by going to the homepage of OMMAS, selecting “Other Reports” and clicking
on “Facility Details”. In that report you can download data for one state at a time. Please read the
FAQs before proceeding.

We invite government departments, academia, startups etc to use this data to inform policy and fill
gaps in rural India.

The dataset, titled “PMGSY Rural Facilities Dataset” is dedicated to government frontline engineers
who are responsible for planning, construction, monitoring and maintenance of rural roads under
PMGSY and also spearheaded the collection of this dataset.

Frequently Asked Questions about PMGSY Rural Facilities Dataset 2020

Q1. For what purpose was this data originally collected for and who collected it ?

A. The data was collected for purpose of selection/ranking roads for PMGSY-III. The data was collected
by frontline government road engineers or in some cases contracted out to third party. The selection
mechanism can be read in detail in the PMGSY-III guidelines
( )

Q2. What file format is the dataset?

The dataset can be downloaded as excel workbooks or pdf formats.

Q3. How many Facility Categories are there?

A. There are total 4 categories: Medical, Agro, Education and Transport/Admin.

Q4. What is the unit Habitation & what does the Habitation code represent?

A. “Habitation” is the lowest geographical unit which is unique to PMGSY and inherited from the first
phase of the program which had the objective of providing connectivity to all unconnected rural
habitations with an all-weather road. Some states have started using revenue villages for habitations
for PMGSY-II onwards but they unit is still referred to as habitations. The id is the internal primary key
used to identify habitations and not related to LGD or Census 2011.

Q5. Why is XYZ facility missing in the dataset?

A. Facilities belonging to urban areas are not surveyed. Otherwise, a facility in a rural habitation may
be missing in this dataset because of reasons explained in process of data collection.

Q6. Under what license has the data been released? What are the terms and conditions?

A. The data has been released under Government Open Data License, India. The terms and condition
can be read here:

Q7. How should this dataset be cited?

A. Users should cite this data as “PMGSY Rural Dataset, 2020”

Q8. Which facilities were surveyed?

A. The list of facilities which were to be surveyed as per guidelines of the scheme can be seen on Pg 37
of the PMGSY-III Guidelines ( )

Eg. High Schools, Higher Secondary Schools, Vet Hospitals, PHCs, CHCs, Bedded Hospitals, Bus Stands,
Block HQs, Panchayat HQs, Banks, Fuel Stations, Cold Storages, Agro Industries, Pack Houses,
Collection Centres etc.

Q9. How was this data collected?

A. A common mobile application was developed by C-DAC which was used by field engineers/third party
consultants to undertake the survey. A technical training on the application was conducted and
basic guidelines were provided as to how to conduct the survey. The states were allowed to interpret
the definitions of facilities as long as they remained in the overall categories defined by the PMGSY
guidelines. Eg. Some states have chosen to consider taxi stands as well in place of bus stands as taxis
are the primary mode of transport in the region concerned. Similarly, agricultural industry is a very
context specific definition. Some states have chosen to survey public as well as private facilities
whereas others have limited to public facilities only.

Q10. Can this dataset be used for comparing rural facilities across states?

A. Any comparison should be done with understanding of the following constraints: There were no strict
definitions/terminology employed, the primary data collectors are not trained enumerators in most
cases and accuracy may vary across blocks/districts/states.

Q11. Were the facilities surveyed audited?

A. A maker-checker mechanism was instituted within the state level and sample of facilities were
audited centrally for accuracy. This may not mean that all facilities surveyed are accurate or

Q12. Why is data for certain geographies entirely missing?

A. The dataset is collected for the purpose of PMGSY-III and states are eligible for the government
scheme after meeting certain conditions. Not all states/UTs have been onboarded at present.

Q13. Where can I access the digitized roads and habitations under PMGSY?

A. As of now you can only view them at That data hasn’t been opened yet.

Q14. Certain facilities don’t have lat/long attached?

A. This would mean that the survey is not complete yet in the inspected block as all facilities to be
used for the primary purpose of the program need to be geo-tagged through the common mobile

Q15. Certain facilities seem to be outside the geographic extent of the country/state/district etc.
Why is that so?

A. The common mobile application uses the GPS coordinates as provided by the mobile used by the
surveyor. The accuracy depends on the handset, warm-up period and region in which the facilities are
being geo-tagged.