Sources

Data can be tricky to come by for these quangoes and their statistics, as many of them simply don't publish any. However, it is possible to patch various sources together into a coherent picture.

quangoes.csv
quangocrats.csv
decisions.csv
graphs.json
afuera.db
queries.sql
The government website lists all public bodies
The Public Bodies Act 2011 worked for a while
Colin Mackie's list of civil servants since 1900
The Civil Service Yearbook has been published since 1972
The Taxpayers' Alliance often publishes financial analysis
The Cabinet Office produces its own summary
Despite the mess, data.gov.uk can be useful
Generally, government data is years out of date
Occasionally good data can be found on Github

Credits

This data was collected, cleaned, and processed for The Restorationist by Alex Coppen, an English engineer based in California.

Legal oversight was provided for The Restorationist by Michael Reiners, an English Barrister based in London.

Open Source

This dataset is released under the Creative Commons Attribution 4.0 International License (CC BY 4.0). The copyright holder retains all intellectual property rights to this work; however, recipients are granted the following permissions:

To reproduce, distribute, and communicate the material via any medium or format
To adapt, transform, and build upon the material for any purpose, including commercial utilisation

The aforementioned permissions are contingent upon adherence to the following condition: appropriate attribution must be provided by clearly indicating the original creator (Alex Coppen), incorporating a link to the licence, and specifying whether modifications have been implemented. Such attribution shall be provided in a reasonable manner, but not in any way that suggests the licensor endorses you or your utilisation of the work.

For the complete terms and conditions of this licence, please refer to: https://creativecommons.org/licenses/by/4.0/legalcode

Disclaimer

Whilst considerable effort has been undertaken to ensure the accuracy and quality of this dataset, it is provided "as is" without any warranty, express or implied. The dataset may contain errors, omissions, or inconsistencies. Users are advised to exercise independent judgement when utilising this dataset and to verify any critical information independently. The copyright holder shall not be liable for any damages or losses arising from the use of, or inability to use, this data.

Usage in AI Systems

This dataset can easily be optimised for Retrieval-Augmented Generation (RAG) applications. To implement it effectively:

Convert the data into vector embeddings using a model compatible with your RAG architecture. For optimal results, ensure documents are chunked appropriately (250-1000 tokens depending on your use case).
Load the embeddings into a vector database such as Pinecone, Weaviate, or Chroma for efficient similarity searching.
When implementing the retrieval component, experiment with different similarity metrics (cosine, dot product, euclidean) to determine which works best with this dataset.
When generating responses, provide the retrieved context alongside your query to your LLM, adjusting the number of retrieved documents based on your accuracy requirements.
Monitor relevance metrics to ensure quality retrievals and continuously refine your embedding and chunking strategies as needed.
For tracking people and relationships between data points, use a graph database such as Neo4j.

Contributions are welcome and encouraged

Sources

Credits

Open Source

Disclaimer

Usage in AI Systems