Developer & Data Documentation
The Raras Knowledge Graph
An open, FAIR-compliant knowledge graph of rare diseases — the largest in Latin America. It links diseases, phenotypes, genes, medications, clinical trials, research and Brazilian public-health (SUS) data through open standards, and serves it as Linked Data anyone can query, download and reuse. Think of it as a “Wikidata for rare diseases”: persistent identifiers, machine-readable provenance, and a public domain (CC0) core.
Overview
The graph integrates Orphanet, the Human Phenotype Ontology (HPO), OMIM, MONDO, ClinVar, Open Targets, PubMed and ClinicalTrials.gov with Brazilian public-health data (DATASUS, CEAF, SIGTAP, PNTN, PCDT). Every disease carries Portuguese clinical content and crosswalks to international ontologies, so the data is usable both for Brazilian care pathways and for global federation (ERN, EJP RD, GA4GH, NCATS Translator).
The data is dedicated to the public domain under CC0 1.0 for Raras-originated triples, with upstream sources keeping their own open licenses (see Data sources & licensing). Full license terms are on the license page.
Quick start
No authentication, no API key. Every endpoint is public and CORS-enabled.
1 — Query with SPARQL
curl -G https://raras.org/api/sparql \
--data-urlencode 'query=SELECT ?orphaCode ?name WHERE {
?d a rnc:Disease ;
rnp:orphaCode ?orphaCode ;
rdfs:label ?name .
} LIMIT 5' \
-H 'Accept: application/sparql-results+json'2 — Get one disease as RDF
curl 'https://raras.org/api/rdf?type=Disease&format=turtle' | head -403 — Query with GraphQL
curl https://raras.org/api/graphql \
-H 'Content-Type: application/json' \
-d '{"query":"{ disease(orphaCode:\"166\"){ name mondoId genes{ symbol } } }"}'Data model
The graph has 8 entity classes and 45 properties. Classes are aligned with the Biolink Model, schema.org and OBO ontologies, so the same vocabulary works across SPARQL, GraphQL and TRAPI. Every term dereferences to its own RDF definition — see /class/Disease and /property/orphaCode, or browse the full vocabulary at /ontology.
Entity classes
| Class | Count | Description | Aligned with |
|---|---|---|---|
rnc:Disease | 10,468 | A rare disease | biolink:Disease, schema:MedicalCondition |
rnc:Phenotype | 11,652 | An HPO phenotypic feature | biolink:PhenotypicFeature |
rnc:Gene | 5,571 | A human gene | biolink:Gene |
rnc:Medication | 123 | A treatment / drug | biolink:ChemicalEntity |
rnc:ClinicalTrial | 505 | A ClinicalTrials.gov study | — |
rnc:Paper | 5,539 | A curated research paper | biolink:Publication |
rnc:ReferenceCenter | 81 | A CNES-coded treatment center | — |
rnc:Community | 10,471 | A patient support community | — |
Disease properties
| Property | Range | Meaning |
|---|---|---|
rnp:orphaCode | string | Orphanet code (primary identifier) |
rnp:mondoId | string | MONDO Disease Ontology ID |
rnp:omimId | string | OMIM ID (identifier only) |
rnp:icd10Code | string | CID-10 (Brazilian ICD-10) |
rnp:prevalenceClass | string | Orphanet prevalence class |
rnp:inheritance | string | Mode(s) of inheritance |
rnp:ageOfOnset | string | Age(s) of onset (HPO-aligned) |
rnp:clinicalDescription | string | Clinical description (PT-BR) |
rnp:activeTrialCount | integer | Number of active clinical trials |
rnp:paperCount | integer | Number of curated papers |
rnp:susCoverageScore | float | Brazilian SUS coverage score |
rnp:wikidataId | string | Wikidata QID |
rdfs:label / skos:prefLabel | lang string | Disease name (PT-BR / EN) |
owl:sameAs | IRI | Cross-references (Orphanet, MONDO, OMIM, Wikidata) |
Relationships
| Predicate | Target | Meaning |
|---|---|---|
rnp:hasPhenotype | Phenotype | Disease presents this HPO phenotype (frequency-annotated) |
rnp:associatedGene | Gene | Disease is associated with this gene |
rnp:hasMedication | Medication | Treated with this medication |
rnp:hasSUSMedication | Medication | Covered by Brazilian SUS (CEAF) |
rnp:hasTrial | ClinicalTrial | Has an associated clinical trial |
rnp:hasPaper | Paper | Has a curated research paper |
rnp:hasReferenceCenter | ReferenceCenter | Treated at this reference center |
Biolink aliases (e.g. biolink:has_phenotype, biolink:treats) are accepted as synonyms for TRAPI/Translator compatibility.
Identifiers & cross-references
Like Wikidata, every entity has a persistent, dereferenceable identifier. Resources live under https://raras.org/id/ and resolve to RDF via content negotiation (HTTP 303 to HTML, or Turtle / JSON-LD with the matching Accept header).
| Type prefix | Entity | Example RARAS ID |
|---|---|---|
| D | Disease | D00166 |
| C | ReferenceCenter | C00081 |
| P | Protocol | P00012 |
| A | Association | A00007 |
| G | Community | G00451 |
| M | Medication | M00123 |
Each disease emits owl:sameAs links to external authorities, enabling round-trip integration with the wider Linked Open Data cloud:
- Orphanet —
http://www.orpha.net/ORDO/Orphanet_{code} - MONDO —
http://purl.obolibrary.org/obo/MONDO_{id} - OMIM —
https://omim.org/entry/{id} - HPO —
http://purl.obolibrary.org/obo/HP_{id} - Wikidata —
http://www.wikidata.org/entity/{QID} - HGNC (genes) —
https://www.genenames.org/.../HGNC:{id}
Auditability & the Disease Twin
The graph is not a one-off import. Every one of the 10,468 diseases is continuously maintained by an autonomous “Disease Twin” agent that verifies facts against official sources, keeps the data fresh, discovers new sources, and proposes research hypotheses. This is what makes the dataset auditable: every fact is traceable to a source and a verification timestamp.
Atomic claims (FActScore-style) checked against authority-ranked official URLs (DOU > bvsms > gov.br > ANVISA). Confidence-graded; stale claims demoted.
A control-plane coverage matrix tracks every dimension. All 10,468 diseases are re-checked within 30 days; gene data within 180 days. Populated data stays 100% within SLA.
An agent autonomously finds specialized sources the fixed pipeline misses — disease registries, ERNs, locus-specific variant DBs — and verifies each URL before trusting it.
A co-scientist layer (generate → debate → evolve) proposes novel, testable gaps per disease, grounded in the twin’s current knowledge.
Each release is stamped with provenance (dcterms:publisher, prov:wasAttributedTo) and a unique fingerprint (dcterms:hasVersion), so any copy stays traceable. See the license page for attribution terms, and the live Disease Twin transparency dashboard for real-time coverage and freshness across all diseases.
Access methods
The same data is exposed through multiple open protocols — pick the one that fits your use case.
| Endpoint | Protocol | Best for |
|---|---|---|
| /api/sparql | SPARQL 1.1 | Researchers, semantic web, federation |
| /api/rdf | RDF / Linked Data | Ontology ingestion, triple stores |
| /api/graphql | GraphQL | App & frontend developers |
| /api/graph/public | JSON | Network visualization |
| /api/downloads | Bulk dump | Full-graph download (N-Triples, CSV) |
| /.well-known/void | VOID / DCAT | Dataset metadata & discovery |
| /api/beacon | GA4GH Beacon v2 | Variant / disease discovery |
| /api/phenopackets | GA4GH Phenopackets v2 | Clinical phenotype exchange |
SPARQL support
Supported: SELECT (DISTINCT, aggregates COUNT/SUM/AVG/MIN/MAX), ASK, CONSTRUCT, DESCRIBE, OPTIONAL, FILTER (=, !=, <, >, CONTAINS, STRSTARTS, STRENDS, REGEX, BOUND), ORDER BY, GROUP BY, LIMIT, OFFSET. Not yet supported (returns HTTP 501): UNION, MINUS, SERVICE, subqueries, property paths. Rate limit: 120 requests/min per IP; 15s query timeout. The bare endpoint (no query) returns a machine-readable service description.
Namespaces
Default prefixes are pre-declared on the SPARQL endpoint, so you can use them without a PREFIX header.
| Prefix | IRI |
|---|---|
rnc: | https://raras.org/class/ |
rnp: | https://raras.org/property/ |
rn: | https://raras.org/resource/ |
raras: | https://raras.org/id/ |
orphanet: | http://www.orpha.net/ORDO/Orphanet_ |
mondo: | http://purl.obolibrary.org/obo/MONDO_ |
hp: | http://purl.obolibrary.org/obo/HP_ |
omim: | https://omim.org/entry/ |
wd: | http://www.wikidata.org/entity/ |
rdfs:, owl:, skos:, dcterms:, void:, schema: | standard W3C / metadata vocabularies |
SPARQL examples
Diseases linked to a specific HPO phenotype
SELECT ?orphaCode ?name WHERE {
?d a rnc:Disease ;
rnp:orphaCode ?orphaCode ;
rdfs:label ?name ;
rnp:hasPhenotype ?p .
?p rnp:hpoId "HP:0001250" .
}Count diseases with active clinical trials
SELECT (COUNT(?d) AS ?count) WHERE {
?d a rnc:Disease ;
rnp:activeTrialCount ?n .
FILTER(?n > 0)
}CONSTRUCT RDF for one disease
CONSTRUCT {
?d rdfs:label ?name ;
rnp:orphaCode ?orphaCode ;
rnp:mondoId ?mondoId .
} WHERE {
?d a rnc:Disease ;
rnp:orphaCode "166" ;
rdfs:label ?name .
OPTIONAL { ?d rnp:mondoId ?mondoId }
}Check existence (ASK)
ASK { ?d rnp:orphaCode "166" }GraphQL
The GraphQL endpoint at /api/graphql has introspection enabled. Key queries: searchDiseases, disease(orphaCode), diseaseByMondo, diseasesByPhenotypes (differential-diagnosis ranking), similarDiseases (vector similarity), searchPhenotypes, referenceCenters, stats and serviceInfo.
{
disease(orphaCode: "166") {
name
mondoId
cid10
phenotypes(limit: 5) { phenotype { hpoId name } frequency }
genes { symbol hgncId }
susCoverage { coverageScore ceafMedicationCount }
}
}Formats & content negotiation
RDF endpoints negotiate by Accept header or a ?format= query parameter:
| Format | Accept header | ?format= |
|---|---|---|
| Turtle | text/turtle | turtle |
| JSON-LD | application/ld+json | jsonld |
| N-Triples | application/n-triples | nt |
| SPARQL Results JSON | application/sparql-results+json | (default for SELECT) |
| CSV / TSV | text/csv, text/tab-separated-values | — |
Data sources & licensing
Per-partition licensing follows the Wikidata / Open Targets convention: the Raras-originated layer is CC0, upstream sources keep their licenses. The combined dataset / bulk dump is distributed under CC-BY 4.0 (it embeds CC-BY sources, so attribution is required), while Raras-originated triples remain CC0. These terms are emitted machine-readably in the VOID descriptor as void:subset + dcterms:license. See the license page for details.
| Partition | License |
|---|---|
| Raras-originated (IDs, crosswalks, PT translations, SUS integration) | CC0 1.0 |
| Orphanet (nomenclature, cross-refs) | CC BY 4.0 |
| Human Phenotype Ontology (HPO) | CC BY 4.0 |
| MONDO | CC BY 4.0 |
| Wikidata | CC0 1.0 |
| Open Targets | CC0 1.0 |
| ClinVar | Public domain |
| PubMed (metadata) | Public domain |
| OMIM | Identifiers only — no text redistributed |
Rate limits & versioning
- Rate limit: 120 requests/min per IP on the SPARQL endpoint; 15s query timeout.
- CORS: all read endpoints send
Access-Control-Allow-Origin: *. - Versioning: dataset version and last-modified date are published in the VOID descriptor (
dcterms:modified). - Read-only: SPARQL UPDATE (INSERT/DELETE/LOAD) is rejected with HTTP 403.
How to cite
dcterms:publisher, dcterms:hasVersion).When reusing the dataset, please cite:
Raras — Brazilian Rare Disease Knowledge Graph (RarasNet).
https://raras.org — Raras-originated triples CC0 1.0; combined dataset CC-BY 4.0.
Upstream sources retain their own licenses (see /license).Questions, federation requests or data corrections: [email protected]. Full license terms: /license.