Dataset
- class sdmxthon.model.dataset.Dataset(structure: DataStructureDefinition | None = None, dataflow: DataFlowDefinition | None = None, dataset_attributes: dict | None = None, attached_attributes: dict | None = None, data=None, unique_id: str | None = None, structure_type: str | None = None)
Bases:
object
An organised collection of data.
- Parameters:
structure (class:DataStructureDefinition) – Associates the DataStructureDefinition to the DataSet
dataflow (class:DataFlowDefinition) – Associates the DataFlowDefinition to the Dataset
dataset_attributes (dict) – Contains all the attributes from the DataSet class of the Information Model. Keys allowed are “reportingBegin”, “reportingEnd”, “dataExtractionDate”, “validFrom”, “validTo”, “publicationYear”, “publicationPeriod”, “action”, “setId”, “dimensionAtObservation”
attached_attributes (dict) – Contains all the attributes at a Dataset level
data (Pandas Dataframe) – Any object compatible with pandas.DataFrame()
unique_id (str) – Internal attribute to use a full id in the dataset with format “AgencyID:ID(Version)”
structure_type (str) – Internal attribute to use structure_type in the dataset. Can only be “structure” or “dataflow”.
- property attached_attributes: dict
Contains all the attributes at a Dataset level with NoSpecifiedRelationship
- property data: DataFrame
Pandas DataFrame that withholds all the data
- property dataflow: DataFlowDefinition
Associates the DataFlowDefinition to the Dataset
- Class:
DataFlowDefinition
- property dataset_attributes: dict
Contains all the attributes from the DataSet class of the Information Model Keys allowed: “reportingBegin”, “reportingEnd”, “dataExtractionDate”, “validFrom”, “validTo”, “publicationYear”, “publicationPeriod”, “action”, “setId”, “dimensionAtObservation”
- Class:
dict
- property dim_at_obs
Extracts the dimensionAtObservation from the dataset_attributes
- fmr_validation(host: str = 'localhost', port: int = 8080, use_https: bool = False, delimiter: str = 'comma', max_retries: int = 10, interval_time: float = 0.5)
Uploads data to FMR and performs validation
- Parameters:
host (str) – The FMR instance host (default is ‘localhost’)
port (int) – The FMR instance port (default is 8080)
use_https (bool) – A boolean indicating whether to use HTTPS (default is False)
delimiter (str) – The delimiter used in the CSV file (options: ‘comma’, ‘semicolon’, ‘tab’, ‘space’)
max_retries (int) – The maximum number of retries for checking validation status (default is 10)
interval_time (int) – The interval time between retries in seconds (default is 0.5)
- Returns:
The validation status if successful
- read_csv(path_to_csv: str, **kwargs)
Loads the data from a CSV. Check the Pandas read_csv docs Kwargs are supported
- Parameters:
path_to_csv (str) – Path to CSV file
- read_excel(path_to_excel: str, **kwargs)
Loads the data from an Excel file. Check the Pandas read_excel docs. Kwargs are supported
- Parameters:
path_to_excel (str) – Path to Excel file
- read_json(path_to_json: str, **kwargs)
Loads the data from a JSON. Check the Pandas read_json docs. Kwargs are supported
- Parameters:
path_to_json (str) – Path to JSON file
- set_dimension_at_observation(dim_at_obs)
Sets the dimensionAtObservation :param dim_at_obs: Dimension At Observation :type dim_at_obs: str
- structural_validation()
Performs a Structural Validation on the Data.
- Returns:
A list of errors as defined in the Validation Page.
- property structure: DataStructureDefinition
Associates the DataStructureDefinition to the DataSet
- Class:
DataStructureDefinition
- property structure_type
Extracts the structure_type
- to_csv(path_to_csv: str | None = None, **kwargs)
Parses the data to a CSV file. Kwargs are supported
- Parameters:
path_to_csv (str) – Path to save as CSV file
- to_feather(path_to_feather: str, **kwargs)
Parses the data to an Apache Feather format. Kwargs are supported.
- Parameters:
path_to_feather (str) – Path to Feather file
- to_json(path_to_json: str | None = None)
Parses the data using the JSON Specification from the library documentation
- Parameters:
path_to_json (str) – Path to save as JSON file
- to_sdmx_csv(version: int, output_path: str | None = None)
Converts a dataset to an SDMX CSV format
- Parameters:
version – The SDMX-CSV version (1.2)
output_path – The path where the resulting SDMX CSV file will be saved
- Returns:
The SDMX CSV data as a string if no output path is provided
Important
The SDMX CSV version must be 1 or 2. Please refer to this link for more info: https://wiki.sdmxcloud.org/SDMX-CSV
Uses pandas.Dataframe.to_csv with specific parameters to ensure the file is compatible with the SDMX-CSV standard (e.g. no index, uses header, comma delimiter, custom column names for the first two columns)
- to_xml(output_path: str = '', message_type: MessageTypeEnum = MessageTypeEnum.StructureSpecificDataSet, header: Header | None = None, id_: str = 'test', test: str = 'true', prepared: datetime | None = None, sender: str = 'Unknown', receiver: str = 'Not_supplied', prettyprint=True)
Parses the data to SDMX-ML 2.1, specifying the Message_Type (StructureSpecific or Generic or Metadata)
- Parameters:
message_type (MessageTypeEnum) – Format of the Message in SDMX-ML
output_path (str) – Path to save the file, defaults to ‘’
prettyprint (bool) – Saves the file formatted to be human-readable
header (Header) – Header to be written, defaults to None
Important
If the header argument is not None, rest of the below arguments will not be used
- Parameters:
id (str) – ID of the Header, defaults to ‘test’
test (str) – Mark as test file, defaults to ‘true’
prepared (datetime) – Datetime of the preparation of the Message, defaults to current date and time
sender (str) – ID of the Sender, defaults to ‘Unknown’
receiver (str) – ID of the Receiver, defaults to ‘Not_supplied’
- Returns:
StringIO object, if outputPath is ‘’
- property unique_id
Extracts the unique_id