Dataset

class sdmxthon.model.dataset.Dataset(structure: DataStructureDefinition | None = None, dataflow: DataFlowDefinition | None = None, dataset_attributes: dict | None = None, attached_attributes: dict | None = None, data=None, unique_id: str | None = None, structure_type: str | None = None)

Bases: object

An organised collection of data.

Parameters:
  • structure (class:DataStructureDefinition) – Associates the DataStructureDefinition to the DataSet

  • dataflow (class:DataFlowDefinition) – Associates the DataFlowDefinition to the Dataset

  • dataset_attributes (dict) – Contains all the attributes from the DataSet class of the Information Model. Keys allowed are “reportingBegin”, “reportingEnd”, “dataExtractionDate”, “validFrom”, “validTo”, “publicationYear”, “publicationPeriod”, “action”, “setId”, “dimensionAtObservation”

  • attached_attributes (dict) – Contains all the attributes at a Dataset level

  • data (Pandas Dataframe) – Any object compatible with pandas.DataFrame()

  • unique_id (str) – Internal attribute to use a full id in the dataset with format “AgencyID:ID(Version)”

  • structure_type (str) – Internal attribute to use structure_type in the dataset. Can only be “structure” or “dataflow”.

property attached_attributes: dict

Contains all the attributes at a Dataset level with NoSpecifiedRelationship

property data: DataFrame

Pandas DataFrame that withholds all the data

property dataflow: DataFlowDefinition

Associates the DataFlowDefinition to the Dataset

Class:

DataFlowDefinition

property dataset_attributes: dict

Contains all the attributes from the DataSet class of the Information Model Keys allowed: “reportingBegin”, “reportingEnd”, “dataExtractionDate”, “validFrom”, “validTo”, “publicationYear”, “publicationPeriod”, “action”, “setId”, “dimensionAtObservation”

Class:

dict

property dim_at_obs

Extracts the dimensionAtObservation from the dataset_attributes

fmr_validation(host: str = 'localhost', port: int = 8080, use_https: bool = False, delimiter: str = 'comma', max_retries: int = 10, interval_time: float = 0.5)

Uploads data to FMR and performs validation

Parameters:
  • host (str) – The FMR instance host (default is ‘localhost’)

  • port (int) – The FMR instance port (default is 8080)

  • use_https (bool) – A boolean indicating whether to use HTTPS (default is False)

  • delimiter (str) – The delimiter used in the CSV file (options: ‘comma’, ‘semicolon’, ‘tab’, ‘space’)

  • max_retries (int) – The maximum number of retries for checking validation status (default is 10)

  • interval_time (int) – The interval time between retries in seconds (default is 0.5)

Returns:

The validation status if successful

read_csv(path_to_csv: str, **kwargs)

Loads the data from a CSV. Check the Pandas read_csv docs Kwargs are supported

Parameters:

path_to_csv (str) – Path to CSV file

read_excel(path_to_excel: str, **kwargs)

Loads the data from an Excel file. Check the Pandas read_excel docs. Kwargs are supported

Parameters:

path_to_excel (str) – Path to Excel file

read_json(path_to_json: str, **kwargs)

Loads the data from a JSON. Check the Pandas read_json docs. Kwargs are supported

Parameters:

path_to_json (str) – Path to JSON file

set_dimension_at_observation(dim_at_obs)

Sets the dimensionAtObservation :param dim_at_obs: Dimension At Observation :type dim_at_obs: str

structural_validation()

Performs a Structural Validation on the Data.

Returns:

A list of errors as defined in the Validation Page.

property structure: DataStructureDefinition

Associates the DataStructureDefinition to the DataSet

Class:

DataStructureDefinition

property structure_type

Extracts the structure_type

to_csv(path_to_csv: str | None = None, **kwargs)

Parses the data to a CSV file. Kwargs are supported

Parameters:

path_to_csv (str) – Path to save as CSV file

to_feather(path_to_feather: str, **kwargs)

Parses the data to an Apache Feather format. Kwargs are supported.

Parameters:

path_to_feather (str) – Path to Feather file

to_json(path_to_json: str | None = None)

Parses the data using the JSON Specification from the library documentation

Parameters:

path_to_json (str) – Path to save as JSON file

to_sdmx_csv(version: int, output_path: str | None = None)

Converts a dataset to an SDMX CSV format

Parameters:
  • version – The SDMX-CSV version (1.2)

  • output_path – The path where the resulting SDMX CSV file will be saved

Returns:

The SDMX CSV data as a string if no output path is provided

Important

The SDMX CSV version must be 1 or 2. Please refer to this link for more info: https://wiki.sdmxcloud.org/SDMX-CSV

Uses pandas.Dataframe.to_csv with specific parameters to ensure the file is compatible with the SDMX-CSV standard (e.g. no index, uses header, comma delimiter, custom column names for the first two columns)

to_xml(output_path: str = '', message_type: MessageTypeEnum = MessageTypeEnum.StructureSpecificDataSet, header: Header | None = None, id_: str = 'test', test: str = 'true', prepared: datetime | None = None, sender: str = 'Unknown', receiver: str = 'Not_supplied', prettyprint=True)

Parses the data to SDMX-ML 2.1, specifying the Message_Type (StructureSpecific or Generic or Metadata)

Parameters:
  • message_type (MessageTypeEnum) – Format of the Message in SDMX-ML

  • output_path (str) – Path to save the file, defaults to ‘’

  • prettyprint (bool) – Saves the file formatted to be human-readable

  • header (Header) – Header to be written, defaults to None

Important

If the header argument is not None, rest of the below arguments will not be used

Parameters:
  • id (str) – ID of the Header, defaults to ‘test’

  • test (str) – Mark as test file, defaults to ‘true’

  • prepared (datetime) – Datetime of the preparation of the Message, defaults to current date and time

  • sender (str) – ID of the Sender, defaults to ‘Unknown’

  • receiver (str) – ID of the Receiver, defaults to ‘Not_supplied’

Returns:

StringIO object, if outputPath is ‘’

property unique_id

Extracts the unique_id