ConnectionRefusedError when loading a dataset for the second time

And, finally, make sure to include the versions of your packages. Specifically, show us the output of qml.about().

Hello,

I use the following lines to load a pennylane dataset:

import pennylane as qml
H2datasets = qml.data.load("qchem", molname="H2O", basis="STO-3G", bondlength=1.1)

This works fine when I execute it for the first time. The dataset is saved in ‘./datasets/qchem/H2O’, as expected. If I now execute the line again, from what I understood from the documentation, the dataset should now be loaded from the saved file. However, I get the following error message:

ConnectionRefusedError                    Traceback (most recent call last)
File ~/venvs/venv_pennylaneDatasets/lib/python3.11/site-packages/aiohttp/connector.py:1025, in TCPConnector._wrap_create_connection(self, req, timeout, client_error, *args, **kwargs)
   1022     async with ceil_timeout(
   1023         timeout.sock_connect, ceil_threshold=timeout.ceil_threshold
   1024     ):
-> 1025         return await self._loop.create_connection(*args, **kwargs)
   1026 except cert_errors as exc:

File ~/.pyenv/versions/3.11.9/lib/python3.11/asyncio/base_events.py:1086, in BaseEventLoop.create_connection(self, protocol_factory, host, port, ssl, family, proto, flags, sock, local_addr, server_hostname, ssl_handshake_timeout, ssl_shutdown_timeout, happy_eyeballs_delay, interleave)
   1085 if len(exceptions) == 1:
-> 1086     raise exceptions[0]
   1087 else:
   1088     # If they all have the same str(), raise one.

File ~/.pyenv/versions/3.11.9/lib/python3.11/asyncio/base_events.py:1070, in BaseEventLoop.create_connection(self, protocol_factory, host, port, ssl, family, proto, flags, sock, local_addr, server_hostname, ssl_handshake_timeout, ssl_shutdown_timeout, happy_eyeballs_delay, interleave)
   1069 try:
-> 1070     sock = await self._connect_sock(
   1071         exceptions, addrinfo, laddr_infos)
   1072     break

File ~/.pyenv/versions/3.11.9/lib/python3.11/asyncio/base_events.py:974, in BaseEventLoop._connect_sock(self, exceptions, addr_info, local_addr_infos)
    973             raise OSError(f"no matching local address with {family=} found")
--> 974 await self.sock_connect(sock, address)
    975 return sock

File ~/.pyenv/versions/3.11.9/lib/python3.11/asyncio/selector_events.py:638, in BaseSelectorEventLoop.sock_connect(self, sock, address)
    637 try:
--> 638     return await fut
    639 finally:
    640     # Needed to break cycles when an exception occurs.

File ~/.pyenv/versions/3.11.9/lib/python3.11/asyncio/selector_events.py:678, in BaseSelectorEventLoop._sock_connect_cb(self, fut, sock, address)
    676     if err != 0:
    677         # Jump to any except clause below.
--> 678         raise OSError(err, f'Connect call failed {address}')
    679 except (BlockingIOError, InterruptedError):
    680     # socket is still registered, the callback will be retried later

ConnectionRefusedError: [Errno 111] Connect call failed ('18.66.102.109', 443)

The above exception was the direct cause of the following exception:

ClientConnectorError                      Traceback (most recent call last)
File ~/venvs/venv_pennylaneDatasets/lib/python3.11/site-packages/fsspec/implementations/http.py:422, in HTTPFileSystem._info(self, url, **kwargs)
    420 try:
    421     info.update(
--> 422         await _file_info(
    423             self.encode_url(url),
    424             size_policy=policy,
    425             session=session,
    426             **self.kwargs,
    427             **kwargs,
    428         )
    429     )
    430     if info.get("size") is not None:

File ~/venvs/venv_pennylaneDatasets/lib/python3.11/site-packages/fsspec/implementations/http.py:831, in _file_info(url, session, size_policy, **kwargs)
    830 elif size_policy == "get":
--> 831     r = await session.get(url, allow_redirects=ar, **kwargs)
    832 else:

File ~/venvs/venv_pennylaneDatasets/lib/python3.11/site-packages/aiohttp/client.py:581, in ClientSession._request(self, method, str_or_url, params, data, json, cookies, headers, skip_auto_headers, auth, allow_redirects, max_redirects, compress, chunked, expect100, raise_for_status, read_until_eof, proxy, proxy_auth, timeout, verify_ssl, fingerprint, ssl_context, ssl, server_hostname, proxy_headers, trace_request_ctx, read_bufsize, auto_decompress, max_line_size, max_field_size)
    580         assert self._connector is not None
--> 581         conn = await self._connector.connect(
    582             req, traces=traces, timeout=real_timeout
    583         )
    584 except asyncio.TimeoutError as exc:

File ~/venvs/venv_pennylaneDatasets/lib/python3.11/site-packages/aiohttp/connector.py:544, in BaseConnector.connect(self, req, traces, timeout)
    543 try:
--> 544     proto = await self._create_connection(req, traces, timeout)
    545     if self._closed:

File ~/venvs/venv_pennylaneDatasets/lib/python3.11/site-packages/aiohttp/connector.py:944, in TCPConnector._create_connection(self, req, traces, timeout)
    943 else:
--> 944     _, proto = await self._create_direct_connection(req, traces, timeout)
    946 return proto

File ~/venvs/venv_pennylaneDatasets/lib/python3.11/site-packages/aiohttp/connector.py:1257, in TCPConnector._create_direct_connection(self, req, traces, timeout, client_error)
   1256 assert last_exc is not None
-> 1257 raise last_exc

File ~/venvs/venv_pennylaneDatasets/lib/python3.11/site-packages/aiohttp/connector.py:1226, in TCPConnector._create_direct_connection(self, req, traces, timeout, client_error)
   1225 try:
-> 1226     transp, proto = await self._wrap_create_connection(
   1227         self._factory,
   1228         host,
   1229         port,
   1230         timeout=timeout,
   1231         ssl=sslcontext,
   1232         family=hinfo["family"],
   1233         proto=hinfo["proto"],
   1234         flags=hinfo["flags"],
   1235         server_hostname=server_hostname,
   1236         local_addr=self._local_addr,
   1237         req=req,
   1238         client_error=client_error,
   1239     )
   1240 except ClientConnectorError as exc:

File ~/venvs/venv_pennylaneDatasets/lib/python3.11/site-packages/aiohttp/connector.py:1033, in TCPConnector._wrap_create_connection(self, req, timeout, client_error, *args, **kwargs)
   1032     raise
-> 1033 raise client_error(req.connection_key, exc) from exc

ClientConnectorError: Cannot connect to host datasets.cloud.pennylane.ai:443 ssl:default [Connect call failed ('18.66.102.109', 443)]

The above exception was the direct cause of the following exception:

FileNotFoundError                         Traceback (most recent call last)
/home/cai/PennylaneDatasetMolecularScaling/test_loading_dataset.py in line 3
      61 #%%
      62 import pennylane as qml
----> 63 H2datasets = qml.data.load("qchem", molname="H2O", basis="STO-3G", bondlength=1.1)

File ~/venvs/venv_pennylaneDatasets/lib/python3.11/site-packages/pennylane/data/data_manager/__init__.py:279, in load(data_name, attributes, folder_path, force, num_threads, block_size, **params)
    277     for result in results.done:
    278         if result.exception() is not None:
--> 279             raise result.exception()
    281 return [Dataset.open(Path(dest_path), "a") for dest_path in dest_paths]

File ~/.pyenv/versions/3.11.9/lib/python3.11/concurrent/futures/thread.py:58, in _WorkItem.run(self)
     55     return
     57 try:
---> 58     result = self.fn(*self.args, **self.kwargs)
     59 except BaseException as exc:
     60     self.future.set_exception(exc)

File ~/venvs/venv_pennylaneDatasets/lib/python3.11/site-packages/pennylane/data/data_manager/__init__.py:132, in _download_dataset(data_path, dest, attributes, block_size, force)
    129 s3_url = f"{S3_URL}/{url_safe_datapath}"
    131 if attributes is not None or dest.exists():
--> 132     _download_partial(
    133         s3_url, dest=dest, attributes=attributes, overwrite=force, block_size=block_size
    134     )
    135 else:
    136     _download_full(s3_url, dest=dest)

File ~/venvs/venv_pennylaneDatasets/lib/python3.11/site-packages/pennylane/data/data_manager/__init__.py:84, in _download_partial(s3_url, dest, attributes, overwrite, block_size)
     82     attributes_to_fetch.update(attributes)
     83 else:
---> 84     remote_dataset = Dataset(open_hdf5_s3(s3_url, block_size=block_size))
     85     attributes_to_fetch.update(remote_dataset.attrs)
     87 if not overwrite:

File ~/venvs/venv_pennylaneDatasets/lib/python3.11/site-packages/pennylane/data/base/hdf5.py:114, in open_hdf5_s3(s3_url, block_size)
    111 memory_cache_args = {"cache_type": "mmap", "block_size": block_size}
    112 fs = fsspec.open(s3_url, **memory_cache_args)
--> 114 return h5py.File(fs.open())

File ~/venvs/venv_pennylaneDatasets/lib/python3.11/site-packages/fsspec/core.py:147, in OpenFile.open(self)
    140 def open(self):
    141     """Materialise this as a real open file without context
    142 
    143     The OpenFile object should be explicitly closed to avoid enclosed file
    144     instances persisting. You must, therefore, keep a reference to the OpenFile
    145     during the life of the file-like it generates.
    146     """
--> 147     return self.__enter__()

File ~/venvs/venv_pennylaneDatasets/lib/python3.11/site-packages/fsspec/core.py:105, in OpenFile.__enter__(self)
    102 mode = self.mode.replace("t", "").replace("b", "") + "b"
    104 try:
--> 105     f = self.fs.open(self.path, mode=mode)
    106 except FileNotFoundError as e:
    107     if has_magic(self.path):

File ~/venvs/venv_pennylaneDatasets/lib/python3.11/site-packages/fsspec/spec.py:1303, in AbstractFileSystem.open(self, path, mode, block_size, cache_options, compression, **kwargs)
   1301 else:
   1302     ac = kwargs.pop("autocommit", not self._intrans)
-> 1303     f = self._open(
   1304         path,
   1305         mode=mode,
   1306         block_size=block_size,
   1307         autocommit=ac,
   1308         cache_options=cache_options,
   1309         **kwargs,
   1310     )
   1311     if compression is not None:
   1312         from fsspec.compression import compr

File ~/venvs/venv_pennylaneDatasets/lib/python3.11/site-packages/fsspec/implementations/http.py:361, in HTTPFileSystem._open(self, path, mode, block_size, autocommit, cache_type, cache_options, size, **kwargs)
    359 kw["asynchronous"] = self.asynchronous
    360 kw.update(kwargs)
--> 361 size = size or self.info(path, **kwargs)["size"]
    362 session = sync(self.loop, self.set_session)
    363 if block_size and size:

File ~/venvs/venv_pennylaneDatasets/lib/python3.11/site-packages/fsspec/asyn.py:118, in sync_wrapper.<locals>.wrapper(*args, **kwargs)
    115 @functools.wraps(func)
    116 def wrapper(*args, **kwargs):
    117     self = obj or args[0]
--> 118     return sync(self.loop, func, *args, **kwargs)

File ~/venvs/venv_pennylaneDatasets/lib/python3.11/site-packages/fsspec/asyn.py:103, in sync(loop, func, timeout, *args, **kwargs)
    101     raise FSTimeoutError from return_result
    102 elif isinstance(return_result, BaseException):
--> 103     raise return_result
    104 else:
    105     return return_result

File ~/venvs/venv_pennylaneDatasets/lib/python3.11/site-packages/fsspec/asyn.py:56, in _runner(event, coro, result, timeout)
     54     coro = asyncio.wait_for(coro, timeout=timeout)
     55 try:
---> 56     result[0] = await coro
     57 except Exception as ex:
     58     result[0] = ex

File ~/venvs/venv_pennylaneDatasets/lib/python3.11/site-packages/fsspec/implementations/http.py:435, in HTTPFileSystem._info(self, url, **kwargs)
    432     except Exception as exc:
    433         if policy == "get":
    434             # If get failed, then raise a FileNotFoundError
--> 435             raise FileNotFoundError(url) from exc
    436         logger.debug("", exc_info=exc)
    438 return {"name": url, "size": None, **info, "type": "file"}

FileNotFoundError: https://datasets.cloud.pennylane.ai/datasets/h5/qchem/H2O/STO-3G/1.1/H2O_STO-3G_1.1.h5

Used versions:
Name: PennyLane
Version: 0.36.0
Summary: PennyLane is a cross-platform Python library for quantum computing, quantum machine learning, and quantum chemistry. Train a quantum computer the same way as a neural network.
Home-page: GitHub - PennyLaneAI/pennylane: PennyLane is a cross-platform Python library for quantum computing, quantum machine learning, and quantum chemistry. Train a quantum computer the same way as a neural network.
Author:
Author-email:
License: Apache License 2.0
Location: /home/cai/venvs/venv_pennylaneDatasets/lib/python3.11/site-packages
Requires: appdirs, autograd, autoray, cachetools, networkx, numpy, pennylane-lightning, requests, rustworkx, scipy, semantic-version, toml, typing-extensions
Required-by: PennyLane_Lightning

Platform info: Linux-5.4.0-182-generic-x86_64-with-glibc2.31
Python version: 3.11.9
Numpy version: 1.23.5
Scipy version: 1.13.1
Installed devices:

  • default.clifford (PennyLane-0.36.0)
  • default.gaussian (PennyLane-0.36.0)
  • default.mixed (PennyLane-0.36.0)
  • default.qubit (PennyLane-0.36.0)
  • default.qubit.autograd (PennyLane-0.36.0)
  • default.qubit.jax (PennyLane-0.36.0)
  • default.qubit.legacy (PennyLane-0.36.0)
  • default.qubit.tf (PennyLane-0.36.0)
  • default.qubit.torch (PennyLane-0.36.0)
  • default.qutrit (PennyLane-0.36.0)
  • default.qutrit.mixed (PennyLane-0.36.0)
  • null.qubit (PennyLane-0.36.0)
  • lightning.qubit (PennyLane_Lightning-0.36.0)

Hi @qnbc, thank you for reporting this issue! We will look into it and get back to you on this.

Hi @qnbc,

That should work as you described. After running those lines, H2datasets should be a list with one PennyLane dataset object inside it.

While we investigate, you can try manually loading by specifying the file path.

ds = qml.data.Dataset.open('./datasets/qchem/H2O/STO-3G/1.1/H2O_STO-3G_1.1.h5', mode='r')

The second time you run the code, is it from the same kernel? That is, are you running in the same jupyter notebook/ same terminal instance?

1 Like

Hello,

thank you for the feedback.

Yes, I run the code from the same kernel in both executions.

I also found that there also seems to be an issue with the saved data. Opening the data set as you proposed gives me an empty data set. While when downloading the file manually from the browser, opening the data set works.

Thanks for bringing this to our attention. We’ll take a look to see if we can reproduce it and find a solution.

In the meantime, does qml.data.Dataset.open() serve as a temporary solution?

Thank you. Yes, a work around via open() works.

That’s great to hear @qnbc !