When I run this command I get an error. Can someone help?

@Jack_Lindsay_Kraken1 When I run this command ,
new_frame = frame.compute() , it continue shows : Unable to allocate 3.21 GiB for an array with shape (2, 215599182) and data type object Do you know how I can fix this problem? Thanks.

Try importing Client

code:

from dask.distributed import Client

client = Client(process=False)

then run:

new_df = client.persist(df_to_convert)

here is the documentation:

https://distributed.dask.org/en/latest/manage-computation.html

let me know what that outputs

Traceback (most recent call last):
  File "C:\Anaconda3\lib\site-packages\distributed\utils.py", line 665, in log_errors
    yield
  File "C:\Anaconda3\lib\site-packages\distributed\scheduler.py", line 2135, in remove_worker
    address = self.coerce_address(address)
  File "C:\Anaconda3\lib\site-packages\distributed\scheduler.py", line 4844, in coerce_address
    raise TypeError("addresses should be strings or tuples, got %r" % (addr,))
TypeError: addresses should be strings or tuples, got None
distributed.core - ERROR - addresses should be strings or tuples, got None
Traceback (most recent call last):
  File "C:\Anaconda3\lib\site-packages\distributed\core.py", line 408, in handle_comm
    result = handler(comm, **msg)
  File "C:\Anaconda3\lib\site-packages\distributed\scheduler.py", line 2135, in remove_worker
    address = self.coerce_address(address)
  File "C:\Anaconda3\lib\site-packages\distributed\scheduler.py", line 4844, in coerce_address
    raise TypeError("addresses should be strings or tuples, got %r" % (addr,))
TypeError: addresses should be strings or tuples, got None
distributed.utils - ERROR - addresses should be strings or tuples, got None
Traceback (most recent call last):
  File "C:\Anaconda3\lib\site-packages\distributed\utils.py", line 665, in log_errors
    yield
  File "C:\Anaconda3\lib\site-packages\distributed\scheduler.py", line 2135, in remove_worker
    address = self.coerce_address(address)
  File "C:\Anaconda3\lib\site-packages\distributed\scheduler.py", line 4844, in coerce_address
    raise TypeError("addresses should be strings or tuples, got %r" % (addr,))
TypeError: addresses should be strings or tuples, got None
distributed.core - ERROR - addresses should be strings or tuples, got None
Traceback (most recent call last):
  File "C:\Anaconda3\lib\site-packages\distributed\core.py", line 408, in handle_comm
    result = handler(comm, **msg)
  File "C:\Anaconda3\lib\site-packages\distributed\scheduler.py", line 2135, in remove_worker
    address = self.coerce_address(address)
  File "C:\Anaconda3\lib\site-packages\distributed\scheduler.py", line 4844, in coerce_address
    raise TypeError("addresses should be strings or tuples, got %r" % (addr,))
TypeError: addresses should be strings or tuples, got None
tornado.application - ERROR - Exception in callback functools.partial(<bound method IOLoop._discard_future_result of <zmq.eventloop.ioloop.ZMQIOLoop object at 0x000001AF5F407D08>>, <Task finished coro=<Nanny._on_exit() done, defined at C:\Anaconda3\lib\site-packages\distributed\nanny.py:387> exception=TypeError('addresses should be strings or tuples, got None')>)
Traceback (most recent call last):
  File "C:\Anaconda3\lib\site-packages\tornado\ioloop.py", line 743, in _run_callback
    ret = callback()
  File "C:\Anaconda3\lib\site-packages\tornado\ioloop.py", line 767, in _discard_future_result
    future.result()
  File "C:\Anaconda3\lib\site-packages\distributed\nanny.py", line 390, in _on_exit
    await self.scheduler.unregister(address=self.worker_address)
  File "C:\Anaconda3\lib\site-packages\distributed\core.py", line 757, in send_recv_from_rpc
    result = await send_recv(comm=comm, op=key, **kwargs)
  File "C:\Anaconda3\lib\site-packages\distributed\core.py", line 556, in send_recv
    raise exc.with_traceback(tb)
  File "C:\Anaconda3\lib\site-packages\distributed\core.py", line 408, in handle_comm
    result = handler(comm, **msg)
  File "C:\Anaconda3\lib\site-packages\distributed\scheduler.py", line 2135, in remove_worker
    address = self.coerce_address(address)
  File "C:\Anaconda3\lib\site-packages\distributed\scheduler.py", line 4844, in coerce_address
    raise TypeError("addresses should be strings or tuples, got %r" % (addr,))
TypeError: addresses should be strings or tuples, got None
tornado.application - ERROR - Exception in callback functools.partial(<bound method IOLoop._discard_future_result of <zmq.eventloop.ioloop.ZMQIOLoop object at 0x000001AF5F407D08>>, <Task finished coro=<Nanny._on_exit() done, defined at C:\Anaconda3\lib\site-packages\distributed\nanny.py:387> exception=TypeError('addresses should be strings or tuples, got None')>)
Traceback (most recent call last):
  File "C:\Anaconda3\lib\site-packages\tornado\ioloop.py", line 743, in _run_callback
    ret = callback()
  File "C:\Anaconda3\lib\site-packages\tornado\ioloop.py", line 767, in _discard_future_result
    future.result()
  File "C:\Anaconda3\lib\site-packages\distributed\nanny.py", line 390, in _on_exit
    await self.scheduler.unregister(address=self.worker_address)
  File "C:\Anaconda3\lib\site-packages\distributed\core.py", line 757, in send_recv_from_rpc
    result = await send_recv(comm=comm, op=key, **kwargs)
  File "C:\Anaconda3\lib\site-packages\distributed\core.py", line 556, in send_recv
    raise exc.with_traceback(tb)
  File "C:\Anaconda3\lib\site-packages\distributed\core.py", line 408, in handle_comm
    result = handler(comm, **msg)
  File "C:\Anaconda3\lib\site-packages\distributed\scheduler.py", line 2135, in remove_worker
    address = self.coerce_address(address)
  File "C:\Anaconda3\lib\site-packages\distributed\scheduler.py", line 4844, in coerce_address
    raise TypeError("addresses should be strings or tuples, got %r" % (addr,))
TypeError: addresses should be strings or tuples, got None
distributed.utils - ERROR - addresses should be strings or tuples, got None
Traceback (most recent call last):
  File "C:\Anaconda3\lib\site-packages\distributed\utils.py", line 665, in log_errors
    yield
  File "C:\Anaconda3\lib\site-packages\distributed\scheduler.py", line 2135, in remove_worker
    address = self.coerce_address(address)
  File "C:\Anaconda3\lib\site-packages\distributed\scheduler.py", line 4844, in coerce_address
    raise TypeError("addresses should be strings or tuples, got %r" % (addr,))
TypeError: addresses should be strings or tuples, got None
distributed.core - ERROR - addresses should be strings or tuples, got None
Traceback (most recent call last):
  File "C:\Anaconda3\lib\site-packages\distributed\core.py", line 408, in handle_comm
    result = handler(comm, **msg)
  File "C:\Anaconda3\lib\site-packages\distributed\scheduler.py", line 2135, in remove_worker
    address = self.coerce_address(address)
  File "C:\Anaconda3\lib\site-packages\distributed\scheduler.py", line 4844, in coerce_address
    raise TypeError("addresses should be strings or tuples, got %r" % (addr,))
TypeError: addresses should be strings or tuples, got None
distributed.utils - ERROR - addresses should be strings or tuples, got None
Traceback (most recent call last):
  File "C:\Anaconda3\lib\site-packages\distributed\utils.py", line 665, in log_errors
    yield
  File "C:\Anaconda3\lib\site-packages\distributed\scheduler.py", line 2135, in remove_worker
    address = self.coerce_address(address)
  File "C:\Anaconda3\lib\site-packages\distributed\scheduler.py", line 4844, in coerce_address
    raise TypeError("addresses should be strings or tuples, got %r" % (addr,))
TypeError: addresses should be strings or tuples, got None
distributed.core - ERROR - addresses should be strings or tuples, got None
Traceback (most recent call last):
  File "C:\Anaconda3\lib\site-packages\distributed\core.py", line 408, in handle_comm
    result = handler(comm, **msg)
  File "C:\Anaconda3\lib\site-packages\distributed\scheduler.py", line 2135, in remove_worker
    address = self.coerce_address(address)
  File "C:\Anaconda3\lib\site-packages\distributed\scheduler.py", line 4844, in coerce_address
    raise TypeError("addresses should be strings or tuples, got %r" % (addr,))
TypeError: addresses should be strings or tuples, got None
tornado.application - ERROR - Exception in callback functools.partial(<bound method IOLoop._discard_future_result of <zmq.eventloop.ioloop.ZMQIOLoop object at 0x000001AF5F407D08>>, <Task finished coro=<Nanny._on_exit() done, defined at C:\Anaconda3\lib\site-packages\distributed\nanny.py:387> exception=TypeError('addresses should be strings or tuples, got None')>)
Traceback (most recent call last):
  File "C:\Anaconda3\lib\site-packages\tornado\ioloop.py", line 743, in _run_callback
    ret = callback()
  File "C:\Anaconda3\lib\site-packages\tornado\ioloop.py", line 767, in _discard_future_result
    future.result()
  File "C:\Anaconda3\lib\site-packages\distributed\nanny.py", line 390, in _on_exit
    await self.scheduler.unregister(address=self.worker_address)
  File "C:\Anaconda3\lib\site-packages\distributed\core.py", line 757, in send_recv_from_rpc
    result = await send_recv(comm=comm, op=key, **kwargs)
  File "C:\Anaconda3\lib\site-packages\distributed\core.py", line 556, in send_recv
    raise exc.with_traceback(tb)
  File "C:\Anaconda3\lib\site-packages\distributed\core.py", line 408, in handle_comm
    result = handler(comm, **msg)
  File "C:\Anaconda3\lib\site-packages\distributed\scheduler.py", line 2135, in remove_worker
    address = self.coerce_address(address)
  File "C:\Anaconda3\lib\site-packages\distributed\scheduler.py", line 4844, in coerce_address
    raise TypeError("addresses should be strings or tuples, got %r" % (addr,))
TypeError: addresses should be strings or tuples, got None
tornado.application - ERROR - Exception in callback functools.partial(<bound method IOLoop._discard_future_result of <zmq.eventloop.ioloop.ZMQIOLoop object at 0x000001AF5F407D08>>, <Task finished coro=<Nanny._on_exit() done, defined at C:\Anaconda3\lib\site-packages\distributed\nanny.py:387> exception=TypeError('addresses should be strings or tuples, got None')>)
Traceback (most recent call last):
  File "C:\Anaconda3\lib\site-packages\tornado\ioloop.py", line 743, in _run_callback
    ret = callback()
  File "C:\Anaconda3\lib\site-packages\tornado\ioloop.py", line 767, in _discard_future_result
    future.result()
  File "C:\Anaconda3\lib\site-packages\distributed\nanny.py", line 390, in _on_exit
    await self.scheduler.unregister(address=self.worker_address)
  File "C:\Anaconda3\lib\site-packages\distributed\core.py", line 757, in send_recv_from_rpc
    result = await send_recv(comm=comm, op=key, **kwargs)
  File "C:\Anaconda3\lib\site-packages\distributed\core.py", line 556, in send_recv
    raise exc.with_traceback(tb)
  File "C:\Anaconda3\lib\site-packages\distributed\core.py", line 408, in handle_comm
    result = handler(comm, **msg)
  File "C:\Anaconda3\lib\site-packages\distributed\scheduler.py", line 2135, in remove_worker
    address = self.coerce_address(address)
  File "C:\Anaconda3\lib\site-packages\distributed\scheduler.py", line 4844, in coerce_address
    raise TypeError("addresses should be strings or tuples, got %r" % (addr,))
TypeError: addresses should be strings or tuples, got None```

from dask.distributed import Client
client = Client(process=False)jupyter notebook is still runing, even the error message shows.

thats normal, just let me know if it finishes

how many rows and columns do you have?

(215599182, 6)

still runing. the data collected before

that is a hefty dataset, let it run until it completes or timesout

okay. thank you very much. I will give you update tomorrow

great. just keep me posted

:+1:

Im not sure how much RAM is offered with your jupyter notebook, but golab gives 12gb in the free tier. If that one takes forever it might be worth checking out colab

@Jack_Lindsay_Kraken1 the code is still running. My PC has 16 RAM. I willr read your file about colab

If your computer has 16gb of ram you might be better off running it on a local IDE like pycharm (you can get it free if you are a student) or some other IDE. I am not sure about jupyter notebooks memory constraints but it sounds like you arent using all of the the RAM available. Can you give me all of the files you are trying to merge and then turn into a pd df so i can recreate it?

Thank you very much. I will send you message about the file.