Neeraj Krishnan
05/24/2024, 6:51 PMimport ray
import daft
ray.init()
daft.read_sql(<sql query>, mssql://<user>:<password>@<server>:<port>/<db>, parallelism=<parallelism>, partition_col=<col>)
Which almost immediately returns
RuntimeError: Timed out in bb8
If it really is a timeout, there doesn’t seem like a way to configure it as part of the read_sql call.
Verified that the server is up, and the creds are correct because I can use pyodbc + ray dataset’s read_sql to read the data.Neeraj Krishnan
05/24/2024, 6:56 PMTraceback (most recent call last):
File "/tmp/ray/session_2024-05-24_09-29-07_153171_8/runtime_resources/working_dir_files/_ray_pkg_b59887b0d434edc1/daft_test.py", line 44, in <module>
daft_df = daft.read_sql(sql, db_connection_string)
File "/home/ray/anaconda3/lib/python3.9/site-packages/daft/api_annotations.py", line 43, in _wrap
return timed_func(*args, **kwargs)
File "/home/ray/anaconda3/lib/python3.9/site-packages/daft/analytics.py", line 215, in tracked_fn
result = fn(*args, **kwargs)
File "/home/ray/anaconda3/lib/python3.9/site-packages/daft/io/_sql.py", line 84, in read_sql
sql_operator = SQLScanOperator(
File "/home/ray/anaconda3/lib/python3.9/site-packages/daft/sql/sql_scan.py", line 47, in __init__
self._schema = self._attempt_schema_read()
File "/home/ray/anaconda3/lib/python3.9/site-packages/daft/sql/sql_scan.py", line 112, in _attempt_schema_read
pa_table = self.conn.read(self.sql, limit=1)
File "/home/ray/anaconda3/lib/python3.9/site-packages/daft/sql/sql_connection.py", line 63, in read
return self._execute_sql_query(self._construct_sql_query(sql, projection, predicate))
File "/home/ray/anaconda3/lib/python3.9/site-packages/daft/sql/sql_connection.py", line 101, in _execute_sql_query
return self._execute_sql_query_with_connectorx(sql)
File "/home/ray/anaconda3/lib/python3.9/site-packages/daft/sql/sql_connection.py", line 116, in _execute_sql_query_with_connectorx
raise RuntimeError(f"Failed to execute sql: {sql} with url: {self.conn}, error: {e}") from e
...
error: Timed out in bb8
Neeraj Krishnan
05/24/2024, 7:05 PMjay
05/24/2024, 7:31 PM?driver=SQL+Server&trusted_connection={trusted_conn}'
Let me know if that works?Neeraj Krishnan
05/24/2024, 7:42 PMdriver=SQL+Server
also results in the same error 😕jay
05/24/2024, 7:44 PMimport connectorx
username = 'my_username'
password = '1234'
server = 'SERVER1'
database = 'db1'
trusted_conn = 'no' # or yes
query = "SELECT * FROM table"
mssql_url = f'mssql://{username}:{password}@{server}/{database}?driver=SQL+Server&trusted_connection={trusted_conn}'
df = connectorx.read_sql(mssql_url, query)
Neeraj Krishnan
05/24/2024, 7:51 PMNeeraj Krishnan
05/24/2024, 7:55 PMRuntimeError: Timed out in bb8
The stalled connection was from encrypt=true which was a red-herring presumablyjay
05/24/2024, 7:58 PMjay
05/24/2024, 8:03 PMNeeraj Krishnan
05/24/2024, 8:04 PMjay
05/24/2024, 8:05 PMjay
05/24/2024, 8:07 PMNeeraj Krishnan
05/24/2024, 8:10 PMjay
05/24/2024, 8:32 PMdef create_conn():
return sqlalchemy.create_engine("mssql+pyodbc://<user>:<password>@<server>:<port>/<db>").connect()
df = daft.read_sql("SELECT * FROM my_table", create_conn)
I think you’ll need to specify mssql+pyodbc://
so that SQLAlchemy will use pyodbc for the connection under the hood. Let me know if that works for you!Colin Ho
05/29/2024, 6:32 PMNeeraj Krishnan
05/31/2024, 3:40 PMUserWarning: Failed to execute the query with limit 1: Failed to execute sql: SELECT *
FROM (<original query>) AS subquery
LIMIT 1
which is expected since sqlserver does not support limit
(uses TOP
)
However, after stating Attempting to read the entire table
we still get:
Incorrect syntax near 'LIMIT
...
[SQL: SELECT *
FROM (SELECT *
FROM (<original query>) AS subquery
LIMIT 1) AS subquery]
using daft 0.2.21
https://github.com/Eventual-Inc/Daft/blob/v0.2.21/daft/sql/sql_connection.py
where the second query construction is supposed to leave LIMIT out of the query. B_ut_ the sql
arg passed to it is the shadowed sql variable which has the wrapping select * and limit already added to it.
Also, if its of interest later to someone
connection_url = sqlalchemy.engine.URL.create(
"mssql+pyodbc",
username=user,
password=password,
host=host,
port=1433,
database=database,
query={
"driver": "ODBC Driver 18 for SQL Server",
},
)
def create_conn():
return sqlalchemy.create_engine(connection_url).connect()
Where ODBC drivers are installed per:
<https://learn.microsoft.com/en-us/sql/connect/odbc/linux-mac/installing-the-microsoft-odbc-driver-for-sql-server?view=sql-server-ver15&tabs=ubuntu18-install%2Calpine17-install%2Cdebian8-install%2Credhat7-13-install%2Crhel7-offline>
is what worked connection-wise.Colin Ho
05/31/2024, 6:09 PMlimit
in version 0.2.21
We've actually updated Daft starting from version 0.2.22
to use a separate SQL parsing library called SQLGlot to construct our sql queries, which should make us compatible across more dialects. However, I do need to make a small fix to allow it to work with SQL server (essentially SQLGlot recognizes the sql server dialect as tsql
instead of mssql
), which should be ready in the next release!
https://github.com/Eventual-Inc/Daft/pull/2330 @jay could you take a look?jay
05/31/2024, 6:28 PMNeeraj Krishnan
05/31/2024, 6:28 PMjay
05/31/2024, 6:29 PMColin Ho
06/04/2024, 8:49 PM0.2.26
), the new fixes should be ready to try out!Neeraj Krishnan
06/04/2024, 9:17 PMNeeraj Krishnan
06/06/2024, 1:48 PMColin Ho
06/06/2024, 3:58 PM