Custom SQL Code

Modified on Thu, 5 Dec, 2024 at 1:08 PM

Description

Custom code enables a user to write and run his own SQL code in order to transform a tabular dataset.

Advanced functionality, not recommended for inexperienced users

This complex feature requires advanced knowledge of SQL code. For assistance, contact our Customer Success team or visit our Support Portal. Also consider using other manipulator steps in combination to achieve your intended result.

Application

Even though the platform offers numerous functions to transform tables, users might sometimes want to transform their data in a way so specific that no general solution is available. This function will allow such users to manipulate the data with total freedom and flexibility. The function can be used for example to manually modify the values of the table, restructure the table, to create new columns in a complex way.

How to use

Knowledge of SQL code is required for this step.

Select the dataset on which you want to apply the code.
Write the code in the code section.

Here are a few important points:

Each snippet of code must contain FROM {dataset_1} in a single line, to reference the selected dataset
The code block cannot end with a semi-colon

After your code is written, you have the following options before running the step:

Add a description to your code that will be visible from the notebook (e.g. “data filtered in my specific way”). If no description is added, the step will just print the custom code. Especially if the code becomes longer it is highly recommended to add a description instead.
Enable Save output under a different name if the step should not overwrite the initial dataset. If this option is not enabled the dataset will be overwritten.

Note: We've created custom GPTs to auto-generate SQL queries to help you use this manipulator (requires a Pro OpenAI Plan):

If you have an Amazon AWS installation (using Athena SQL): link here
If you have a Microsoft Azure installation (using Synapse): link here

Examples

The examples below assume using data available for selection in the Tabular loader under workshop > Other data sets > Automotive Dynamics Track Testing > TrackTesting.csv.

The initial list of columns in this dataset are

name, time, steeringtorque, steeringangle, wheelfront_forcex, wheelfront_forcey, wheelfront_forcez, wheelfront_momentx, wheelfront_momenty, wheelfront_momentz, wheelrear_forcex, wheelrear_forcey, wheelrear_forcez, wheelrear_momentx, wheelrear_momenty, wheelrear_momentz, car_accelerationx, car_accelerationy, car_accelerationz, car_velocity, car_angle

Renaming a column

SELECT name AS test_id, time
FROM {dataset_1}

Select/Reorder columns

SELECT test_id, time, steeringtorque, steeringangle, car_accelerationx, car_accelerationy, car_accelerationz, car_velocity, car_angle
FROM {dataset_1}

Remove missing values

SELECT * FROM {dataset_1} 
WHERE NOT (car_angle IS NULL AND car_velocity IS NULL)

Remove duplicates

SELECT DISTINCT test_id, time, steeringtorque, steeringangle, car_accelerationx, car_accelerationy, car_accelerationz, car_velocity, car_angle
FROM {dataset_1}

Random subset

SELECT * FROM {dataset_1}
WHERE RAND() <= 0.6

Group by

SELECT test_id, MAX(car_angle) AS max_car_angle
FROM {dataset_1}
GROUP BY test_id

Sort by

SELECT test_id, time, steeringtorque, steeringangle, car_accelerationx, car_accelerationy, car_accelerationz, car_velocity, car_angle
FROM {dataset_1}
ORDER BY test_id, time

Filter numeric values

SELECT test_id, time, steeringtorque, steeringangle, car_accelerationx, car_accelerationy, car_accelerationz, car_velocity, car_angle
FROM {dataset_1}
WHERE steeringangle BETWEEN -50 AND 50

Clip values

SELECT
    test_id,
    time,
    steeringtorque,
    CASE
        WHEN steeringangle < -50 THEN -50
        WHEN steeringangle > 50 THEN 50
        ELSE steeringangle
    END AS steeringangle,
    car_accelerationx,
    car_accelerationy,
    car_accelerationz,
    car_velocity,
    car_angle
FROM {dataset_1}