Description
Custom code enables a user to write and run his own SQL code in order to transform a tabular dataset.
Advanced functionality, not recommended for inexperienced users
This complex feature requires advanced knowledge of SQL code. For assistance, contact our Customer Success team or visit our Support Portal. Also consider using other manipulator steps in combination to achieve your intended result.Application
Even though the platform offers numerous functions to transform tables, users might sometimes want to transform their data in a way so specific that no general solution is available. This function will allow such users to manipulate the data with total freedom and flexibility. The function can be used for example to manually modify the values of the table, restructure the table, to create new columns in a complex way.
How to use
Knowledge of SQL code is required for this step.
- Select the dataset on which you want to apply the code.
- Write the code in the code section.
Here are a few important points:
Each snippet of code must contain FROM {dataset_1} in a single line, to reference the selected dataset
The code block cannot end with a semi-colon
After your code is written, you have the following options before running the step:
- Add a description to your code that will be visible from the notebook (e.g. “data filtered in my specific way”). If no description is added, the step will just print the custom code. Especially if the code becomes longer it is highly recommended to add a description instead.
- Enable Save output under a different name if the step should not overwrite the initial dataset. If this option is not enabled the dataset will be overwritten.
Note: We've created custom GPTs to auto-generate SQL queries to help you use this manipulator (requires a Pro OpenAI Plan):
- If you have an Amazon AWS installation (using Athena SQL): link here
- If you have a Microsoft Azure installation (using Synapse): link here
Examples
The examples below assume using data available for selection in the Tabular loader under workshop > Other data sets > Automotive Dynamics Track Testing > TrackTesting.csv.
The initial list of columns in this dataset are
name, time, steeringtorque, steeringangle, wheelfront_forcex, wheelfront_forcey, wheelfront_forcez, wheelfront_momentx, wheelfront_momenty, wheelfront_momentz, wheelrear_forcex, wheelrear_forcey, wheelrear_forcez, wheelrear_momentx, wheelrear_momenty, wheelrear_momentz, car_accelerationx, car_accelerationy, car_accelerationz, car_velocity, car_angle
Renaming a column
SELECT name AS test_id, time
FROM {dataset_1}
Select/Reorder columns
SELECT test_id, time, steeringtorque, steeringangle, car_accelerationx, car_accelerationy, car_accelerationz, car_velocity, car_angle
FROM {dataset_1}Remove missing values
SELECT * FROM {dataset_1}
WHERE NOT (car_angle IS NULL AND car_velocity IS NULL)Remove duplicates
SELECT DISTINCT test_id, time, steeringtorque, steeringangle, car_accelerationx, car_accelerationy, car_accelerationz, car_velocity, car_angle
FROM {dataset_1}Random subset
SELECT * FROM {dataset_1}
WHERE RAND() <= 0.6Group by
SELECT test_id, MAX(car_angle) AS max_car_angle
FROM {dataset_1}
GROUP BY test_idSort by
SELECT test_id, time, steeringtorque, steeringangle, car_accelerationx, car_accelerationy, car_accelerationz, car_velocity, car_angle
FROM {dataset_1}
ORDER BY test_id, timeFilter numeric values
SELECT test_id, time, steeringtorque, steeringangle, car_accelerationx, car_accelerationy, car_accelerationz, car_velocity, car_angle
FROM {dataset_1}
WHERE steeringangle BETWEEN -50 AND 50Clip values
SELECT
test_id,
time,
steeringtorque,
CASE
WHEN steeringangle < -50 THEN -50
WHEN steeringangle > 50 THEN 50
ELSE steeringangle
END AS steeringangle,
car_accelerationx,
car_accelerationy,
car_accelerationz,
car_velocity,
car_angle
FROM {dataset_1}Was this article helpful?
That’s Great!
Thank you for your feedback
Sorry! We couldn't be helpful
Thank you for your feedback
Feedback sent
We appreciate your effort and will try to fix the article