Description
Custom code enables a user to write and run his own SQL code in order to transform a tabular dataset.
Advanced functionality, not recommended for inexperienced users
This complex feature requires advanced knowledge of SQL code. For assistance, contact our Customer Success team or visit our Support Portal. Also consider using other manipulator steps in combination to achieve your intended result.
Application
Even though the platform offers numerous functions to transform tables, users might sometimes want to transform their data in a way so specific that no general solution is available. This function will allow such users to manipulate the data with total freedom and flexibility. The function can be used for example to manually modify the values of the table, restructure the table, to create new columns in a complex way.
How to use
Knowledge of SQL code is required for this step.
- Select the dataset on which you want to apply the code.
- Write the code in the code section.
Here are a few important points:
Each snippet of code must contain FROM {dataset_1} in a single line, to reference the selected dataset
The code block cannot end with a semi-colon
After your code is written, you have the following options before running the step:
- Add a description to your code that will be visible from the notebook (e.g. “data filtered in my specific way”). If no description is added, the step will just print the custom code. Especially if the code becomes longer it is highly recommended to add a description instead.
- Enable Save output under a different name if the step should not overwrite the initial dataset. If this option is not enabled the dataset will be overwritten.
Note: We've created custom GPTs to auto-generate SQL queries to help you use this manipulator (requires a Pro OpenAI Plan):
- If you have an Amazon AWS installation (using Athena SQL): link here
- If you have a Microsoft Azure installation (using Synapse): link here
Examples
The examples below assume using data available for selection in the Tabular loader under workshop > Other data sets > Automotive Dynamics Track Testing > TrackTesting.csv.
The initial list of columns in this dataset are
name, time, steeringtorque, steeringangle, wheelfront_forcex, wheelfront_forcey, wheelfront_forcez, wheelfront_momentx, wheelfront_momenty, wheelfront_momentz, wheelrear_forcex, wheelrear_forcey, wheelrear_forcez, wheelrear_momentx, wheelrear_momenty, wheelrear_momentz, car_accelerationx, car_accelerationy, car_accelerationz, car_velocity, car_angle
Renaming a column
SELECT name AS test_id, time FROM {dataset_1}
Select/Reorder columns
SELECT test_id, time, steeringtorque, steeringangle, car_accelerationx, car_accelerationy, car_accelerationz, car_velocity, car_angle FROM {dataset_1}
Remove missing values
SELECT * FROM {dataset_1} WHERE NOT (car_angle IS NULL AND car_velocity IS NULL)
Remove duplicates
SELECT DISTINCT test_id, time, steeringtorque, steeringangle, car_accelerationx, car_accelerationy, car_accelerationz, car_velocity, car_angle FROM {dataset_1}
Random subset
SELECT * FROM {dataset_1} WHERE RAND() <= 0.6
Group by
SELECT test_id, MAX(car_angle) AS max_car_angle FROM {dataset_1} GROUP BY test_id
Sort by
SELECT test_id, time, steeringtorque, steeringangle, car_accelerationx, car_accelerationy, car_accelerationz, car_velocity, car_angle FROM {dataset_1} ORDER BY test_id, time
Filter numeric values
SELECT test_id, time, steeringtorque, steeringangle, car_accelerationx, car_accelerationy, car_accelerationz, car_velocity, car_angle FROM {dataset_1} WHERE steeringangle BETWEEN -50 AND 50
Clip values
SELECT test_id, time, steeringtorque, CASE WHEN steeringangle < -50 THEN -50 WHEN steeringangle > 50 THEN 50 ELSE steeringangle END AS steeringangle, car_accelerationx, car_accelerationy, car_accelerationz, car_velocity, car_angle FROM {dataset_1}
Was this article helpful?
That’s Great!
Thank you for your feedback
Sorry! We couldn't be helpful
Thank you for your feedback
Feedback sent
We appreciate your effort and will try to fix the article