Sort | Sort within group | Ab initio GDE | ETL | overview | Interview

πŸš€ Add to Chrome – It’s Free - YouTube Summarizer

Category: Data Processing

Tags: componentsdataexecutionparameterssorting

Entities: allow unsortedmajor keymax scoreminor keysort componentsort within group component

Building WordCloud ...

Summary

    Business Fundamentals
    • The sort component is used to arrange data based on a specified key in either ascending or descending order.
    • It offers options for handling null values and sequence ordering, including custom orders.
    • The max score parameter in the sort component determines the maximum size of data to be sorted at once, with a default of 96 MB.
    • Sort component creates temporary files during execution if data exceeds the max score.
    Advanced Sorting Techniques
    • Sort within group component allows sorting of data based on a secondary key after the primary key has been sorted.
    • The major key represents the already sorted primary key, while the minor key is used for further sorting within groups.
    • The allow unsorted parameter, when set to true, permits unsorted data in the major key.
    • Max score for sort within group is 10 MB by default, similar to the sort component.
    • Sort within group creates temporary files if data exceeds the specified max score.
    Takeaways
    • Use the sort component to organize data based on key attributes with customizable order and null handling.
    • Implement sort within group for advanced sorting needs where multiple keys are involved.
    • Adjust max score parameters to optimize performance based on data volume.
    • Utilize temporary files for efficient sorting when handling large datasets.
    • Ensure major key data is sorted or set allow unsorted to true to prevent execution failure.

    Transcript

    00:00

    [Music] Welcome to my YouTube channel. In this video, we'll discuss about au sort and sort within group component.

    These two component we discuss. Okay.

    00:19

    So we know that sort component will require when if you are using some rule of join dup sort or scan component if and that parameter if you make as a true that input parameter then we require the

    00:37

    sorted data sometime if you don't have input data is coming like not sorted out of order then we need this sort component right if sort component we are not using and here uh rule of if you're using sorted input is true then this

    00:55

    your graph will be fail okay and uh sort within group when we required let's suppose if you're using the sort component is having some primary key and uh we are we need to use uh the sorting

    01:14

    on the second key okay so let's suppose sort within the group will be work based on your second key. Okay, first key I mean that we are let's suppose we think that we are having two key is primary and second is secondary key.

    Okay, so

    01:31

    primary that is major key is coming from the sorted that will be specifying sort within group and second key secondary key that is minor key. So based on the minor key it will be uh sort your data.

    Okay. So these two

    01:48

    component we discuss details in this video. Okay.

    So let's start. So the sort component the name itself we can understand that that this component is is producing the data based

    02:06

    sorting order based on your key that you have specified. Okay.

    This component is provide two parameters. one is key and second one is max score.

    Okay. So key we know that based on the key it will be uh order

    02:24

    your record. There is in that key you can direction you can provide like I mean ascending order or descending order.

    These two direction this uh this component will be provide your output

    02:39

    record also they have option if here having in record having null then which order you have. Okay.

    So null you want to be low precedence or null to be high. So this also you can provide you can

    02:54

    select here. Okay.

    And also fourth third option that we have sequence right. Sequence also based on that uh how the order you want sequence machine level this will be default.

    Okay. So phone

    03:10

    book index or custom. Okay your order.

    So custom will be work based on that your digit lower case upper case alpha numeric this you can specify or white space null user defined types of you can

    03:26

    design your order. So this is how the key parameter is for the sort component that providing this much options direction null order and sequence.

    Okay. So next parameter

    03:41

    we'll discuss that max score. So max for default value for short component we have 96 MB.

    You can define based on your requirement 0 to you can increase the

    03:57

    volume more than 96 also but default it is it is 96. You can specify zero as well.

    Okay. But uh you can feel that your performance will be I mean uh reduced because if you are having use

    04:12

    volume of data then you you can understand that the your performance will be reduced. Okay the this component will be right I mean some temporary file in your disk if if it is increased whatever

    04:27

    the value we have defined in max code. Okay.

    So now we'll discuss quickly runtime behavior of sort components. Okay.

    So sort components will read all the in records from all your input flow that is connected to your sort component

    04:43

    and it will be split into temporary files. Okay.

    If you have defined a smaller size than whatever it is defined in max score parameter. Okay.

    So after reading

    05:00

    all your inputs record next it will be uh your uh sort uh sort read each input temporary files. Okay.

    Whatever temporary file generated in your default directory those will be ordering based on the sort key. Okay.

    Then it will be

    05:19

    merge all your temporary files and maintain your sort orders and finally it will be write the result to your output port. So this is the just two steps of runtime behavior of the sort component.

    Okay. So now we'll discuss sort within

    05:37

    group component. So au have option to produce means sorted based on the secondary key.

    We know that uh input is coming sorted based on one

    05:53

    key only. If you want to order your secondary key, so we can use the sort within group component.

    Okay. So description I have just written here that sort within group refineses

    06:08

    the sorting of record already sorted according to one key specifier means one key that already have sorted. Okay.

    And next it is saying that it sort the record within the group formed by the first sort according to a second key

    06:23

    specifier. So it will be work your second key and produce your order.

    Okay. And the sort within group component is provide these are the parameters.

    First we discuss the major

    06:39

    key. Major key we can specify that key which is we know that that key having the data is coming already sorted means input is coming the sorted order.

    So that field we can specify in major key and uh minor key that you want to

    06:56

    sort using this component. So you can specify that key in minor key.

    Okay. And also we have option that allow unsorted parameter.

    So this parameter is default will be false. Okay.

    So if you

    07:13

    are making as a true then this component will be your unsorted data your prime major key primary key first key. Okay.

    Next we have max score. So max score also having some default value for

    07:30

    this component having 10 MB. So this also behave like same like sort component that will be specify a maximum size this component and if it is increase the size then it will be write the temporary file in your disk.

    07:46

    Okay. So now we can discuss the runtime behavior of sort within group component.

    Okay. So this component will read the record from all input flow that is connected to your input port and it will read until your max score parameter that

    08:02

    you have specified. Suppose if you have it is increase the length whatever you have defined then it will be write I mean your temporary file in your disk space okay in uh uh your working directory

    08:20

    from the temporary files it will be create the arranging or uh based on your key. Okay, the second option will be having a second steps that sort within group that it will assume that your sorted according to the major key

    08:36

    parameter. If it is not sorted major key parameter then allow unsorted parameter approved to true.

    Okay, if it will check if you have not specified to set to true then it will be your execution will be failed and it will take that out of order record is

    08:52

    coming. Okay.

    So let's suppose here this steps is fine then it will be uh then again it will be coming to your third steps and it will be read the record uh from the temp free file that have

    09:07

    created in your working directory it will merge and it will produce based on your minor key parameter okay and it will write your output. So this is how overall we will work sort within group component.

    Okay. So if you have any

    09:23

    question on these two component uh sort and sort within group so you can put your comment on comment section and also you can reach out to my number that I have given in description. Okay.

    So yeah thank you so

    09:40

    much for watching this video.