Protocol of Building a Single Pipeline
On this page, you can read how you can build a single pipeline.
The following steps need to be followed for building a single pipeline very successfully.
Steps of the protocol of Building a Single Pipeline:
Step 1: Determining the processes between the input data and the final results:
When you look at the input data and at the proposed final results that should be generated, you have to think which processes are necessary in between to generate the final results from the present input data you have. Therefore, write these processes in chronological order on paper.
When you look at the input data and at the proposed final results that should be generated, you have to think which processes are necessary in between to generate the final results from the present input data you have. Therefore, write these processes in chronological order on paper.
Step 2: Making the processes more specific into more subprocesses:
Now you have written down the processes of your single pipeline on paper, you have to look at each process very carefully. You have to think about each process yourself, if it is possible to divide that process into separate subprocesses. If so, you can make each process more specific to make multiple subprocesses out of it.
Look at all the phases many times for determining more subprocesses. When you cannot make more subprocesses anymore, you have finished step 2. Each subprocess becomes a normal process now.
Tip: Use a flowchart for this step.
Now you have written down the processes of your single pipeline on paper, you have to look at each process very carefully. You have to think about each process yourself, if it is possible to divide that process into separate subprocesses. If so, you can make each process more specific to make multiple subprocesses out of it.
Look at all the phases many times for determining more subprocesses. When you cannot make more subprocesses anymore, you have finished step 2. Each subprocess becomes a normal process now.
Tip: Use a flowchart for this step.
Step 3: Searching for already existing programs for some processes:
It is very important that you do not program a script for a process, if a program in the world already exists for it. Preventing that, look on the internet for each process to search for programs that already carry out that specific task. If you find such a program, you have to use that program for that specific process and never make your own script. If you discover that for a specific process no program exists, you will be programming a script for it.
It is very important that you do not program a script for a process, if a program in the world already exists for it. Preventing that, look on the internet for each process to search for programs that already carry out that specific task. If you find such a program, you have to use that program for that specific process and never make your own script. If you discover that for a specific process no program exists, you will be programming a script for it.
Step 4: Determining the programming language(s) for each process:
Now, you know from which processes you will have to make the scripts, you have to choose for each script in which programming language or combination of programming languages it should be programming. Each programming language has its own advantages and disadvantages.
The choice which programming language(s) for each script should be used depends on what kinds of actions there have to be carried out during that process and/or what the results of that process will be.
Here are some examples of choices of programming language(s) in several situations:
N.B.: The single pipeline itself (highway) always has to be programmed in Bash in Linux.
Now, you know from which processes you will have to make the scripts, you have to choose for each script in which programming language or combination of programming languages it should be programming. Each programming language has its own advantages and disadvantages.
The choice which programming language(s) for each script should be used depends on what kinds of actions there have to be carried out during that process and/or what the results of that process will be.
Here are some examples of choices of programming language(s) in several situations:
- When statistics are applied during the process, the programming language R should be used.
- When paintings are made during the process, the programming language Java should be used.
- When next-generation sequencing data are being analyzed, the programming language Python should be used.
- When big text files are being analyzed during the process, the programming language Bash should be used.
- When in a big csv text file only specific records should extracted and strings should be analyzed after that during the process, a combination of R and Python should be used.
N.B.: The single pipeline itself (highway) always has to be programmed in Bash in Linux.
Step 5: Programming a script for each remaining process:
You can now start programming each script for each remaining process without an already existing program in the desired programming language.
N.B.: You have to use one of the Majops Skeleton Scripts for programming each script. Furthermore, you have to follow the rules of Majops. That means that your single pipeline meets with the single pipeline characteristics of Majops and your scripts meet with the script characteristics of Majops.
You can now start programming each script for each remaining process without an already existing program in the desired programming language.
N.B.: You have to use one of the Majops Skeleton Scripts for programming each script. Furthermore, you have to follow the rules of Majops. That means that your single pipeline meets with the single pipeline characteristics of Majops and your scripts meet with the script characteristics of Majops.
Step 6: Testing the scripts and pipeline:
After you have programmed all the scripts and you have used all the programs in the correct way, you can test each program and your single pipeline. If everything works fine, you have finished.
After you have programmed all the scripts and you have used all the programs in the correct way, you can test each program and your single pipeline. If everything works fine, you have finished.
Step 7: Finalizing the scripts:
When your scripts work perfectly, you have to carry out the following steps in each script:
When your scripts work perfectly, you have to carry out the following steps in each script:
- You have to comment your code.
- You need to remove all dead code.
- You have to write down the Show Usage Information's:
In the Show Usage Information, the following has to be explained about the script in the right order from the top to the bottom:- Function of the script
- Way of usage (of the command line arguments)
- Example of starting the script
Step 8: Making the start bash script:
All your scripts have been programmed beautifully now, but to start the pipeline itself a start bash script must be programmed.
You always have to use the Start Majops Skeleton Script Bash Pipeline as start for it.
In that start bash script, the following items have to programmed in the right order from top to the bottom:
All your scripts have been programmed beautifully now, but to start the pipeline itself a start bash script must be programmed.
You always have to use the Start Majops Skeleton Script Bash Pipeline as start for it.
In that start bash script, the following items have to programmed in the right order from top to the bottom:
- You have to write down the Show Usage Information:
In the Show Usage Information, the following have to be explained about the pipeline in the right order from the top to the bottom:- Function of the pipeline
- Way of usage (of the command line arguments)
- Example of starting the pipeline with the start bash script
- Put all the scripts and programs that are called and used, when your pipeline is running in the file check list in alpha numerical order.
- Write down the questions for the user of the pipeline that he has to answer for generating the right command line arguments of your pipeline.
- Program the code for the generation of the right command line arguments for your pipeline according to the answers of the user.
- At the end of the start bash script, your pipeline will be called with the highway bash script by giving the right command line arguments to it.
N.B.: This is only one line of code!
Step 9: Making a flowchart of your pipeline:
You have to make a flowchart of your pipeline. Therefore, it is best to use a flowchart program. A good free program for that is Dia. When you make the flowchart, all the scripts must be mentioned on that and you have to draw the flowchart from the left to the right or from the top to the down.
Save the flowchart as a PNG picture and name the PNG file in the following way:
Flowchart_[ pipeline name ].png
Example:
Flowchart_PlasticBottlesCreatorPipeline.png
You have to make a flowchart of your pipeline. Therefore, it is best to use a flowchart program. A good free program for that is Dia. When you make the flowchart, all the scripts must be mentioned on that and you have to draw the flowchart from the left to the right or from the top to the down.
Save the flowchart as a PNG picture and name the PNG file in the following way:
Flowchart_[ pipeline name ].png
Example:
Flowchart_PlasticBottlesCreatorPipeline.png
Step 10: Finalizing and archiving your pipeline:
You have done all the steps before. Now you can finalize and archive your pipeline for other users in the future.
All the scripts, including the start bash script, and the flowchart need to be compressed in 1 zip file. This zip file will get the same name as the pipeline name.
[ pipeline name ].zip
Example:
PlasticBottlesCreatorPipeline.zip
You have done all the steps before. Now you can finalize and archive your pipeline for other users in the future.
All the scripts, including the start bash script, and the flowchart need to be compressed in 1 zip file. This zip file will get the same name as the pipeline name.
[ pipeline name ].zip
Example:
PlasticBottlesCreatorPipeline.zip