Dajbych.net


Working with Azure Batch

, 4 minutes to read

Azure Batch is a very prac­ti­cal and highly cus­tomiz­able ser­vice. It is one of Azure Com­pute ser­vices which are de­signed for per­form­ing com­pute-in­ten­sive or data-in­ten­sive tasks. Azure Batch com­bined with Azure Stor­age is a pair of draft horses which can han­dle un­usual work­loads. Setup is very easy and pro­gram in­ter­face is in­tu­i­tive and easy to code against. There is even an API to re­trieve files from a work­ing di­rec­tory of your ap­pli­ca­tion.

What workload is Azure Batch good for

Azure Batch is good for data trans­for­ma­tion. It is ideal when you need big amount of com­put­ing power for a short pe­riod of time. I prob­a­bly would not rec­om­mend Azure Batch for high avai­l­abil­ity sce­nar­ios. It is where Ser­vice Fab­ric serves much bet­ter. I even don’t know how to ap­ply OS up­dates to Azure Batch com­pute nodes. I just didn’t need them. I al­lo­cated over a hun­dred of vir­tual ma­chines, per­formed what I need to com­pute and deal­lo­cated them when the job was done.

Where input and output are stored

It is in­ef­fec­tive to ac­cess a sin­gle database from hun­dreds of ma­chines at one time. Azure Batch is pre­pared to read/write data from/to Azure Blob Stor­age. You don’t have to write this logic by your­self. Azure Batch down­loads in­put files from the stor­age to com­pute node be­fore job starts. When the job is fin­ished it uploads out­put files back to the stor­age au­to­mat­i­cally (when the blob with the same name al­ready ex­ists it is over­writ­ten).

How to code the logic that is being executed

The busi­ness logic must be packed into old fash­ioned tra­di­tional (.exe) ap­pli­ca­tion. Which is cool be­cause you are not lim­ited by any sandbox. The only limit you have is a boundary of vir­tual ma­chine. I wanted to say that the ap­pli­ca­tion can run with ad­min­is­tra­tor priv­i­leges.

Application deployment & updates

The ap­pli­ca­tion is stored as a sin­gle ZIP archive in Azure Stor­age. From there is de­ployed to ev­ery com­pute node af­ter its al­lo­ca­tion. You can eas­ily up­date the app, but you must restart the com­pute node to de­ploy the up­dated ap­pli­ca­tion au­to­mat­i­cally. You are not lim­ited to 1 ap­pli­ca­tion only (how­ever you can­not ex­ceed 20 ap­pli­ca­tions) which can be used for sep­arat­ing ap­pli­ca­tion de­pen­den­cies into in­di­vid­ual pack­ages which can be up­dated in­de­pen­dently.

Organization

Vir­tual ma­chine nodes are grouped into pools. All tasks of the same kind are grouped into jobs. One pool must be as­signed to a job. One Azure Batch account can con­tain mul­ti­ple pools. One task can de­clare mul­ti­ple de­pen­dent tasks which must be com­pleted be­fore the fol­low-up task is started. A task con­sists of two im­por­tant parts – a unique name and a com­mand which is ex­e­cuted in a com­mand line and calls one of your ap­pli­ca­tions in the com­pute node. Azure Batch it­self is an or­ches­tra­tion ser­vice which holds a list of your jobs and tak­ing care of com­pute nodes in your pools.

Management

You can man­age Azure Batch man­u­ally in the Azure Por­tal, Azure CLI or in the Batch Ex­plorer. There is also an op­tion to man­age it pro­gram­mat­i­cally in Python or in .NET (Core) by Azure.Batch NuGet pack­age. You can cre­ate a new pool with Ac­tive Di­rec­tory Batch ac­count cre­den­tials only. The pool can be scaled man­u­ally or au­to­mat­i­cally by a cus­tom script which has an overview of pend­ing tasks.

How to start experimenting with Azure Batch