Showing posts with label Distributed Applications. Show all posts
Showing posts with label Distributed Applications. Show all posts

Tuesday, May 24, 2022

Learning some useful Windows OS Concepts

Windows  Operating system provides multiple technologies to develop distributed, scalable  Network based applications. Some not so common technologies and their usage is discussed below.

Overlapped IO 

  • Provide fastest asynchronous capability known to OS, enabling concurrent reads/writes by multiple threads. This comes free with .NET managed thread pool. 
    • Overlapped IO is useful when IOs are executed in parallel. For example, when an data file is assembled from strips received from UDP packets.
    • Operations such as ReadFile / WriteFile use OVERLAPPED structure parameter.
    • To perform asynchronous overlapped IO, files are opened in the CreateFile API with FILE_FLAG_OVERLAPPED flag.
    • In case of async, ReadFile / WriteFile calls might return immediately.
    • GetOverlappedResult API can be used to check status of the IO operation.

    IO Completion Port

    • IOCompletionPorts Provide fastest asynchronous IO capability known to OS, enabling concurrent reads/writes by multiple threads. 
      • IOCompletionPorts are useful to process results of multiple async IOs executed in parallel on multiple handles(file, socket, pipes etc) in a centralized place. For example, web server such as IIS.
      • To use IOCP functionality, first an IOCP needs to be created using CreateIOCompletionPort Api with an Invalid handle.
      • A thread pool needs to be created where multiple threads waits on the IOCP in QueuedCompletionStatus API to process completed IOs.
      • The handles on which async IOs are performed should be registered with the above IOCP. An unique Completion key can be also supplied.
      • Upon completion of asynchronous overlapped IOs on the above handles, one of the threads from the thread pool is released to process the results.
      • .Net library provides a separate IOCP thread pool. Classes such as FileStream internally use this to support async operations.
      • A good Winsock based example in C++ can be found here

      File Mapping backed by NTFS sparse file

      • Sparse data consists of data chunks containing 0s.
      • In a file system these occupy valuable disk space without adding much value.
      • NTFS introduces a new file type called sparse file.
      • The big advantage of sparse file is that when it contains sparse data, it’s not committed to disk as seen in the screenshot below.


      OS keeps track of these sparse data segments within in its internal data structures. When sparse data is
       read, OS returns an array of 0s instead of physically reading the file.
      • When real data is written to a sparse file, OS will update its internal data structures to reflect changes.
      • One of the disadvantages of shared memory is that either huge blocks of memory needs to be committed upfront or requires complicated programming to commit memory on the fly for the previously reserved address space.
      • NTFS sparse file backed memory mapped file enables accessing of a large address space without committing of disk space.
      • The OS transparently commits as much memory written when an application writes into the address space.
      • When a non written address space is accessed, faults are not raised; instead a block of 0s is returned.
      Usage Example
      • During acquisition data files generated vary in numbers and size. 
      • In other words, the total size of  data files generated can vary from 100MB to 1.5GB.
      • The storage is allocated in the shared memory results in underutilization since most of the committed pages contain sparse data thereby affecting overall performance of the system as well.
      • As an alternative if the storage is allocated in a memory mapped file, the disk space will be committed upfront. Again most pages will have sparse data.
      • A NTFS Sparse file backed MMF file eliminates both of these shortcomings since sparse data is never committed to disk.
      • As seen in the screenshot above, this difference can be seen.

      Source and Binaries can be found here.

      Wednesday, May 4, 2022

      Ultra fast compression and decompression Win32 APIs for realtime applications



      The NTFS file system internally uses Ultra fast Realtime data compression and decompression.
      Starting with Win10, it’s used internally by OS as shown below.

      The same APIs RtlCompressBuffer and RtlDecompressBuffer can be used by user applications as well.
      More info in the MSDN page.

      The tables below show data collected for first 10 data files during an acquisition run.
      Up to 38% of savings can be achieved over a standard 1.8 MB data file. Compression timings as low as 16 milliseconds and expansion timings as low as 15 milliseconds are clocked per data file.


      In an another example below,  a 5 MB DICOM image was compressed to almost 40% in 38 ms and decompressed in 8ms.

      TestApp  IM_0047

      Compression [fast]
      uncompressed size:      5509714 compressed size:        3378829 time in ms:38

      Decompression [fast]
      compressed size:        3378829 uncompressed size:      5509714 time in ms:8


      Compression [slow]
      uncompressed size:      5509714 compressed size:        3203440 time in ms:6235

      Decompression [slow]
      compressed size:        3203440 uncompressed size:      5509714 time in ms:8




      Source and Binaries can be found here.

      Saturday, April 30, 2022

      Running Custom applications using task scheduler when the system is idle

      Many times  it is necessary to run background tasks when the system is idle. 
      For example, perform background encryption / decryption  using manage-bde tool on volumes containing data which could not be completed during installation.
      In such cases the task scheduler can be used to schedule a task under the system account, upon OnIdle condition (when the system becomes idle with no user inter action or no background process running), after a delay say 1 minute. This task may launch a batch script start_idle_task.cmd.  As this runs under the system account, it'll run even none is logged in.  
      When the system goes out of idle say due to an user action, this task is immediately terminated if it's still running. 
      In the some cases, when the on idle task is terminated, it's desirable to take an action such as run another task. As windows OS provides no direct way to achieve this, it can be implemented by tracking the process termination of start_idle_task.cmd using WMI class Win32_ProcessStopTrace in a different process and launch the new task.

      Example:
      Create a batch file start_idle_task.cmd as below. This batch file will be started when system becomes idle and trigger ProcessWatcher.exe when it's terminated by user action.
      rem processwatcher will trigger  when idle_task.cmd is terminated due to user action
      start D:\Github\TechBlog\IdleTask\Scripts\ProcessWatcher.exe
      echo started %date% %time% >> c:\temp\idletask.log
      echo started %date% %time% >> c:\temp\idletask.log
      rem run any command to start when system is idle rem start manage-bde -resume d: rem wait for the task scheduler to kill pause
      Create a task scheduler task that will be launched when system becomes idle. This task will run under system account and run start_idle_task.cmd.
      schtasks /create /f /sc onidle /i 1 /tn idletaskstart /tr "D:\Github\TechBlog\IdleTask\Scripts\start_idle_task.cmd" /ru system

      ProcessWatcher.exe
      Create a winform project. Remove Form1 and add following lines in Program.cs. This will wait for termination of start_idle_task.cmd by the system.  Then execute customaction() and terminate.
             
          static class Program
          {
              static ManagementEventWatcher processStopEvent;
              static void customaction()
              {
                  System.IO.File.AppendAllText(@"c:\temp\processwatcher.txt", "ended " + System.DateTime.Now.ToString() + "\n");
              }
              static void processStopEvent_EventArrived(object sender, EventArrivedEventArgs e)
              {
                  try
                  {
                      customaction();
                      processStopEvent.Stop();
                      processStopEvent.EventArrived -= processStopEvent_EventArrived;
                      Process.GetCurrentProcess().Kill();
                  }
                  catch
                  {
                  }
              }
      
              /// <summary>
              /// The main entry point for the application.
              /// </summary>
              [STAThread]
              static void Main()
              {
                  int parentpid = 0;
                  using (var query = new ManagementObjectSearcher(
                      "SELECT * " +
                      "FROM Win32_Process " +
                      "WHERE ProcessId=" + System.Diagnostics.Process.GetCurrentProcess().Id))
                  {
                      parentpid = query
                          .Get()
                          .OfType<ManagementObject>()
                          .Select(p => System.Diagnostics.Process.GetProcessById((int)(uint)p["ParentProcessId"]))
                          .FirstOrDefault().Id;
                  };
      
      
                  processStopEvent = new ManagementEventWatcher("SELECT * FROM Win32_ProcessStopTrace where ProcessID="+ parentpid);
                  processStopEvent.EventArrived += processStopEvent_EventArrived;
                  processStopEvent.Start();
      
                  Application.EnableVisualStyles();
                  Application.SetCompatibleTextRenderingDefault(false);
                  Application.Run();
              }
          }
      }
             
       
      Output
      c:\temp\idletask.log
      started 07-06-2022  4:19:21.85 
      ended 07-06-2022 11:59:01

      Source and Binaries can be found here.

      Saturday, April 23, 2022

      Share large allocated memory across applications without duplication


      In some client - server applications running on same box,  a large amount of data may need to be shared. For example, an image acquisition application sharing the image with its clients. This typically involves using file Mapping objects to share memory across processes. However sometimes this can be overbearing if only two processes are involved.  Instead ReadProcessMemory and WriteProcessMemory APIs can be used.
      What this means is a Process A can share its array with Process B without doing  IPC except sharing its PID and address of the array. Debuggers exploit this..

      In the demo below, Writer process fills an array and shares it with Reader process. The array is read by the reader process using  ReadProcessMemory api . Also the array is repopulated using WriteProcessMemory api and is available to Writer process.





      Sequence

      Reader process starts
      Writer process starts

      Writer Process: 
      creates an array of 100000  in writer process
      Writes 100000 bytes containing 1
      signals Reader with process handle and array memory address
      sleeps for 5 seconds

      Reader Process: 
      Opens array using writer process handle and array memory address
      Reads 100000 bytes containing 1
      Writes 100000 bytes containing 2

      Writer Process: 
      Reads 100000 bytes containing 2

      Source and Binaries can be found here.


      Saturday, April 9, 2022

      Use System Cache memory for your applications


      Windows OS uses Cache memory set aside for faster IO with disk and disk. For example when an user launches an application  for the first time, it's loaded into cache memory. Next time when user launches the same application it loads faster since it's loaded from the cache memory instead from disk.
      The size of cache is determined during system startup.

      The red box in the screenshot below shows size of the cached files on the system.

      The system cache is highly scalable and memory management is done by the cache manager in conjunction with memory manager. The following video describes internals for system cache.



      User applications can also take advantage of this such that data can be directly written to cache memory and read from there.    The trick is to set FILE_ATTRIBUTE_TEMPORARY option during file creation. This will instruct cache manager to keep data in cache memory rather than write to disk.

      It's also important the file readers should not try to open the file in anyway. This will make the file
      contents to be written to disk. The alternative is to create a duplicate handle and use it to access the file.
      Using overlapped IO for read and write ensures proper management of the file pointer by readers and writers.

      The list of files in the screenshot below are the actual cached files of  the data files from an acquisition.  The red box indicates that 0 bytes are committed to the disk for the all the cached data files. Similarly the cyan colored rectangle indicates the actual sizes of the selected data file.


      Example:
      In this example, 100000 bytes are written by the writer process to the system cache via temporary file  as in the first screenshot and the same is read from the system cache by the reader process as in the second screenshot. From the third screenshot we can notice that all it stays in system cache memory and nothing is committed to disk.






      Sequence

      Reader process starts
      Writer process starts

      Writer Process: 
      creates test1.dat on system cache
      Writes 100000 bytes containing 1
      signals Reader with duplicate file handle
      sleeps for 5 seconds

      Reader Process: 
      Opens test1.dat on system cache using duplicate file handle
      Reads 100000 bytes containing 1
      Writes 100000 bytes containing 2

      Writer Process:
      Reads 100000 bytes containing 2


      Source and Binaries can be found here.

      Thursday, March 31, 2022

      Simple Managed Inter Process Communication(IPC) Framework



      If the need is just to do a cross process communication within the box, it'd be bit heavy to use frameworks such as WCF as it comes with a learning curve and complex setup. However  home grown solutions such as messaged based implementation suffer from flexibility and heavy maintenance.
       The simpleIPC framework tries to strike right balance with the interface based programming coupled with zero setup.

      IPC Overview
      A lightweight Inter Process Communication (IPC) across process within the same PC in .Net can be implemented using named event object or Window kernel object and shared memory.
      • Interface based programming
      • Duplex communication support
      • Servers are identified by a unique string to which clients can generate a proxy to communicate. Some of the features:
      • Servers/ Clients generate Stubs/ Proxy based on an interface for communication using reflection.
      • As the proxy is based on the RealProxy object, intellisense is also supported in the Visual Studio IDE editor. Easy to debug, just a single file of code.
      • To access a managed server from an unmanaged client, reversePinvoke can be used.
      As shown in  the diagram above. IPC consists of a Client, Proxy, Stub and Server as described below

      • Client uses proxy based on RealProxy to make API call
      • Proxy serializes the input data using binary formatter to shared memory and signals stub
      • Stub deserializes the input data using binary formatter from shared memory and makes the call on the server using reflection
      • Stub serializes the results to shared memory and signals Proxy
      • Proxy deserializes the results from shared memory and returns it to the client
      Implementation
      Comes in two flavors windows kernel object based or named event kernel object based.
      Windows based uses a window kernel object for implementing the server


      As shown above multiple Windows are hosted in a thread. Each thread hosts a server. 
      • A Window based container can supports multiple servers per thread
      • A window based server cannot work across window stations
      Named object based uses a named event object for implementing the server.
      • A Named event object based container supports only one server
      • A named event object based server can work across window stations
      Example
      The interface  ICallInterface shown below is implemented by the server and ICallbackInterface is implemented by the client.
      namespace Example
      {
          [Serializable]
          public class regdata
          {
              public string name;
              public string regid;
          }
      
          public interface ICallInterface
          {
              string current { get; }
              string register(string name, string cbservername, out int ticket);
          }
      
          public interface ICallbackInterface
          {
              void update(int ticket, regdata data);
          }
      
      }


      Windows Simple IPC  Demo

      Server Process
      Simple IPC Server name for ICallback : winserver


      Client Process
      Simple IPC server name for ICallbackInterface : winclient




      Named Object Demo

      Server Process
      Simple IPC Server name for ICallback : namederver


      Client Process
      Simple IPC server name for ICallbackInterface :  namedclient


      Operation

      Server process starts
      client process starts

      Client:
      creates proxy for server
      calls register() on server using proxy

      Server:
      executes method register() 
      server returns success with Ticket #100

      Client:
      results are received from proxy : success, Ticket:100

      Server:
      creates proxy for client implementing callback interface
      calls update() method on the proxy
      Client:
      executes update() method

      Source and Binaries can be found here.