I have a problem with mounted drive in Azure Machine Learning using git clone and docker build.

Bijlsma, H. (Hessel) DOMC 20 Reputation points
2025-11-26T09:27:36.5066667+00:00

In my compute in Azure Machine Learning Workspace is perform a git clone of my Azure Devops Repository. The git clone takes al lot of time and results in an error:

```Cloning into 'UC... Password for 'https://******@dev.azure.com': ```  
```remote: Azure Repos remote: Found 342 objects to send. (179 ms) ```  
```Receiving objects: 100% (342/342), 224.21 KiB | 9.00 KiB/s, done. ```  
```fatal: premature end of pack file, 406 bytes missing fatal: ```  
```fetch-pack: invalid index-pack output```
Azure Machine Learning
{count} votes

Answer accepted by question author
  1. SRILAKSHMI C 10,805 Reputation points Microsoft External Staff Moderator
    2025-11-26T12:56:29.0833333+00:00

    Hello Bijlsma, H. (Hessel) DOMC,

    Welcome to Microsoft Q&A and Thank you for reaching out.

    I understand that You’re running into two separate issues here slow or failing git clone operations, and Docker builds timing out, but both have the same underlying cause: how Azure Machine Learning handles mounted storage.

    When a datastore is mounted inside an Azure ML compute, it isn’t a local disk. It’s a network-backed FUSE mount. This works well for reading datasets, but it has very different performance characteristics compared to the compute’s local SSD.

    Why git clone Fails on a Mounted Drive

    The error:

    fatal: premature end of pack file
    fetch-pack: invalid index-pack output
    

    usually means the pack file couldn’t be streamed reliably. On mounted storage:

    • throughput is much lower than local disk

    latency is significantly higher

    frequent small reads/writes (as Git does) perform poorly

    Git has to unpack hundreds of small objects, and the mounted drive simply can’t keep up. That’s also why the progress stalls and interactive authentication (“Password for…”) times out.

    Why Docker Builds Time Out

    Docker builds rely on:

    thousands of filesystem reads/writes

    creating and diffing layers

    scanning build context efficiently

    Mounted storage introduces too much latency for Docker’s I/O pattern, so builds hang or time out. This is expected: mounted drives in AML were never intended for build operations.

    The fact that everything works correctly on the VM’s local filesystem confirms that the mounted drive is the bottleneck.

    Recommended Solution

    Here’s the pattern that works reliably and is recommended by Azure ML engineering:

    1. Don’t Git clone directly into the mounted directory

    Instead, clone your repo into the compute’s local disk:

    cd /tmp   # or /home/azureuser
    git clone https://<username>:<PAT>@dev.azure.com/...
    

    Using a PAT avoids interactive prompts and reduces timeouts.

    2. Don’t run Docker builds from a mounted datastore

    Move (or copy) the project to local storage before building:

    cp -r /mnt/azureml/<your_datastore_path>/myrepo /tmp/myrepo
    cd /tmp/myrepo
    docker build -t myimage .
    

    This avoids all latency issues from the mounted drive.

    3. If you want automation

    You can use:

    a startup script to clone the repo onto the local disk automatically

    ACR Tasks or a CI/CD pipeline (Azure DevOps/GitHub Actions) to build images outside AML and push them to ACR

    AML is designed to use Docker images, not build them on top of mounted storage.

    Why This Happens

    Mounted datastores in AML are optimized for reading datasets, not for development workflows. Git and Docker expect fast local disks. Network-mounted storage cannot deliver the I/O performance required for:

    packfile streaming

    untarring large objects

    Docker layer diffing

    rapid small-file operations

    So, although the mounted drive works fine for reading data during training, it's not suitable for Git/Docker workflows.

    The behavior you’re seeing is expected with mounted storage in Azure ML. For reliable cloning and building:

    always use the compute node’s local filesystem

    avoid Git and Docker directly on mounted datastores

    optionally build images in CI/CD and pull them into AML

    Please refer this

    I Hope this helps. Do let me know if you have any further queries.


    If this answers your query, could you please take a moment to retake the survey by accepting this response? Your feedback is greatly appreciated.

    Thank you!


0 additional answers

Sort by: Most helpful

Your answer

Answers can be marked as 'Accepted' by the question author and 'Recommended' by moderators, which helps users know the answer solved the author's problem.