Hello Bijlsma, H. (Hessel) DOMC,
Welcome to Microsoft Q&A and Thank you for reaching out.
I understand that You’re running into two separate issues here slow or failing git clone operations, and Docker builds timing out, but both have the same underlying cause: how Azure Machine Learning handles mounted storage.
When a datastore is mounted inside an Azure ML compute, it isn’t a local disk. It’s a network-backed FUSE mount. This works well for reading datasets, but it has very different performance characteristics compared to the compute’s local SSD.
Why git clone Fails on a Mounted Drive
The error:
fatal: premature end of pack file
fetch-pack: invalid index-pack output
usually means the pack file couldn’t be streamed reliably. On mounted storage:
- throughput is much lower than local disk
latency is significantly higher
frequent small reads/writes (as Git does) perform poorly
Git has to unpack hundreds of small objects, and the mounted drive simply can’t keep up. That’s also why the progress stalls and interactive authentication (“Password for…”) times out.
Why Docker Builds Time Out
Docker builds rely on:
thousands of filesystem reads/writes
creating and diffing layers
scanning build context efficiently
Mounted storage introduces too much latency for Docker’s I/O pattern, so builds hang or time out. This is expected: mounted drives in AML were never intended for build operations.
The fact that everything works correctly on the VM’s local filesystem confirms that the mounted drive is the bottleneck.
Recommended Solution
Here’s the pattern that works reliably and is recommended by Azure ML engineering:
1. Don’t Git clone directly into the mounted directory
Instead, clone your repo into the compute’s local disk:
cd /tmp # or /home/azureuser
git clone https://<username>:<PAT>@dev.azure.com/...
Using a PAT avoids interactive prompts and reduces timeouts.
2. Don’t run Docker builds from a mounted datastore
Move (or copy) the project to local storage before building:
cp -r /mnt/azureml/<your_datastore_path>/myrepo /tmp/myrepo
cd /tmp/myrepo
docker build -t myimage .
This avoids all latency issues from the mounted drive.
3. If you want automation
You can use:
a startup script to clone the repo onto the local disk automatically
ACR Tasks or a CI/CD pipeline (Azure DevOps/GitHub Actions) to build images outside AML and push them to ACR
AML is designed to use Docker images, not build them on top of mounted storage.
Why This Happens
Mounted datastores in AML are optimized for reading datasets, not for development workflows. Git and Docker expect fast local disks. Network-mounted storage cannot deliver the I/O performance required for:
packfile streaming
untarring large objects
Docker layer diffing
rapid small-file operations
So, although the mounted drive works fine for reading data during training, it's not suitable for Git/Docker workflows.
The behavior you’re seeing is expected with mounted storage in Azure ML. For reliable cloning and building:
always use the compute node’s local filesystem
avoid Git and Docker directly on mounted datastores
optionally build images in CI/CD and pull them into AML
Please refer this
- Git integration for Azure Machine Learning
- Troubleshooting environment issues
- Azure CLI Installation Guide
- Use a custom container to deploy a model to an online endpoint.
I Hope this helps. Do let me know if you have any further queries.
If this answers your query, could you please take a moment to retake the survey by accepting this response? Your feedback is greatly appreciated.
Thank you!