Middle-mile logistics describes the problem of routing parcels through a network of hubs, which are linked by a fixed set of trucks. The main challenge comes from the finite capacity of the trucks. The decision to allocate a parcel to a specific truck might block another parcel from using the same truck. It is thus necessary to solve for all parcel routes simultaneously. Exact solution methods scale poorly with the problem size and real-world instances are intractable. Instead, we turn to reinforcement learning (RL) by rephrasing the middle-mile problem as a multi-object goal-conditioned Markov decision process. The key ingredients of our proposed method for parcel routing are the extraction of small feature graphs from the environment state and the combination of graph neural networks with model-free RL. There remain several open challenges and we provide an open-source implementation of the environment to encourage stronger cooperation between the reinforcement learning and logistics communities.