The first time most people learned about the direct product (also called direct sum) and tensor product may vary wildly, because they are so universal in mathematics, physics and computer science. I learned it first in either multivariable calculus or abstract algebra. In multivariable calculus, in particular, linear algebra, the distinction is pretty clear, based on dimension consideration. A direct sum of two vector spaces will have dimension the sum of the summands. The tensor product would have as dimension the product of the constituents. There the direct sum is the more easily grasped one, because it has the nice interpretation in terms of concatenation of two vectors. The direct sum of two matrices is simply a bigger block diagonal matrix with each block being the summand matrix. Tensor product of two vectors would actually require some calculation (in particular multiplication). For instance, if , and , then . The operator version similarly requires some multiplications of the matrix entries. But tensors are useful because they allow the introduction of differential forms, or multilinear maps in general, which formalizes high dimensional integration, as well as change of base ring. In fact a matrix is a 2-tensor. In differential geometry, the curvature tensor is an object that encodes all the information about a Riemannian metric. Such information not only has geometric meaning, but is relevant in modern physics as well. For instance, the energy momentum 4 tensor of General relativity, as well as stress tensor of fluid dynamics or material science are well studied tensor fields of which partial differential equations can be written. Of course one can write down a system of PDEs purely in terms of the component elements of these tensor fields, but doing that would lose the big picture and furthermore obliviate the multilinearity of the quantities. Another example that comes from physics is the so-called second quantization. If first quantization is about turning observables in space time into operators that act on the space of configurations of a single particle, which are treated as functions over space time, then second quantization is about multiple particles. The natural operations associated with multiple particles are then tensor product of single particle operators, wave functions, etc. Depending on whether the particles are indistinguishable or distinguishable, we use the symmetric tensor or the antisymmetric tensor, which are the bosonic or fermionic statistics respectively. Furthermore, we can even think of the entire spectrum of space time as an index set of particles. Thus associated with each point in space time is a creation and annihilation operator. The corresponding configuration space then becomes the uncountably infinite tensor product of single particle states. Thus in some sense, tensor products are more universal than direct products in nature. In terms of elementary school mathematics, direct product corresponds to the so-called principle of addition (加法原理）(even though there is no addition operation going on when you take the direct product of two vectors)，whereas tensor product is the principle of multiplication(乘法原理) which underlies many of the combinatorial object like permutation, binomial coefficients , etc. I remember getting sometimes confused about when to use which one back in the days.
So a final confusing appearance of tensor product comes in probability theory. When you have two probability spaces and , one can take their product, and consider probability measure . The question now is, should this product be direct product or tensor product. If we take the linear algebra analogy, since the space now is the product of two smaller spaces, one would expect direct product. But the problem is that and are not acting on the elements of and , but rather subsets of A and B! This can still be confusing in the infinite setting, so let’s think about what happens when A and B are finite sets. Then subsets of A for example are nothing but binary vectors indexed by the elements in A. Thus the total dimension of the product of over is . this is enough to convince you that the product measure is a tensor product. So when do we use direct product of two measures? thats when we take the union of two probability spaces. But that’s not possible since probability spaces have total measure 1, that’s why you never hear about direct sum of two probability measures. One open question: one could define gaussian probability measures on Hilbert spaces. So what would be the corresponding measures on the tensor product of two Hilbert spaces each of which is already equipped with a Gaussian probability measure? Take each constituent space to be finite dimensional for instance.