Amazon.com Inc. has reportedly developed a multimodal large language model that could debut as early as next week. The Information on Wednesday cited sources as saying that the algorithm is known as ...
Transformer-based models have rapidly spread from text to speech, vision, and other modalities. This has created challenges for the development of Neural Processing Units (NPUs). NPUs must now ...
Microsoft has unveiled two new additions to its Phi-4 family of small language models: Phi-4-multimodal, which integrates speech, vision, and text, and Phi-4-mini. In December 2024, Microsoft ...
A generalized architectural blueprint for building efficient MLLMs. This template achieves efficiency through a combination of component choices and data flow optimization. Key strategies include: (1) ...
Chinese AI startup Zhipu AI aka Z.ai has released its GLM-4.6V series, a new generation of open-source vision-language models (VLMs) optimized for multimodal reasoning, frontend automation, and ...