Awesome Chinese LLM
An awesome-style curated list of open-source Chinese large language models, focused on smaller-scale models suitable for private deployment, along with domain-specific fine-tunes, applications, datasets, and tutorials.
About this tool
Awesome Chinese LLM
URL: https://github.com/HqWu-HITCS/Awesome-Chinese-LLM
Category: Themed Directories
Tags: ai, llm, open-source
Overview
Awesome Chinese LLM is a curated, "awesome-style" directory of open-source Chinese large language model resources. It emphasizes smaller-scale models that are feasible for private deployment and lower-cost training, and aggregates base models, domain-specific fine-tunes, applications, datasets, and tutorials related to Chinese LLMs.
Features
1. Focus and Scope
- Concentrates on Chinese-language LLMs and related tooling.
- Prioritizes smaller-scale models that:
- Can be run by individuals or small teams.
- Are suitable for private deployment.
- Have lower training and deployment costs.
- Includes resources across the full ecosystem:
- Base / foundation models.
- Domain-specific fine-tuned models.
- Applications built on top of LLMs.
- Datasets for training and evaluation.
- Tutorials and learning materials.
2. Base Model Overview Table
Provides a comparative overview of commonly used base models, including:
-
ChatGLM family
- Variants: ChatGLM / ChatGLM2 / ChatGLM3 / ChatGLM4 (Base & Chat)
- Parameters: ~6B
- Training tokens: ~1T / 1.4T
- Max sequence length: 2K / 32K
- Commercial use: Allowed
-
LLaMA family
- Variants: LLaMA / LLaMA2 / LLaMA3 (Base & Chat)
- Parameters: 7B / 8B / 13B / 33B / 70B
- Training tokens: ~1T / 2T
- Max sequence length: 2K / 4K
- Commercial use: Partially allowed (depends on version/license)
-
Baichuan
- Variants: Baichuan / Baichuan2 (Base & Chat)
- Parameters: 7B / 13B
- Training tokens: ~1.2T / 1.4T
- Max sequence length: 4K
- Commercial use: Allowed
-
Qwen (通义千问)
- Variants: Qwen / Qwen1.5 / Qwen2 / Qwen2.5 (Base, Chat, VL)
- Parameters: 7B / 14B / 32B / 72B / 110B
- Training tokens: ~2.2T / 3T / 18T
- Max sequence length: 8K / 32K
- Commercial use: Allowed
-
BLOOM
- Variants: BLOOM
- Parameters: 1B / 7B / 176B-MT
- Training tokens: ~1.5T
- Max sequence length: 2K
- Commercial use: Allowed
-
Aquila
- Variants: Aquila / Aquila2 (Base / Chat)
- Parameters: 7B / 34B
- Max sequence length: 2K
- Commercial use: Allowed
-
InternLM
- Variants: InternLM / InternLM2 / InternLM2.5 (Base / Chat / VL)
- Parameters: 7B / 20B
- Max sequence length: up to 200K
- Commercial use: Allowed
-
Mixtral
- Variants: Base & Chat
- Parameters: 8×7B (Mixture-of-Experts)
- Max sequence length: 32K
- Commercial use: Allowed
-
Yi
- Variants: Base & Chat
- Parameters: 6B / 9B / 34B
- Training tokens: ~3T
- Max sequence length: up to 200K
- Commercial use: Allowed
-
DeepSeek
- Variants: Base & Chat
- Parameters: 1.3B / 7B / 33B / 67B
- Max sequence length: 4K
- Commercial use: Allowed
-
XVERSE
- Variants: Base & Chat
- Parameters: 7B / 13B / 65B / A4.2B
- Training tokens: ~2.6T / 3.2T
- Max sequence length: 8K / 16K / 256K
- Commercial use: Allowed
3. Structured Directory of Resources
The repository is organized as an "awesome list" with a table of contents that includes (among others):
- 1. 模型 (Models)
- 1.1 文本 LLM 模型 (Text LLMs)
- 1.2 多模态 LLM 模型 (Multimodal LLMs)
(Additional sections for applications, datasets, and tutorials exist but are not fully visible in the provided excerpt.)
4. Scale and Community Activity
- Tracks and collects 100+ Chinese LLM-related open-source resources.
- Hosted as a public GitHub repository, allowing community contributions via pull requests.
- As of the snapshot, shows significant community interest (stars and forks), indicating active maintenance and ecosystem relevance.
5. Contribution Guidelines (High-Level)
- Encourages contributions of:
- New open-source models.
- Applications built on Chinese LLMs.
- Datasets and tutorials.
- Requests contributors to follow a consistent format, including:
- Repository link.
- Star count.
- Concise introduction/description.
Pricing
- This is an open-source, free GitHub directory.
- No pricing plans or paid tiers are indicated.
Loading more......
Information
Categories
Tags
Similar Products
6 result(s)A GitHub-hosted awesome list that curates frameworks, tools, and resources for building and deploying AI agents, including multi-agent systems and autonomous coding assistants. It is explicitly tagged as an "awesome" and "awesome-list" repository, making it directly relevant as part of the broader meta collection of awesome directories.
Awesome-LLM-RL is an awesome-style curated list focused on reinforcement learning with large language models. It catalogs open-source frameworks, libraries, and learning resources, including projects built on Ray, vLLM, ZeRO-3, and HuggingFace Transformers, serving as a specialized awesome directory within the broader AI and LLM ecosystem.
Awesome-Vibe-Coding is a curated "awesome" list of open-source projects, tools, and learning resources for vibe coding—AI-assisted, modern software development workflows. It organizes AI development toolkits, web-based IDEs, cloud-based agents, and educational materials, fitting into the broader ecosystem of meta awesome directories focused on artificial intelligence and large language models.
An awesome list of beginner-friendly open source projects and tutorials, focused on first-timer and newcomer contributions, with GitHub star and fork metadata for each entry.
A curated collection that regularly shares interesting, entry-level open source projects on GitHub. While not branded as a traditional awesome-list, it serves as an awesome-style discovery directory for noteworthy GitHub repositories, especially for beginners.
An awesome-list style directory of alternative open-source front-ends for major internet platforms (YouTube, Twitter/X, Reddit, Instagram, etc.). It catalogs privacy-focused, tracker-free UI alternatives and is part of the wider awesome-lists ecosystem.