Deep learning is a machine learning method based on artificial neural networks that uses multi-layer neural networks to automatically learn the characteristics of data. The following are several common deep learning algorithms:
Feedforward neural networks are the most basic deep learning architecture, in which data flows through the network in one direction without forming loops. FNN is very effective for classification and regression problems.
CNN is a deep learning network specifically designed to process image data. It uses convolutional layers to automatically extract spatial features in images. It is often used for tasks such as image classification and object detection.
RNN can process sequence data, such as time series, language models, etc. It uses a loop structure to enable the network to remember previous inputs, and is widely used in speech recognition, natural language processing, etc.
LSTM is an improved version of RNN that solves the long-term dependency problem in traditional RNN, allowing it to maintain key information over longer sequences.
Autoencoders are an unsupervised learning method used for dimensionality reduction and data denoising. It compresses the input data into a low-dimensional hidden layer and then attempts to restore the original data.
GAN consists of a generator that tries to generate realistic data and a discriminator that tries to differentiate between real and generated data. GAN is widely used in tasks such as image generation and style transfer.
Transformer is a model based on the attention mechanism, which is particularly outstanding in natural language processing. It is able to handle long sequence data and is faster to train compared to RNN.
TensorFlowIt is an open source machine learning framework developed by the Google Brain team. It uses the concept of Data Flow Graphs to allow developers to build complex neural networks. Its name comes from the "Flow" (flow) of "Tensor" (multi-dimensional array) in the operation graph.
TensorFlow is designed in multiple layers to balance flexibility and development efficiency:
The typical process for developing a model in TensorFlow is as follows:
| stage | illustrate |
|---|---|
| Data preparation | usetf.dataAPI for data reading, cleaning and preprocessing. |
| Build a model | throughtf.keras.Sequentialor Functional API to define the network layer. |
| Compilation and training | Set the optimizer (Optimizer) and loss function (Loss Function), and executemodel.fit()。 |
| Assessment and Deployment | Verify model accuracy and export asSavedModelformat for deployment. |
The following is an example of building a simple linear regression model using Keras:
import tensorflow astf
import numpy as np
# Build model
model = tf.keras.Sequential([
tf.keras.layers.Dense(units=1, input_shape=[1])
])
# Compile model
model.compile(optimizer='sgd', loss='mean_squared_error')
# Prepare test data
xs = np.array([-1.0, 0.0, 1.0, 2.0, 3.0, 4.0], dtype=float)
ys = np.array([-3.0, -1.0, 1.0, 3.0, 5.0, 7.0], dtype=float)
#Train model
model.fit(xs, ys, epochs=500, verbose=0)
# Make predictions
print("Prediction result:", model.predict([10.0]))
Kerasis a high-level neural network API written in Python, designed to enable rapid experimentation. It was originally developed by François Chollet and is now available asTensorFlowThe official high-level interface (tf.keras). The core design principles of Keras are user-friendly, modular, and easy to extend, allowing developers to build deep learning models with minimal coding.
| Way | Features | Applicable scenarios |
|---|---|---|
| Sequential API | Simply layer upon layer, one after the other. | Single-input, single-output linear stacking model. |
| Functional API | Can define complex graphics and support multiple input/output. | Residual network (ResNet), multi-branch model. |
| Subclassing | through inheritanceModelCategory custom behavior. |
R&D scenarios that require full control of forward propagation logic. |
Completing a machine learning task in Keras usually involves the following five steps:
fit()The function feeds data for learning.evaluate()Check the performance on the test set.predict()Produce prediction results for new data.The following is the standard way to use Keras to build a simple image classification network (such as MNIST):
from tensorflow.keras import layers, models
# 1. Define Sequential model
model = models.Sequential([
layers.Flatten(input_shape=(28, 28)), #Input layer
layers.Dense(128, activation='relu'), # Hidden layer
layers.Dropout(0.2), # Prevent overfitting
layers.Dense(10, activation='softmax') # Output layer
])
# 2. Compile
model.compile(optimizer='adam',
loss='sparse_categorical_crossentropy',
metrics=['accuracy'])
# 3. Training (assuming x_train, y_train already exist)
# model.fit(x_train, y_train, epochs=5)
Since TensorFlow 2.0, Keras has been its default high-level API. This means you can use the simple syntax of Keras while taking advantage of TensorFlow’s underlying distributed training, TPU acceleration, and powerful deployment capabilities (such as TensorFlow Serving).
In Keras,LayerIt is the basic unit for constructing neural networks. Each layer encapsulates specific calculation logic (such as matrix multiplication) and status (weight weightweights). A model essentially connects multiple layers to form a structure for data flow.
| category | Commonly used layers (tf.keras.layers) | Main functions |
|---|---|---|
| Base layer (Core) | Dense |
Fully connected layer, execute \(y = f(Wx + b)\). |
| Convolutional layer | Conv2D, Conv1D |
Used for feature extraction, often used in images or time series. |
| Pooling layer (Pooling) | MaxPooling2D, AveragePooling2D |
Reduce dimensionality and reduce the amount of calculation while retaining key features. |
| Recurrent layer (Recurrent) | LSTM, GRU, SimpleRNN |
Processes sequence data (such as text, stock prices) and has memory. |
| Regularization layer (Regularization) | Dropout, BatchNormalization |
Prevent overfitting and accelerate convergence. |
'relu', 'sigmoid', 'softmax'), determines the nonlinear transformation of the output.In addition to the layers for calculating features, there are also special layers for transforming data structures:
The following shows how various layers work together in an image processing model:
from tensorflow.keras import layers, models
model = models.Sequential([
# Convolution layer extracts spatial features
layers.Conv2D(32, (3, 3), activation='relu', input_shape=(64, 64, 3)),
# Pooling layer compression features
layers.MaxPooling2D((2, 2)),
# Leveling layer is ready to enter full connection
layers.Flatten(),
# Fully connected layer for learning
layers.Dense(64, activation='relu'),
# Dropout to prevent overfitting
layers.Dropout(0.5),
# Output layer (assuming 10 categories)
layers.Dense(10, activation='softmax')
])
Each layer has trainable parameters (Trainable Params). For exampleDense(units=10)If the input dimension is 50, the number of parameters is \(50 \times 10\) (weight) + \(10\) (bias) = \(510\). You can usemodel.summary()to view the parameter distribution of each layer.
Feedforward Neural Network (FNN) is the most basic neural network-like architecture. Data enters from the input layer, undergoes calculations in one or more hidden layers, and finally outputs results from the output layer. Data flow is always forward, with no loops or feedback paths.
Use the followingtf.keras.SequentialBuild a standard multilayer perceptron (MLP) suitable for structured data classification (such as the Iris data set or house price prediction):
from tensorflow.keras import layers, models
def build_fnn_model(input_dim, num_classes):
model = models.Sequential([
# Input layer and first hidden layer
layers.Dense(64, activation='relu', input_shape=(input_dim,)),
#Second hidden layer
layers.Dense(32, activation='relu'),
# Regularization layer (optional) to reduce overfitting
layers.Dropout(0.2),
# Output layer (softmax is used for multi-classification, and linear is usually added for regression)
layers.Dense(num_classes, activation='softmax')
])
return model
# Assume there are 20 input features and 3 categories of classification targets
model = build_fnn_model(input_dim=20, num_classes=3)
model.summary()
When compiling, you need to select an appropriate loss function (Loss Function) according to the task type:
model.compile(optimizer='adam',
loss='sparse_categorical_crossentropy',
metrics=['accuracy'])
# Execute training
# model.fit(x_train, y_train, epochs=50, batch_size=32, validation_split=0.2)
| Component name | Commonly used settings | Function description |
|---|---|---|
| Dense (fully connected layer) | units=64 |
Connect all neurons in the previous layer to this layer to learn nonlinear combinations of features. |
| ReLU (activation function) | activation='relu' |
Solving the vanishing gradient problem is currently the most commonly used activation function for hidden layers. |
| Softmax (output function) | activation='softmax' |
Transform the output into a probability distribution such that all class probabilities sum to 1. |
| Adam (optimizer) | optimizer='adam' |
Algorithms that automatically adjust the learning rate can usually achieve faster and stable convergence. |
ValueError: Input 0 of layer dense is incompatible with the layer, please usetraceback.format_exc()Check whether the Shape of the input data matchesinput_shapedefinition.Convolutional Neural Network (CNN) mainly consists of convolutional layer (Convolutional Layer), pooling layer (Pooling Layer) and fully connected layer (Dense Layer). The convolutional layer is responsible for extracting image spatial features, the pooling layer is responsible for reducing the data dimension, and finally the fully connected layer makes classification decisions.
Use the followingtf.keras.SequentialBuild a classic CNN model, suitable for image classification tasks such as MNIST or CIFAR-10:
from tensorflow.keras import layers, models
def build_cnn_model(input_shape, num_classes):
model = models.Sequential([
# The first set of convolution and pooling: extract basic features
layers.Conv2D(32, (3, 3), activation='relu', input_shape=input_shape),
layers.MaxPooling2D((2, 2)),
# The second set of convolution and pooling: extracting high-order features
layers.Conv2D(64, (3, 3), activation='relu'),
layers.MaxPooling2D((2, 2)),
# The third set of convolutions: further strengthen features
layers.Conv2D(64, (3, 3), activation='relu'),
# Flattening and fully connected layers: convert feature maps into classification results
layers.Flatten(),
layers.Dense(64, activation='relu'),
layers.Dense(num_classes, activation='softmax')
])
return model
# Build the model (assuming the input image is 28x28 grayscale and the number of categories is 10)
model = build_cnn_model(input_shape=(28, 28, 1), num_classes=10)
model.summary()
After building the model, you need to specify the optimizer, loss function and evaluation indicators. For multi-classification problems, it is common to useAdamOptimizer andSparseCategoricalCrossentropy。
model.compile(optimizer='adam',
loss='sparse_categorical_crossentropy',
metrics=['accuracy'])
# Assume that there are already prepared training data x_train, y_train
# model.fit(x_train, y_train, epochs=10, batch_size=64)
| Layer name | Key parameter examples | Main function |
|---|---|---|
| Conv2D | filters=32, kernel_size=(3,3) |
Through convolution kernels and image operations, local features such as edges and textures are extracted. |
| MaxPooling2D | pool_size=(2,2) |
Select the maximum value of the region and reduce the resolution to reduce the amount of calculation and avoid overfitting. |
| Flatten | none | Flatten the multi-dimensional tensor into a one-dimensional vector to enter the final classifier. |
| Dense | units=10, activation='softmax' |
Map the extracted features to specific class probabilities. |
layers.Dropout(0.5), randomly discarding neurons to enhance generalization ability.tf.keras.layers.RandomFlipThe other layers automatically rotate or flip images during training to increase sample diversity.traceback.format_exc()Check for tensor dimension (Shapes) mismatch or out of memory (OOM).Recurrent Neural Network (RNN) is specially used to processsequence data, such as time series, speech or natural language. Different from FNN, RNN has the ability to "memory". The neurons in the hidden layer will pass the current information to the next time step (Time Step), thereby capturing the contextual correlation in the data.
In practical applications, in order to avoid the "vanishing gradient" problem caused by long sequences, we usually useLSTMorGRUlayers. The following builds a simple LSTM model to predict time series (such as stock prices or temperatures):
from tensorflow.keras import layers, models
def build_rnn_model(timesteps, features):
model = models.Sequential([
# LSTM layer: requires 3D input (samples, timesteps, features)
layers.LSTM(50, activation='relu', input_shape=(timesteps, features), return_sequences=True),
# Second layer LSTM: If the sequence is not returned, return_sequences=False (default)
layers.LSTM(50, activation='relu'),
# Fully connected layer for final output
layers.Dense(25),
layers.Dense(1) # Assuming it is a regression problem, predict the next value
])
return model
# Assume that the data of the past 10 days are observed, and there are 5 features per day
model = build_rnn_model(timesteps=10, features=5)
model.summary()
RNN layer for input dataShapeThe requirements are very strict, and this is where beginners most often report errors:
| Dimension name | illustrate | example |
|---|---|---|
| Samples | The total number of training samples. | 1000 records |
| Timesteps | The length of the sequence (time window). | Observe the past 30 days |
| Features | The number of features at each time point. | Opening price, closing price, trading volume |
| Layer name | Features | Suggested scenarios |
|---|---|---|
| SimpleRNN | The most basic structure, fast operation but extremely short memory. | Very short sequences or simple patterns. |
| LSTM | It has a gate mechanism and can retain long-term memory. | Long text processing, complex time series prediction. |
| GRU | A simplified version of LSTM with fewer parameters and faster training. | An alternative to LSTM when computing resources are limited. |
clipnorm=1.0Can increase stability.return_sequences=True。Input 0 of layer lstm is incompatible with the layer, please passtraceback.format_exc()examineX_train.shapeWhether it is indeed a three-dimensional tensor.# Compilation example
model.compile(optimizer='adam', loss='mean_squared_error')
Long Short-Term Memory (LSTM) is a special type of RNN, originally designed to solve the vanishing gradient problem that occurs when traditional RNN processes long sequences. It uses a "gate mechanism" (forgetting gate, input gate, output gate) to control the retention and discarding of information, allowing it to capture long-term dependencies in data.
The following establishes a two-layer stacked LSTM model, which is often used for stock price prediction, power load prediction, or weather sensing data analysis:
from tensorflow.keras import layers, models
def build_lstm_model(timesteps, features):
model = models.Sequential([
# First layer LSTM: return_sequences=True must be set to pass the sequence to the next layer
layers.LSTM(units=50, return_sequences=True, input_shape=(timesteps, features)),
layers.Dropout(0.2), # Prevent overfitting
# Second layer LSTM: The last layer usually does not return the sequence
layers.LSTM(units=50, return_sequences=False),
layers.Dropout(0.2),
# Fully connected layer output
layers.Dense(units=1) # Predict a single value
])
model.compile(optimizer='adam', loss='mean_squared_error')
return model
# Example: Observe 60 time points in the past, each point has 1 feature (such as price)
model = build_lstm_model(timesteps=60, features=1)
model.summary()
The input to LSTM must be a three-dimensional tensor(Samples, Timesteps, Features). Before feeding the data into the model, it is often necessary to use NumPy for conversion:
import numpy as np
# Assume that the original data is a one-dimensional sequence
data = np.random.rand(1000, 1)
# Convert to (number of samples, time steps, number of features)
#Example: 1 prediction for every 60 predictions
X_train = []
for i in range(60, 1000):
X_train.append(data[i-60:i, 0])
X_train = np.array(X_train)
X_train = np.reshape(X_train, (X_train.shape[0], X_train.shape[1], 1))
| Parameter name | illustrate |
|---|---|
| units | The number of neurons in the hidden layer represents the memory capacity of the model. |
| return_sequences | IfTrue, output the entire sequence; ifFalse, only the last time step is output. |
| input_shape | The format is(length of time, number of features), does not include the number of samples. |
| Dropout | Randomly setting some units to 0 during training can effectively reduce the risk of overfitting of the model. |
expected ndim=3, found ndim=2, please check whether the input data has passedreshapeConvert to 3D.MinMaxScalerScale the data between 0 and 1.batch_sizeor usetraceback.format_exc()Check the specific cause of the error.Autoencoder is aunsupervised learningA neural network is designed to compress input data into a low-dimensional representation (encoding), and then reconstruct the original data (decoding) from it. It mainly consists of two parts:
The following uses the Keras Functional API to build a basic autoencoder for image denoising or dimensionality reduction:
from tensorflow.keras import layers, models
#Set the input dimension (assumed to be 28x28 image after flattening)
input_dim = 784
encoding_dim = 32 #Compressed feature dimension
# 1. Define Encoder
input_img = layers.Input(shape=(input_dim,))
encoded = layers.Dense(encoding_dim, activation='relu')(input_img)
# 2. Define decoder (Decoder)
decoded = layers.Dense(input_dim, activation='sigmoid')(encoded)
# 3. Build an autoencoder model (including input to reconstructed output)
autoencoder = models.Model(input_img, decoded)
# 4. Build a separate encoder model (for feature extraction)
encoder = models.Model(input_img, encoded)
# 5. Compile the model (usually using MSE as the loss function)
autoencoder.compile(optimizer='adam', loss='binary_crossentropy')
| Application type | Main purpose | Features |
|---|---|---|
| Data dimensionality reduction | Replace PCA | Can capture nonlinear characteristic relationships. |
| Image denoising (Denoising) | Remove image noise | Input a noisy image and the target is the original clean image. |
| Anomaly detection | Detect credit card fraud and equipment failure | If the reconstruction error (Reconstruction Error) is too large, it is an anomaly. |
| Generate model | VAE (variational autoencoder) | New material can be generated randomly from the coding space. |
When processing images, it is better to use convolutional layers. Encoder usageConv2DandMaxPooling2D, the decoder usesUpSampling2DorConv2DTranspose:
# Encoder part
x = layers.Conv2D(16, (3, 3), activation='relu', padding='same')(input_img_2d)
x = layers.MaxPooling2D((2, 2), padding='same')(x)
#Decoder part
x = layers.Conv2D(16, (3, 3), activation='relu', padding='same')(x)
x = layers.UpSampling2D((2, 2))(x)
decoded = layers.Conv2D(1, (3, 3), activation='sigmoid', padding='same')(x)
y_trainthat isx_trainitself, that ismodel.fit(x_train, x_train, ...)。encoding_dimToo small to fully capture the data characteristics.traceback.format_exc()Check the activation function (e.g. the output layer should be sigmoid or linear depending on the data range).Generative Adversarial Network (GAN) consists of two competing neural networks:
Both evolve during training: the generator learns to produce more realistic material, while the discriminator learns to become a sharper inspector. This dynamic balance ultimately enables the generator to produce high-quality realistic materials.
The following shows a basic GAN structure for generating MNIST-like handwritten digits:
from tensorflow.keras import layers, models, optimizers
# 1. Define generator
def build_generator(latent_dim):
model = models.Sequential([
layers.Dense(128, input_dim=latent_dim),
layers.LeakyReLU(alpha=0.2),
layers.Dense(256),
layers.LeakyReLU(alpha=0.2),
layers.Dense(784, activation='tanh') # The output range is between -1 and 1
])
return model
# 2. Define discriminator
def build_discriminator():
model = models.Sequential([
layers.Dense(256, input_dim=784),
layers.LeakyReLU(alpha=0.2),
layers.Dropout(0.3),
layers.Dense(1, activation='sigmoid') # Binary classification: true or false
])
model.compile(loss='binary_crossentropy', optimizer=optimizers.Adam(0.0002, 0.5))
return model
# 3. Define adversarial network (combination model)
def build_gan(generator, discriminator):
discriminator.trainable = False # Fix the discriminator in the combined model
model = models.Sequential([generator, discriminator])
model.compile(loss='binary_crossentropy', optimizer=optimizers.Adam(0.0002, 0.5))
return model
The training of GAN is different from that of general models. It requires alternate training of the discriminator and generator:
| training phase | Operation steps | training objectives |
|---|---|---|
| train discriminator | Input half real images and half fake images, and give them labels (1 and 0). | Maximize the accuracy of identifying authenticity. |
| training generator | Input random noise through the adversarial network and set the tags all to 1 (pretend to be the real thing). | Minimizes the chance of the discriminator detecting a forgery. |
When processing images, switching to convolutional layers can greatly improve the quality of the generated images. The generator will useConv2DTranspose(transposed convolution) to enlarge the feature map:
# Example of transposed convolution layer in generator
model.add(layers.Conv2DTranspose(128, (4,4), strides=(2,2), padding='same'))
model.add(layers.LeakyReLU(alpha=0.2))
traceback.format_exc()Catching exceptions in training loops. It is recommended to periodically save the images produced by the generator to observe the visual evolution.Transformer is a method that abandons the traditional RNN loop structure and is completely based onAttention Mechanismarchitecture. Its core lies in the "Multi-Head Self-Attention layer", which can process all positions in the sequence at the same time, perfectly solving the long-distance dependency problem, and is the cornerstone of current large models such as BERT and GPT.
In Keras, we usually encapsulate the core unit of Transformer into a custom layer or function. A standard Transformer Block includes multi-head attention, addition and normalization (Add & Norm), and feed forward network (Feed Forward).
from tensorflow import keras
from tensorflow.keras import layers
def transformer_encoder(inputs, head_size, num_heads, ff_dim, dropout=0):
# 1. Multi-Head Self-Attention
x = layers.MultiHeadAttention(
key_dim=head_size, num_heads=num_heads, dropout=dropout
)(inputs, inputs)
x = layers.Dropout(dropout)(x)
res = x + inputs # Residual connection
x = layers.LayerNormalization(epsilon=1e-6)(res)
# 2. Feed Forward Network
x_ff = layers.Dense(ff_dim, activation="relu")(x)
x_ff = layers.Dropout(dropout)(x_ff)
x_ff = layers.Dense(inputs.shape[-1])(x_ff)
x = x_ff + x # Residual connection
return layers.LayerNormalization(epsilon=1e-6)(x)
The following shows how to apply Transformer to sequence classification tasks (such as sentiment analysis or time series classification):
def build_transformer_model(input_shape, head_size, num_heads, ff_dim, num_transformer_blocks, mlp_units, num_classes, dropout=0):
inputs = keras.Input(shape=input_shape)
x = inputs
# Stack multiple Transformer Encoder layers
for _ in range(num_transformer_blocks):
x = transformer_encoder(x, head_size, num_heads, ff_dim, dropout)
#Global average pooling and final classification layer
x = layers.GlobalAveragePooling1D(data_format="channels_last")(x)
for dim in mlp_units:
x = layers.Dense(dim, activation="relu")(x)
x = layers.Dropout(dropout)(x)
outputs = layers.Dense(num_classes, activation="softmax")(x)
return keras.Model(inputs, outputs)
#Example parameters
model = build_transformer_model(
input_shape=(100, 64), # 100 time points, 64 features at each point
head_size=256,
num_heads=4,
ff_dim=4,
num_transformer_blocks=4,
mlp_units=[128],
num_classes=2,
dropout=0.1
)
model.summary()
| Component name | Function description |
|---|---|
| MultiHeadAttention | Calculate the correlation strength between different positions in the sequence to capture contextual information. |
| Positional Encoding | Due to the parallel processing of Transformer, additional position information needs to be added (usually added to the input layer). |
| LayerNormalization | Stabilizes the activity of neurons and accelerates training convergence, which is different from the Batch Norm commonly used in CNN. |
| Residual Connection | throughx + inputsMake gradients easier to propagate and prevent deep network degradation. |
Incompatible shapes, usually occurs at residual connections. Please ensure that after passing through the attention layer or Dense layer, the output dimension is the same as the inputinputsTotally consistent.PyTorchIt is an open source machine learning framework based on the Torch library, mainly developed by the AI research team of Meta (formerly Facebook). It is designed with Python-first in mind and emphasizes flexibility and dynamics. It has become the most popular framework in academic research circles and is widely used in industry.
| Component name | Main purpose |
|---|---|
| torch.nn | Contains various neural network layers (such as Linear, Conv2d) and loss functions. |
| torch.optim | Provides optimization algorithms such as SGD, Adam, and RMSprop. |
| torch.utils.data | Process data loading, includingDatasetandDataLoader。 |
| torchvision | A toolkit specially designed for computer vision, including commonly used data sets, model architectures and image conversions. |
Developing a model in PyTorch typically follows these steps:
Datasetcategory and useDataLoaderPerform batch processing.nn.Module,exist__init__Define the layer inforwardDefine forward propagation logic in .The following is a simple linear regression model implementation:
import torch
import torch.nn as nn
# 1. Define model architecture
class LinearModel(nn.Module):
def __init__(self):
super(LinearModel, self).__init__()
self.linear = nn.Linear(1, 1) # input 1, output 1
def forward(self, x):
return self.linear(x)
model = LinearModel()
# 2. Define loss function and optimizer
criterion = nn.MSELoss()
optimizer = torch.optim.SGD(model.parameters(), lr=0.01)
# 3. Training loop (simplified version)
# for inputs, targets in dataloader:
# outputs = model(inputs)
# loss = criterion(outputs, targets)
# optimizer.zero_grad()
# loss.backward()
# optimizer.step()
Classify multiple sets of time series data through the classifier model of PyTorch architecture
Assume that multiple sets of time series data are labeled data sets, containing different classification labels. We need to preprocess the data into something suitable for PyTorchDatasetandDataLoaderformat for training and testing.
import torch
from torch.utils.data import DataLoader, Dataset
# Assume that each set of data has characteristics at multiple time points
class TimeSeriesDataset(Dataset):
def __init__(self, data, labels):
self.data = data
self.labels = labels
def __len__(self):
return len(self.data)
def __getitem__(self, idx):
return torch.tensor(self.data[idx], dtype=torch.float32), torch.tensor(self.labels[idx], dtype=torch.long)
Here a simpleLong short-term memory network (LSTM)Model to process time series data and classify the final output into multiple categories. Below is an example of a simple LSTM model.
import torch.nn as nn
class LSTMClassifier(nn.Module):
def __init__(self, input_size, hidden_size, num_layers, num_classes):
super(LSTMClassifier, self).__init__()
self.lstm = nn.LSTM(input_size, hidden_size, num_layers, batch_first=True)
self.fc = nn.Linear(hidden_size, num_classes)
def forward(self, x):
h0 = torch.zeros(num_layers, x.size(0), hidden_size).to(x.device)
c0 = torch.zeros(num_layers, x.size(0), hidden_size).to(x.device)
out, _ = self.lstm(x, (h0, c0))
out = self.fc(out[:, -1, :])
return out
Next, set the loss function and optimizer, and feed the data into the model for training.
import torch.optim as optim
#Model parameters
input_size = 10 #Number of features at each time point
hidden_size = 64
num_layers = 2
num_classes = 3 #Number of classification categories
model = LSTMClassifier(input_size, hidden_size, num_layers, num_classes)
criterion = nn.CrossEntropyLoss()
optimizer = optim.Adam(model.parameters(), lr=0.001)
# training loop
for epoch in range(num_epochs):
for inputs, labels in train_loader:
optimizer.zero_grad()
outputs = model(inputs)
loss = criterion(outputs, labels)
loss.backward()
optimizer.step()
print(f'Epoch {epoch+1}/{num_epochs}, Loss: {loss.item()}')
Evaluate model performance on test data.
model.eval()
correct = 0
total = 0
with torch.no_grad():
for inputs, labels in test_loader:
outputs = model(inputs)
_, predicted = torch.max(outputs, 1)
total += labels.size(0)
correct += (predicted == labels).sum().item()
print(f'Test accuracy: {100 * correct / total:.2f}%')
The main reason why you generally cannot enable the GPU is that the version of PyTorch you installed is+cpuVersion.
Before installing the GPU-enabled version, you must first remove the existing CPU-only version to avoid library conflicts:
pip uninstall torch torchvision torchaudio
If your CUDA version is 13.1, you need to install the corresponding or compatible PyTorch command. Please note that PyTorch is usually compiled against a specific CUDA version.
Please execute the following command (install PyTorch that supports the latest CUDA version):
pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu124
Note: As of now, the official stable version of PyTorch may not be fully tagged with cu131, but usually cu124 or the latest CUDA compiled version can be downward compatible with the 13.x driver.
Once the installation is complete, execute your check script again. If successful, you should see the following results:
torch.cuda.is_available(): Truetorch.version.cuda: Displayed as12.4(or the version you have installed)transformersis a powerful suite developed by Hugging Face, designed for natural language processing (NLP) and other machine learning tasks. It provides convenient use of a variety of pre-trained models, allowing developers to use the most advanced technology with minimal settings.
Can be installed using piptransformersKit:
pip install transformers
Here is a simple example of using a pre-trained model for text classification:
from transformers import pipeline
# Load sentiment analysis pipeline
classifier = pipeline("sentiment-analysis")
# Perform sentiment analysis
results = classifier("Hugging Face's Transformers kit is awesome!")
print(results)
transformersThe suite is an important tool for developers and researchers in the NLP field. Its rich model library and friendly API make it the first choice for building and deploying the most advanced machine learning applications.
from transformers import AutoModelForCausalLM, AutoTokenizer
# Select a pre-trained model, such as GPT-2
model_name = "gpt2"
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForCausalLM.from_pretrained(model_name)
# Input prompt
prompt = "A long time ago, in a far away country,"
#Convert the input to the model's encoding format
input_ids = tokenizer.encode(prompt, return_tensors="pt")
# Use the model to generate text
output = model.generate(
input_ids,
max_length=50, # The maximum number of words generated
num_return_sequences=1, #The number of texts returned
temperature=0.7, # Control the diversity of generation
top_k=50, # Limit the range of candidate words
top_p=0.9, # Use kernel sampling
do_sample=True # Enable sampling to produce varied output
)
# Convert the generated encoding back to text
generated_text = tokenizer.decode(output[0], skip_special_tokens=True)
print(generated_text)
prompt = "In the future era of artificial intelligence,"
output = model.generate(
tokenizer.encode(prompt, return_tensors="pt"),
max_length=100,
temperature=1.0,
top_p=0.95,
do_sample=True
)
print(tokenizer.decode(output[0], skip_special_tokens=True))
prompt = "The main function of artificial intelligence is"
output = model.generate(
tokenizer.encode(prompt, return_tensors="pt"),
max_length=50,
temperature=0.5,
do_sample=False # Use deterministic generation
)
print(tokenizer.decode(output[0], skip_special_tokens=True))
The above example shows how to usetransformersModel for text generation. According to different needs, you can adjust parameters to generate diverse or precise text, which is suitable for creative generation, completion of technical documents and other scenarios.
Using Hugging FacetransformersWhen installing the package, the pre-trained files of the model and tokenizer will be downloaded and stored in the default cache directory. If you need to change the cache directory, you can specify it when loading the model or tokenizer.cache_dirparameter.
from transformers import AutoModel, AutoTokenizer
# Custom cache directory
cache_directory = "./my_custom_cache"
# Load the model and tokenizer, specify the cache directory
model = AutoModel.from_pretrained("bert-base-uncased", cache_dir=cache_directory)
tokenizer = AutoTokenizer.from_pretrained("bert-base-uncased", cache_dir=cache_directory)
You can also change the global cache directory by setting environment variables to ensure that all models and tokenizers use the same cache location.
import os
from transformers import AutoModel, AutoTokenizer
# Set global cache directory
os.environ["TRANSFORMERS_CACHE"] = "./my_global_cache"
# Load model and tokenizer
model = AutoModel.from_pretrained("bert-base-uncased")
tokenizer = AutoTokenizer.from_pretrained("bert-base-uncased")
To check the location of the default cache directory, you can use the following code:
from transformers.utils import default_cache_path
print("Default cache directory:", default_cache_path)
through settingscache_diror environment variables, you can easily manage the cache directory of the Transformers suite, improving project flexibility and resource management efficiency.
The models of most AI frameworks (such as Hugging Face Transformers) are stored in preset directories when downloaded for reuse. The following are preset directories for some common frameworks:
~/.cache/huggingface/transformers/~/.tensorflow_hub/~/.cache/torch/hub/Some model files are large and require some management. The default download directory can be changed through environment variables or program parameters. Here are some specific methods:
through environment variablesHF_HOMEorTRANSFORMERS_CACHERevise.
export HF_HOME=/your/custom/path
export TRANSFORMERS_CACHE=/your/custom/path
Or specify in the program:
from transformers import AutoModel
import os
os.environ["HF_HOME"] = "/your/custom/path"
os.environ["TRANSFORMERS_CACHE"] = "/your/custom/path"
model = AutoModel.from_pretrained("model-name")
Set environment variablesTFHUB_CACHE_DIR:
export TFHUB_CACHE_DIR=/your/custom/path
Set in the program:
import os
os.environ["TFHUB_CACHE_DIR"] = "/your/custom/path"
Set environment variablesTORCH_HOME:
export TORCH_HOME=/your/custom/path
Set in the program:
import os
os.environ["TORCH_HOME"] = "/your/custom/path"
To confirm whether the model is downloaded to the specified directory, you can check the directory contents:
ls /your/custom/path
Or print the current directory in the program:
import os
print(os.environ.get("HF_HOME"))
print(os.environ.get("TRANSFORMERS_CACHE"))
File type:Standard Hugging Face model file (pytorch_model.bin), converted to FP16 format.
use:Compared to FP32, FP16 reduces memory usage and improves computing performance.
Usage situation:Commonly used for GPU inference with PyTorch or TensorFlow.
How to save as FP16:
from transformers import AutoModelForCausalLM
model = AutoModelForCausalLM.from_pretrained("model_name")
model.half() # Convert to FP16 format
model.save_pretrained("./fp16_model")
File type:Similar to FP16, but designed with better numerical stability.
use:Stable inference and training on BF16-capable GPUs such as NVIDIA A100, H100.
How to use:
model = AutoModelForCausalLM.from_pretrained("model_name", torch_dtype="torch.bfloat16").cuda()
File type:INT8 quantized Hugging Face model file.
use:Significantly reduces memory usage with minimal performance loss.
How to store INT8 models:
from transformers import AutoModelForCausalLM
model = AutoModelForCausalLM.from_pretrained("model_name", device_map="auto", load_in_8bit=True)
model.save_pretrained("./int8_model")
File type: .onnx
use:Cross-platform GPU inference, supporting ONNX Runtime and TensorRT.
How to convert to ONNX:
pip install optimum[onnxruntime] optimum-cli export onnx --model=model_name ./onnx_model
ONNX inference example:
from onnxruntime import InferenceSession
session = InferenceSession("model.onnx", providers=["CUDAExecutionProvider"])
File type: .engine
use:NVIDIA proprietary format for high-performance inference, supporting FP16 and INT8.
Method to convert to TensorRT:
trtexec --onnx=model.onnx --saveEngine=model.engine --fp16
TensorRT inference example:
import tensorrt as trt
TorchScript model:Using PyTorch.ptFormat.
Save example:
scripted_model = torch.jit.script(model)
scripted_model.save("model.pt")
| Format | File extension | Optimized use | frame |
|---|---|---|---|
| FP16 | .bin | General purpose GPU inference | PyTorch, TensorFlow |
| BF16 | .bin | numerical stability | PyTorch, TensorFlow |
| INT8 | .bin | Low memory GPU | Hugging Face + bitsandbytes |
| ONNX | .onnx | Cross-platform GPU | ONNX Runtime, TensorRT |
| TensorRT | .engine | NVIDIA GPU | TensorRT |
GGML (General Graphical Model Layer) is a machine learning model format designed for high-performance and low-resource scenarios. Its core purpose is to achieve efficient storage and inference of models, especially suitable for memory-constrained devices.
Steps to convert a machine learning model to GGML format:
pytorch_model.bin)。llama.cppConversion script provided.model.ggml.q4_0.bin。Llama (Large Language Model Meta AI) is a large language model developed by Meta, designed for generating natural language text, answering questions, and performing language understanding tasks.
These models are known for their efficiency, providing high-quality text generation results with relatively few hardware resources.
Natural language generation:Used to create stories, articles or conversations.
Question and answer system:Support user queries and generate accurate answers.
Language translation:Supports translation tasks between multiple languages.
Language understanding:Suitable for tasks such as summarization and sentiment analysis.
High efficiency:Llama requires fewer resources to train than other large models.
Openness:Meta provides support for research and commercial applications and promotes the development of the community.
flexibility:Models can run on a variety of hardware platforms, including CPUs and GPUs.
1. Install the required tools:
pip install transformers pip install sentencepiece
2. Load the model:
from transformers import AutoModelForCausalLM, AutoTokenizer
tokenizer = AutoTokenizer.from_pretrained("meta-llama/Llama-2-7b")
model = AutoModelForCausalLM.from_pretrained("meta-llama/Llama-2-7b").cuda()
input_text = "What is Llama?"
input_ids = tokenizer(input_text, return_tensors="pt").input_ids.cuda()
output = model.generate(input_ids, max_length=50)
print(tokenizer.decode(output[0]))
Llama provides multiple versions, the main difference lies in the number of parameters of the model (such as 7B, 13B, 70B):
CPU:The GGUF format can be used for efficient inference.
GPU:Supports FP16 or BF16 format for improved performance.
FP16 quantification:Reduce memory usage and increase inference speed.
INT8 quantization:Suitable for resource-constrained devices to reduce performance loss.
Mistral is a large-scale language model developed by Mistral AI that focuses on providing efficient and accurate natural language processing capabilities.
The model is known for its highly optimized architecture and streamlined computing resource requirements, making it suitable for a variety of language generation and understanding tasks.
Efficiency:Mistral is designed with the latest Transformer architecture to provide fast and accurate reasoning capabilities.
Openness:The Mistral model is open source, allowing users to run it locally and customize it.
Scalability:The model supports multiple quantization formats and is suitable for different hardware environments.
Privacy protection:Can be deployed in a local environment to avoid the risk of data leakage.
Content generation:Including article writing, conversation generation and copywriting.
Language understanding:Used for tasks such as text classification and sentiment analysis.
Educational applications:Provide teaching assistance and answer academic questions.
Automation system:Integrate into customer service systems or other automated processes.
1. Install dependencies:
pip install transformers
2. Download the model:Download Mistral model archives from Hugging Face or other official sources.
3. Load the model:
from transformers import AutoModelForCausalLM, AutoTokenizer
tokenizer = AutoTokenizer.from_pretrained("mistralai/Mistral-7B")
model = AutoModelForCausalLM.from_pretrained("mistralai/Mistral-7B").cuda()
input_text = "What is Mistral?"
input_ids = tokenizer(input_text, return_tensors="pt").input_ids.cuda()
output = model.generate(input_ids, max_length=50)
print(tokenizer.decode(output[0]))
CPU:Supports high-performance CPU inference and can be optimized using the GGUF format.
GPU:Supports FP16 or BF16 formats for optimal performance on NVIDIA GPUs.
Mistral is available in several versions, the main differences being the number of parameters and performance:
Quantification format:Use INT8 or GGUF format to reduce memory requirements and improve inference efficiency.
Performance optimization:Leverage hardware features such as the AVX instruction set or CUDA acceleration for efficient computing.
Gemma is an open source large-scale language model designed for efficient natural language processing tasks.
With its versatility and scalability at its core, the model supports text generation, language understanding, and translation tasks in multiple languages.
Multi-language support:Gemma can handle multiple languages, making it suitable for global application scenarios.
Lightweight:The model is highly optimized for resource-constrained hardware.
Scalability:Supports running on multiple hardware environments, including CPU and GPU.
Open source:Open source code makes it convenient for users to carry out secondary development and customization.
Natural language generation:Suitable for content creation, article writing and conversation generation.
Language understanding:Including tasks such as sentiment analysis, topic classification, and text summarization.
Machine translation:Provide highly accurate multi-language translation services.
Education and Research:As a teaching aid or research analysis platform.
1. Install dependencies:
pip install gemma
2. Download the model:Download the appropriate Gemma model file from the official website or model library.
3. Load the model:
from gemma import GemmaModel, GemmaTokenizer
#Initialize Tokenizer and model
tokenizer = GemmaTokenizer.from_pretrained("gemma-ai/gemma-base")
model = GemmaModel.from_pretrained("gemma-ai/gemma-base").cuda()
# Prepare to enter text
input_text = "What is Gemma?"
input_ids = tokenizer(input_text, return_tensors="pt").input_ids.cuda()
# Generate output
outputs = model.generate(input_ids, max_length=50)
print(tokenizer.decode(outputs[0]))
Gemma provides multiple versions to meet different needs:
CPU:Supports CPU inference and is suitable for low resource environments.
GPU:Runs on CUDA-enabled GPUs for high performance.
Quantitative techniques:Supports INT8 and FP16 formats, reducing memory usage while maintaining stable performance.
Hardware optimization:Leverage hardware features such as the AVX instruction set to further speed up inference.
GPT4All is an open source large-scale language model designed for natural language processing on local devices without cloud dependencies.
This model provides efficient text generation capabilities, is suitable for multiple language application scenarios, and is designed for low-resource hardware.
Open source:GPT4All is completely open source and can be customized according to your needs.
Run locally:Supports running on PC, laptop or server, no network connection required.
Lightweight:Can run on CPU and low-spec GPU, reducing hardware requirements.
Privacy protection:Since all operations are performed locally, user data is not leaked to external servers.
Content creation:Use it for writing articles, stories, blogs or technical documents.
Question and answer system:Create a Q&A assistant that works offline.
Educational assistance:Acts as a learning and problem-solving tool to help users understand complex concepts.
Development assistance:Used to generate code or provide program development suggestions.
1. Install dependencies:
pip install gpt4all
2. Download the model file:Download the required model archive from the GPT4All official website (such as.binor.ggufFormat).
3. Load the model:
from gpt4all import GPT4All
model = GPT4All("gpt4all-lora-quantized.bin")
response = model.generate("Hello, what is GPT4All?", max_tokens=100)
print(response)
.bin:Common quantized model format, suitable for most devices.
.gguf:A CPU-optimized format suitable for efficient inference on low-resource devices.
CPU:GPT4All supports high-performance CPU inference and is suitable for GPU-less environments.
GPU:If you have an NVIDIA GPU, you can use PyTorch or other frameworks for acceleration.
Quantitative model:Use INT8 or other quantization techniques to reduce memory usage.
Performance optimization:Leverage hardware features such as the AVX instruction set to speed up inference.
unnecessary. GPT4All does not rely on torch (PyTorch) or tensorflow at all.
Python GPT4All
↓
C++ backend (llama.cpp CPU version)
↓
.gguf model
When only the model file name is passed in, it will be automatically downloaded to the cache folder in the user's home directory:
~/.cache/gpt4all/
from gpt4all import GPT4All
model = GPT4All(
"Meta-Llama-3-8B-Instruct.Q4_0.gguf",
model_path="D:/llm_models/gpt4all"
)
Not supported. GPT4All's Python API is currently CPU-only.
The following error occurs:
TypeError: GPT4All.__init__() got an unexpected keyword argument 'n_gpu_layers'
It means that the currently installed GPT4All Python binding does not implement GPU related parameters, which is normal behavior.
import ollama
response = ollama.chat(
model="llama3:8b-instruct",
messages=[
{"role": "user", "content": "Explain CUDA in one sentence"}
]
)
print(response["message"]["content"])
characteristic:
from llama_cpp import Llama
llm = Llama(
model_path="Meta-Llama-3-8B-Instruct.Q4_0.gguf",
n_gpu_layers=999,
n_ctx=4096
)
print(llm("Explain CUDA in one sentence")["choices"][0]["text"])
characteristic:
This is the most common and intuitive method, suitable for developers who have fixed remote servers (such as company or laboratory hosts).
When a single GPU cannot meet the demand, or tasks need to be dynamically assigned to different nodes, a dedicated distributed framework can be used.
If developers do not have hardware equipment themselves, they can take advantage of on-demand cloud GPU resources.
| plan | Implementation |
|---|---|
| interactive platform | Use Google Colab or Kaggle Kernels to directly obtain free or paid remote GPUs (such as T4, A100) through the browser. |
| Computing power rental service | Rent specific GPU containers through RunPod or Lambda Labs, and use Docker images to quickly deploy Python execution environments. |
| Enterprise API | Use AWS SageMaker or GCP Vertex AI to encapsulate Python scripts into Jobs and send them to the cloud for execution. The system will automatically allocate GPU resources and recycle them after the operation is completed. |
When using remote computing power, the bottleneck often lies in network transmission rather than the calculation itself. The following strategies are recommended:
RayIt is an open source distributed computing framework designed to allow Python applications to easily scale from a single machine to a large cluster. It solves the problem of limited single-processor performance in Python when dealing with large-scale machine learning, data processing, and real-time inference.
Ray transforms standard Python code into distributed tasks through simple decorators:
@ray.remoteDefined asynchronous function. They are stateless and ideal for processing independent computational jobs in parallel.@ray.remotedefined categories. They are stateful and can persist data between multiple tasks, making them suitable for use in simulation environments or model serving.For developers who need powerful computing power, Ray provides an extremely simple GPU scheduling method. You don't need to manually manage the number of CUDA devices, just specify the resource requirements when defining the remote task:
@ray.remote(num_gpus=1)
def train_model(data):
# Ray will automatically assign this task to nodes with idle GPUs
# And set the environment variable CUDA_VISIBLE_DEVICES
return "Training Complete"
This abstraction allows developers to focus on algorithm logic without worrying about the allocation and recycling of underlying hardware resources.
Ray is more than just an execution engine, it also includes a series of libraries optimized for AI workflows:
| library name | Main functions |
|---|---|
| Ray Data | Data loading and transformation for large-scale machine learning training, supporting streaming processing. |
| Ray Train | Simplify distributed model training (supports PyTorch, TensorFlow, Horovod, etc.). |
| Ray Tune | Efficient hyperparameter optimization (Hyperparameter Tuning) framework. |
| Ray Serve | Used to deploy machine learning models, with automatic expansion and load balancing functions. |
Ray's greatest value lies in itslow latencyandHigh throughputcharacteristics. Compared with traditional task queues (such as Celery) or heavyweight distributed frameworks (such as Spark), Ray's syntax is closer to Python's native style, and its dynamic graph scheduling mechanism can handle extremely complex and dependent computing tasks.
PyTorch RPC (Remote Procedure Call)It is a distributed training framework officially provided by PyTorch, designed to support complex models that cannot be easily processed through data parallelism (Data Parallelism). It allows a node (Worker) to call a function on another remote node and get the result or reference the remote object like a local function.
| method name | Execution characteristics |
|---|---|
rpc_sync
|
synchronous call. After sending the request, the current execution thread will be blocked until the response result from the remote end is received. |
rpc_async
|
asynchronous call. Send one back immediatelyFutureObject, the program can continue to perform other tasks and obtain the results later. |
remote
|
Remote establishment. Create an object on the remote node and return aRRef, suitable for creating a remote parameter server (Parameter Server). |
PyTorch RPC is particularly suitable for the following types of highly complex distributed tasks:
When using the RPC framework, developers need to face higher debugging difficulties than general training. Since cross-machine communication is involved, network latency and bandwidth often become performance bottlenecks. In addition, the life cycle management of objects (through reference counting of RRef) is also more complicated in a distributed environment, and it is necessary to ensure that remote objects can be correctly recycled when they are no longer needed.
email: [email protected]