生而为人

程序员的自我修养

0%

guide line

best practise

新建cluster注意事项:

  1. 指定独立的VNet,保证可以做灵活的网络管控
  2. 对于节点多的cluster,为了保证Ambari监控性能,最好申请独立的DB,并且提高其等级,机器越多建议提高的等级越多。

azure cosmos命令

d:\project\APGold\autopilotservice\Global\VirtualEnvironments\AdsBI\AdsMz-Test-MW1-MSNMediation>sd edit deployment.int
deployment.int - file(s) not on client.

直接编辑

  1. add/edit
  2. 安装codeflow 提交pr
    \codeflow.redmond.corp.microsoft.com\public\cf2Launcher.cmd

codeflow作用

ve作用:
pe作用:

之前

D:\project\APGold\autopilotservice\Global\VirtualEnvironments\AdsBI\AdsMz-Test-MW1-MSNMediation

D:\project\APGold\autopilotservice\MW1\AdsMz-Prod-MW1

A ServiceGroup is mandatory for every machine function

VE中存储的可以理解为某个项目的global config,所有的PE公用这部分config
设计目的是在同一项目部署多个PE时,避免多次发布,只需要更新VE,相关PE就会生效

https://msazure.visualstudio.com

D:\project\APGold\autopilotservice\Global\VirtualEnvironments\AdsBI\AdsMz-Prod-MW1-MSNMediation

1
07/26/21 16:54:01.841,error processing AdsMz-Test-MW1: Rollout 'AdsBI-AdsMz-Test-MW1-MzOrchestration-VE.6799404_17553440741849145769_0.Rollout_AdsBI_Streaming_CFR' cannot be kicked due to status: InRollout - Rollout in progress..Rollout 'AdsBI-AdsMz-Test-MW1-MSNMediation-VE.6827694_4064931696698155710_0.Rollout_AdsBI_Streaming_CFR' cannot be kicked due to status: OtherRollout - Waiting on other rollout on MF AdsBI_Streaming_CFR in progress: AdsBI_Streaming_CFR.AdvertiserAggs_94d0ef91_90546_1_15-0-897+94d0ef91_CL6670088_5920165827906938206_0.AdsBI-AdsMz-Test-MW1-AdvertiserAggs-VE.csv:AdsBI_Streaming_CFR.AgoraAggs_76700988_85934_1_16-0-173+76700988_CL6670088_4056715873563314682_0.AdsBI-AdsMz-Test-MW1-AgoraAggs-VE.csv:AdsBI_Streaming_CFR.CACFR_9a63868e_90143_2_1-0-2408+9a63868e_CL6670088_11167218336511992544_0.AdsBI-AdsMz-Test-MW1-CACFR-VE.csv:AdsBI_Streaming_CFR.CFR_2d2cea68_91067_1_17-0-6662+2d2cea68_CL6670088_18415503406869780426_0.AdsBI-AdsMz-Test-MW1-CFR-VE.csv:AdsBI_Streaming_CFR.KpiAggs_fdd5f92d_90180_1_17-0-433+fdd5f92d_CL6670088_6626197140932378707_0.AdsBI-AdsMz-Test-MW1-KPIAggs-VE.csv:AdsBI_Streaming_CFR.Monetization_de0db608_91071_1_1-0-2808+de0db608_CL6670088_351842135959690630_0.AdsBI-AdsMz-Test-MW1-Monetization-VE.csv:AdsBI_Streaming_CFR.Orchestration_93eb645e_91078_1_16-0-557+93eb645e_CL6799404_17553440741849145769_0.AdsBI-AdsMz-Test-MW1-MzOrchestration-VE.csv:AdsBI_Streaming_CFR.PublisherAggs_55bd6dd9_90534_1_17-0-474+55bd6dd9_CL6670088_15361642287162420254_0.AdsBI-AdsMz-Test-MW1-PublisherAggs-VE.csv:AdsBI_Streaming_CFR.Mediation_865d3b39_90170_1_merge_20210629_-1_CL6670088_3189803787641037971_0.AdsBI-AdsMz-Test-MW1-Mediation-VE.csv:AdsBI_Streaming_CFR.MSNMediation_4a63e32e_91077_2_0-1-28+4a63e32e33_CL6827694_4798040603905411799_0.AdsBI-AdsMz-Test-MW1-MSNMediation-VE.csv:.OM.Autopilot-AutopilotClient-VE.csv..; .. Some rollouts not triggered.
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
Deployment: 6827694_4798040603905411799 (this is NOT the latest deployment.ini change number)
Downloaded 89 files, 21510455 bytes, compressed size: 8267353 bytes.
Failed to build services: Failed to build service 'SAMHourlyS2B.MSNMediation_4a63e32e_91077_2_0-1-28+4a63e32e33': Proxy error: not all data was received (DP: 25.68.114.53; by MW01NAP0000036C); ; ;Log:e:GetStreamChunkMapWithRetry:stream does not exist [response from server] : GetStreamChunkMap('stream://AdsBI-AdsMz-Test-MW1-MSNMediation-VE/app/ServiceMaps/MSNMediationServiceMap.ini'; 'MSNMediation_4a63e32e_91077_2_0-1-28+4a63e32e33') failed ;Log:w:ApDynamicStorage::StorageConnectionEx::DownloadBufferFromStreamWithRetry:Stream (stream://AdsBI-AdsMz-Test-MW1-MSNMediation-VE/app/ServiceMaps/MSNMediationServiceMap.ini _ MSNMediation_4a63e32e_91077_2_0-1-28+4a63e32e33) not found in storage ;
Majority of the building errors are temporary and will be fixed automatically.
If there is no change or progress in the app deployment log after 15 minutes, contact apswat for production environments and aptalk for non-production environment.
Look in App Deployment log for more info

Here are common error messages and possible fixes:
Msg: Error 53: The network path was not found.: ac: 0
Fix: Try adding REDMOND@ to the build path.

Msg: Msg: EDP010385: APSEQREAD::GetDataPointer error 2
Fix: Try adding REDMOND@ to the build path.

Msg: Proxy reported error: EDP010196: ApckBuilder Error 3: The system cannot find the path specified.
Fix: Look for missing directories under your build drop.

For more troubleshooting info, please refer to App Deployment Troubleshooting
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
Deployment: 6827694_9979618793807238839_0 (this is NOT the latest deployment.ini change number)
Downloaded [unknown] files, [unknown] bytes.
Failed to import version '6827694_9979618793807238839' of image for environment 'AdsBI-AdsMz-Test-MW1-MSNMediation-VE': Error downloading image from remote storage (stream 'stream://AdsBI-AdsMz-Test-MW1-MSNMediation-VE/app/image.ini@6827694_9979618793807238839')
Majority of the building errors are temporary and will be fixed automatically.
If there is no change or progress in the app deployment log after 15 minutes, contact apswat for production environments and aptalk for non-production environment.
Look in App Deployment log for more info

Here are common error messages and possible fixes:
Msg: Error 53: The network path was not found.: ac: 0
Fix: Try adding REDMOND@ to the build path.

Msg: Msg: EDP010385: APSEQREAD::GetDataPointer error 2
Fix: Try adding REDMOND@ to the build path.

Msg: Proxy reported error: EDP010196: ApckBuilder Error 3: The system cannot find the path specified.
Fix: Look for missing directories under your build drop.

For more troubleshooting info, please refer to App Deployment Troubleshooting

Severity Code Description Project File Line Suppression State
Error MSB3680 The source file “D:\project\Ads.BI.MSNMediation\private\src\Batch/Autopilot/ExternalConfigs/Microsoft.BingAds.BI.APDeploy.exe.config” does not exist. Microsoft.BI.MSNMediation.HourlyS2B.Drop D:\project\Ads.BI.MSNMediation\private\src\Batch\build\targets\CreateDrop.targets 47

Severity Code Description Project File Line Suppression State
Error MSB3030 Could not copy the file “D:\project\Ads.BI.MSNMediation\private\src\Batch/SAMHourlyS2B/Microsoft.BI.MSNMediation.HourlyS2B.Drop/bin\Debug\net472**” because it was not found. Microsoft.BI.MSNMediation.HourlyS2B.Drop D:\project\Ads.BI.MSNMediation\private\src\Batch\build\targets\CreateDrop.targets 47

c:\Users\jingqicao.nuget\packages\microsoft.bingads.bi.apdeploy\10.5.2

在sdk之外做的nuget package改动,比如删掉某个包

resolved Show files that have been merged but not submitted 跟执行目录无关
retype Reappraise the file type for files on the client
revert Discard changes from an opened file
review List and track changelists (for the review daemon)

sd review 如何只搜索某个人的review

FAREAST\jingqicao

sd opened

sd submit -c

Auzre DevOps任务定时调度

步骤:Onboard AP App Deployment using AzDeployer (Stratus)

  1. Prepare workflow folder, master config and workflow config.
  2. Prepare APDrop.
  3. Create APGold virtual environment.
  4. Setup corresponding Azure Dev Ops CD pipeline. Add Azure DevOps release pipeline
1
2
3
4
##[warning]C:\Program Files (x86)\Microsoft Visual Studio\2019\Enterprise\MSBuild\Current\Bin\amd64\Microsoft.Common.CurrentVersion.targets(2203,5): Warning MSB3245: Could not resolve this reference. Could not locate the assembly "Google.Protobuf, Version=3.6.1.0, Culture=neutral, PublicKeyToken=a7d26565bac4d604, processorArchitecture=MSIL". Check to make sure the assembly exists on disk. If this reference is required by your code, you may get compilation errors.


externalFeedCredentials: 'nuget-msazure-oldbond, nuget-msblox-azuregenevamonitoring, nuget-mscosmos-cosmosprod, nuget-mscosmos-cosmostest, nuget-msdata-bigdata, nuget-ossmsft-oss_all, nuget-trill-trill'
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
2021-08-02 06:33:05 ThreadId:   69	PauseAndResumeWorkflows() Process: MSNBI_S2BHourly_0 UserCommand: Run
2021-08-02 06:33:05 ThreadId: 79 Getting information about event AdsBI_MSNBI_S2B_Hourly_Done
2021-08-02 06:33:05 ThreadId: 88 Getting information about event AdsBI_MSNBI_S2B_Hourly_Done
2021-08-02 06:33:05 ThreadId: 83 Getting information about event AdsBI_MSNBI_S2B_Hourly_Done
2021-08-02 06:33:05 ThreadId: 83 Got SQL exception: System.Data.SqlClient.SqlException (0x80131904): -999999:3:2--prc_ProcessStateGetNextDelta-212--Violation of PRIMARY KEY constraint 'PK_PROCESSPRESTATE'. Cannot insert duplicate key in object 'dbo.ProcessPrestate'. The duplicate key value is (-1931531102, 20210731 1100).
at System.Data.SqlClient.SqlConnection.OnError(SqlException exception, Boolean breakConnection, Action`1 wrapCloseInAction)
at System.Data.SqlClient.TdsParser.ThrowExceptionAndWarning(TdsParserStateObject stateObj, Boolean callerHasConnectionLock, Boolean asyncClose)
at System.Data.SqlClient.TdsParser.TryRun(RunBehavior runBehavior, SqlCommand cmdHandler, SqlDataReader dataStream, BulkCopySimpleResultSet bulkCopyHandler, TdsParserStateObject stateObj, Boolean& dataReady)
at System.Data.SqlClient.SqlDataReader.TryConsumeMetaData()
at System.Data.SqlClient.SqlDataReader.get_MetaData()
at System.Data.SqlClient.SqlCommand.FinishExecuteReader(SqlDataReader ds, RunBehavior runBehavior, String resetOptionsString, Boolean isInternal, Boolean forDescribeParameterEncryption, Boolean shouldCacheForAlwaysEncrypted)
at System.Data.SqlClient.SqlCommand.RunExecuteReaderTds(CommandBehavior cmdBehavior, RunBehavior runBehavior, Boolean returnStream, Boolean async, Int32 timeout, Task& task, Boolean asyncWrite, Boolean inRetry, SqlDataReader ds, Boolean describeParameterEncryptionRequest)
at System.Data.SqlClient.SqlCommand.RunExecuteReader(CommandBehavior cmdBehavior, RunBehavior runBehavior, Boolean returnStream, String method, TaskCompletionSource`1 completion, Int32 timeout, Task& task, Boolean& usedCache, Boolean asyncWrite, Boolean inRetry)
at System.Data.SqlClient.SqlCommand.RunExecuteReader(CommandBehavior cmdBehavior, RunBehavior runBehavior, Boolean returnStream, String method)
at System.Data.SqlClient.SqlCommand.ExecuteReader(CommandBehavior behavior, String method)
at System.Data.SqlClient.SqlCommand.ExecuteReader()
at Microsoft.AdCenter.WorkflowHost.CommunicationWorkflow.<>c__DisplayClass32_0.<GetLastParam>b__1(SqlConnection sqlConnection)
at Microsoft.AdCenter.WorkflowHost.SqlConnectionHelper.RunQuery(Action`1 command)
ClientConnectionId:0fd2a26b-bb72-4bbd-baec-a16499dff840
Error Number:50000,State:1,Class:16
ClientConnectionId before routing:bfc0cfb4-53ac-4a8f-a985-44e2e512efe2
Routing Destination:b9d5d60a293f.tr147.westus2-a.worker.database.windows.net,11020
connection string: Server=adsdwctest.database.windows.net;Database=DWC_DB;User ID=AdsDataSI_Execution;Password=AnotherPassword8!
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
2021-08-02T08:28:02.5629452Z          Checking compatibility for System.Security.Cryptography.Primitives 4.3.0 with .NETFramework,Version=v4.7.2.
2021-08-02T08:28:02.5630121Z All packages and projects are compatible with .NETFramework,Version=v4.7.2.
2021-08-02T08:28:02.6040861Z Committing restore...
2021-08-02T08:28:02.6041989Z Generating MSBuild file D:\a\1\s\private\src\Batch\SAMHourlyS2B\Microsoft.BI.MSNMediation.HourlyS2B.Drop\obj\Microsoft.BI.MSNMediation.HourlyS2B.Drop.csproj.nuget.g.props.
2021-08-02T08:28:02.6048518Z Generating MSBuild file D:\a\1\s\private\src\Batch\SAMHourlyS2B\Microsoft.BI.MSNMediation.HourlyS2B.Drop\obj\Microsoft.BI.MSNMediation.HourlyS2B.Drop.csproj.nuget.g.targets.
2021-08-02T08:28:02.6053133Z Writing assets file to disk. Path: D:\a\1\s\private\src\Batch\SAMHourlyS2B\Microsoft.BI.MSNMediation.HourlyS2B.Drop\obj\project.assets.json
2021-08-02T08:28:02.6174412Z Writing cache file to disk. Path: D:\a\1\s\private\src\Batch\SAMHourlyS2B\Microsoft.BI.MSNMediation.HourlyS2B.Drop\obj\project.nuget.cache
2021-08-02T08:28:02.6187683Z Persisting dg to D:\a\1\s\private\src\Batch\SAMHourlyS2B\Microsoft.BI.MSNMediation.HourlyS2B.Drop\obj\Microsoft.BI.MSNMediation.HourlyS2B.Drop.csproj.nuget.dgspec.json
2021-08-02T08:28:02.6200559Z Failed to restore D:\a\1\s\private\src\Batch\SAMHourlyS2B\Microsoft.BI.MSNMediation.HourlyS2B.Drop\Microsoft.BI.MSNMediation.HourlyS2B.Drop.csproj (in 17.33 sec).
2021-08-02T08:28:02.6260383Z
2021-08-02T08:28:02.6264281Z NuGet Config files used:
2021-08-02T08:28:02.6264975Z D:\a\1\Nuget\tempNuGet_22994537.config
2021-08-02T08:28:02.6265449Z
2021-08-02T08:28:02.6266040Z Feeds used:
2021-08-02T08:28:02.6266664Z https://msasg.pkgs.visualstudio.com/Shared%20Data/_packaging/Ads.BI.SubjectArea.Upstreams/nuget/v3/index.json
2021-08-02T08:28:02.6267321Z
2021-08-02T08:28:02.6267938Z Installed:
2021-08-02T08:28:02.6281713Z 60 package(s) to D:\a\1\s\private\src\Batch\SAMHourlyS2B\Microsoft.BI.MSNMediation.HourlyS2B.Drop\Microsoft.BI.MSNMediation.HourlyS2B.Drop.csproj
2021-08-02T08:28:02.6297867Z Done executing task "RestoreTask" -- FAILED.
2021-08-02T08:28:02.6302830Z 1>Done building target "Restore" in project "dirs.proj" -- FAILED.
2021-08-02T08:28:02.6303710Z 1>Done Building Project "D:\a\1\s\private\src\Batch\dirs.proj" (Restore target(s)) -- FAILED.
2021-08-02T08:28:02.6405509Z
2021-08-02T08:28:02.6547930Z Build FAILED.
2021-08-02T08:28:02.6562009Z
2021-08-02T08:28:02.6580770Z "D:\a\1\s\private\src\Batch\dirs.proj" (Restore target) (1) ->
2021-08-02T08:28:02.6589059Z (Restore target) ->
2021-08-02T08:28:02.6596250Z D:\a\1\s\private\src\Batch\SAMHourlyS2B\Microsoft.BI.MSNMediation.HourlyS2B.Drop\Microsoft.BI.MSNMediation.HourlyS2B.Drop.csproj : error NU1605: Detected package downgrade: Microsoft.BI.Common from 4.21.0 to 4.5.0. Reference the package directly from the project to select a different version. [D:\a\1\s\private\src\Batch\dirs.proj]
2021-08-02T08:28:02.6599483Z D:\a\1\s\private\src\Batch\SAMHourlyS2B\Microsoft.BI.MSNMediation.HourlyS2B.Drop\Microsoft.BI.MSNMediation.HourlyS2B.Drop.csproj : error NU1605: Microsoft.BI.MSNMediation.HourlyS2B.Drop -> Ads.BI.StreamingToBatch 2.2.0 -> Microsoft.BI.Common (>= 4.21.0) [D:\a\1\s\private\src\Batch\dirs.proj]
2021-08-02T08:28:02.6602014Z D:\a\1\s\private\src\Batch\SAMHourlyS2B\Microsoft.BI.MSNMediation.HourlyS2B.Drop\Microsoft.BI.MSNMediation.HourlyS2B.Drop.csproj : error NU1605: Microsoft.BI.MSNMediation.HourlyS2B.Drop -> Microsoft.BI.Common (>= 4.5.0) [D:\a\1\s\private\src\Batch\dirs.proj]
2021-08-02T08:28:02.6603542Z D:\a\1\s\private\src\Batch\SAMHourlyS2B\Microsoft.BI.MSNMediation.HourlyS2B.Drop\Microsoft.BI.MSNMediation.HourlyS2B.Drop.csproj : error NU1605: Detected package downgrade: Microsoft.BI.Common from 4.12.0 to 4.5.0. Reference the package directly from the project to select a different version. [D:\a\1\s\private\src\Batch\dirs.proj]
2021-08-02T08:28:02.6605044Z D:\a\1\s\private\src\Batch\SAMHourlyS2B\Microsoft.BI.MSNMediation.HourlyS2B.Drop\Microsoft.BI.MSNMediation.HourlyS2B.Drop.csproj : error NU1605: Microsoft.BI.MSNMediation.HourlyS2B.Drop -> Ads.BI.Orchestration.Workflows 0.1.0 -> Microsoft.BI.Common (>= 4.12.0) [D:\a\1\s\private\src\Batch\dirs.proj]
2021-08-02T08:28:02.6606611Z D:\a\1\s\private\src\Batch\SAMHourlyS2B\Microsoft.BI.MSNMediation.HourlyS2B.Drop\Microsoft.BI.MSNMediation.HourlyS2B.Drop.csproj : error NU1605: Microsoft.BI.MSNMediation.HourlyS2B.Drop -> Microsoft.BI.Common (>= 4.5.0) [D:\a\1\s\private\src\Batch\dirs.proj]
2021-08-02T08:28:02.6608307Z D:\a\1\s\private\src\Batch\SAMHourlyS2B\Microsoft.BI.MSNMediation.HourlyS2B.Drop\Microsoft.BI.MSNMediation.HourlyS2B.Drop.csproj : error NU1605: Detected package downgrade: Microsoft.Bingads.Dwc.Tools from 2.2.0 to 2.0.5102033.20. Reference the package directly from the project to select a different version. [D:\a\1\s\private\src\Batch\dirs.proj]
2021-08-02T08:28:02.6609947Z D:\a\1\s\private\src\Batch\SAMHourlyS2B\Microsoft.BI.MSNMediation.HourlyS2B.Drop\Microsoft.BI.MSNMediation.HourlyS2B.Drop.csproj : error NU1605: Microsoft.BI.MSNMediation.HourlyS2B.Drop -> Ads.BI.Orchestration.Workflows 0.1.0 -> Microsoft.Bingads.Dwc.Tools (>= 2.2.0) [D:\a\1\s\private\src\Batch\dirs.proj]
2021-08-02T08:28:02.6612139Z D:\a\1\s\private\src\Batch\SAMHourlyS2B\Microsoft.BI.MSNMediation.HourlyS2B.Drop\Microsoft.BI.MSNMediation.HourlyS2B.Drop.csproj : error NU1605: Microsoft.BI.MSNMediation.HourlyS2B.Drop -> Microsoft.BingAds.Dwc.Tools (>= 2.0.5102033.20) [D:\a\1\s\private\src\Batch\dirs.proj]
2021-08-02T08:28:02.6612903Z
2021-08-02T08:28:02.6613320Z 0 Warning(s)
2021-08-02T08:28:02.6613727Z 3 Error(s)
2021-08-02T08:28:02.6613981Z
2021-08-02T08:28:02.6614474Z Time Elapsed 00:00:19.93
2021-08-02T08:28:02.7009246Z ##[error]Error: The process 'C:\Program Files\dotnet\dotnet.exe' failed with exit code 1
2021-08-02T08:28:02.7020201Z ##[error]Packages failed to restore









可知,package引入如果指定了低层级的版本,会强制上层package统一使用,类似java的原理。



<PackageReference Include="Ads.BI.BinplacingTools">
<Version>3.0.0</Version>
</PackageReference>
<PackageReference Include="Ads.BI.PipelineApps.Egress">
<Version>6.1.0</Version>
</PackageReference>
<PackageReference Include="Microsoft.AdCenter.ExecuteSSISPackage">
<Version>1.1.5102398-pacman</Version>
</PackageReference>
<PackageReference Include="Microsoft.Azure.KeyVault.Core">
<Version>1.0.0</Version>
</PackageReference>
<PackageReference Include="Microsoft.BingAds.Dwc.Engine.Library">
<Version>2.1.0</Version>
</PackageReference>
<PackageReference Include="Microsoft.Search.Autopilot">
<Version>1.0.0</Version>
</PackageReference>
<PackageReference Include="WindowsAzure.Storage">
<Version>7.1.2</Version>
</PackageReference>

ads Data live-site tracking v2

Deployment process for FASTBI streaming pipeline - V2

[toc]

原理

为什么当前的AI大模型可以在很快的时间内根据你的问题知道该参考哪些文章,并进行总结,这里面都包含哪些步骤或者哪些功能模块的参与

这是一个很专业的问题。你观察到的这种“快速定位并总结参考文献”的能力,并非单个模型的魔法,而是一套精密系统协同工作的结果。这套系统通常被称为 RAG(检索增强生成)

简单来说,整个过程可以分解为三个核心阶段,每个阶段都有专门的功能模块参与。

阶段一:理解你的问题(查询处理)

这个阶段的目标是把你的口语化问题,变成系统能精准检索的“钥匙”。

  • 模块1:查询理解:系统会先分析你的问题,进行同义词替换(比如把“怎么减肥”理解为“减脂方法”)、识别专有名词,甚至把“苹果最新款手机”这种模糊指代明确为“iPhone 15 Pro Max”。
  • 模块2:查询改写与扩展:为了增加命中率,系统可能会用大模型把你的问题改写成几个不同角度的搜索语句,或者拆成几个子问题(如“苹果公司财报”、“iPhone销量”)。
  • 模块3:意图识别:快速判断你是想了解最新新闻(需要实时信息)、学术研究(需要论文库),还是常识问答(需要百科知识),以便后续去不同的知识库搜索。

阶段二:快速找到相关文章(检索)

这是实现“快速”的关键。系统不会去一篇篇阅读原始文档,而是通过“语义指纹”来匹配。

  • 核心模块:向量数据库与嵌入模型
    1. 预处理:系统事先把所有知识库里的文章,都通过一个嵌入模型转换成一串由数字组成的“向量”。这个向量就像文章的语义指纹,意思相近的文章,指纹在空间里的位置也相近。
    2. 实时转换:收到你的问题时,系统用同一个嵌入模型,也把你的问题转换成一个向量。
    3. 相似度搜索:系统立刻在你问题的向量和数据库里海量的文章向量之间,进行一种数学运算(如余弦相似度),快速找出距离最近的N篇文章。这个过程借助了高效的索引算法(如HNSW),速度极快,哪怕在百万级文档里也只需毫秒级。
  • 补充模块:混合检索与重排序:为了更精准,系统还会结合传统的关键词搜索(比如匹配你问题里的专有名词),并用一个轻量级的重排序模型,把初步找出的文章按最相关顺序排好。

阶段三:进行总结与回答(生成)

系统不是简单地把文章拼起来,而是让大模型扮演一个“擅长总结的编辑”。

  • 核心模块:大语言模型

    1. 构建上下文:系统会把你原始的问题,和阶段二找到的最相关的几篇文章(或段落),一起打包放进一个精心设计的提示模板里。比如:

      “根据以下参考资料,用中文简洁、准确地回答用户的问题。如果资料里没有答案,就说不知道。\n参考资料:\n[文章1内容]\n[文章2内容]\n…\n用户问题:[你的问题]”

    2. 生成回答:大模型看到这个提示后,会“根据”提供的资料来生成答案。它会提取关键信息、理顺逻辑、用自己的话进行总结,并在结尾附上参考来源。

    3. 流式输出:生成答案的同时,系统以“打字机效果”逐字返回,让你感觉响应极快。

一个简单的流程示例

假设你问:“2024年诺贝尔物理学奖得主是谁?”

  1. 查询处理:系统识别出关键词“2024年”、“诺贝尔物理学奖”、“得主”。
  2. 检索:系统将这几个词转换成向量,去维基百科、新闻库等向量数据库里搜索。瞬间定位到一篇名为“2024年诺贝尔物理学奖”的新闻文章。
  3. 生成:系统把问题和这篇文章的内容发给大模型。大模型读完文章,提炼出“约翰·J·霍普菲尔德和杰弗里·E·辛顿”,然后组织成一句通顺的回答:“2024年诺贝尔物理学奖授予了…”。同时附上参考文章链接。

总结:为什么这么快?

  • 预处理:文章的向量指纹是事先算好的,不占用你提问时的时间。
  • 向量检索:本质是数学计算,比人眼扫读文档快上亿倍。
  • 并行计算:现代GPU(图形处理器)/TPU(张量处理器)硬件可以同时进行大量计算。
  • 流式输出:模型算出第一个字就显示给你,减少等待的焦虑感。

这套RAG架构,就像给大模型外挂了一个可以无限扩展的、能极速检索的“图书馆”,让它既能博闻强识(访问最新外部知识),又能避免胡编乱造(回答基于事实资料)。这已经成为目前解决大模型知识更新和幻觉问题的主流工业方案。