golang中使用url encoding遇到的小坑

原创内容,转载请注明出处

Posted by Weakyon Blog on May 4, 2017

一 问题描述

url encoding在golang 1.7中使用的是net/url库

然而这个库有个小陷阱

主要问题是空格在http编码中是编码为%20还是+

从API列表看,编码用的是QueryEscape,解码用的是QueryUnescape

这个函数会将空格编码成+,但是%20和+都可以解码成空格

fmt.Println(url.QueryEscape(" test"))
s, err := url.QueryUnescape("%20+test")
if err != nil {
	fmt.Println(err)
	return
}
fmt.Println(s)
+test
  test

测试同学在java程序写的客户端中调用了类似的函数,在URI中将空格编码成了+,但是golang的http server库无法识别,依然解析成+而不是空格

二 问题解析

看了下golang源码

http server的对应处理代码在http/request.go的readRequest函数中

if req.URL, err = url.ParseRequestURI(rawurl); err != nil {                     
    return nil, err                                                             
}

由此可知,http server在解析URI时并没有使用QueryUnescape

QueryUnescape和ParseRequestURI在底层都调用了net/url的同一个函数unescape

仅仅是传入参数不同

QueryUnescape使用了encodeQueryComponent参数,而ParseRequestURI使用了encodePath参数

在unescape函数中

case '+':                                                                  
    if mode == encodeQueryComponent {                                      
        t[j] = ' '                                                         
    } else {                                                               
        t[j] = '+'                                                         
    }

和踩到的坑现象一致

QueryUnescape会处理+以及%20变成空格,ParseRequestURI只把%20变成空格

三 深入解析

查阅了一些相关资料

When data that has been entered into HTML forms is submitted, the form field names and values are encoded and sent to the server in an HTTP request message using method GET or POST, or, historically, via email. The encoding used by default is based on a very early version of the general URI percent-encoding rules, with a number of modifications such as newline normalization and replacing spaces with "+" instead of "%20". The MIME type of data encoded this way is application/x-www-form-urlencoded, and it is currently defined (still in a very outdated manner) in the HTML and XForms specifications.

简单的总结,历史上+是处理成空格的,但是根据最新的RFC3986

1 URI中的字段只能空格编码成%20

golang使用ParseRequestURI来解析,符合标准

2 query string作为application/x-www-form-urlencoded的方式,可以编码成+,也可以编码成%20

golang的具体实现

net/http中的request.go的parsePostForm调用了ParseQuery

case ct == "application/x-www-form-urlencoded":                             
    var reader io.Reader = r.Body                                           
    maxFormSize := int64(1<<63 - 1)                                         
    if _, ok := r.Body.(*maxBytesReader); !ok {                             
        maxFormSize = int64(10 << 20) // 10 MB is a lot of text.            
        reader = io.LimitReader(r.Body, maxFormSize+1)                      
    }                                                                       
    b, e := ioutil.ReadAll(reader)                                          
    if e != nil {                                                           
        if err == nil {                                                     
            err = e                                                         
        }                                                                   
        break                                                               
    }                                                                       
    if int64(len(b)) > maxFormSize {                                        
        err = errors.New("http: POST too large")                            
        return                                                              
    }                                                                       
    vs, e = url.ParseQuery(string(b))                                       
    if err == nil {                                                         
        err = e                                                             
    }

而ParseQuery调用了QueryUnescape,根据第二节的结论,符合标准

3 multipart表单上传作为multipart/form-data的方式,不执行任何的编码步骤

multipart表单上传入口函数ParseMultipartForm先调用ParseForm,然后调用ReadForm

func (r *Request) ParseForm() error {                                               
    var err error                                                                   
    if r.PostForm == nil {                                                          
        if r.Method == "POST" || r.Method == "PUT" || r.Method == "PATCH" {         
            r.PostForm, err = parsePostForm(r)                                      
        }                                                                           
        if r.PostForm == nil {                                                      
            r.PostForm = make(url.Values)                                           
        }                                                                           
    }                                                                               
    if r.Form == nil {                                                              
        if len(r.PostForm) > 0 {                                                    
            r.Form = make(url.Values)                                               
            copyValues(r.Form, r.PostForm)                                          
        }                                                                           
        var newValues url.Values                                                    
        if r.URL != nil {                                                           
            var e error                                                             
            newValues, e = url.ParseQuery(r.URL.RawQuery)                           
            if err == nil {                                                         
                err = e                                                             
            }                                                                       
        }                                                                           
        if newValues == nil {                                                       
            newValues = make(url.Values)                                            
        }                                                                           
        if r.Form == nil {                                                          
            r.Form = newValues                                                      
        } else {                                                                    
            copyValues(r.Form, newValues)                                           
        }                                                                           
    }                                                                               
    return err                                                                      
}

可以看到,这里调用了ParseQuery,在multipart/form-data中也解析了query string

把解析结果作为表单的一部分(此时query string的部分可以用r.URL.Query().Get获取,也可以通过FormValue获取)

而在ReadForm中,解析过程并没有任何编码过程

四 总结

Note that for email links, you do need %20 and not + after the ?. For example, mailto:support@example.org?subject=I%20need%20help. If you tried that with +, the email will open with +es instead of spaces.

但是在邮件连接中,query string如果把空格编码成+,也是无法被解析的。

因此在我看来

URL中不管是URI还是query string,编码成%20

而multipart/form-data中不要进行编码

04 May 2017